RE: [Corpora-List] sentence boundary detectors

From: Joel Tetreault (tetreaul@cs.rochester.edu)
Date: Mon Feb 19 2007 - 16:00:58 MET

  • Next message: Torsten Zesch: "[Corpora-List] Research positions in Natural Language Processing (doctoral/postdoctoral level)"

    hi Armin, if you scroll way down to the "Tools" section of my website, and
    then scroll down to the "Sentence Splitters" subsection, you should find a
    links to several splitters.

    http://www.cs.rochester.edu/u/tetreaul/academic.html

    (Please excuse the fact I threw all these links up one page :) )

    Your question was posed to corpora-list 3 or 4 years ago, so all the links
    above (including an updated link to Scott Piao's Java one) are from other
    researchers emailing in with their suggestions. I just ran through the
    links, and since it has been several years, a bunch are dead. But if you
    google the names of the splitter or their authors, you can probably find
    their new locations.

    I'd also check out the corpora-list archives:

    http://listserv.linguistlist.org/cgi-bin/wa?S1=corpora

    there might be some emails/links that I missed...

    Joel

    On Mon, 19 Feb 2007, Scott Songlin Piao wrote:

    > Hi Armin,
    >
    > I put my English sentence splitor on the website:
    > http://text0.mib.man.ac.uk:8080/sentencebreaker/heuristic_tool
    >
    > It is rule-based Java program and is downloadable.
    >
    > Cheers
    >
    > Scott Piao
    > ----------------------------
    > Text Mining
    > School of Computer Science
    > The University of Manchester
    > UK
    >
    >
    >
    >
    > -----Original Message-----
    > From: owner-corpora@lists.uib.no [mailto:owner-corpora@lists.uib.no] On Behalf Of Armin Schmidt
    > Sent: 17 February 2007 19:48
    > To: corpora@uib.no
    > Subject: [Corpora-List] sentence boundary detectors
    >
    > Dear list,
    >
    > I was wondering if you could point me to good sentence splitters for the
    > following languages: German, Russian, Spanish, English. It would be
    > great if they were stand-alone programs or modules for Python (Perl
    > would be ok, too ... although I'm already aware of the respective
    > CPAN-modules for English and German).
    >
    > Since I do have corpora in all the above mentioned languages I would
    > also be very interested in available implementations (not papers) of any
    > unsupervised learning methods for detecting sentence boundaries (or
    > rather abbreviations).
    >
    > Thanks,
    > Armin
    >
    >
    >
    >
    >
    >



    This archive was generated by hypermail 2b29 : Mon Feb 19 2007 - 16:26:26 MET