Re: [Corpora-List] sentence boundary detectors

From: Armin Schmidt (armin.sch@gmail.com)
Date: Tue Feb 20 2007 - 18:20:44 MET

  • Next message: Doug Cooper: "[Corpora-List] crowd-sourcing tasks"

    Joel,

    thanks. Unfortunately, many of the links on your page are indeed dead.
    But I'll post a summary of all the responses I got so far to the list,
    so you can update your link list, too.

    Of course, I searched the archives (and the web) before posting to
    corpora list but the responses to those earlier posts were of limited
    use only for my particular task. Also, I wanted to find out if, in the
    meantime, sentence splitters had been developed which could be trained
    on particular corpora in an language-independent manner (more on this in
    my summary).

    Cheers,
    Armin

    Joel Tetreault schrieb:
    >
    > hi Armin, if you scroll way down to the "Tools" section of my website,
    > and then scroll down to the "Sentence Splitters" subsection, you should
    > find a links to several splitters.
    >
    > http://www.cs.rochester.edu/u/tetreaul/academic.html
    >
    > (Please excuse the fact I threw all these links up one page :) )
    >
    > Your question was posed to corpora-list 3 or 4 years ago, so all the
    > links above (including an updated link to Scott Piao's Java one) are
    > from other researchers emailing in with their suggestions. I just ran
    > through the links, and since it has been several years, a bunch are
    > dead. But if you google the names of the splitter or their authors, you
    > can probably find their new locations.
    >
    > I'd also check out the corpora-list archives:
    >
    > http://listserv.linguistlist.org/cgi-bin/wa?S1=corpora
    >
    > there might be some emails/links that I missed...
    >
    > Joel
    >
    >
    > On Mon, 19 Feb 2007, Scott Songlin Piao wrote:
    >
    >> Hi Armin,
    >>
    >> I put my English sentence splitor on the website:
    >> http://text0.mib.man.ac.uk:8080/sentencebreaker/heuristic_tool
    >>
    >> It is rule-based Java program and is downloadable.
    >>
    >> Cheers
    >>
    >> Scott Piao
    >> ----------------------------
    >> Text Mining
    >> School of Computer Science
    >> The University of Manchester
    >> UK
    >>
    >>
    >>
    >>
    >> -----Original Message-----
    >> From: owner-corpora@lists.uib.no [mailto:owner-corpora@lists.uib.no]
    >> On Behalf Of Armin Schmidt
    >> Sent: 17 February 2007 19:48
    >> To: corpora@uib.no
    >> Subject: [Corpora-List] sentence boundary detectors
    >>
    >> Dear list,
    >>
    >> I was wondering if you could point me to good sentence splitters for the
    >> following languages: German, Russian, Spanish, English. It would be
    >> great if they were stand-alone programs or modules for Python (Perl
    >> would be ok, too ... although I'm already aware of the respective
    >> CPAN-modules for English and German).
    >>
    >> Since I do have corpora in all the above mentioned languages I would
    >> also be very interested in available implementations (not papers) of any
    >> unsupervised learning methods for detecting sentence boundaries (or
    >> rather abbreviations).
    >>
    >> Thanks,
    >> Armin
    >>
    >>
    >>
    >>
    >>
    >>
    >

    -- 
    http://diotavelli.net/people/armin/
    



    This archive was generated by hypermail 2b29 : Tue Feb 20 2007 - 18:18:39 MET