[Corpora-List] Newspaper Corpora

From: Jan Strunk (strunk@linguistics.ruhr-uni-bochum.de)
Date: Mon Apr 14 2003 - 16:16:11 MET DST

  • Next message: Tony Rose: "RE: [Corpora-List] Newspaper Corpora"

    Hello,

    I would like to evaluate a sentence boundary
    and abbreviation detection algorithm on as
    many different languages as possible.
    Therefore, I am searching for newspaper corpora
    that are either freely avaible or not too expensive.

    The languages in question should use the period
    as an ambiguous token denoting either a sentence
    boundary, an abbreviation or both.

    I am already using parts of the Wall Street Journal Corpus,
    the Neue Zürcher Zeitung and some corpora
    included in the Multilingual Corpus I from the European Corpus Initiative.
    I also know about TRACTOR.

    I would be very thankful for any suggestions.

    Best regards,

    Jan Strunk
    strunk@linguistics.ruhr-uni-bochum.de
    Sprachwissenschaftliches Institut
    Ruhr-Universität Bochum
    Germany



    This archive was generated by hypermail 2b29 : Mon Apr 14 2003 - 16:15:15 MET DST