Re: [Corpora-List] Hierarchically classified corpora?

From: Armin Schmidt (armin.sch@gmail.com)
Date: Tue Jan 16 2007 - 17:48:07 MET

  • Next message: Manuela Speranza: "[Corpora-List] EVALITA 2007 First Call for Participation"

    Hi Daniel,

    Wikipedia (http://www.wikipedia.org) applies hierarchical categorization
    to their articles. It provides very large corpora for German and
    English. You can download the corpora in XML-format here:
    http://download.wikimedia.org/backup-index.html. It's all free and you
    can quite easily generate domain-specific corpora that are of interest
    for you, e.g. those about computer science or linguistics, by simply
    extracting articles having a particular tag. Also, look here:
    https://www.cs.tcd.ie/esslli2007/content/courses/id19.html

    Best,
    Armin

    Daniel Beck schrieb:
    > Hello corpora mailing list,
    >
    > I'm working on my master thesis "Accurate Hierarchical Classification
    > using NLP Techniques". I hope to improve the accuracy of hierarchical
    > classification on English and German corpora by using additional
    > information extracted with aid of linguistic tools.
    >
    > I would like to ask where I can obtain corpora which are already
    > classified in a hierarchy. I need several English and German corpora. I
    > would prefer if the topics of the corpora are about linguistic or
    > computer science.
    >
    > Regards & Thanks,
    >
    > Daniel



    This archive was generated by hypermail 2b29 : Fri Jan 19 2007 - 00:01:37 MET