RE: [Corpora-List] Hierarchically classified corpora?

From: Ralf Steinberger (ralf.steinberger@jrc.it)
Date: Mon Jan 22 2007 - 08:20:06 MET

  • Next message: Antonio Branco: "[Corpora-List] DAARC2007: Conference on Anaphora: call for participation"

    Hello Daniel,

     

    You may also want to consider the hierarchically classified HEP corpus. It
    is in English (i.e. no German texts) and not about computer science, but it
    is very well documented, has a good size, etc. You find it at:

     

       http://sinai.ujaen.es/wiki/index.php/HepCorpus#English_version

     

    Arturo Montejo Ráez (amontejo AT ujaen.es) will be happy to help you with
    any questions you may have. A useful feature about this corpus is that
    Arturo has already produced a number of benchmark values for categorisation
    with various methods.

     

    Ralf

     

     

    Ralf Steinberger ( <mailto:Ralf.Steinberger@jrc.it> Ralf.Steinberger@jrc.it)

    European Commission - Joint Research Centre (JRC)
    IPSC - SeS - Language Technology ( <http://langtech.jrc.it/>
    http://langtech.jrc.it, <http://press.jrc.it/NewsExplorer/>
    http://press.jrc.it/NewsExplorer)
    T.P. 267, Via Fermi 1
    21020 Ispra (VA), Italy

     

    -----Original Message-----

    > I'm working on my master thesis "Accurate Hierarchical Classification

    > using NLP Techniques". I hope to improve the accuracy of hierarchical

    > classification on English and German corpora by using additional

    > information extracted with aid of linguistic tools.

    >

    > I would like to ask where I can obtain corpora which are already

    > classified in a hierarchy. I need several English and German corpora. I

    > would prefer if the topics of the corpora are about linguistic or

    > computer science.

    >

    > Regards & Thanks,

    >

    > Daniel

     



    This archive was generated by hypermail 2b29 : Mon Jan 22 2007 - 10:37:28 MET