RE: [Corpora-List] Hierarchically classified corpora?

From: Ralf Steinberger (ralf.steinberger@jrc.it)
Date: Tue Jan 16 2007 - 17:20:59 MET

  • Next message: Violeta Seretan: "[Corpora-List] Student Research Workshop at ACL-07: Last CFP"

    Dear Daniel,

     

    The JRC-Acquis parallel corpus is available in 21 languages, including
    English and German. Most JRC-Acquis texts are indexed with the
    hierarchically organised Eurovoc thesaurus (you need to get a licence in
    order to receive Eurovoc and info on the hierarchical structure, but that's
    free for research purposes). Unfortunately, it is not about linguistics or
    computer science.

     

    You find more information about the JRC-Acquis, including the link where to
    download it at http://langtech.jrc.it/ <http://langtech.jrc.it/index.html> .

     

    Marko Grobelnik from Jozef Stefan Institute in Ljubljana has worked on
    hierarchical classification, as well, using DMOZ. Would this thesaurus and
    document collection be more appropriate for you?

     

    I hope this helps.

     

    Greetings from the other side of the Alps.

     

    Ralf

     

    PS: I'd be interested in hearing about the outcome of your work, when it
    becomes available. :-)

     

     

     

    Ralf Steinberger ( <mailto:Ralf.Steinberger@jrc.it> Ralf.Steinberger@jrc.it)

    European Commission - Joint Research Centre (JRC)
    IPSC - SeS - Language Technology ( <http://langtech.jrc.it/>
    http://langtech.jrc.it, <http://press.jrc.it/NewsExplorer/>
    http://press.jrc.it/NewsExplorer)
    T.P. 267, Via Fermi 1
    21020 Ispra (VA), Italy

     

     

    -----Original Message-----
    From: owner-corpora@lists.uib.no [mailto:owner-corpora@lists.uib.no] On
    Behalf Of Daniel Beck
    Sent: 16 January 2007 17:02
    To: corpora@hd.uib.no
    Subject: [Corpora-List] Hierarchically classified corpora?

     

    Hello corpora mailing list,

     

    I'm working on my master thesis "Accurate Hierarchical Classification

    using NLP Techniques". I hope to improve the accuracy of hierarchical

    classification on English and German corpora by using additional

    information extracted with aid of linguistic tools.

     

    I would like to ask where I can obtain corpora which are already

    classified in a hierarchy. I need several English and German corpora. I

    would prefer if the topics of the corpora are about linguistic or

    computer science.

     

    Regards & Thanks,

     

    Daniel

     

     

     



    This archive was generated by hypermail 2b29 : Tue Jan 16 2007 - 17:51:50 MET