Dear Daniel,
The JRC-Acquis parallel corpus is available in 21 languages, including
English and German. Most JRC-Acquis texts are indexed with the
hierarchically organised Eurovoc thesaurus (you need to get a licence in
order to receive Eurovoc and info on the hierarchical structure, but that's
free for research purposes). Unfortunately, it is not about linguistics or
computer science.
You find more information about the JRC-Acquis, including the link where to
download it at http://langtech.jrc.it/ <http://langtech.jrc.it/index.html> .
Marko Grobelnik from Jozef Stefan Institute in Ljubljana has worked on
hierarchical classification, as well, using DMOZ. Would this thesaurus and
document collection be more appropriate for you?
I hope this helps.
Greetings from the other side of the Alps.
Ralf
PS: I'd be interested in hearing about the outcome of your work, when it
becomes available. :-)
Ralf Steinberger ( <mailto:Ralf.Steinberger@jrc.it> Ralf.Steinberger@jrc.it)
European Commission - Joint Research Centre (JRC)
IPSC - SeS - Language Technology ( <http://langtech.jrc.it/>
http://langtech.jrc.it, <http://press.jrc.it/NewsExplorer/>
http://press.jrc.it/NewsExplorer)
T.P. 267, Via Fermi 1
21020 Ispra (VA), Italy
-----Original Message-----
From: owner-corpora@lists.uib.no [mailto:owner-corpora@lists.uib.no] On
Behalf Of Daniel Beck
Sent: 16 January 2007 17:02
To: corpora@hd.uib.no
Subject: [Corpora-List] Hierarchically classified corpora?
Hello corpora mailing list,
I'm working on my master thesis "Accurate Hierarchical Classification
using NLP Techniques". I hope to improve the accuracy of hierarchical
classification on English and German corpora by using additional
information extracted with aid of linguistic tools.
I would like to ask where I can obtain corpora which are already
classified in a hierarchy. I need several English and German corpora. I
would prefer if the topics of the corpora are about linguistic or
computer science.
Regards & Thanks,
Daniel
This archive was generated by hypermail 2b29 : Tue Jan 16 2007 - 17:51:50 MET