Hello Daniel,
You may also want to consider the hierarchically classified HEP corpus. It
is in English (i.e. no German texts) and not about computer science, but it
is very well documented, has a good size, etc. You find it at:
http://sinai.ujaen.es/wiki/index.php/HepCorpus#English_version
Arturo Montejo Ráez (amontejo AT ujaen.es) will be happy to help you with
any questions you may have. A useful feature about this corpus is that
Arturo has already produced a number of benchmark values for categorisation
with various methods.
Ralf
Ralf Steinberger ( <mailto:Ralf.Steinberger@jrc.it> Ralf.Steinberger@jrc.it)
European Commission - Joint Research Centre (JRC)
IPSC - SeS - Language Technology ( <http://langtech.jrc.it/>
http://langtech.jrc.it, <http://press.jrc.it/NewsExplorer/>
http://press.jrc.it/NewsExplorer)
T.P. 267, Via Fermi 1
21020 Ispra (VA), Italy
-----Original Message-----
> I'm working on my master thesis "Accurate Hierarchical Classification
> using NLP Techniques". I hope to improve the accuracy of hierarchical
> classification on English and German corpora by using additional
> information extracted with aid of linguistic tools.
>
> I would like to ask where I can obtain corpora which are already
> classified in a hierarchy. I need several English and German corpora. I
> would prefer if the topics of the corpora are about linguistic or
> computer science.
>
> Regards & Thanks,
>
> Daniel
This archive was generated by hypermail 2b29 : Mon Jan 22 2007 - 10:37:28 MET