Dear Daniel,
> I'm working on my master thesis "Accurate Hierarchical Classification
> using NLP Techniques". I hope to improve the accuracy of hierarchical
> classification on English and German corpora by using additional
> information extracted with aid of linguistic tools.
>
> I would like to ask where I can obtain corpora which are already
> classified in a hierarchy. I need several English and German corpora. I
> would prefer if the topics of the corpora are about linguistic or
> computer science.
>
> Regards & Thanks,
>
> Daniel
The Medline database of scientific publications in the
biomedical domain contains article abstracts which are
indexed using the hierarchically organized MeSH thesaurus.
It can be obtained for free through a license with the US National
Library of Medicine. It currently contains over 16 million records,
a majority of which have English abstracts.
http://www.nlm.nih.gov/bsd/licensee/2007_stats/baseline_doc.html
Greetings from the other side of the Rhine.
Pierre.
-- Pierre Zweigenbaum ---- LIMSI - CNRS Groupe LIR / Dépt. Communication Homme-Machine Tél : (+33) (0)1 69 85 80 04 ; Fax : (+33) (0)1 69 85 80 88 Mél : pz@limsi.fr ; Toile : http://www.limsi.fr/~pz/ Lieu : Bâtiment 508, Université Paris XI, Courrier : LIMSI, BP 133, 91403 ORSAY Cedex, France ---- CRIM, Institut National des Langues et Civilisations Orientales ----
This archive was generated by hypermail 2b29 : Thu Jan 18 2007 - 15:42:52 MET