Hi Daniel,
Some datasets that come to mind are ACM digital library for CS-related
publications (but need to be careful about licensing issues), and dmoz.org
for Web pages. The open directory dmoz.org is available for several
languages.
Cheers,
TAA
-----------------------------------------------------
Tony Abou-Assaleh
Email: taa@acm.org
Web site: http://tony.abou-assaleh.net
----------------------[THE END]----------------------
On Tue, 16 Jan 2007, Daniel Beck wrote:
> Hello corpora mailing list,
>
> I'm working on my master thesis "Accurate Hierarchical Classification
> using NLP Techniques". I hope to improve the accuracy of hierarchical
> classification on English and German corpora by using additional
> information extracted with aid of linguistic tools.
>
> I would like to ask where I can obtain corpora which are already
> classified in a hierarchy. I need several English and German corpora. I
> would prefer if the topics of the corpora are about linguistic or
> computer science.
>
> Regards & Thanks,
>
> Daniel
>
>
>
This archive was generated by hypermail 2b29 : Tue Jan 16 2007 - 17:12:20 MET