Re: [Corpora-List] Special-domain corpora

From: Paul Buitelaar (paulb@dfki.de)
Date: Wed Mar 30 2005 - 12:11:54 MET DST

Next message: Trilok Khairnar: "[Corpora-List] Corpus from Blogs required."

Previous message: Cristina Mota: "[Corpora-List] [Fwd: [L2F] Interspeech'2005 (Eurospeech) - Call for Papers]"
In reply to: Carlos Rodriguez: "[Corpora-List] Special-domain corpora"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Carlos Rodriguez wrote:

> Hi,
>
> I was wondering if anyone could point me to domain corpora with the
> following characteristics:
>
> 1.- Written texts (ASCII, xml, txt,pdf, no need to be tagged) from
> specialized or technical domains.

If 1 million tokens is ok, you can try the MuchMore corpus of medical
texts (German/English):

http://muchmore.dfki.de/resources1.htm

Cheers,

    Paul Buitelaar
    DFKI - Language Technology Lab
    Saarbruecken, Germany

> 2.- Open source, or reasonably priced, that can be downloaded to be
> processed (web-accesible through proprietary interfaces won't cut it).
> 3.- If possible, with machine-readable or electronic lexicons or
> dictionaries available for the domain represented by the corpora.
>
> I am thinking about experimenting with techniques for lexical
> acquisition.
>
> Thanks and best to all,
>
>
> Carlos Rodríguez
>
>

Next message: Trilok Khairnar: "[Corpora-List] Corpus from Blogs required."
Previous message: Cristina Mota: "[Corpora-List] [Fwd: [L2F] Interspeech'2005 (Eurospeech) - Call for Papers]"
In reply to: Carlos Rodriguez: "[Corpora-List] Special-domain corpora"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Wed Mar 30 2005 - 12:48:05 MET DST