Re: [Corpora-List] Special-domain corpora

From: Paul Buitelaar (paulb@dfki.de)
Date: Wed Mar 30 2005 - 12:11:54 MET DST

  • Next message: Trilok Khairnar: "[Corpora-List] Corpus from Blogs required."

    Carlos Rodriguez wrote:

    > Hi,
    >
    > I was wondering if anyone could point me to domain corpora with the
    > following characteristics:
    >
    > 1.- Written texts (ASCII, xml, txt,pdf, no need to be tagged) from
    > specialized or technical domains.

    If 1 million tokens is ok, you can try the MuchMore corpus of medical
    texts (German/English):

    http://muchmore.dfki.de/resources1.htm

    Cheers,

        Paul Buitelaar
        DFKI - Language Technology Lab
        Saarbruecken, Germany

    > 2.- Open source, or reasonably priced, that can be downloaded to be
    > processed (web-accesible through proprietary interfaces won't cut it).
    > 3.- If possible, with machine-readable or electronic lexicons or
    > dictionaries available for the domain represented by the corpora.
    >
    > I am thinking about experimenting with techniques for lexical
    > acquisition.
    >
    > Thanks and best to all,
    >
    >
    > Carlos Rodríguez
    >
    >



    This archive was generated by hypermail 2b29 : Wed Mar 30 2005 - 12:48:05 MET DST