Re: [Corpora-List] Incidence of MWEs

From: Jean Veronis (Jean.Veronis@up.univ-mrs.fr)
Date: Fri Mar 17 2006 - 08:05:19 MET

  • Next message: Hans Lindquist: "Re: [Corpora-List] Incidence of MWEs"

    Adam Kilgarriff a écrit :

    > US dictionaries are ***way, way*** behind UK dictionaries in corpus use. UK
    > dictionary publishers lead the world in corpus development and use (with NLP
    > lagging behind). OUP and Longman were prime movers in developing the BNC,
    > and OUP is now on the point of launching its billion-word corpus of English.
    > Collins-COBUILD was the great pioneer in the 1980s.

    Just a small point of history outside English: to my knowledge the
    earliest instance of large corpus-based lexicography is that of the
    Trésor de la Langue Francaise, lauched around 1960. A computer corpus of
    over 100 M words was created, which was used for the creation of the
    monumental 16-volume TLF dictionary (100,000 headwords, 230,000
    definitions, 430,000 examples).

    On line at http://atilf.atilf.fr/

    History: http://www.cnrs.fr/Cnrspresse/n96a7.html (fr)

    The corpus (Frantext) comprises now 210 M words (127 M words POS-tagged)
    and is available on-line for registered users:

    http://www.atilf.fr/frantext.htm (fr)

    -- 
       jv
    

    Web: http://www.up.univ-mrs.fr/veronis Blog: http://aixtal.blogspot.com



    This archive was generated by hypermail 2b29 : Fri Mar 17 2006 - 08:05:11 MET