Re: [Corpora-List] Word frequencies in English, French, German, Spanish, Dutch, Italian and Portuguese

From: Marco Baroni (marco.baroni@unitn.it)
Date: Mon Feb 12 2007 - 19:50:30 MET

  • Next message: Mark Davies: "RE: [Corpora-List] Word frequencies in English, French, German, Spanish, Dutch, Italian and Portuguese"

    You can also extract various types of frequency lists from the Italian la
    Repubblica corpus from here:

    http://sslmitdev-online.sslmit.unibo.it/corpora/frequency.php?path=&name=Repubblica

    They are not balanced like the CoLFIS list, but they come from a much
    larger corpus (about 400M tokens).

    Regards,

    Marco

    Isabella Chiari wrote:
    > For Italian the largest (sigh...) word frequency list available is the list
    > from Corpus e Lessico di Frequenza dell'Italiano Scritto (CoLFIS) from a
    > corpus of 3.150.075 token of written language.
    > You can freely download the lists in various format at:
    > http://www.istc.cnr.it/material/database/colfis/index_eng.shtml
    > The corpus is partially available for search at:
    > http://www.ge.ilc.cnr.it/page.php?ID=archCoLFIS&lingua=it
    >
    > Ref. Laudanna, A., Thornton, A.M., Brown, G., Burani, C. e Marconi, L.
    > (1995). Un corpus dell'italiano scritto contemporaneo dalla parte del
    > ricevente. In S. Bolasco, L. Lebart e A. Salem (a cura di), III Giornate
    > internazionali di Analisi Statistica dei Dati Testuali. Volume I,
    > pp.103-109. Roma: Cisu
    >
    > Best wishes,
    > Isabella Chiari
    >
    >
    > Isabella Chiari
    >
    > Università La Sapienza di Roma
    > Dipartimento di Studi Filologici, Linguistici e Letterari (DSFLL)
    > dell’Università di Roma “La Sapienza”
    > P.le Aldo Moro, 5, III Piano, Edificio ex Facoltà di Lettere e Filosofia,
    > 00185 Roma, tel. +30 06 4991 3575
    > e-mail: isabella.chiari@uniroma1.it
    > Home page Alphabit www.alphabit.net
    > Alphabit blog / Glottophilia blog
    >
    >
    >
    > -----Original Message-----
    > From: owner-corpora@lists.uib.no [mailto:owner-corpora@lists.uib.no] On
    > Behalf Of Yorick Wilks
    > Sent: lunedì 12 febbraio 2007 17.37
    > To: corpora@lists.uib.no
    > Subject: [Corpora-List] Word frequencies in English, French, German,
    > Spanish, Dutch, Italian and Portuguese
    >
    > Does anyone know easily accessible sources of these?
    > Yorick Wilks
    > Sheffield
    >
    >

    -- 
    Marco Baroni
    CIMeC, University of Trento
    http://www.form.unitn.it/~baroni
    



    This archive was generated by hypermail 2b29 : Mon Feb 12 2007 - 19:47:38 MET