RE: [Corpora-List] word frequencies on the web

From: William Fletcher (fletcher@usna.edu)
Date: Fri Dec 08 2006 - 19:54:00 MET

  • Next message: Rada Mihalcea: "[Corpora-List] [Cluj-Napoca, Romania] KEPT 2007: The First International Conference on Knowledge Engineering: Principles and Techniques"

    Dear Tony,

    I have lists of words occurring 100 or more and 10 or more times
    respectively in the preliminary version of a dynamic Web Corpus I am
    compiling for "Phrases in English". Since you cannot reach PIE directly, I
    put them on my KWiCFinder site:

    http://www.kwicfinder.com/WebCorpus2006_min100.html

    tab-separated text files
    http://www.kwicfinder.com/WebCorpus2006_min100.txt
    http://www.kwicfinder.com/WebCorpus2006_min10.txt

    Corpus currently has 97,198,272 tokens and 525,509 types, of which 30,524
    occur 100 or more times 104,675 tokens occur 10 or more times

    Regards,
    Bill Fletcher

    -----Original Message-----
    From: owner-corpora@lists.uib.no [mailto:owner-corpora@lists.uib.no] On
    Behalf Of Tony Berber Sardinha
    Sent: Friday, December 08, 2006 11:44 AM
    To: CORPORA
    Subject: [Corpora-List] word frequencies on the web

    Dear all, does anyone know of ways to estimate the frequency of words on the
    web, or if there're search engines that supply this info (as Altavista used
    to do)?

    thank you!
    tony
    www2.lael.pucsp.br/~tony



    This archive was generated by hypermail 2b29 : Fri Dec 08 2006 - 20:05:01 MET