[Corpora-List] Word frequencies for a large corpus of recent USENET text

From: Cyrus Shaoul (cyrus.shaoul@ualberta.ca)
Date: Thu Aug 31 2006 - 18:46:26 MET DST

  • Next message: Ramesh Krishnamurthy: "Re: [Corpora-List] Word frequencies for a large corpus of recent USENET text"

    Hi All,

    I thought that this might be of interest to the list. I have also experimented with using a CC Attribution-NonCommercial-NoDerivs license for this word frequency list. Please tell me if you think this is a good or a bad idea.

    Thanks,

    Cyrus

     *******
     Announcement: Word frequencies for a large corpus of USENET text released.
     *******
     
     The Westbury Lab at the University of Alberta does research on lexical
     semantics and other areas of psycholinguistics. Recently, as part of a
     research program investigating high-dimensional models of semantic memory,
     they collected 5,894,564,637 words from 47,860 English language,
     non-binary-file newsgroups from the
     USENET between October 2005 and August 2006.
     This list of orthographic frequencies for 111,627 English words will be
     of use to anyone who has used older lists based on corpora from decades
     past.
     
     The list is available for download (3.3 MB file) under a Creative
     Commons 2.5 license at:
     
         http://www.psych.ualberta.ca/~westburylab/downloads/wlfreq.download.html
       

    =[=]={=}=[=]={=}=[=]={=}=[=]={=}=[=]={=}
    Cyrus Shaoul
    http://www.psych.ualberta.ca/~westburylab/
    University of Alberta
    780-492-5843
    =[=]={=}=[=]={=}=[=]={=}=[=]={=}=[=]={=}





    This archive was generated by hypermail 2b29 : Thu Aug 31 2006 - 18:49:12 MET DST