Re: [Corpora-List] UPDATE: Corrected Word frequencies for a large corpus of recent USENET text, and full list of types. New query tool.

From: Cyrus Shaoul (cyrus.shaoul@ualberta.ca)
Date: Tue Sep 05 2006 - 08:45:39 MET DST

  • Next message: GREFENSTETTE Gregory 206823: "[Corpora-List] Second CFP Large-Scale Semantic Access to Content (Text, Image, Video and Sound) RIAO'2007"

    Adam Kilgarriff wrote:
    > Just a comment about this kind of resource: wouldn't it be better to make it
    > available as a searchable resource, allowing people to specify the searches
    > they wanted and check up on anomalous frequencies, rather than distributing
    > a frequency list which will inevitably raise many questions, for anyone
    > planning to seriously use it, which they won't be able to answer (at least
    > not without coming back to you, and their questions won't be your priority)
    >
    > Adam
    >
    >
    Good point, Adam. I have now made an interactive query tool available here:

        
    http://www.psych.ualberta.ca/~westburylab/downloads/wlallfreq.download.html

    It only allows one one-word query per submission, but I think it
    should be sufficient for most quick searches.

    Also, the type-list was a little too big for most usages, so I trimmed it down to words that
    appeared more than 20 times in the corpus (equivalent to words that appear more than 0.003 times per million).

    Please send me any feedback that you have. (I do like answering questions! But I am
    busy with other work, as Adam surmised.)

    Thanks,

    Cyrus



    This archive was generated by hypermail 2b29 : Tue Sep 05 2006 - 08:43:49 MET DST