Re: Corpora: BNC word Frequency List

From: Paul Rayson (paul@comp.lancs.ac.uk)
Date: Wed Oct 18 2000 - 14:55:04 MET DST

  • Next message: Lillian Lee: "Corpora: second call for workshop proposals for NAACL '01"

    Neil,

    > I remember reading some time ago that a word frequency list for the BNC had
    > been produced.
    >
    > Could anybody tell me how to get hold of this?

    There was a summary posted by Philip Resnik in July, part of which
    follows.

    Regards,
    Paul.

    1a. British National Corpus (http://info.ox.ac.uk/bnc/)

       The corpus itself is available only to Europeans, but Adam
       Kilgarriff has produced word frequency lists and put them on the
       Web at http://www.itri.brighton.ac.uk/~Adam.Kilgarriff/bnc-readme.html.
       He writes, "the lists from the BNC on my web page - particularly
       the lemmatised ones - were produced with English teaching and
       dictionaries in mind, and have been quite widely used for
       experiment-type purposes. The BNC is clearly appropriate, as it
       was designed with 'general English' in mind. (though it is
       British, but I suspect the differences there are quite marginal.)
       It's been getting 200 files downloaded per month for 4 years now,
       and I think it is quite widely used."

       Adam's paper

        @article{ak-ijl,
            author = "Adam Kilgarriff", title = "Putting Frequencies into
            the Dictionary", journal = "International Journal of
            Lexicography", year = 1997, volume = 10, number = 2, pages =
            {135--155}
        }

       argues for the list and explains how it was done, and there's an
       on-line copy available from his Web page.

       Paul Rayson has been working on BNC and writes:

         I have been working on frequency lists for the second version of
         the BNC (POS tagging and file headers updated) and short versions
         of those lists will appear in

           Leech, G., Wilson, A., Rayson, P. (forthcoming). Word Frequencies
           in Spoken and Written English: based on the British National
           Corpus. Longman, London.

         Due to the size of the lists, we plan to make the longer versions
         available on the UCREL website later this year when the book is
         published.

         http://www.comp.lancs.ac.uk/ucrel/



    This archive was generated by hypermail 2b29 : Wed Oct 18 2000 - 14:55:02 MET DST