Re: Corpora: Statistics in genre differences

Ted E. Dunning (ted@aptex.com)
Mon, 22 Mar 1999 11:34:25 -0800 (PST)

This isn't quite what you are talking about, but it makes a similar
point.

At a cocktail party level of detail, about one third of running text
in newswires consists of words which occur less than 20 in a million.
There are so many caveats to this statement that it is nearly
worthless except as an illustration that "rare" words are really quite
common in text.

>>>>> "jf" == James L Fidelholtz <jfidel@siu.buap.mx> writes:

jf> ... By the way, there are not very many words which occur
jf> 2-5/K [I don't have any counts here, so I can't check the
jf> numbers, but it's certainly not over a couple of hundred].