> You will notice that the theory and experiment at 50,000 words are
> out by a factor of less than 2 - not bad, eh?
> They're worse at the larger numbers - perhpas Zipf didn't have the
> benefit of large computers for his word counting and the 1/n rule is
> a poor approximation at large numbers!
Granted that Zipf didn't have the benefits of large computers (or
even, presumably, large corpora) when he formulated his laws.
Nevertheless, I do believe that he tested it on a large amount (for
his time) of data.
The times that I have tested data against Zipf's laws, the agreement
has been fairly good.
A very interesting Web page on this topic is the following:
http://sun1.bham.ac.uk/G.Landini/evmt/zipf.htm
The page is about applying Zipf's laws to the Voynich manuscript, but
it has a very good description of Zipf's laws, and several references
concerning modifications to the laws to make the more closely model
the data.
--Chris
------- end -------
christopher m. hogan language technologies institute
chogan@cs.cmu.edu carnegie mellon university
http://www.cs.cmu.edu/~chogan pittsburgh, pa