Given a language with a vocabulary of W words, what is a rough
approximation of the number of well-formed sentences of length N or
smaller? (Well-formedness is determined by any reasonably complete
grammar of the language of your chosing.)
Clearly the number of grammatical sentences is much smaller than
W^(N+1). What is a better approximation? Does anyone have an
empirical method for estimating this number using corpus-based
techniques?
---Stephen B. Johnson, Ph.D. -Associate Professor -Department of Medical Informatics -Columbia University