>(a) assuming that a dictionary entry is analogous to a type;
>(b) dictionary x is comprehensive
>(c) dictionary x has 100,000 entries
>(d) a majority is 1/2 + 1
>A representative corpus would need to have as many tokens
>as necessary to include 50,001 types.'
I would rather argue:
of these 100000 types, 20000 make up 80% of the corpora
from where the dictionary was taken;
therefore, a corpus encompassing "most of" these 20000
types can be considered to model the original corpus in
a representative way.
Any better solutions?
Best regards
Hellfried Sabathy