Imagine a tagged corpus of a particular language with N words each tagged
with one of |S| tags
from a set S. If need be, let the relative frequency of each tag in the
corpus be p(s) for an s in S.
Now say that I have a hypothesis about the ordering of words in a
particular construction in the language. For example, say I believe that
between a and d (members of S) the tags b and c (also members of S) only
ever appear in one order. ie [a b c d] is fine *[a c b d] is not.
If a count on the corpus reveals that my proposed ordering appears n
times and that competing orders never appear, how certain (as a function
of N, n, p and whatever else) can I be that my hypothesis is correct?
James K. Tauber <jtauber@tartarus.uwa.edu.au> currently at ALS 95
University Computing Services and Centre for Linguistics
University of Western Australia, Perth, AUSTRALIA
http://www.uwa.edu.au/student/jtauber finger for PGP key