Sorry if this is a dumb question: for a student project, we would like
to get the following stats based on the BNC:
(1) frequency (or probability) of all trigrams
(2) co-occurrence stats for all word pairs (NOT bigrams, note) based on
co-occurrence within the same sentence
I assume that this is easy to compute, though time-consuming; and of
course I understand that the data will be relatively sparse.
So my question is, is this data available somewhere, e.g. someone has
already done it; OR: what is the easiest ay to do it?
Harold Somers
This archive was generated by hypermail 2b29 : Fri Mar 17 2006 - 11:43:14 MET