Tony,
> > Btw, have you (or anyone else) done a proper word count of the
> > corpus? (the
> > RC distributors told me they hadn't) -- Using MP2.2 would of course be a
> > solution to that problem since it does a word count whenever you load a
> > corpus anyway.
>
> FYI you can find lots more statistics on the corpus at:
>
> http://about.reuters.com/researchandstandards/corpus/statistics/index.asp
Yes, I've seen the statistics on the Reuters pages, thanks. You offer a lot
of diagrams on interesting features like distribution of stories across days
or POS distribution, but unfortunately there is no word/token count of the
entire corpus (or maybe I missed that information). Maybe somebody else has
done such a word count?
Best... Ute
This archive was generated by hypermail 2b29 : Fri Jun 11 2004 - 18:50:16 MET DST