Dear corporistas,
I'm trying to compare word frequencies in our native speaker corpus and
our learner corpus. Having normalised the frequencies in both corpora to
frequencies per 10 million words, a simple subtraction still heavily skews
the results towards high-frequency words. I've tried taking the log of
both normalised frequencies before subtracting to get around the Zipfian
nature of word frequency distribution - this gives better results, but is
it well-motivated? I'd be grateful for any help you could give me, or any
pointers to previous work done in this area. Many thanks,
Dom
Dominic Glennon
Systems Manager
Cambridge University Press
01223 325595
Search the web's favourite learner dictionaries for free at Cambridge
Dictionaries Online:
<http://dictionary.cambridge.org>
This archive was generated by hypermail 2b29 : Mon Mar 06 2006 - 22:33:49 MET