[Corpora-List] Comparing learner frequencies with native frequencies

From: Dominic Glennon (dglennon@cambridge.org)
Date: Mon Mar 06 2006 - 12:09:20 MET

Next message: Claus Pusch: "[Corpora-List] Conference announcement"

Previous message: Geoffrey Sampson: "[Corpora-List] 'Standard European English' ?"
Next in thread: Adam Kilgarriff: "RE: [Corpora-List] Comparing learner frequencies with native frequencies"
Reply: Adam Kilgarriff: "RE: [Corpora-List] Comparing learner frequencies with native frequencies"
Reply: Rayson, Paul: "RE: [Corpora-List] Comparing learner frequencies with native frequencies"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Dear corporistas,

I'm trying to compare word frequencies in our native speaker corpus and
our learner corpus. Having normalised the frequencies in both corpora to
frequencies per 10 million words, a simple subtraction still heavily skews
the results towards high-frequency words. I've tried taking the log of
both normalised frequencies before subtracting to get around the Zipfian
nature of word frequency distribution - this gives better results, but is
it well-motivated? I'd be grateful for any help you could give me, or any
pointers to previous work done in this area. Many thanks,

Dom

Dominic Glennon
Systems Manager
Cambridge University Press
01223 325595

Search the web's favourite learner dictionaries for free at Cambridge
Dictionaries Online:
<http://dictionary.cambridge.org>

Next message: Claus Pusch: "[Corpora-List] Conference announcement"
Previous message: Geoffrey Sampson: "[Corpora-List] 'Standard European English' ?"
Next in thread: Adam Kilgarriff: "RE: [Corpora-List] Comparing learner frequencies with native frequencies"
Reply: Adam Kilgarriff: "RE: [Corpora-List] Comparing learner frequencies with native frequencies"
Reply: Rayson, Paul: "RE: [Corpora-List] Comparing learner frequencies with native frequencies"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Mon Mar 06 2006 - 22:33:49 MET