[Corpora-List] Comparing learner frequencies with native frequencies

From: Dominic Glennon (dglennon@cambridge.org)
Date: Mon Mar 06 2006 - 12:09:20 MET

  • Next message: Claus Pusch: "[Corpora-List] Conference announcement"

    Dear corporistas,

    I'm trying to compare word frequencies in our native speaker corpus and
    our learner corpus. Having normalised the frequencies in both corpora to
    frequencies per 10 million words, a simple subtraction still heavily skews
    the results towards high-frequency words. I've tried taking the log of
    both normalised frequencies before subtracting to get around the Zipfian
    nature of word frequency distribution - this gives better results, but is
    it well-motivated? I'd be grateful for any help you could give me, or any
    pointers to previous work done in this area. Many thanks,

    Dom

    Dominic Glennon
    Systems Manager
    Cambridge University Press
    01223 325595

    Search the web's favourite learner dictionaries for free at Cambridge
    Dictionaries Online:
    <http://dictionary.cambridge.org>



    This archive was generated by hypermail 2b29 : Mon Mar 06 2006 - 22:33:49 MET