[Corpora-List] "normalizing" frequencies for different-sized corpora

From: Jenny Eagleton (jenny@asian-emphasis.com)
Date: Mon Sep 12 2005 - 10:08:35 MET DST

  • Next message: Jenny Eagleton: "Re: Re: [Corpora-List] "normalizing" frequencies for different-sized corpora"

    Hello Corpora and Statistics Experts,

    This is a very simple question for all the
    corpora/statistics experts
    out there, but this novice is not really
    mathematically inclined. I
    understand Biber's principle of "normalization,
    however I am not sure
    about how to calculate it. I want frequency counts
    normalized per
    1,000 words of text. I can see how to do it if the
    figures are even,
    i.e. if I have a corpus of 4,000 words and a
    frequency of 200, 
    I would have a normalized figure of 50.

    But for mixed numbers, how would I calculate the
    following: For
    example if I have 2,646 instances of a certain
    kind of noun in a
    corpus of 55,166 how would I calculate the
    normalized figure per
    1,000 words?

    Regards,

    Jenny
    Research Assistant
    Dept. of English & Communication
    City University of Hong Kong



    This archive was generated by hypermail 2b29 : Mon Sep 12 2005 - 10:19:20 MET DST