Re: Re: [Corpora-List] "normalizing" frequencies for different-sized corpora

From: Jenny Eagleton (jenny@asian-emphasis.com)
Date: Mon Sep 12 2005 - 11:04:05 MET DST

Next message: Eric Atwell: "Re: [Corpora-List] "normalizing" frequencies for different-sized corpora"

Previous message: Jenny Eagleton: "[Corpora-List] "normalizing" frequencies for different-sized corpora"
Next in thread: Peter K Tan: "Re: Re: [Corpora-List] "normalizing" frequencies for different-sized corpora"
Reply: Peter K Tan: "Re: Re: [Corpora-List] "normalizing" frequencies for different-sized corpora"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Thanks for the quick response from everybody, I
have got the idea now.

Jenny
----- Original Message -----
SUBJECT: Re: [Corpora-List] "normalizing"
frequencies for
different-sized corpora
FROM: eric@comp.leeds.ac.uk
TO: jenny@asian-emphasis.com
DATE: 12-09-2005 16:59
Jenny,

I may be missing something, but I think the way to
find a
per-thousand
figure is simply:
( (freq of word) / (no of words in text) ) * 1000

eg (200/4000) * 1000 = 50

or (2646/55166) * 1000 = 48 (to nearest whole
number)

- of course it's up to you whether to round to
nearest whole
n7umber,
or give the answer to 2 decimal palces (47.96)
or some other
level
of accuracy; but since generally a text is only a
sample or
approximation of the language you are studying, it
is sensible not to
claim too much accuracy/significance.

eric atwell
On Mon, 12 Sep 2005, Jenny Eagleton wrote:

> Hello Corpora and Statistics Experts,
>
> This is a very simple question for all the
> corpora/statistics experts
> out there, but this novice is not really
> mathematically inclined. I
> understand Biber's principle of "normalization,
> however I am not sure
> about how to calculate it. I want frequency
counts
> normalized per
> 1,000 words of text. I can see how to do it if
the
> figures are even,
> i.e. if I have a corpus of 4,000 words and a
> frequency of 200, 
> I would have a normalized figure of 50.
>
> But for mixed numbers, how would I calculate the
> following: For
> example if I have 2,646 instances of a certain
> kind of noun in a
> corpus of 55,166 how would I calculate the
> normalized figure per
> 1,000 words?
>
> Regards,
>
> Jenny
> Research Assistant
> Dept. of English & Communication
> City University of Hong Kong
>
>
>

-- 
Eric Atwell, Senior Lecturer, Language research
group, School of
Computing, 
Faculty of Engineering, University of Leeds, LEEDS
LS2 9JT, England
TEL: +44-113-2335430  FAX: +44-113-2335468 
http://www.comp.leeds.ac.uk/eric

Next message: Eric Atwell: "Re: [Corpora-List] "normalizing" frequencies for different-sized corpora"
Previous message: Jenny Eagleton: "[Corpora-List] "normalizing" frequencies for different-sized corpora"
Next in thread: Peter K Tan: "Re: Re: [Corpora-List] "normalizing" frequencies for different-sized corpora"
Reply: Peter K Tan: "Re: Re: [Corpora-List] "normalizing" frequencies for different-sized corpora"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Mon Sep 12 2005 - 11:12:23 MET DST