Hullo Jenny! Merely to add that depending on the kind of phenomenon you are examining and the frequency, it is possible to normalise to a per ten-thousand figure too.
Thanks for the quick response from everybody, I have got the idea now.
Jenny
- ----- Original Message -----
- Subject: Re: [Corpora-List] "normalizing" frequencies for different-sized corpora
- From: eric@comp.leeds.ac.uk
- To: jenny@asian-emphasis.com
- Date: 12-09-2005 16:59
- Jenny,
- I may be missing something, but I think the way to find a per-thousand
- figure is simply:
- ( (freq of word) / (no of words in text) ) * 1000
- eg (200/4000) * 1000 = 50
- or (2646/55166) * 1000 = 48 (to nearest whole number)
- - of course it's up to you whether to round to nearest whole n7umber,
- or give the answer to 2 decimal palces (47.96) or some other level
- of accuracy; but since generally a text is only a sample or
- approximation of the language you are studying, it is sensible not to
- claim too much accuracy/significance.
- eric atwell
- On Mon, 12 Sep 2005, Jenny Eagleton wrote:
- > Hello Corpora and Statistics Experts,
- >
- > This is a very simple question for all the
- > corpora/statistics experts
- > out there, but this novice is not really
- > mathematically inclined. I
- > understand Biber's principle of "normalization,
- > however I am not sure
- > about how to calculate it. I want frequency counts
- > normalized per
- > 1,000 words of text. I can see how to do it if the
- > figures are even,
- > i.e. if I have a corpus of 4,000 words and a
- > frequency of 200, 
- > I would have a normalized figure of 50.
- >
- > But for mixed numbers, how would I calculate the
- > following: For
- > example if I have 2,646 instances of a certain
- > kind of noun in a
- > corpus of 55,166 how would I calculate the
- > normalized figure per
- > 1,000 words?
- >
- > Regards,
- >
- > Jenny
- > Research Assistant
- > Dept. of English & Communication
- > City University of Hong Kong
- >
- >
- >
- --
- Eric Atwell, Senior Lecturer, Language research group, School of Computing,
- Faculty of Engineering, University of Leeds, LEEDS LS2 9JT, England
- TEL: +44-113-2335430 FAX: +44-113-2335468 http://www.comp.leeds.ac.uk/eric
This archive was generated by hypermail 2b29 : Tue Sep 13 2005 - 05:28:48 MET DST