Re: Corpora: A little late: Size of representative corpus

Tony Berber Sardinha (tony4@uol.com.br)
Thu, 27 Aug 1998 22:36:14 -0300

> I also assume that a 15 year old has acquired a high level of competency
> with some specialisation. Assuming the childs exposure rate has gone up
to
> 200 words a minute for 14 hours a day (this allows for more chatting, TV
> and books - perhaps an underestimate).
>
> This adds around 500 million words to our corpus!

Same line of reasoning as Guy Cook whose assessment is 300 million words
for a teenager:

Cook, Guy. "The uses of reality: A reply to Ronald Carter." ELT Journal 52
(1998): 57-63, p.59

Basically he argues that most corpora are inadequate as they are usually
smaller than that figure.

tony.
------------------------------------------------------------------------
Dr Tony Berber Sardinha
Catholic University of Sao Paulo, Brazil
tony4@uol.com.br
http://sites.uol.com.br/tony4/homepage.html
http://www.liv.ac.uk/~tony1/homepage.html
http://www.liv.ac.uk/~tony1/corpus.html
http://members.wbs.net/homepages/c/o/r/corpuslinguistics.html
------------------------------------------------------------------------
----------
> From: Iain Downs <idowns@dircon.co.uk>
> To: 'CORPORA@hd.uib.no'
> Subject: Corpora: A little late: Size of representative corpus
> Date: 27 August 1998 05:19
>
> A little late, I'm afraid, but another perspective on a 'representative
> corpus'. This time from the perspective of learning English.
>
> I should say that I have NO formal knowledge of this subject so I look
> forward to corrections!
>
> I assume that a child has learnt english to tolerable competance by the
age
> of 5.
>
> I assume that that child has been exposed to 100 words a minute, 10 hours
a
> day and 350 days a year.
>
> That is (ONLY!!) 21 milion words.
>
> I also assume that a 15 year old has acquired a high level of competency
> with some specialisation. Assuming the childs exposure rate has gone up
to
> 200 words a minute for 14 hours a day (this allows for more chatting, TV
> and books - perhaps an underestimate).
>
> This adds around 500 million words to our corpus!
>
> These figures, however, do not allow for the 'external' stimuli in
learning
> (Oh THATS a 'mummy'), nor the role of prosody and the like but also
assume
> no external expertise (Lexicographers and Corpus linguists!).
>
> However, perhaps this sort of analyis can at least put some bounds on the

> necessary size of a Corpus to be ABLE to learn a language.
>
> Any thoughts?
>
> Iain