Re: Corpora: balance

Adam Kilgarriff (Adam.Kilgarriff@itri.brighton.ac.uk)
Tue, 13 Oct 1998 15:16:16 +0100

Vladimir Rykov asked:

> There is a term often used in CL - "balanced corpus". Can anybody
> tell or point me - is (if) this term strictly defined

-- emphatically not

> Does the definition "balanced corpus" include the properties
> ("balance" of them) of the inner structure of a corpus - or there is a
> balance of user's demands to its contents?

You can call a corpus <b>balanced</b> if it includes a range of the
different text types of the language, with their proportions of the
corpus reflecting, in some more-or-less principled way, their levels
of use in the language community at large.

See the BNC Manual, section 2.3, 3.1 --
http://info.ox.ac.uk/bnc/what/index.html

> ---
> YS Vladimir Rykov, PhD in Computational Linguistics
> OUR INSTITUTE WEB PAGE: Linguistic Institute
> WWW.GOL.RU/~iling 1/12 B.Kislovsky per., Moscow, 103009
> M_M_M_M_M_M_M_M_M_M_M_M KREMLIN WALL IS WHERE YOU MAKE IT !
>
>

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Adam Kilgarriff
Senior Research Fellow tel: (44) 1273 642919
Information Technology Research Institute (44) 1273 642900
University of Brighton fax: (44) 1273 642908
Lewes Road
Brighton BN2 4GJ email: Adam.Kilgarriff@itri.bton.ac.uk
UK http://www.itri.bton.ac.uk/~Adam.Kilgarriff
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%