Re: Corpora: history of corpora

Varadi Tamas (varadi@nytud.hu)
Fri, 4 Dec 1998 11:46:02 +0100 ()

On Tue, 1 Dec 1998, Bill Fisher wrote:

|On Dec 1, 2:29pm, Oliver Mason wrote:
|
|...
|> .... A corpus is a special collection of textual material
|> collected according to a certain set of criteria, like the BNC or the
|> BoE, or Brown, COLT, Flob, LOB, whatever. They all made decisions
|> about the composition of their data in advance and selected it
|> accordingly.
|...
|> ... I am worried that the term `corpus' gets watered down too
|> much it is basically used the same way as `archive'. An archive is
|> less focussed on doing things with its data, and mainly concerned with
|> storage, archival, and retrieval of its elements.
|...
|
| "Corpus" has an older and more general use which is captured
|very well by your definition of "archive". Why don't we just
|go by dictionary definitions?

Perhaps because dictionary definitions have become slightly out of touch
with the way the term 'corpus' has come to be used by the majority of the
practitioners of the field?

If there is a discrepancy between the dictionary definition and actual
usage which has arisen through concensus (note how I am trying not to
condone any frivolous flouting of accepted usage codified in dictionaries)
then, surely, the onus is on lexicographers to bring the dictionary up to
date.

Here are 2 relevant ones, from
|Webster's 3rd Unabridged:
|
| "3a: the whole body or total amount of writings of a particular
| particular kind or on a particular subject (as the total
| production of a writer or the whole literature of a subject)
| ...
| b: a collection or body esp. of knowledge or evidence;
| specif : the collection of recorded utterances that is used
| as a basis for the descriptive analyeis of a language or dialect"

| Other standard dictionaries have similar definitions. Note
|that there is no reference to criteria for selection, or on
|uniformity of storage and retrieval. I think you're trying
|to water up the term 'corpus' unnecessarily.

As to how 'unnecessarily' see above.

'Water up?' If anything, Oliver's original point seemed to argue for a
specific interpretation of the term 'corpus' as against a looser one.
True, the definition he is concerned with is not (yet) recognized by the
standard dictionaries you refer to (and hence they need to be updated)
but that does not invalidate the argument for the distinction between
'corpus' and 'archive' he was making.

Best,
TV

****************************************************************************
Tamas Varadi email: varadi@nytud.hu
Linguistics Institute mail: Budapest PO.Box 19 H-1250
Hungarian Academy of Sciences Tel: 361 1758011x243 Fax: 3612122050