Re: Corpora: corpora: idiosyncracy/typicality

Marco Antonio Esteves da Rocha (marcor@cce.ufsc.br)
Fri, 3 Dec 1999 09:32:52 -0600 (CST)

On Fri, 3 Dec 1999, David Carlson wrote:

> I recall reading about a study of the work that went into the compilation of
> the OED, and about how the attention of the compilers was attracted more
> toward idosyncratic usage rather than toward examples of typical use. I
> can't recall where I read this, however.
> Does anyone have more information (either about the OED-related study, or
> about other documented examples)?

The specific mention, as I remember, refers to the attention of the
hundreds of readers who volunteered to help in the collection of citations
for the OED, which were sent to Murray from all over England and abroad.
Murray complains:

The editor or his assistants have to search for precious hours for
examples of common words, which readers passed by...Thus, of ABUSION we
found in the slips about 50 instances; of ABUSE not five. (James Augustus
Henry Murray, Presidential Address, Philological Society Transactions
1877-9, pp.571-2, quoted by Murray, K. 1977, Caught in the Web of
Words: James Murray and the Oxford English Dictionary, Yale
University Press, p.178).

This was quoted by Church and Mercer (1993) in the Introduction to the
Special Issue on Computational Linguistics Using Large Corpora,
Computational Linguistics, Volume 19, Number 1, p.17, in the section where
the authors compare citation indexes to modern corpora. The quote is
followed by an analysis of sampling problems often found in citation
indexes, which are `skewed away from the "central and typical" facts of
language' that every speaker is expected to know'.

In Francis' contribution to Directions in Corpus Linguistics (1992), Jan
Svartvik (ed.), called Language Corpora B.C., there is a more detailed
discussion of the methods used for the collection of the citation index
used in the elaboration of the OED. Further information about these
methods and the collection process can be found in Murray (1977),
mentioned above.

Marco Rocha
marcor@cce.ufsc.br