Re: Corpora: Corpus Linguistics User Needs

Paul Rayson (paul@comp.lancs.ac.uk)
Thu, 30 Jul 1998 11:57:42 +0100 (BST)

Oliver and Ylva,

You might want to look at a chapter Tony McEnery and I wrote:

A Corpus/Annotation Toolbox. pp. 194 - 208. In Garside, R., Leech, G., and
McEnery, A. (eds.) (1997) Corpus Annotation: Linguistic Information from
Computer Text Corpora Longman, London.

We focus particularly on software which is 'annotation aware' and divide it into
three categories:

1. Corpus development (the input of annotation information into a corpus):
(a) Text encoding
(b) Annotation
(c) Encoding of annotation
2. Corpus editing (changing annotation information in a corpus):
(d) Correction (including correction of annotations)
(e) Disambiguation of annotations
(f) Conversion/transduction of annotations
3. Extraction of information (the output of annotation information from a
corpus, whether raw or annotated):
(g) Concordancing
(h) Frequency analysis
(i) Input to lexicons, grammars, etc.
(j) Information retrieval
(k) Bilingual/multilingual variants of (g)-(j)

Regards,
Paul.