Corpora: corpus design info

Vladimir Rykov (
Fri, 15 Jan 99 10:47:10 +0300

Dear List members !

I got a wise advice to send the answers I got to my
"corpus degign appeal" through the list.

This is the first part of my "Rykov list"


Here are some (all) answers I got after my "corpus design" appeal
to the list before 2 Jan 99. I hope these references will help you.

The additions are WELCOME!

The books mentioned below:

Gerhard Leitner, Corpus design - problems and suggested solutions. In
Leitner (ed.), <italic>New directions in English Language
Corpora</italic>, Berlin & New York 1992

Douglas Biber, Representativeness in corpus design. <italic>Literary and
Linguistic Computing </italic>8, 4:243-257

Greame Kennedy, <italic>An Introduction to Corpus Linguistics</italic>,
London & New York: Longman 1998, pp 70-5.



From: Arne Fitschen <>

I'm not quite sure what exactly you're looking for, but I can give you three links, the first about
corpus linguistics in general

the other two more specific: the Text Encoding Initiative (TEI)

and the Corpus Encoding Standard (CES)

Hope that helps,
Arne Fitschen



Dear Vladimir,

Here are some links I have found.

You may have to dig a bit but they should be useful.



Lernout & Hauspie Speech Products ~ USA
Language Modeling Scientist
Tel. 781-203-5296


If it is possible
- I'd like to have the X copies of the pages mentioned above
(in Xerox copies - or in MS DOS PC readable form)?

they are not available here.

Actually I need only pages that refer to the "corpus design"

The info found in Internet is welcome as well - we have 386 PC (gift of Soros)
with e-mail only here

Vladimir Rykov

YS Vladimir Rykov, PhD in Computational Linguistics Linguistic Institute
WWW.GOL.RU/~iling 1/12 B.Kislovsky per., Moscow, 103009