Corpus analysis resources for Spanish

M. Paula S., Inst. de Lexicografia (paula@crea.rae.es)
Thu, 13 Jun 1996 14:22:30 +0200 (DFT)

Dear colleagues:

The Computational Linguistics Department of the Instituto de
Lexicograf'ia of the Real Academia Espa'nola has recently undertaken a
search of various corpus analysis resources from simple tokenizers to
lemmatizers, specifically developed for Spanish, or, at least,
language independent. We are mostly interested in shareware resources,
but will also thank any information about commercially available ones.

The modules we are interested in are:

1) Tokenizers: capable of dealing with clitization,
preposition-article contraction, multiword units and so on.

2) Taggers, of any type (rule-based, stochastic..., language
dependent for Spanish, or language independent).

3) Lemmatizers.

4) Lexica, MRD/MTD, or any collection of spanish words, and also any
kind of utilities for the creation and maintenance of lexica.

Any submodule or valuable material to complete any of the
previous modules will also be welcome.

Please send responses directly to

paula@crea.rae.es

We will post a summary. Thank you in advance.

----------------------------------------------------------

Maria Paula Santalla del Rio
Dpto. de Linguistica Computacional
Instituto de Lexicografia
Real Academia Espanhola
Felipe IV, 4
Madrid 28071

Tel. 34-1-4201614, ext. 147
Fax. 34-1-4200079
e-mail: paula@crea.rae.es

-----------------------------------------------------------