Corpus analysis resources for Spanish

J.L. Sancho, INSTITUTO DE LEXICOGRAFIA (sancho@crea.rae.es)
Mon, 1 Jul 1996 14:10:31 +0200 (DFT)

Dear all:

A while back my colleague Maria Paula Santalla and I (Jose Luis
Sancho) posted an enquiry about corpus analysis resources for Spanish.
The following is a summary of what we have been referred to. We would
like to thank for their kind responses (order irrelevant): Max Louwerse,
Mike Scott, Carlos Subirats, Ken Litkowski, Jean V'eronis, Yorick Wilks,
Sandro Pedrazzini, John Aberdeen, Ana Mart'inez, Nuno Miguel Cavalheiro
Marques and Ken Beesley. This list exhausts our 'inbox'; therefore, we beg
anyone else who responded and is not mentioned above to forgive us (or our
server); In that case, retry, please. Note that the enquiry was posted in
various lists, hence information not necessarily coming from this list
may be quoted bellow. We apologize for any multiplicities.

##Max Louwerse (<M.M.Louwerse@stud.let.ruu.nl>) told us about the Qualrs-lst
on which a lot of tag-software has been discussed. As for software, he
mentioned NUDIST (Sage Publishers) and Notabene, whose homepage is

http://sls-www.lcs.mit.edu/~flammia/Nb.html and
ftp://sls-www.lcs.mit.edu/pub/flammia/Nb."

You can also email to Giovanni Flammia (flammia@mit.edu).

##Mike Scott (<ms2928@ac.uk>) suggested

http://www.liv.ac.uk/~ms2928/wordsmit.html

This accesses WordSmith Tools (Oxford Univ. Press 1996).

##Carlos Subirats (<lali1@uab.es>) pointed to a 'Etiquetador y
desambiguizador del espanol', developed by the Laboratorio de Linguistica
Informatica de la Universidad Autonoma de Barcelona. The address provided is

Carlos Subirats Ruggeberg
Universidad Autonoma de Barcelona
Laboratorio de Linguistica Informatica
Edificio B
08193 Bellaterra, Spain

e-mail: c.subirats@oasis.uab.es
e-mail: c.subirats@cc.uab.es
Fax: (343)-581-16-86
Tel: (343)-581-22-29

##Ken Litkowski <71520.307@CompuServe.COM> directed us to some dictionary
utilities for creating and maintaining lexica. A description of this
software is available at

http://www.clres.com

##Jean V'eronis (<veronis@univ-aix.fr>) suggested a look at

http://www.lpl.univ-aix.fr/projects/multext/

and contacting Nuria Bel (nuria@gilcub.es).

##Yorick Wilks (<yorick@dcs.shef.ac.uk>) pointed to david@crl.nmsu.edu

##Sandro Pedraziini (<sandro@idsia.ch>) pointed to a system with wich you
can not only create and maintain lexica, but you can use it to generate
different forms of taggers, lemmatizers. A description of it can be found at

http://www.ifi.unibas.ch/grudo/grudo.html
http://www.idsia.ch/wordmanager.html

##John Aberdeen (<aberdeen@mitre.org>) mentioned a fast part of speech
tagger, based on Eric Brill's notion of tranformation based error driven
learning.

##Ana Mart'inez (<sysnet@bitmailer.net>) mentioned MABLe, a 'multilingual
letter authoring tool'.

##Nuno Miguel Cavalheiro Marques (<nmm@di.fct.unl.pt>) brought to our
attention two POS taggers, one using Viterbi tagging and HMM
and the other using Neural Networks. You can find a short review of
this work at

http://www-ia.di.fct.unl.pt/~nmm
http://www-ia.di.fct.unl.pt/~glint/Glint

There you can also access an article about POLARIS:a morphological lexical
acquisition and retrieval data base system. Contact with Gabriel Lopes
(gpl@fct.unl.pt) was also suggested.

##Ken Beesley (<Ken.Beesley@Grenoble.RXRC.Xerox.com>) noted that the Rank
Xerox Research Centre in Grenoble France has developed systems for
tokenization (word/term division) morphological analysis (for syntax, or,
less detailed, for tagging) part-of-speech "guesser" (for words not found
by the morphological analysis) tagging (based on an HMM tagger, trained on
a corpus) for Spanish. You can experiment with the morphological analysis
and tagger on

http://www.xerox.fr/grenoble/mltt/home.html

Thank you very much again. See you on the net

Jose Luis Sancho Maria Paula Santalla
sancho@crea.rae.es santalla@crea.rae.es