Corpora: Summary: Spanish Corpora

Eva Remberger (eremberg@spinfo.uni-koeln.de)
Fri, 25 Sep 1998 17:52:28 +0200 (MET DST)

Dear list members,

here are the results and a list of people who were so helpful to send me
suggestions and hints concerning my question posted to the linguist list
on friday 18th september.

My question was as follows:
>Dear list members,
>
>it's a while I'm looking for Spanish Corpora of business Spanish. Does
>anybody know if there are Spanish Newspapers on CD-ROM (eg. all the
issues
>of one year as it is possible for the german newspaper Sueddeutsche
>Zeitung)? I tried to contact EL PAIS but never received an answer.
>
>Actually, I would be interested in any kind of Corpus of contemporary
>Spanish (mainly european), - to buy or not to buy - but 'economia'-
>arguments would be even greater.
>
>Thank you for an answer. Of course, I will post a message with the
>results.

______________________________________________________________________

I want to thank:

Andreas Eisele
Antoine Consigny
Valerie Mapelli
Iain Downs
Purificacion Fdez-Nistal
Susana Sotelo Docio
Eva Easton
Leonel Ruiz Miyares
José Luis Sancho
M.M.W.Pollmann
Raphael Salkie
Rene' Schneider
______________________________________________________________________

The summary of the results:
_______________________________________________________________________

Among the commercial corpora there is ELRA
http://www.icp.grenet.fr/ELRA/cata/tabtext.html
they have an Multilingual corpus (MLCC) consisting of 6 European financial
newspapers (Het Financieele Dagblad, Handelsblatt, Financial Times, Le
Monde, Il Sole 24 Ore, Expansion); the spanish subcorpus (Expansion) has
about 10 million words (21.10.1991-24.10.91 and 14.5.94-27.12.94). The
entire corpus is available via ELRA at the following costs:
- For ELRA members for research use: 360 ECU
- For non members for research use: 750 ECU
----------------------------------------------------------------------
Another commercial publisher of research material and a provider
of newspapers on CD-ROM is Newsbanks: They offer Noticias en Espanol on
monthly CD-ROMs:
http://www.newsbank.com/schools/high/spanish.html
-----------------------------------------------------------------------
Yet another commercial service is ProQuest; they seem to have EL Norte and
Reforma (Mexico)
http://www.umi.com/hp/WhatWeDo.html
------------------------------------------------------------------------
There must be a CD-ROM edition of the 1994 volume of El Mundo (in to
disks); the text is in ASCII format and classified in categories (economy,
national, etc); I'm not sure if it is still available.
-------------------------------------------------------------------------
There is a link collection to Spanish online-newspapers at:
http://www.newslink.org/euspan.html
------------------------------------------------------------------------
There is a website about corpora-FAQs of the Language technology group
(the interesting one is the tool section I guess):
http://www.ltg.ed.ac.uk/helpdesk/faq/index.html#Texts0040
-----------------------------------------------------------------------
El Observatorio Español de Industrias de la Lengua, could be interesting;
it also has some more links:
http://www.cervantes.es/internet/acad/oeil/mar_oeil.htm (click on recursos
linguisticos)
---------------------------------------------------------------------------
There a several corpora available at the Department of Romance Languages
of the University of Goeteborg (Banco de datos de Prensa Espanola 1977,
Banco de Datos de Once Novelas Espanolas 1951-1971, A Concordance based on
the Corpus oral the referencia del Espanol contemporaneo.)
http://rom.gu.se/~romgb/Corpora.html
-----------------------------------------------------------------------
Professor Barry Ife, at School of Humanities, King's College / London
is reffered to be compiling a large corpus of modern Spanish.
barry.ife@kcl.ac.uk
--------------------------------------------------------------------------
Spanisch newspaper corpus that consists of 200 newspaper texts of
latinamerican newspapers on CD-ROM (Tiff and a ASCII Version). The corpus
includes 39.081 tokens and is available (to buy) at the
Information Science Research Institute / University of Nevada at Las Vegas
4505 Maryland Parkway
Las Vegas, Nevada 89154-4201
For information contact ISRI by
Phone: +1 702 895 - 3338
Fax: +1 702 895 -1560
E-mail: isri-info@isri.unlv.edu
---------------------------------------------------------------------------
At the University of Murcia there is the CUMBRE Corpus: Contact Prof.
Aquilino Sanchez: asanchez@fcu.um.es
--------------------------------------------------------------------------
The CRATER corpus consists of morphosyntactically tagged communication:
ftp.ling.lancs.ac.uk
---------------------------------------------------------------------------
Dr. Purificacion Fdez.- Nistal and the Instituto de Terminologia Bilingue
y Traduccion Especializada (ITBYTE) at the Universidad de Valladolid/Spain
are in the process of building their own corpus.
---------------------------------------------------------------------------
Ing. Leonel Ruiz Miyares (Director of Applied Linguistics Centre /
Santiago de Cuba) keeps a Spanish-corpus of children's vocabulary
(by the way, there is a European Spanish Corpus of child language, the
MARIA-Corpus: http://www.sis.ucm.es/Spanish/)
------------------------------------------------------------------------
The Lingua project (EU-funded project on multilingual concordancing:
but as far they have only English, French, German, Italian, Greek, Danish
texts - they are considering bringing in Spanish and portoghese:
http://www.loria.fr/equipes/dialogue/lingua
-----------------------------------------------------------------------

Thanks a lot Eva Remberger

-- 
________________________________________________________________________
				Sprachliche Informationsverarbeitung
Eva Maria Remberger		Philosophische Fakultaet
				Universitaet zu Koeln
				Albertus-Magnus-Platz
				D-50923 Koeln
------------------------------------------------------------------------
	Visit our web-site at:  http://www.spinfo.uni-koeln.de
________________________________________________________________________