MULTEXT-East V3: http://nl.ijs.si/ME/V3/
MULTEXT-East resources are a multilingual dataset for language
engineering research and development. This dataset contains, for
Bulgarian, Croatian, Czech, English, Estonian, Hungarian, Lithuanian,
Resian, Romanian, Russian, Serbian, and Slovene, some or all of the
following resources:
- MULTEXT-East morphosyntactic specifications (free)
- MULTEXT-East morphosyntactic lexica (licence)
- MULTEXT-East morphosyntactically annotated "1984" corpus (licence)
- MULTEXT-East comparable corpus (licence)
- MULTEXT-East parallel speech corpus (free)
- and associated documentation (free).
The resources comply with the EAGLES and TEI recommendations and are
freely available for research use - to get access to the licenced
resources, you need to fill out and submit the on-line licence.
What's new in this edition?
- all corpora now encoded in XML TEI P4
- joins together the resources from Version 1 (1998) and Version 2 (2002)
- adds Serbian annotated "1984" and Resian morphosyntactic specifications
- an updated bibliography
- many errors from previous versions corrected
- and probably some new ones introduced...
Hope you find them useful!
-- Tomaž Erjavec | Dept. of Knowledge Technologies email: tomaz.erjavec@ijs.si | Jozef Stefan Institute www: http://nl.ijs.si/et/ | Jamova 39, SI-1000, Ljubljana fax: (+386 1) 4251 038 | Slovenia
This archive was generated by hypermail 2b29 : Wed Jun 30 2004 - 17:19:33 MET DST