> Date: Sat, 23 Oct 2004 14:36:47 +0400 (MSD)
> From: "P bI K O B___ B.B. (MOCKBA)" <rykov@narod.ru>
>
> I am looking for super large Russian corpus to use in my research project.
> Corpus doesn’t require any tagging, it can be Russian text only.
http://lib.ru/ claims to have close to 5Gb of Russian-language text, multiple
genres, sources, etc.
a substantial part of it is OCR'ed, and consequently some pieces exhibit
problems, such as end-of-page hyphenation. so you may have to do some quality
control, depending on your needs.
-- Roman Yangarber ______________________________ __________________________________________ Research Assistant Professor voice +1 (212) 998-3264 Department of Computer Science fax +1 (212) 995-4123 Courant Institute of Mathematical Sciences New York University roman@cs.nyu.edu 715 Broadway, 7th Floor www.cs.nyu.edu/roman New York, NY 10003-6806 ______________________________ __________________________________________ mobile: +358 50 4668 383 in Finland ______________________________ __________________________________________
This archive was generated by hypermail 2b29 : Sat Oct 23 2004 - 22:31:55 MET DST