[Corpora-List] Summary: Corpus of translated material

From: Nomi Guthmann (nomi.guthmann@googlemail.com)
Date: Thu Mar 08 2007 - 13:46:33 MET

  • Next message: Chapman, Wendy: "[Corpora-List] Postdoc in NLP and Biomedical Informatics at the University of Pi ttsburgh"

    Dear corpora list members,

    Here is the summary of the various responses on corpora of translated
    material (the main requirement was to know the source language of the
    translations) :

    The EUROPARL corpus
    http://people.csail.mit.edu/koehn/publications/europarl/
    In its current form, it does not include information of the source
    language of the various texts, but I was told that its next release
    will.

    The English-Estonian and Estonian-English parallel corpus :
    http://www.cl.ut.ee/korpused/paralleel/index.php?lang=en
    It includes Estonian laws and EU legislation, and their translation.

    The INTERSECT corpus
    http://www.brighton.ac.uk/languages/contact/academicstaff/intersect.html
    It includes English-French, English-German translations in several domains.

    The COMPARA corpus
    http://www.linguateca.pt/COMPARA/Welcome.html
    It includes English and Portuguese bi-directional parallel texts.

    The OPUS corpus
    http://logos.uio.no/opus/
    It is an open source parallel corpus in several languages.
    Jörg Tiedemann also has a corpus of aligned movie subtitles, available
    for research purposes only.

    The TEC corpus
    http://www.llc.manchester.ac.uk/Research/Centres/CentreforTranslationandInterculturalStudies/ResearchProgrammesPhDMPhil/TranslationEnglishCorpus/
    A large corpus of translated English.

    The Bible corpus
    http://www.umiacs.umd.edu/~resnik/parallel/bible.html

    Corina Forascu has a section of the TimeBank 1.2 (English) corpus
    translated into Romanian.

    JRC-Acquis multilingual parallel corpus
    http://langtech.jrc.it/JRC-Acquis.html
    A parallel corpus in several languages. The source languages in this
    corpus are unknown.

    The CroCo project
    http://fr46.uni-saarland.de/croco
    Corpus of German and English translations. The corpus is not available
    for copyright reasons.

    Many thanks for responses:
    Chris Callison-Burch
    Israel Cohen
    Corina Forascu
    Ana Frankenberg-Garcia
    Hieu Hoang
    Heiki Kaalep
    Andrea Mulloni
    Stella Neumann
    Sebastian Padó
    Raphael Salkie
    Armin Schmidt
    Harold Somers
    Ralf Steinberger
    Jörg Tiedemann

    Noemie Guthmann
    Translation and Interpreting Studies Department
    Bar Ilan University



    This archive was generated by hypermail 2b29 : Thu Mar 08 2007 - 13:44:22 MET