Re: [Corpora-List] English-French parallel corpus?

From: Chris Callison-Burch (callison-burch@ed.ac.uk)
Date: Thu Jan 19 2006 - 18:04:02 MET

  • Next message: Jorge Civera Saiz: "Re: [Corpora-List] English-French parallel corpus?"

    Dear Oana,

    You might consider constructing a parallel corpus of French novels
    and their translations into English using public domain texts from
    Project Gutenberg. As I see it, there are two advantages of doing
    this. Firstly, the text would be quite different from the
    parliamentary domain represented by the Canadian Hansard and
    Europarl. Secondly, novels often have multiple translations, which
    you could potentially use with automatic MT evaluation metrics that
    take advantage of multiple reference translations.

    Here's an example to get you started:

    Madame Bovary in the original French:
            http://www.gutenberg.org/files/14155/14155-8.txt

    Translated into English:
            http://www.gutenberg.org/dirs/etext00/mbova11.txt

    Also, here are two additional English translations that Regina
    Barzilay used in her PhD thesis on paraphrasing with monolingual
    parallel corpora:
            http://people.csail.mit.edu/regina/par/bovary1.txt
            http://people.csail.mit.edu/regina/par/bovary3.txt

    Yours,
    Chris Callison-Burch

    On Jan 19, 2006, at 4:10 PM, ofrun083@uottawa.ca wrote:

    >
    >
    > Hello All,
    >
    > My name is Oana, and i am a Msc. student at University of Ottawa
    > working
    > in the field of NLP and ML.
    >
    > I am currently working on project for French and English, and i am
    > looking for a parallel corpus, other than Hansard and EuroParl. I am
    > interested in a parallel text that contains other domains, any,
    > than the
    > ones of Hansard and EuroParl.
    >
    > Thank you for your help,
    > Oana
    >



    This archive was generated by hypermail 2b29 : Thu Jan 19 2006 - 18:45:40 MET