Re: [Corpora-List] BILINGUAL PARALLEL CORPORA

From: Olivier Kraif (olivier.kraif@tele2.fr)
Date: Tue Nov 14 2006 - 21:53:11 MET

  • Next message: Djoerd Hiemstra: "[Corpora-List] SIGIR 2007 Conference in Amsterdam: Call for Papers"

    Dear J.L.,

    another parallel corpus concerning latin languages is available on the
    website of the Carmel Project (www.projetcarmel.org) : these are travel
    stories of the 19th and early 20th century (Darwin, Loti, Stendhal,
    Flaubert, Dickens, London, etc.), translated in English, French, Italian
    and Spanish.
    The corpus can be queried online, but it is also possible to download
    some texts (all the original texts, and some translations that are old
    enough !), with the alignment files. The texts are pos-tagged...
    The website is still in construction and some data are not yet available
    : I think it should be complete within one month.
    You can find a review of links on my page :
    http://w3.u-grenoble3.fr/kraif/index.php?option=com_content&task=view&id=20&Itemid=36

    If you need a tool to process the bilingual corpus (aligning at sentence
    or word level, editing, searching, etc.) I have also put a free software
    online (under Windows only):
    Alinea :
    http://w3.u-grenoble3.fr/kraif/index.php?option=com_content&task=view&id=27&Itemid=43

    (the latest version is not yet available, but previous ones can be
    downloaded).

    Regards

    Olivier

    > Dear Corpora-List members,
    >
    > I have three questions...
    >
    > Does anyone know if there is any publicly available bilingual,
    > sentence aligned, freely available corpus involving several languages,
    > namely in Scandinavian (Finnish, Norwegian, etc.) or Latin languages
    > (Spanish, Italian, etc.), for bilingual studies ?
    >
    > My second question is: Which would be the requirements to create an
    > online/desktop software tool for the whole process of a parallel corpora?
    >
    > Finally, do you should consider one million of words (in both
    > languages) a large or a little bilingual corpus?
    >
    > Any help will be appreciated.
    >
    >
    > Regards,
    >
    >
    > J. L. DeLucca (in some place of Spain)
    >
    >
    > ------------------------------------------------------------------------
    > Access over 1 million songs - Yahoo! Music Unlimited.
    > <http://pa.yahoo.com/*http://us.rd.yahoo.com/evt=36035/*http://music.yahoo.com/unlimited/>



    This archive was generated by hypermail 2b29 : Tue Nov 14 2006 - 21:50:58 MET