[Corpora-List] European Constitution in parallel

From: Joerg Tiedemann (tiedeman@let.rug.nl)
Date: Mon Apr 25 2005 - 01:03:20 MET DST

  • Next message: FIDELHOLTZ_DOOCHIN_JAMES_LAWRENCE: "[Corpora-List] Re: Common connectors"

    The EU constitution is now part of OPUS parallel corpus.
    21 languages, aligned at the sentence level!

    download: http://logos.uio.no/opus/EUconst.html
    query: http://logos.uio.no/cgi-bin/opus/opuscqp.pl?corpus=EUconst

    Everything is machine annotated & automatically aligned. Tokenization,
    sentence splitting, alignment are not 100% correct ...

    The query engine is the corpus work bench. There are some problems in
    cases where a conversion from UTF-8 to ISO-8859 wasn't possible. Sorry
    for that.

    The source files are taken from:
    http://europa.eu.int/eur-lex/lex/XX/treaties/dat/12004V/htm/12004V.html
    (replace 'XX' with language codes such as 'en', 'de', ...)

    Jörg

    ***********/\/\/\/\/\/\/\/\/\/\/\************************************
    ** Jörg Tiedemann tiedeman@let.rug.nl **
    ** Alfa-Informatica http://www.let.rug.nl/~tiedeman **
    ** Rijksuniversiteit Groningen Harmoniegebouw, room 1311-429 **
    ** Oude Kijk in 't Jatstraat 26 phone: +31 (0)50-363 5935 **
    ** 9712 EK Groningen fax: +31 (0)50-363 6855 **
    *************************************/\/\/\/\/\/\/\/\/\/\/\**********



    This archive was generated by hypermail 2b29 : Mon Apr 25 2005 - 01:34:02 MET DST