[Corpora-List] XML/TEI Human Rights Corpus

From: Pincemin (benie@club-internet.fr)
Date: Tue Oct 11 2005 - 09:57:07 MET DST

  • Next message: Stefan Evert: "Re: [Corpora-List] producing n-gram lists in java"

    We are happy to announce the release of the Human Rights Corpus / Corpus
    Droits de l'Homme, v.1, available on our web site :
    Université de Paris 13 - Laboratoire de Linguistique Informatique
    http://www-lli.univ-paris13.fr/ressources

    The corpus is composed of 28 International Conventions, from 1948
    (Universal Declaration of Human Rights) up to 2000. The choice of the
    texts has been made with an expert of the field, with the aim to have a
    representative view of the Human Rights reference texts and of the
    language and vocabulary used.

    Each text is given in 2 or 3 languages : English and French, and Spanish
    when the Convention is one of the United Nations. These versions are
    aligned at the level of the finest subdivision (article) through an
    appropriate design of identifiers.

    The encoding is in XML and follows the guidelines of the TEI. A special
    attention has been devoted to the realization of the Header ; in
    particular, the "TagUsage" part is fully developped in order to make
    understandable the choices made for the encoding and the meaning of each
    XML/TEI tag in our context.

    Please contact us to let us know your interests or remarks :
    corpus@lli.univ-paris13.fr

    Fabrice ISSAC, Computational Linguist
    Christine CHODKIEWICZ, Lawyer and Linguist
    Bénédicte PINCEMIN, Linguist



    This archive was generated by hypermail 2b29 : Tue Oct 11 2005 - 10:54:35 MET DST