[Corpora-List] Teaching corpora for romance languages

From: Carlos Rodriguez (crodriguezp@gmail.com)
Date: Tue Apr 26 2005 - 16:25:20 MET DST

  • Next message: Raza Shahid: "[Corpora-List] Arabic Phoneme Segmentation"

    Hi all,
    I am trying to coordinate compilation, adaptation and licencing of
    various language resources (corpora, treebanks, ontologies) for
    non-commercial use in teaching computational linguistics and Natural
    Language Processing programming techniques in Romance languages, using
    the Natural Language ToolKit (NLTK, at http://nltk.sf.net, is a
    Python-based plattform that already provides with its processing
    modules, for didactic purposes, sample data for English from the Brown
    corpus, the Penn treebank, among other sources ). We will soon have
    available some Spanish and Catalan datasets, interfases and tutorial
    translations, but will be great to have also Portuguese, French,
    Italian, and so on. There is a gap in these teaching resources for
    languages other than English, and this initiative can help fill it.
    If anyone is interested in providing and licensing corpora and other
    resources (formatted in internationally and scientifically-accepted
    standards), please contact me at CRodriguezP@gmail.com.

    Thanks,

    Carlos Rodríguez
    -----------------
    IIMAS-National Autonomous University (Mexico)



    This archive was generated by hypermail 2b29 : Tue Apr 26 2005 - 16:37:15 MET DST