Re: [Corpora-List] spanish tokenizer

From: Marco Baroni (baroni@sslmit.unibo.it)
Date: Mon Oct 16 2006 - 16:59:20 MET DST

  • Next message: Jorge Civera Saiz: "Re: [Corpora-List] spanish tokenizer"

    The freeling suite includes an open source Spanish tokenizer implemented in
    C++:

    http://garraf.epsevg.upc.es/freeling/index.php

    Regards,

    Marco

    Maria Esteva wrote:
    > Dear all,
    >
    > I am a PhD student in the School of Information, University of Texas at
    > Austin. For my dissertation, I will text mine a large set of corporate
    > electronic records in Spanish. For this, I need to find an open source
    > spanish tokenizer, if possible in C++ although other languages would be
    > fine as well. I am familiar with the Lucene tool set so if you know
    > about another source where I can find this tool I will appreciate your
    > help.
    >
    > Thanks in advance,
    >
    > Maria Esteva
    >



    This archive was generated by hypermail 2b29 : Mon Oct 16 2006 - 16:57:01 MET DST