Re: [Corpora-List] Natural Language Toolkit: NLTK-Lite version 0.6.5 released

From: Markus Heller (markus@relix.de)
Date: Tue Jul 11 2006 - 01:53:08 MET DST

  • Next message: Hamish Cunningham: "Re: [Corpora-List] Natural Language Toolkit: NLTK-Lite version 0.6.5 released"

    Dear Corpora Community,

    I recently saw that the tokenizer from the nltk package requires a good regex.
    Does anybody have a reasonable regex for this package which can produce
    decent tokens from modern texts, preferably German texts? I have tried out
    the ones on the tutorial pages but I see a common package user is required to
    develop his own regex for tokenizing purposes. Are there good (free)
    tokenizer regexes around for this package?

    Thanks in advance,
    Markus



    This archive was generated by hypermail 2b29 : Tue Jul 11 2006 - 02:13:46 MET DST