Re: [Corpora-List] Natural Language Toolkit: NLTK-Lite version 0.6.5 released

From: Markus Heller (markus@relix.de)
Date: Tue Jul 11 2006 - 01:53:08 MET DST

Next message: Hamish Cunningham: "Re: [Corpora-List] Natural Language Toolkit: NLTK-Lite version 0.6.5 released"

Previous message: Sandra Kübler: "[Corpora-List] call for papers TLT 2006"
In reply to: Steven Bird: "[Corpora-List] Natural Language Toolkit: NLTK-Lite version 0.6.5 released"
Next in thread: Hamish Cunningham: "Re: [Corpora-List] Natural Language Toolkit: NLTK-Lite version 0.6.5 released"
Reply: Hamish Cunningham: "Re: [Corpora-List] Natural Language Toolkit: NLTK-Lite version 0.6.5 released"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Dear Corpora Community,

I recently saw that the tokenizer from the nltk package requires a good regex.
Does anybody have a reasonable regex for this package which can produce
decent tokens from modern texts, preferably German texts? I have tried out
the ones on the tutorial pages but I see a common package user is required to
develop his own regex for tokenizing purposes. Are there good (free)
tokenizer regexes around for this package?

Thanks in advance,
Markus

Next message: Hamish Cunningham: "Re: [Corpora-List] Natural Language Toolkit: NLTK-Lite version 0.6.5 released"
Previous message: Sandra Kübler: "[Corpora-List] call for papers TLT 2006"
In reply to: Steven Bird: "[Corpora-List] Natural Language Toolkit: NLTK-Lite version 0.6.5 released"
Next in thread: Hamish Cunningham: "Re: [Corpora-List] Natural Language Toolkit: NLTK-Lite version 0.6.5 released"
Reply: Hamish Cunningham: "Re: [Corpora-List] Natural Language Toolkit: NLTK-Lite version 0.6.5 released"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Tue Jul 11 2006 - 02:13:46 MET DST