Dear Corpora Community,
I recently saw that the tokenizer from the nltk package requires a good regex.
Does anybody have a reasonable regex for this package which can produce
decent tokens from modern texts, preferably German texts? I have tried out
the ones on the tutorial pages but I see a common package user is required to
develop his own regex for tokenizing purposes. Are there good (free)
tokenizer regexes around for this package?
Thanks in advance,
Markus
This archive was generated by hypermail 2b29 : Tue Jul 11 2006 - 02:13:46 MET DST