Re: [Corpora-List] Natural Language Toolkit: NLTK-Lite version 0.6.5 released

From: Hamish Cunningham (hamish@dcs.shef.ac.uk)
Date: Tue Jul 11 2006 - 12:41:26 MET DST

  • Next message: Diana Maynard: "Re: [Corpora-List] Natural Language Toolkit: NLTK-Lite version 0.6.5 released"

    Markus,

    You might try the unicode-based tokeniser included with GATE
    (http://gate.ac.uk), or ask on the user list for a German specialisation of
    it.

    Best

    -- 
    Hamish
    http://www.dcs.shef.ac.uk/~hamish/
    

    Markus Heller wrote: > Dear Corpora Community, > > I recently saw that the tokenizer from the nltk package requires a good regex. > Does anybody have a reasonable regex for this package which can produce > decent tokens from modern texts, preferably German texts? I have tried out > the ones on the tutorial pages but I see a common package user is required to > develop his own regex for tokenizing purposes. Are there good (free) > tokenizer regexes around for this package? > > Thanks in advance, > Markus > > >



    This archive was generated by hypermail 2b29 : Tue Jul 11 2006 - 12:42:51 MET DST