[Corpora-List] Annoucement: jTokeniser v2.0 released

From: Andy Roberts (andyr@comp.leeds.ac.uk)
Date: Sun Jul 16 2006 - 22:51:19 MET DST

  • Next message: Nuno Seco: "[Corpora-List] FW: Doctoral Program in Information Science and Technology-University of Coimbra"

    Dear Corpora List readers,

    I'm happy to announce that I've just released a new version of the
    jTokeniser library.

    As some may recall, jTokeniser comprises of 6 tokenisers ranging from
    basic to powerful, and they were accessible in a very simple Java API.
    Tokenisers include:

    * WhiteSpaceTokeniser
    * StringTokeniser (based on specified delimiters)
    * RegexTokeniser (regular expression defines a token)
    * RegexSeparatorTokeniser (define what is *not* a token)
    * BreatIteratorTokeniser (sophisticated locale-specific tokeniser)
    * SentenceTokeniser (sentence segmentation)

    jTokeniser v2.0 makes no changes to the core tokenisers themselves, but
    adds a nice GUI front-end to the library to allow users to experiment
    with the tokenisers interactively.

    This should appeal to those who perhaps don't have the programming
    experience in Java to utilise the library in its intended form. It also
    makes it ideal for use within a learning context, such as an NLP course.

    For all information about downloading, installing, running and using
    jTokeniser v2.0, please visit the project website (screenshots included):

    http://www.andy-roberts.net/software/jTokeniser

    Any comments or feature suggestions are welcome.

    Regards,
    Andy Roberts



    This archive was generated by hypermail 2b29 : Sun Jul 16 2006 - 22:50:24 MET DST