Re: [Corpora-List] Re: Looking for Automatic POS Tagging Software - a summary of responses

From: Lam Yuen Wing, Peter (ywlam@kcrc.com)
Date: Sat Feb 18 2006 - 07:30:09 MET

  • Next message: Isabelle Oliveira: "[Corpora-List] Annonce de colloque : "corpus et dictionnaires de langues de spécialité""

    Dear all,
     
    About six weeks ago, I asked for pointers on user-friendly POS taggers
    that run under Windows and are able to tag and subcategorise words, e.g.
    to tag adjectives and subcategorise them into predicates, attributes,
    superlatives, participles, etc. I am grateful to the following members,
    who have spent time writing me valuable advice. The following is a
    summary of their responses:
     
    Ted Pedersen tpederse@d.umn.edu
    Ted suggested trying GATE http://gate.ac.uk/, which includes a POS
    tagger, and "is fairly easy to install and use (it is
    Java based and runs on Windows, Linux, etc...)".
     
    Alex Fang acfang@cityu.edu.hk
    Alex recommended AUTASYS, which runs under Windows. For more
    information, please visit
    http://www.phon.ucl.ac.uk/home/alex/project/tagging/tagging.htm.
    AUTASYS provides subcategorisations and gives a selection of two tag
    sets: ICE and LOB. In addition, it has a lemmatisation module. It is
    available for academic purposes only, 500 pound sterling one-off payment
    for a single-user licence or 1,000 pounds for a site licence of one
    year. AUTASYS tags 1.8 million words per minute, with estimated accuracy
    of 95%. Output results can be in horizontal (passage style) or vertical
    format.

    Neil Millar kansaineil@hotmail.com <mailto:kansaineil@hotmail.com>
    Neil suggested giving a try of Brill's Tagger for free at
    http://www.cs.jhu.edu/~brill/RBT1_14.tar.Z. The tagger runs on Windows
    and is "easy to use".
     
    Eric Atwell eric@comp.leeds.ac.uk <mailto:eric@comp.leeds.ac.uk>
    Eric said the CLAWS system can be used via WWW by accessing the UCREL
    website <http://www.comp.lancs.ac.uk/computing/research/ucrel/claws/>
    http://www.comp.lancs.ac.uk/computing/research/ucrel/claws/, which means
    it does not necessarily run on UNIX.
    There is a free trial service offering access to the latest version of
    the tagger, CLAWS4:
    http://www.comp.lancs.ac.uk/computing/research/ucrel/claws/trial.html
     
    Paul Rayson rayson@exchange.lancs.ac.uk
    <mailto:rayson@exchange.lancs.ac.uk>
    Paul advised there are beta versions of CLAWS for Windows, Linux and
    shortly for MacOSX. Trials could be available on request.
     
    Oliver Mason o.mason@bham.ac.uk <mailto:o.mason@bham.ac.uk>
    Oliver suggested a try of Qtag
    (http://www.english.bham.ac.uk/staff/omason/software/qtag.html), which
    is written in Java and thus runs on Windows.
     
    SVMTool team jgimenez@lsi.upc.edu <mailto:jgimenez@lsi.upc.edu>
    SVMTool team said that in the TALP Research Center (Barcelona) they have
    developed a geberak sequential tagger, and applied it to the problem of
    PoS tagging. It may be freely downloaded at:
    http://www.lsi.upc.edu/~nlp/SVMTool/.
     
    Models for English, Spanish and Catalan are available. And, given
    annotated data, it may be trained for any language, any sequential
    tagging problem (PoS tagging, NERC, chunking, etc). The C++ version
    exhibits a tagging speed of 10,000 words per second.
     
    Atanas Chanev artanisz@mail.bg <mailto:artanisz@mail.bg>
    Atanas suggested trying the T'n'T tagger (by Thorsten Brants), which is
    freely available through registration with
    http://www.coli.uni-saarland.de/~thorsten/tnt/
    <http://www.coli.uni-saarland.de/~thorsten/tnt/> . Atanas said: "There
    is a version for Windows and it has the most user friendly interface
    among the taggers I have used. It is one of the currently most accurate
    taggers".
     
    A package of taggers working under Linux can be found on:
    http://acopost.sourceforge.net/ (follow the sourceforge link). Most of
    the Linux applications should work under cygwin emulator of Linux for
    Windows, which is downloadable from internet .
     
    Another tagger is the SVMtool (Jes&#250;s Gim&#233;nez and Llu&#237;s
    M&#224;rquez). Its accuracy is similar to the
    accuracy of T'n'T for small amounts of training data. There are c++ and
    Perl versions and Perl can be downloaded for free from
    www.activestate.com.
     
    Svetlana Sheremetyeva linklana@yahoo.com
    Svetlana has her FLAT (Flexible Language Acquisition Tool), which is
    "extremely user friendly and can be tuned to any features". Description
    of it can be found at http://lanaconsult.com.
     
    Gerard Peregrin GerardPer@aol.com <mailto:GerardPer@aol.com>
    Gerald recommended to try the software at
    http://www-nlp.stanford.edu/software/lex-parser.shtml
    <http://www-nlp.stanford.edu/software/lex-parser.shtml> , which is
    written in Java.
     
    Vlad Gojol gojol@rnc.ro <mailto:gojol@rnc.ro>
    Vlad suggested GojolParser, which is "a deep structure morpho-syntactic
    analyzer".
     
    Best
    Peter Lam
    PhD Student
    The Hong Kong Polytechnic University

    "KCRC - Better connections; better services"

    This email and any attachment to it may contain confidential or proprietary information that are intended solely for the person / entity to whom it was originally addressed. If you are not the intended recipient, any disclosure, copying, distributing or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful.

    Internet communications cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, arrive late or contain viruses. The sender therefore does not accept liability for any errors or omissions in the context of this message which arise as a result of transmission over the Internet.

    No opinions contained herein shall be construed as being a formal disclosure or commitment of the Kowloon-Canton Railway Corporation unless specifically so stated.



    This archive was generated by hypermail 2b29 : Sat Feb 18 2006 - 10:26:18 MET