Re: [Corpora-List] POS Tagger for German / Java

From: Niels Ott (niels@drni.de)
Date: Sun Jan 14 2007 - 17:56:30 MET

  • Next message: Alexander BOULTON: "Re: [Corpora-List] list of cognates"

    -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1

    Dear Michael,

    Michael Sonntag schrieb:
    > 3. I also used qtag. But it comes only with a, for my task too
    > small data base (lexicon and matrix).

    I used Qtag for some testing and I found that the quality of its output
    depends on the training data. (I assume this is true for most taggers.)

    In case you have a large tagged corpus, try training Qtag with it. If
    you plan to use corpora/treebanks in TigerXML, I can provide an XSLT
    style sheet to convert them into vertical training data for Qtag.

    > So, is there any POS tagger out there that is easy to use and up
    > for the task?

    TreeTagger (TT) seems to be a renowned tagger. However, I found it has
    problems with processing Unicode. As you seem to require it to work with
    your Java program, your wrapper should ensure that it feeds TT with
    iso-8859-1 only.

    Regards,

      Niels Ott

    P.S.: You will get this message twice, as I forgot to include the
    corpora list into the recipient list.

    - --
    Niels Ott - Computational Linguist (B.A.) - http://www.drni.de/niels/
    "Paper or plastic?" "Not (not paper and not plastic)." (Augustus
    DeMorgan in a grocery store ;-)
    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.2.2 (GNU/Linux)

    iD8DBQFFqmC+bosnVosUgx0RApBTAKCEPNQoHhTvhiu/GW36DBYfV9sioACfV9wD
    blU9XV55J1f4IbYUtT7pY4Y=
    =BSbA
    -----END PGP SIGNATURE-----



    This archive was generated by hypermail 2b29 : Sun Jan 14 2007 - 17:55:51 MET