Re: [Corpora-List] Annotation Tool for German corpora/NE recognition task

From: Christopher Walker (chwalker@ldc.upenn.edu)
Date: Tue Oct 17 2006 - 19:24:31 MET DST

  • Next message: Gaëtanelle Gilquin: "[Corpora-List] “Corpus and Cognition” at CL 2007: call for expressions of interest"

    Hi,

    | > What I am really missing is:
    | > - a good tool to annotate some documents quickly, i.e. with
    | > information about : toponym, first and surname, and other
    | > NE?s. This, to get an idea
    | > (prec.+recall) about the quality of my models.

    The LDC ACE toolkit may also satisfy your needs:

      http://projects.ldc.upenn.edu/ace/tools/2005Toolkit.html

    It is highly customized to ACE annotation, but can be modified
    via an XML config file to suit a number of similar needs --
    including NE annotation, with co-reference. We find this tool
    to be easier to use (at scale for raw, untagged data) than either
    Callisto or Wordfreak, but I am not familiar with gate.

    Also, the output is .ag.xml format. But the package includes
    conversion scripts to the latest ACE Pilot Format (.apf.xml).
    These would need to be modified to the new tagset, but would
    work nonetheless. If you're interested in the infrastructure,
    I have a few perl script that generate a tabular output as
    well.

    -Christopher.

    ---------------------------------------
    Christopher R. Walker, Project Manager
    Automatic Content Extraction (ACE) &
    Less-Commonly Taught Languages (LCTL)
    LDC Annotation Lab
    chwalker@ldc.upenn.edu
    215.898.0946
    ---------------------------------------



    This archive was generated by hypermail 2b29 : Tue Oct 17 2006 - 19:40:47 MET DST