[Corpora-List] WordSmith and ANC

From: Nancy Ide (ide@cs.vassar.edu)
Date: Thu Aug 24 2006 - 03:51:56 MET DST

  • Next message: Cecilie Desiree Widsteen: "[Corpora-List] Resources concerning multilabel problem - SUMMARY"

    On Jul 20, 2006, at 2:06 PM, Linda Bawcom wrote:
    > 1) Can the ANC be used with Wordsmith? Only a program called "Gate"
    > is listed on the web site and I don't understand enough about XML,
    > filenames, or markups to know if the information given means it
    > can be used with Wordsmith. The ANC publisher has not gotten back
    > to me (just so you know I've done my home!).

    Sorry that this response is so long in coming. I assume by the "ANC
    publisher" you mean LDC, which would not have the answer to this
    question. Please send inquiries to anc@cs.vassar.edu, not to LDC.

    The answer is "yes" concerning WordSmith. Once you have the ANC,
    download the ANCTool (go to http://americannationalcorpus.org/tools/
    anctool.html--the link to it is on the main ANC web page) and run it,
    at which point you can choose the parts of the corpus you want to use
    as well as the output format. One of the options in the ANCTool is
    for the data to be output in a format for input to WordSmith (see
    "The WordSmith tab" on the tool web page). Other options in the
    downloadable version of the tool are MonoConc input format and XCES.

    We have a new version of the tool which provides other formats as
    well as a mechanism for specifying your own output format, is "schema
    aware" to enable greater control over the output, and provides
    several options for handling overlapping hierarchies. This new
    version, together with another 20 million words of data, annotations
    for several syntactic analyses, and some manually produced WordNet
    annotations will be made available if and when the ANC project finds
    the funding to enable us to resume activity.

    =======================================================
    Nancy Ide

    Professor and Chair
    Department of Computer Science
    Vassar College
    Poughkeepsie, New York 12604-0520
    USA

    tel: (+1 845) 437 5988
    fax: (+1 845) 437 7498
    email: ide@cs.vassar.edu
    http://www.cs.vassar.edu/~ide
    =======================================================



    This archive was generated by hypermail 2b29 : Thu Aug 24 2006 - 04:21:09 MET DST