Re: [Corpora-List] New LDC Corpora

From: Joel Tetreault (tetreaul@cs.rochester.edu)
Date: Tue Sep 16 2003 - 17:47:11 MET DST

  • Next message: Cédrick Fairon: "[Corpora-List] JADT2004 - Second Call for Papers"

    hi, we'll take both. thanks, Joel

    On Tue, 16 Sep 2003, ldc@ldc.upenn.edu wrote:

    >
    >
    > LDC2003T11
    > * ACE-2 Version 1.0 *
    >
    > LDC2003T13
    > * Message Understanding Conference (MUC) 6 *
    >
    > The Linguistic Data Consortium (LDC) is pleased to announce the
    > availability of two new corpora.
    >
    > *
    >
    > ACE-2 Version 1.0 supports the Automatic Content Extraction (ACE)
    > program whose objective is to develop extraction technology to support
    > automatic processing of source language data. This includes
    > classification, filtering, and selection based on the language content
    > of the source data, i.e., based on the meaning conveyed by the data.
    > Thus, the ACE program requires the development of technologies that
    > automatically detect and characterize this meaning. The ACE research
    > objectives are viewed as the detection and characterization of Entities,
    > Relations, and Events.
    >
    > Annotations for the ACE-2 corpus concern two research tasks: Entity
    > Detection and Tracking (EDT) and Relation Detection and Characterization
    > (RDC). ACE-2 contains two sets of data: training and devtest. Each of
    > these sets is further divided by source: broadcast news, newspaper, and
    > newswire. There are 179,007 words of source data in 519 files.
    >
    > For further information about this corpus, including a link to online
    > documentation and the NIST ACE program site, please visit:
    >
    > http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2003T11
    >
    >
    > Institutions that have membership in the LDC during the 2003
    > Membership Year will be able to receive this corpus free of charge.
    > Nonmembers may license this publication for $500.
    >
    >
    > *
    >
    > In the 1990s, the MUC evaluations funded the development of metrics and
    > statistical algorithms to support government evaluations of emerging
    > information extraction technologies. The Message Understanding
    > Conference (MUC) 6 corpus contains 318 annotated Wall Street Journal
    > articles, scoring software, and corresponding documentation used in the
    > MUC 6 evaluation. Both the MUC 6 Additional News Text (LDC96T10) corpus
    > and the MUC 6 corpus are necessary in order to replicate the evaluation.
    >
    > All the materials have been published as received from the corpus
    > authors. No quality control has been conducted at the LDC; however, the
    > text files have been uncompressed.
    >
    > For further information, including online documentation and a link to
    > the NIST's MUC pages, please visit:
    >
    > http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2003T13
    >
    > Institutions that have membership in the LDC during the 2003
    > Membership Year will be able to receive this corpus free of charge.
    > Nonmembers may license this publication for US$100.
    >
    >
    > *
    >
    >
    > MUC VI Text Collection (LDC96T10) has been renamed MUC 6 Additional News
    > Text. The new title more accurately reflects the corpus data as it
    > consists only of additional training materials for the MUC 6 evaluation.
    >
    >
    >
    > If you need additional information before placing your order, or
    > would like to inquire about membership in the LDC, please send email to
    > or call (215) 573-1275.
    >
    >
    > ---------------------------------------------------------------------
    > Linguistic Data Consortium Phone: (215) 573-1275
    > 3600 Market Street Fax: (215) 573-2175
    > Suite 810 email: ldc@ldc.upenn.edu
    > Philadelphia, PA 19104-2653 www: http://www.ldc.upenn.ed
    >
    >



    This archive was generated by hypermail 2b29 : Tue Sep 16 2003 - 17:45:00 MET DST