[Corpora-List] New LDC Publications

From: Linguistic Data Consortium (ldc@ldc.upenn.edu)
Date: Mon Feb 27 2006 - 20:32:37 MET

  • Next message: Syed Abdul Rahman: "Re: [Corpora-List] Malay Corpus"

    LDC2006T06
    *ACE 2005 Multilingual Training Corpus
    <http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T06>
    *
    LDC2006S29*
    Levantine Arabic QT Training Data Set 5, Speech
    <http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006S29>
    *
    LDC2006T07*
    Levantine Arabic QT Training Data Set 5, Transcripts
    <http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T07>

    *
    The Linguistic Data Consortium (LDC) is pleased to announce the
    availability of three new publications.

    ------------------------------------------------------------------------
    *
    *
    *New LDC Publications

    *
    (1) ACE 2005 Multilingual Training Corpus
    <http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T06>
    contains the complete set of English, Arabic and Chinese training data
    for the 2005 Automatic Content Extraction (ACE) technology evaluation.
    The corpus consists of data of various types annotated for entities,
    relations and events and was created by the Linguistic Data Consortium
    with support from the ACE Program, with additional assistance from LDC.
    The objective of the ACE program is to develop automatic content
    extraction technology to support automatic processing of human language
    in text form.

    In November 2005, sites were evaluated on system performance in five
    primary areas: the recognition of entities, values, temporal
    expressions, relations, and events. Entity, relation and event mention
    detection were also offered as diagnostic tasks. All tasks with the
    exception of event tasks were performed for three languages, English,
    Chinese and Arabic. Event tasks were evaluated in English and Chinese
    only. The current publication comprises the official training data for
    these evaluation tasks.

    A complete description of the ACE 2005 Evaluation can be found on the
    ACE Program website maintained by the National Institute of Standards
    and Technology (NIST) <http://www.nist.gov/speech/tests/ace/>.

    For more information about linguistic resources for the ACE Program,
    including annotation guidelines, task definitions, free annotation tools
    and other documentation, please visit LDC's ACE website.
    <http://projects.ldc.upenn.edu/ace/>* * **

    *

    (2) Levantine Arabic QT Training Data Set 5, Speech
    <http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006S29>
    and (3) Levantine Arabic QT Training Data, Set 5, Transcripts
    <http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T07>
    cover 1660 calls totaling approximately 250 hours of telephone
    conversation in Levantine Arabic collected between 2003 and 2005. These
    publications are the combination of four former training data sets:
    LDC2004E21 and LDC2004E22, LDC2004E65 and LDC2004E66, LDC2005S07 and
    LDC2005T03, and LDC2005S14 (Speech and Transcripts). The participants
    represent a range of Levantine Arabic dialects. More than half of the
    speakers are Lebanese; among the other speakers are Jordanian,
    Palestinian and Syrian participants.

    ------------------------------------------------------------------------

    If you need further information, or would like to inquire about
    membership to the LDC, please email ldc@ldc.upenn.edu or call +1 215 573
    1275.

    --------------------------------------------------------------------

    Linguistic Data Consortium Phone: (215) 573-1275
    University of Pennsylvania Fax: (215) 573-2175
    3600 Market St., Suite 810 ldc@ldc.upenn.edu
    Philadelphia, PA 19104 USA http://www.ldc.upenn.edu



    This archive was generated by hypermail 2b29 : Mon Feb 27 2006 - 20:55:42 MET