Re: [Corpora-List] Needed: Corpora of radio news segments.

From: Linguistic Data Consortium (ldc@ldc.upenn.edu)
Date: Wed Nov 09 2005 - 18:36:08 MET

  • Next message: Thierry Fontenelle: "[Corpora-List] Euralex 2006 2nd Call for Papers available + deadline extension"

    Hi John,

    You might wish to consider the following HUB4 and TDT resources
    distributed by the LDC. These data sets contain substantial quantities
    of recent broadcast news in several languages, segmented into individual
    stories and time-aligned with verbatim transcripts.

    LDC97S66 <catalogEntry.jsp?catalogId=LDC97S66> 1996 English Broadcast
    News Dev and Eval (Hub-4)
    LDC97S44 <catalogEntry.jsp?catalogId=LDC97S44> 1996 English Broadcast
    News Speech (Hub-4)
    LDC97T22 <catalogEntry.jsp?catalogId=LDC97T22> 1996 English Broadcast
    News Transcripts (Hub-4)
    LDC98S71 <catalogEntry.jsp?catalogId=LDC98S71> 1997 English Broadcast
    News Speech (Hub-4)
    LDC98T28 <catalogEntry.jsp?catalogId=LDC98T28> 1997 English Broadcast
    News Transcripts (Hub-4)

    LDC2002S11 <catalogEntry.jsp?catalogId=LDC2002S11> 1997 HUB4 English
    Evaluation Speech and Transcripts
    LDC98S73 <catalogEntry.jsp?catalogId=LDC98S73> 1997 Mandarin Broadcast
    News Speech (Hub-4NE)
    LDC98T24 <catalogEntry.jsp?catalogId=LDC98T24> 1997 Mandarin Broadcast
    News Transcripts (Hub-4NE)
    LDC98S74 <catalogEntry.jsp?catalogId=LDC98S74> 1997 Spanish Broadcast
    News Speech (Hub-4NE)
    LDC98T29 <catalogEntry.jsp?catalogId=LDC98T29> 1997 Spanish Broadcast
    News Transcripts (Hub-4NE)
    LDC2000S86 <catalogEntry.jsp?catalogId=LDC2000S86> 1998 HUB-4 Broadcast
    News Evaluation English Test Material

    LDC2000S92 <catalogEntry.jsp?catalogId=LDC2000S92> TDT2 Careful
    Transcription Audio
    LDC2000T44 <catalogEntry.jsp?catalogId=LDC2000T44> TDT2 Careful
    Transcription Text
    LDC99S84 <catalogEntry.jsp?catalogId=LDC99S84> TDT2 English Audio
    LDC2001S93 <catalogEntry.jsp?catalogId=LDC2001S93> TDT2 Mandarin Audio
    Corpus
    LDC2001T57 <catalogEntry.jsp?catalogId=LDC2001T57> TDT2 Multilanguage
    Text Version 4.0
    LDC2001S94 <catalogEntry.jsp?catalogId=LDC2001S94> TDT3 English Audio
    LDC2001S95 <catalogEntry.jsp?catalogId=LDC2001S95> TDT3 Mandarin Audio
    LDC2001T58 <catalogEntry.jsp?catalogId=LDC2001T58> TDT3 Multilanguage
    Text Version 2.0
    LDC2005S11 <catalogEntry.jsp?catalogId=LDC2005S11> TDT4 Multilingual
    Broadcast News Speech Corpus
    LDC2005T16 <catalogEntry.jsp?catalogId=LDC2005T16> TDT4 Multilingual
    Text and Annotations

    You can view our entire online catalog at:

    http://www.ldc.upenn.edu/Catalog/

    Kind regards,

    Ilya

    Bryar Family wrote:

    >Hello:
    >
    >I'm developing a project for rapid identification and categorization of
    >audio news clips, with a "target communities" focus. Are there any public
    >corpora available that consist of individual audio news stories of recent
    >vintage? (last 5-10 years)
    >
    >I'd also be interested in corresponding with any members of the list who are
    >developing content categorization strategies for such audio content. For
    >example, if there are any members of the list who are involved with the
    >NewsML project, I'd like to hear from them.
    >
    >John V "Jack" Bryar
    >Managing Partner and acting CTO,
    >MilkBottleNews Partners
    >Direct: 802-843-6033
    >jack@milkbottlenews.com
    >
    >
    >

    -- 
    

    Ilya Ahtaridis Membership Coordinator -------------------------------------------------------------------- Linguistic Data Consortium Phone: (215) 573-1275 University of Pennsylvania Fax: (215) 573-2175 3600 Market St., Suite 810 ldc@ldc.upenn.edu Philadelphia, PA 19104 http://www.ldc.upenn.edu



    This archive was generated by hypermail 2b29 : Wed Nov 09 2005 - 19:20:25 MET