New Release from the LDC

LDC Office (ldc@unagi.cis.upenn.edu)
Fri, 17 May 1996 16:39:53 EDT

Announcing a NEW RELEASE from the
LINGUISTIC DATA CONSORTIUM

Radio Broadcast News
Continuous Speech Recognition Corpus (Hub-4)

This set of CD-ROMs contains all of the speech data provided to sites
participating in the DARPA CSR November 1995 Hub-4 (Radio) Broadcast
News tests. The data consists of digitized waveforms of MarketPlace
(tm) business news radio shows provided by KUSC through an agreement
with the Linguistic Data Consortium, and detailed transcriptions of
those broadcasts. The software NIST used to process and score the
output of the test systems is also included.

The data is organized as follows:

CD26-1: Training Data-Ten complete half-hour broadcasts with
minimally-verified transcripts. The transcripts are time aligned with
the waveforms at the story-boundary level.

CD26-2: Development-Test Data-Six complete half-hour broadcasts with
verified transcripts. The transcripts are time aligned with the
waveforms at the story-and turn-boundary level. Index files have been
included which specify how the data may be partitioned into 2 test
sets.

CD26-6 Evaluation-Test Data-Five complete half-hour broadcasts with
verified/adjudicated transcripts. The transcripts are time aligned
with the waveforms at the story-, turn-, and music-boundary level. An
index file has been included which specifies how the data was
partitioned into the test set used in the CSR 1995 Hub-4 tests.

Institutions that have membership in the LDC during the 1996
Membership Year will be able to receive a copy of the Radio Broadcast
News at no additional charge, in the same manner as all other text and
speech corpora published by the LDC.

Nonmembers can receive a copy of this corpus for research purposes
only for a fee of $2500. If you would like to order a copy of this
corpus, please email your request to ldc@unagi.cis.upenn.edu. If you
need additional information before placing your order, or would like
to inquire about membership in the LDC, please send email or call
(215) 898-0464.

Further information about the LDC and its available corpora can be
accessed on the Linguistic Data Consortium WWW Home Page at URL
http://www.cis.upenn.edu/~ldc. Information is also available via ftp
at ftp.cis.upenn.edu under pub/ldc; for ftp access, please use
"anonymous" as your login name, and give your email address when asked
for password.