[Corpora-List] New Corpus from the LDC

From: Linguistic Data Consortium (ldc@ldc.upenn.edu)
Date: Fri Dec 05 2003 - 22:16:05 MET

Next message: kivs@bultreebank.org: "[Corpora-List] CFP: Combining Shallow and Deep Processing for NLP (ESSLLI 2004 Workshop)"

Previous message: Sandra Kübler: "[Corpora-List] Computational Linguistics post at Univ. Tuebingen"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

* LDC2003T15 *
* SLX Corpus of Classic
Sociolinguistic Interviews *

The Linguistic Data Consortium (LDC) is pleased to announce the
availability of the SLX Corpus of Classic Sociolinguistic Interviews

The SLX Corpus of Classic Sociolinguistic Interviews contains 8
sociolinguistic interviews with a total of 9 speakers. William Labov
and one of his students conducted the interviews in the 1960s and 70s.
These interviews represent solutions to the problems of achieving
cross-cultural contact, reducing the effect of the Observer's Paradox
and approximating the vernacular of everyday life.

The corpus includes the complete interview recordings plus time-aligned
verbatim transcripts for each speaker. Also included in the publication
is a sociolinguistic variable survey that represents an overview of the
intra- and inter-speaker variation attested in the corpus, highlighting
a broad range of phonological, phonetic, grammatical, lexical and
stylistic variables. Finally, the publication includes a number of
annotation tools that allow users to listen to each interview while
browsing the corresponding transcripts, and to display and hear each
token identified in the variable survey.

The SLX Corpus was developed as part of the Data and Annotations for
Sociolinguistics (DASL) Project
<http://www.ldc.upenn.edu/Projects/DASL>, an investigation of best
practices in the use of digital speech corpora for the study of language
variation. The recordings demonstrate successful interviewing
techniques, the sound quality is high, and the digitization,
segmentation and transcription of the data represent best practice in
these areas. The variable survey highlights over 150 sociolinguistic
variables attested in the corpus and suggests avenues for further
research. Most importantly, the SLX Corpus provides both an example of a
digital speech corpus developed specifically to support sociolinguistic
research, and a stable benchmark for training in sociolinguistic data
collection, digitization, segmentation, transcription, analysis and
publication.

The SLX Corpus contains 17 speech files (22050Hz, 16 bit, single-channel
in the MS WAV (RIFF) format), for a total of 575 minutes (~ 1.5GB). The
data is distributed on DVD-ROM.

For further information, including online documentation, please visit:

http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2003T15

The cost of the first 100 copies of this publication (not including the
copies distributed to LDC members) is covered by NSF Grant Number
BCS-998009, and therefore free of charge. After these first 100 copies
are distributed, additional copies will be available for the production
cost of $100 per disc.

If you need additional information before placing your order, or would
like to inquire about membership to the LDC, please send email to
<ldc@ldc.upenn.edu> or call 1 (215) 573 1275.

-----------------------------------------------------------------------
Linguistic Data Consortium Phone: 1 (215) 573-1275
University of Pennsylvania Fax: 1 (215) 573-2175
3600 Market St., Suite 810 email: ldc@ldc.upenn.edu
Philadelphia, PA 19104-2653 www: http://www.ldc.upenn.edu

Next message: kivs@bultreebank.org: "[Corpora-List] CFP: Combining Shallow and Deep Processing for NLP (ESSLLI 2004 Workshop)"
Previous message: Sandra Kübler: "[Corpora-List] Computational Linguistics post at Univ. Tuebingen"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Fri Dec 05 2003 - 22:20:08 MET