[Corpora-List] New Corpus from the LDC

From: Linguistic Data Consortium (ldc@ldc.upenn.edu)
Date: Fri Dec 05 2003 - 22:16:05 MET

  • Next message: kivs@bultreebank.org: "[Corpora-List] CFP: Combining Shallow and Deep Processing for NLP (ESSLLI 2004 Workshop)"

                                                          * LDC2003T15 *
                                              * SLX Corpus of Classic
    Sociolinguistic Interviews *
     
    The Linguistic Data Consortium (LDC) is pleased to announce the
    availability of the SLX Corpus of Classic Sociolinguistic Interviews

    The SLX Corpus of Classic Sociolinguistic Interviews contains 8
    sociolinguistic interviews with a total of 9 speakers. William Labov
    and one of his students conducted the interviews in the 1960s and 70s.
    These interviews represent solutions to the problems of achieving
    cross-cultural contact, reducing the effect of the Observer's Paradox
    and approximating the vernacular of everyday life.

    The corpus includes the complete interview recordings plus time-aligned
    verbatim transcripts for each speaker. Also included in the publication
    is a sociolinguistic variable survey that represents an overview of the
    intra- and inter-speaker variation attested in the corpus, highlighting
    a broad range of phonological, phonetic, grammatical, lexical and
    stylistic variables. Finally, the publication includes a number of
    annotation tools that allow users to listen to each interview while
    browsing the corresponding transcripts, and to display and hear each
    token identified in the variable survey.

    The SLX Corpus was developed as part of the Data and Annotations for
    Sociolinguistics (DASL) Project
    <http://www.ldc.upenn.edu/Projects/DASL>, an investigation of best
    practices in the use of digital speech corpora for the study of language
    variation. The recordings demonstrate successful interviewing
    techniques, the sound quality is high, and the digitization,
    segmentation and transcription of the data represent best practice in
    these areas. The variable survey highlights over 150 sociolinguistic
    variables attested in the corpus and suggests avenues for further
    research. Most importantly, the SLX Corpus provides both an example of a
    digital speech corpus developed specifically to support sociolinguistic
    research, and a stable benchmark for training in sociolinguistic data
    collection, digitization, segmentation, transcription, analysis and
    publication.

    The SLX Corpus contains 17 speech files (22050Hz, 16 bit, single-channel
    in the MS WAV (RIFF) format), for a total of 575 minutes (~ 1.5GB). The
    data is distributed on DVD-ROM.

    For further information, including online documentation, please visit:

    http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2003T15

    The cost of the first 100 copies of this publication (not including the
    copies distributed to LDC members) is covered by NSF Grant Number
    BCS-998009, and therefore free of charge. After these first 100 copies
    are distributed, additional copies will be available for the production
    cost of $100 per disc.

                                                                            
                *

    If you need additional information before placing your order, or would
    like to inquire about membership to the LDC, please send email to
    <ldc@ldc.upenn.edu> or call 1 (215) 573 1275.

                                                                             
               *

    -----------------------------------------------------------------------
    Linguistic Data Consortium Phone: 1 (215) 573-1275
    University of Pennsylvania Fax: 1 (215) 573-2175
    3600 Market St., Suite 810 email: ldc@ldc.upenn.edu
    Philadelphia, PA 19104-2653 www: http://www.ldc.upenn.edu



    This archive was generated by hypermail 2b29 : Fri Dec 05 2003 - 22:20:08 MET