[Corpora-List] ELRA - Language Resources Catalogue - Update - NEMLAR resources

From: ELDA (info@elda.org)
Date: Fri Aug 11 2006 - 12:34:52 MET DST

  • Next message: Martin Wynne: "[Corpora-List] OTA at 30: free one-day conference"

    Our apologies if you have received multiple copies of this announcement

    *******************************************************************
    ELRA - Language Resources Catalogue - Update
    *******************************************************************
    We are happy to announce the following Arabic resources, produced within
    the NEMLAR project (www.nemlar.org). All 3 resources are owned and
    copyrighted by the Nemlar Consortium. They are available in our catalogue.
    To view all the Language Resources available, you can visit our on-line
    catalogue: <http://www.elra.info/>http://www.elra.info or
    <http://www.elda.org/>http://www.elda.org

    *** ELRA-W0042 NEMLAR Written Corpus ***
    This corpus consists of about 500,000 words of Arabic text from 13
    different categories. The text is provided in 4 different versions:
    · Raw text
    · Fully vowelized text
    · Text with Arabic lexical analysis
    · Text with Arabic POS-tags

    The database is distributed on 1 ISO 9660 CD-ROM volume.

    For more information, see
    <http://catalog.elda.org:8080/product_info.php?products_id=873&osCsid=2eb47737dba8e4365c4972784a235948>http://catalog.elda.org:8080/product_info.php?products_id=873&osCsid=2eb47737dba8e4365c4972784a235948

    *** ELRA-S0219 NEMLAR Broadcast News Speech Corpus ***
    The data consists of about 40 hours and is provided by ELDA of Arabic data
    (mainly Standard Arabic from a number of broadcast companies);
    Transcriptions follow the Transcriber conventions as used by ELDA and focus
    on the orthographic, named entities, speaker/turn segmentation levels. No
    phonetic transcription/segmentation is planned.

    The database is distributed in 1 ISO 9660 DVD-ROM volume.

    For more information, see
    <http://catalog.elda.org:8080/product_info.php?products_id=874&osCsid=2eb47737dba8e4365c4972784a235948>http://catalog.elda.org:8080/product_info.php?products_id=874&osCsid=2eb47737dba8e4365c4972784a235948

    *** ELRA-S0220 NEMLAR Speech Synthesis Corpus ***
    The NEMLAR Speech Synthesis Corpus contains the recordings of 2 native
    Egyptian speakers (male and female, 35 years old) recorded in a studio over
    2 channel (voice + laryngograph). The data collection and transcription
    were performed by RDI (Egypt).

    Speech samples are stored in 96 kHz, 24 bit with the least significant byte
    first (“lohi” or Intel format) as (signed) integers.

    The speaker read 2,032 prompted sentences covering approx. 42,000 words in
    three categories: transcribed speech (20%), written text (50%), and
    constructed phrases (30%).

    The database is provided with orthographic, prosodic and phonetic
    transcriptions in SAMPA. All transcriptions were segmented at the
    utterance (sentence/command word) level, annotated at the word level and
    checked manually. A pronunciation lexicon including 3,589 headwords with
    phonetics in SAMPA is also available.

    The database is distributed on 3 ISO 9660 DVD-ROM volumes.

    For more information, see
    <http://catalog.elda.org:8080/product_info.php?products_id=875&osCsid=2eb47737dba8e4365c4972784a235948>http://catalog.elda.org:8080/product_info.php?products_id=875&osCsid=2eb47737dba8e4365c4972784a235948

      For more information on the catalogue, please contact Valérie Mapelli
    mailto:mapelli@elda.org



    This archive was generated by hypermail 2b29 : Fri Aug 11 2006 - 12:45:31 MET DST