[Corpora-List] ELRA - Language Resources Catalogue - Update

From: ELDA (info@elda.org)
Date: Thu Mar 16 2006 - 12:26:08 MET

  • Next message: John D. Burger: "Re: [Corpora-List] Phrag installation"

    Our apologies if you have received multiple copies of this announcement
     
    *******************************************************************
    ELRA - Language Resources Catalogue - Update

    *******************************************************************

    We are happy to announce that new Text and Speech Language Resources are
    now available in our catalogue.
    To view all the Language Resources available, you can visit our on-line
    catalogue : http://catalog.elda.org/index.php?language=en

    L0058: British English Source Lexicon (BESL) version 2.2
    BESL consists of over 230,000 lemmas, over 350,000 word forms, 60,000
    proper nouns, 3,000 abbreviations, and 58,000 multi-word compound nouns.
    Each headword is provided with a full listing of all inflected forms and
    other morphological variation. Every word form is marked for part of
    speech (using Penn TreeBank notation). Most single-word forms include a
    representation of IPA pronunciation. BESL covers both British and
    American English, and other spelling variants, with cross-references
    between corresponding forms. BESL is provided in XML.
    For more information, see
    http://catalog.elda.org:8080/product_info.php?cPath=42_44&products_id=834&osCsid=7baa5c04a4e133faf8f27719b14eb3bf

    L0059: Offensive Word Filter 1
    This list features 4500 words and expressions for UK and US English
    usage with a grading system describing vocabulary type and offensive
    strength for each term, plus collocational information to help identify
    the terms in context. The list is provided in tab-delimited ASCII.
    For more information, see
    http://catalog.elda.org:8080/product_info.php?cPath=42_44&products_id=835&osCsid=7baa5c04a4e133faf8f27719b14eb3bf

    L0060: Offensive Word Filter 2
    This list features 2000 words and expressions, classified into 13
    categories, for UK and US English usage with a grading system describing
    vocabulary type and offensive strength for each term, plus collocational
    information to help identify the terms in context. The list is provided
    in an Excel spreadsheet.
    For more information, see
    http://catalog.elda.org:8080/product_info.php?cPath=42_44&products_id=836&osCsid=7baa5c04a4e133faf8f27719b14eb3bf

    L0061: The Oxford Spanish Dictionary
    This dictionary consists of 300,000 words and phrases, 500,000
    translations, for 24 regional varieties of Spanish. It includes
    thousands of real, authentic example sentences carefully selected to
    illustrate the full range of meanings and typical contexts. The
    dictionary is provided in XML or SGML.
    For more information, see
    http://catalog.elda.org:8080/product_info.php?cPath=42_44&products_id=837&osCsid=7baa5c04a4e133faf8f27719b14eb3bf

    L0062 : L0062 French Source Lexicon
    This source lexicon contains morphological and phonetic data for French.
    It consists of over 90,000 headwords/lemmas, 400,000 wordforms, 1,000
    abbreviations, and 35,000 proper nouns. Each headword lemma is provided
    with a full listing of its possible syntactic forms and spelling
    variants, along with information on their relationship to the headword
    form. In addition, a representation of the IPA pronunciation is given
    for every form. There is also information on domains in which the
    headwords are used, e.g. Computing, Engineering, Zoology. The lexicon is
    provided in SGML.
    For more information, see
    http://catalog.elda.org:8080/product_info.php?cPath=42_44&products_id=838&osCsid=7baa5c04a4e133faf8f27719b14eb3bf

    L0063 : L0063 Spanish Source Lexicon
    This source lexicon contains morphological and phonetic data for
    Spanish. It consists of over 575,000 wordforms, 1,000 abbreviations, and
    25,000 proper nouns. Each headword lemma is provided with a full listing
    of its possible syntactic forms and spelling variants, along with
    information on their relationship to the headword form. In addition, a
    representation of the IPA pronunciation is given for every form. There
    is also information on domains in which the headwords are used, e.g.
    Computing, Engineering, Zoology. The lexicon is provided in SGML.
    For more information, see
    http://catalog.elda.org:8080/product_info.php?cPath=42_44&products_id=839&osCsid=7baa5c04a4e133faf8f27719b14eb3bf

    L0064 : L0064 Italian Source Lexicon
    This source lexicon contains morphological and phonetic data for
    Italian. It consists of over 115,000 headwords/lemmas and 925,000
    wordforms. Each headword lemma is provided with a full listing of its
    possible syntactic forms and spelling variants, along with information
    on their relationship to the headword form. In addition, a
    representation of the IPA pronunciation is given for every form. There
    is also information on domains in which the headwords are used, e.g.
    Computing, Engineering, Zoology. The lexicon is provided in SGML.
    For more information, see
    http://catalog.elda.org:8080/product_info.php?cPath=42_44&products_id=840&osCsid=7baa5c04a4e133faf8f27719b14eb3bf

    T0368 : Multilingual Wordbank
    The Multilingual Wordbank consists of word translation glossaries
    designed for the travel/handy-reference market. It consists of 17,500
    core terms from English into French, German, Italian, Spanish, and
    Portuguese, plus full coverage of local variations in American English,
    Latin American Spanish, and Brazilian Portuguese. Every word is given a
    frequency ranking, which can be used as a guide to user levels. In
    addition, all translations in the Wordbank are provided along with
    appropriate part of speech and gender information. It is provided in
    tab-delimited text.
    For more information, see
    http://catalog.elda.org:8080/product_info.php?cPath=24&products_id=841&osCsid=7baa5c04a4e133faf8f27719b14eb3bf
     
    T0369 : Multilingual Phrasebank
    The Phrasebank consists of 3,000 base phrases per language organized
    under 9 different topics, many of which are further subdivided. It is
    presented in a compressed format, with substitutable elements bracketed,
    and one or several alternatives included within the entry, reducing
    storage space wasted due to repetition of common material. The
    compression is extended further by reference to "template" sets of
    common terms, e.g. Days of the Week, Parts of the Body, allowing a base
    phrase to be combined with up to 100 different terms. 9 languages
    covered (incl regional variants): UK English, US English, French,
    German, Italian, European Spanish, Latin American Spanish, European
    Portuguese, Brazilian Portuguese. It is provided in tab-delimited text
    for phrases and Excel spreadsheets for template lists.
    For more information, see
    http://catalog.elda.org:8080/product_info.php?cPath=24&products_id=842&osCsid=7baa5c04a4e133faf8f27719b14eb3bf
       
    T0370 : Dictionary of Law
    Over 4,000 entries define and explain the major terms, concepts,
    processes, and the organization of the English legal system. It features
    authoritative and up-to-date articles which have been written by
    practising and academic lawyers. New entries cover the Woolf reforms,
    human rights law, as well as family law, central and local government,
    and international law. The dictionary is provided in XML.
    For more information, see
    http://catalog.elda.org:8080/product_info.php?cPath=24&products_id=843&osCsid=7baa5c04a4e133faf8f27719b14eb3bf

    T0371 : Dictionary of Medecine
    Over 10,000 clear and concise entries cover all major medical and
    surgical specialities. The dictionary reflects recent developments in
    the medical field, covering new drugs in clinical use, as well as new
    advances in genetics, infertility treatment, cancer, organ
    transplantation, and BSE. The dictionary is provided in XML.
    For more information, see
    http://catalog.elda.org:8080/product_info.php?cPath=24&products_id=844&osCsid=7baa5c04a4e133faf8f27719b14eb3bf

    S0209 : Oxford English phonetics files
    Derived from a range of Oxford Dictionaries, these files list word forms
    together with a representation of their IPA pronunciation. It contains
    250,000 words. Pronunciation is based on standard British English. Word
    forms include dictionary lemmas and inflections or other morphological
    variations, plus a wide range of proper name and encyclopedic material.
    The data also includes a large number of common multi-word phrases and
    compound nouns. The files are provided in XML.
    For more information, see
    http://catalog.elda.org:8080/product_info.php?cPath=37_41&products_id=845&osCsid=7baa5c04a4e133faf8f27719b14eb3bf

    S0210 : Shorter Oxford English Dictionary - Audio Files
    These are recorded headwords for the Shorter Oxford English Dictionary.
    British English pronunciation. It consists of over 95,000 soundfiles.
    The files are provided in 11kHz 8-bit WAV.
    For more information, see
    http://catalog.elda.org:8080/product_info.php?products_id=846&osCsid=f929035bd1601c2221f5beeb5144689c

    W0041 : Corpus of Contemporaneous Spanish Novels
    This corpus consists of 11 novels written in Castilian Spanish by
    Inmaculada Ferrer-Vidal Turull, a contemporaneous author.
    For more information, see
    http://catalog.elda.org:8080/product_info.php?products_id=847&osCsid=f929035bd1601c2221f5beeb5144689c

    For more information on the catalogue, please contact Valérie Mapelli
    mailto:mapelli@elda.org



    This archive was generated by hypermail 2b29 : Thu Mar 16 2006 - 12:31:55 MET