Re: [Corpora-List] Re: Minor(ity) Language

From: Briony Williams (b.williams@bangor.ac.uk)
Date: Thu Mar 09 2006 - 12:47:12 MET

  • Next message: Somers, Harold: "RE: [Corpora-List] Re: Minor(ity) Language"

    Mike Maxwell wrote:
    > Reminds me of a project we (mostly Bill Poser and myself) did at the LDC
    > a few years back, in which we tried to quantify the resources available
    > for languages with at least a million speakers (of which the Ethnologue
    > reports something like 330). We looked on the web for things like 100k
    > words of monolingual and bilingual text, bilingual lexicons,
    > morphological parsers (where relevant), etc. We did _not_ try to
    > quantify more high-end things, such as syntactic parsers or MT programs
    > (although we recorded them if we found them). Everything was
    > text-based: we did not look at speech resources.

    This sounds similar to the BLARK concept ("Basic Language Resource Kit"),
    which was proposed by Stephen Krauwer and developed by ELSNET and ELRA. See
    http://www.elda.org/blark - quote: "in the framework of the ENABLER thematic
    network ... ELDA elaborated a report defining a (minimal) set of LRs to be
    made available for as many languages as possible and mapping the actual gaps
    that should be filled in so as to meet the needs of the HLT field.".

    That website also contains "BLARK matrices", one per language, to be filled
    in similarly to the LDC project described above.

    However, there are differences:

    1) BLARK covers speech resources also (not just text resources).
    2) BLARK does not set a minimum number of speakers for a language (hence it
    can cover lesser-used languages as well).
    3) BLARK also includes "high-end" modules, e.g. syntactic parsers, sentence
    generation).
    4) The BLARK matrix can be filled in with a greater degree of detail than
    "yes/no" - i.e. "irrelevant", "important", "very important", "essential".

    The website asks researchers to fill in details for languages which they have
    knowledge of - all languages, not only European ones. This is a much-needed
    project and should be encouraged.

    Best regards

    Briony Williams



    This archive was generated by hypermail 2b29 : Thu Mar 09 2006 - 12:47:15 MET