Re: [Corpora-List] Guidance Needed for Corpus Building

From: Gregor Erbach (gor@acm.org)
Date: Wed Apr 13 2005 - 07:59:04 MET DST

  • Next message: Eric Atwell: "Re: [Corpora-List] On-line concordancer for the European Constitution"

    Daniel,
    a relevant initiative is OLAC, the Open Language Archives Community, "an
    international partnership of institutions and individuals who are creating a
    worldwide virtual library of language resources by: (i) developing consensus on
    best current practice for the digital archiving of language resources, and (ii)
    developing a network of interoperating repositories and services for housing
    and accessing such resources". The website is located at
    http://www.language-archives.org/

    regards,

       Gregor

    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Dr. Gregor Erbach http://purl.org/net/gregor/
    DFKI GmbH, Language Technology Lab http://www.dfki.de/
    Tel. +49 (681) 302-5354 mailto:erbach@dfki.de

    Quoting Daniel Yacob <corpora@geez.org>:

    > Greetings,
    >
    > I'm at the very starting point of compiling an Amharic corpus
    > comprised of a large number of files and word lists in my
    > possession. I'm investigating starting my own project vs
    > joining an existing effort.
    >
    > I have found lots of information from the LinguistList site
    > and in particular "David Lee's Bookmarks for Corpus-based
    > Linguists". However, it is a lot of info to sort thru and
    > I can not evaluate well the usefulness of some resources.
    > For example, the "XML Corpus Encoding Standard" looks promising
    > but documentation has not changed in nearly 3 years -is it
    > widely used or a dead project? The Linguistic Data
    > Consortium appears to have the right goals but is also
    > subscription based -? I want to keep the data available freely.
    >
    > I would be grateful if people here could send recommendations
    > for tools to use and references for groups active in
    > developing free corpus materials.
    >
    > thank you,
    >
    > Daniel
    >
    >



    This archive was generated by hypermail 2b29 : Wed Apr 13 2005 - 08:36:11 MET DST