[Corpora-List] Guidance Needed for Corpus Building

From: Daniel Yacob (corpora@geez.org)
Date: Wed Apr 13 2005 - 09:59:51 MET DST

  • Next message: Mona Diab: "[Corpora-List] Final CFP -- Computational Approaches to Semitic Languages ****EXTENDED SUBMISSION DEADLINE April 15th 2005"

    Greetings,

    I'm at the very starting point of compiling an Amharic corpus
    comprised of a large number of files and word lists in my
    possession. I'm investigating starting my own project vs
    joining an existing effort.

    I have found lots of information from the LinguistList site
    and in particular "David Lee's Bookmarks for Corpus-based
    Linguists". However, it is a lot of info to sort thru and
    I can not evaluate well the usefulness of some resources.
    For example, the "XML Corpus Encoding Standard" looks promising
    but documentation has not changed in nearly 3 years -is it
    widely used or a dead project? The Linguistic Data
    Consortium appears to have the right goals but is also
    subscription based -? I want to keep the data available freely.

    I would be grateful if people here could send recommendations
    for tools to use and references for groups active in
    developing free corpus materials.

    thank you,

    Daniel



    This archive was generated by hypermail 2b29 : Wed Apr 13 2005 - 00:25:06 MET DST