Re: [Corpora-List] Enquiry about Indonesian corpus

From: Mike Maxwell (maxwell@ldc.upenn.edu)
Date: Tue Mar 16 2004 - 20:23:29 MET

  • Next message: Mark Davies: "[Corpora-List] Portuguese thesaurus/dictionary"

    Jelita Asian wrote:
    > ...A person from
    > Linguistic Data Consortium recommend me to contact you to get hold of
    > some Indonesian corpus. Do you have any Indonesian corpus with you?
    > If not, do you know who we can contact to get hold of it?

    I'm not sure who you contacted here at the LDC. About a year ago, we looked
    into what was available on-line for Bahasa Indonesian, without actually
    creating a corpus. It turns out there is a huge amount of news text, which
    you can easily download and turn into a news corpus, if that is the type of
    corpus you want. Judging by what we've seen in other languages, you can
    doubtless find other genera on-line too. (We were specifically searching
    for news.)

    The Tempo Interactive might be a source of parallel bilingual text.
    Caution: when we looked at this, it was not apparent whether their English
    and Indonesian articles were actually parallel, which is why I say "might".

    There are also several on-line dictionaries and a couple morphological
    parsers, although from what little I know of Indonesian, there shouldn't be
    too much morphology to worry about.

    In summary, if you don't find that anyone else has compiled a corpus, you
    could put one together yourselves without too much effort. You might even
    find a "market" for it.

        Mike Maxwell
        Linguistic Data Consortium
        maxwell@ldc.upenn.edu



    This archive was generated by hypermail 2b29 : Tue Mar 16 2004 - 20:46:09 MET