RE: [Corpora-List] starting a machine translation project

From: zhang min (mzhang@i2r.a-star.edu.sg)
Date: Wed Sep 13 2006 - 10:35:00 MET DST

  • Next message: Kees Koster: "Re: [Corpora-List] starting a machine translation project"

    Yes, you have to provide English-to-Indonesian bilingual corpus to do SMT
    training. As I know, there is not existing such kind of parallel corpus and
    it is a quite tough job to construct this corpus.

    Does anyone know where we can get English-to-Indonesian bilingual corpus?

    Cheers,

    Zhang Min

    -----Original Message-----
    From: owner-corpora@lists.uib.no [mailto:owner-corpora@lists.uib.no] On
    Behalf Of Nano Surbakti
    Sent: 2006年9月13日 16:26
    To: CORPORA@UIB.NO
    Subject: [Corpora-List] starting a machine translation project

    Hi,

    We want to start an English-Indonesian MT project. We found that
    there is an opensource MT toolkit, "Moses", in http://www.statmt.org

    I don't know much about machine translation. From some articles I've
    been reading, it looks like Statistical translation method is a rather
    easy but yet produce a reasonable result.

    I got some newbie-like questions:
    - Our main purpose is to make an opensource English-to-Indonesian MT,
    can we use Moses for this purpose, or perhaps Moses is specific for
    Foreign-to-English translation only?
    - AFAIK, we have to provide bilingual corpus to do the statistical
    training. Some articles mentioned about "phrase translation". Do we
    need to provide some kind of phrase table, or perhaps it is generated
    automatically by a special program?
    - If we can't use Moses, do you have some guidance for us, perhaps
    like some pointers to opensource toolkit?
    - As a rough prediction, how many months is it going take to develop
    an "early-version" of English-to-ForeignLanguage MT ?

    Regards,

    --
    Nano Surbakti
    (sorry if you got double posting)
    

    ------------ Institute For Infocomm Research - Disclaimer ------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you. --------------------------------------------------------



    This archive was generated by hypermail 2b29 : Wed Sep 13 2006 - 10:50:31 MET DST