Re: [Corpora-List] lexicographic tools for parallel/comparable corpora

From: mickel.gronroos@masterin.com
Date: Tue Feb 06 2007 - 19:03:27 MET

  • Next message: Briony Williams: "[Corpora-List] CFP: Special Session on "Speech and language technology for less-resourced languages""

    Joerg Tiedemann <tiedeman@let.rug.nl> kirjoitti:

    > I'm looking for information about tools for the lexicographic use of
    > parallel and comparable corpora.

    The Finnish translation technology company Masterin has a bilingual term extractor that builds a raw bilingual translation lexicon from translation memory databases (which are, naturally, comparable to parallel corpora). (Shameless plug: The term extraction module will be available in the forth-coming Masterin 2007 translation tool.)

    Masterin's solution is language-aware and supports English, Swedish and Finnish (any pair and direction). This enables the use of both rule-based and more traditional statistical approaches which in turn leads to impressive results. The tool is being used for the extraction of domain-specific translation lexica as we speak and very efficiently I might add.

    I'll be glad to do a test run for you, should you have any parallel data in the languages covered by Masterin. (Maybe some English-Swedish stuff?) Please feel free to contact me directly.

    Best regards,

    Mickel Grönroos

    --
    Mickel Grönroos
    Chief Language Officer, Masterin
    Tekniikantie 14, FIN-02150 Espoo, Finland, www.masterin.com
    



    This archive was generated by hypermail 2b29 : Tue Feb 06 2007 - 19:01:10 MET