Corpora: Lexicon development for MT

Adam Kilgarriff (Adam.Kilgarriff@itri.brighton.ac.uk)
Thu, 10 Sep 1998 11:19:32 +0100

==========================
Lexicon development for MT
==========================

I am trying to establish what techniques have been used for lexicon
development in MT to enable the MT system to select the correct
translation for an ambiguous word.

Hutchins and Somers (Intro to MT, 1992) says little on the topic but
gives a striking example: the SYSTRAN English-French lexicon
responsible for word choice contains 400 hand-crafted rules governing
the one English word, {\em oil}, and when it should be translated as
{\em huile}, when {\em p\'{e}trole} (p 179). Is this still the state
of the art, or has NLP done anything to help?

I am aware of the widespread use of templates, and the use (eg at New
Mexico) of inheritance, and of sophisticated techniques based on
parallel corpora for extracting translation equivalents for
terminology, but these are only accidentally likely to help with this
particular problem. Much Word Sense Disambiguation work is in
principle relevant, but, with the honourable exception of Dagan and
Itai (CL 20 (4), 1994) it is not clear whether any of it can be
tailored to the specific needs of an MT system (and I do not believe
any of it has been).

I'd very much like to hear about semi-automatic and "lexicographer's
workbench" approaches as well as fully automatic techniques.
Responses are particularly welcome (even if they are laments) from
people who have working MT systems.

I'll collate responses and post a summary in due course,

Adam Kilgarriff

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Adam Kilgarriff
Senior Research Fellow tel: (44) 1273 642919
Information Technology Research Institute (44) 1273 642900
University of Brighton fax: (44) 1273 642908
Lewes Road
Brighton BN2 4GJ email: Adam.Kilgarriff@itri.bton.ac.uk
UK http://www.itri.bton.ac.uk/~Adam.Kilgarriff
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%