[Corpora-List] CFP: HLT-NAACL 2006 Workshop on Statistical Machine Translation

From: Christof Monz (christof@dcs.qmul.ac.uk)
Date: Thu Dec 08 2005 - 19:15:03 MET

  • Next message: Marco Baroni: "[Corpora-List] Second Call for Papers: Web as Corpus at EACL 2006"

                              First Call for Papers

                             NAACL 2006 WORKSHOP ON
                        STATISTICAL MACHINE TRANSLATION

                             June 8 or June 9, 2006

                          http://www.statmt.org/wmt06/

    Translating documents from foreign languages into English (or between
    any two languages) by computer is one of the oldest goals in
    computational linguistics. Now, armed with vast amounts of digitally
    available translated text and powerful computers, we are witnessing
    significant progress toward achieving that goal. Statistical methods
    allow the analysis of parallel text corpora and the automatic
    construction of machine translation systems. Already, for some
    language pairs such as Chinese-English or Arabic-English, statistical
    machine translation (SMT) systems built at research labs outperform
    commercial systems.

    The focus of this workshop is to use parallel corpora for machine
    translation. It can be seen as an attempt to repeat the success of the
    2005 ACL Workshop on Parallel Text, organized last year, which
    featured a track on statistical machine translation and a shared task
    on building machine translation systems.

    Recent experimentation has shown that the performance of SMT systems
    varies greatly with the source and target language. In this workshop
    we would like to encourage researchers to investigate ways to improve
    the performance of SMT systems for diverse languages, including
    morphologically complex languages (e.g., Finnish) and languages with
    partial free word order (e.g., German). Besides experimental work and
    system building, we also encourage linguistic analysis of problems of
    the current state of the art in statistical machine translation, as
    showcased by last year's ACL 2005 Workshop on Parallel Text shared
    task.

    Topics of interest include, but are not limited to:

         * word-based, chunk-based, phrase-based, syntax-based SMT
         * using comparable corpora for SMT
         * using morphological and POS information for SMT
         * integration of rule-based MT and statistical MT
         * decoding
         * error analysis
         * Statistical MT for resource-poor languages
         * Domain robustness/adaptation for MT
         * Evaluation of translation quality

    SHARED TASK

    In addition to submissions on the topics listed above, this workshop
    features a shared task and we encourage participants to evaluate their
    approaches on that task. The shared task is to evaluate your approach
    to machine translation---see the list of topics of interests
    above---on the Europarl corpus.

    We provide training data for three European language pairs, and a
    common framework (including a language model and a baseline
    system). The task is to improve methods to build a phrase translation
    table (e.g. by better word alignment, phrase extraction, phrase
    scoring), augment the system otherwise (e.g. by preprocessing), or
    build entirely new translation systems.

    The participants' system is used to translate a test set of unseen
    sentences in the source language. The translation quality is measured
    by the BLEU score, which measures overlap with a reference
    translation, and manual evaluation. Participants agree to contribute
    to the manual evaluation about eight hours of work.

    To have a common framework that allows for comparable results, and
    also to lower the barrier to entry, we provide

         * a fixed training set
         * a fixed language model
         * a fixed baseline system

    More information on the shared task can be found at:
    http://www.statmt.org/wmt06/shared-task/

    SUBMISSION INFORMATION

    Submissions will consist of regular full papers of max. 8 pages,
    formatted following the NAACL 2006 guidelines. Authors of regular full
    papers will be required to indicate a track for their submission. In
    addition, teams participating in the shared tasks will be invited to
    submit short papers (max. 4 pages) describing their systems. Both
    submission and review processes will be handled electronically.

    IMPORTANT DATES

    ------------------------------------
    Regular papers:

    Submissions: March 17
    Notification: April 7
    ------------------------------------
    Shared Task:

    Results submissions: March 31
    Short paper submissions: April 7
    Notification: April 14
    ------------------------------------

    Camera-ready papers April 21

    ------------------------------------

    ORGANIZERS

    Philipp Koehn (University of Edinburgh)
    Christof Monz (Queen Mary, University of London)

    CONTACT

    For questions, comments, etc. please send email to
    naacl.wmt06@dcs.qmul.ac.uk

    PROGRAM COMMITTEE

    Bill Byrne (University of Cambridge)
    Chris Callison-Burch (University of Edinburgh)
    Francisco Casacuberta (University of Valencia)
    David Chiang (University of Maryland)
    Stephen Clark (Oxford University)
    Marcello Federico (ITC-IRST)
    George Foster (Canada National Research Council)
    Alexander Fraser (ISI/University of Southern California)
    Jan Hajic (Charles University)
    Kevin Knight (ISI/University of Southern California)
    Greg Kondrak (University of Alberta)
    Shankar Kumar (Google)
    Philippe Langlais (University of Montreal)
    Daniel Marcu (ISI/University of Southern California)
    Dan Melamed (New York University)
    Franz-Josef Och (Google)
    Miles Osborne (University of Edinburgh)
    Philip Resnik (University of Maryland)
    Libin Shen (University of Pennsylvania)
    Wade Shen (MIT-Lincoln Labs)
    Michel Simard (Canada National Research Council)
    Eiichiro Sumita (ATR Spoken Language Translation Research Laboratories)
    Joerg Tiedemann (University of Groningen)
    Christoph Tillmann (IBM)
    Taro Watanabe (ATR Spoken Language Translation Research Laboratories)
    Richard Zens (RWTH Aachen)



    This archive was generated by hypermail 2b29 : Thu Dec 08 2005 - 19:49:31 MET