[Corpora-List] CFP: ACL 2005 Workshop on Parallel Texts and MT

From: Christof Monz (christof@umiacs.umd.edu)
Date: Thu Mar 03 2005 - 22:34:14 MET

  • Next message: Anna Korhonen: "[Corpora-List] ACL 2005 Workshop on Deep Lexical Acquisition"

                             CALL FOR PAPERS

                BUILDING AND USING PARALLEL TEXTS: DATA-DRIVEN
                      MACHINE TRANSLATION AND BEYOND

                    Workshop at the Annual Meeting of
           the Association of Computational Linguistics (ACL 2005)

                           Ann Arbor, Michigan
                            June 29-30, 2005

                      http://www.statmt.org/wpt05/

    The goal of this workshop is to provide a forum for researchers
    working on problems related to the creation and use of parallel
    text. Recent events have demonstrated once again the importance of
    inter-language communication across a broad range of languages. This
    reinforces the need for advances in machine translation (MT) and
    multi-lingual processing tools, especially for languages with scarce
    resources.

    This is a two-day workshop featuring two tracks:

        1. Building and Using Parallel Texts for Languages
           with Scarce Resources (day 1)

        2. Exploiting Parallel Texts for Statistical Machine
           Translation (day 2)

    Both tracks feature a shared task each, that allows participants to
    compare their results on a common task. Although not required, we
    encourage submissions to participate in the shared tasks for
    bench-marking purposes.

    TRACK DESCRIPTIONS

    1. BUILDING AND USING PARALLEL TEXTS FOR LANGUAGES WITH SCARCE RESOURCES

    The aim of this track is to bring together researchers involved in the
    study of creating and using parallel corpora for minority
    languages. The track will be therefore centered around issues related
    to manual/automatic collection of parallel corpora, studies in the
    "import" of knowledge from a well-studied language via parallel
    alignments, evaluations of the quality of collected corpora or the
    quality of the tools that are derived based on these corpora.

    We invite submissions of papers addressing any of the following issues:

        * Construction of parallel corpora, including the automatic
          identification and harvesting of parallel corpora from the Web
        * Tools for processing parallel corpora, including automatic
          sentence alignment, word alignment, phrase alignment, detection of
          omissions and gaps in translations, and others
        * Methods to evaluate the quality of parallel corpora and word
          alignments
        * Using parallel corpora for the derivation of language processing
    tools
          in new languages
        * Using parallel corpora for automatic corpus annotation (e.g. word
          sense disambiguation)
        * Using parallel corpora for cross-language information retrieval and
          extraction
        * The quality of language resources and systems that can be constructed
          with small amounts of parallel text and how do these scale up with
    the
          amount of text available.
        * The role of external knowledge sources (e.g. bilingual dictionaries)
          in building resources and systems relying on parallel texts.
        * Machine learning techniques for building and exploiting parallel texts
          (e.g. using small amounts of human-aligned parallel text to bootstrap
          large aligned corpora; active selection of data based on usefulness
          for different tasks)

    While we invite submissions addressing any of the above topics, or related
    issues, we particularly welcome work involving parallel corpora addressing
    languages with scarce resources.

    Shared task

    In addition to regular paper presentations, the track will also include
    a shared task for the evaluation of various word alignment techniques.
    Word alignment represents an important step in exploiting parallel corpora,
    and yet there is no common evaluation framework for such systems. This
    follows on the success of the word alignment task that took place as a part
    of the NAACL 2003 workshop on parallel text. This year's edition will be
    distinct in that it will focus on Inuktitut-English and Romanian-English
    alignment. This fits into the theme of our track, since neither Inuktitut
    nor Romanian is a widely studied language, and there are relatively few
    online resources and tools available.

    Teams that participate in the alignment exercise will be provided the
    training data for each language pair and development data taken from the
    gold standard data in order to build their systems. Thereafter they will
    be provided the unaligned gold standard data and asked to submit their
    proposed alignments in a short time frame. There will be two tracks
    for each language pair, one for teams that augment the training data with
    additional resources, and another for those that only use the training
    data. The resulting alignments will be evaluated relative to the
    previously
    mentioned gold standard data prior to the workshop. Short papers describing
    systems participating in this shared task and all evaluation methodologies
    employed will constitute a separate section in the workshop proceedings.

    A more detailed description, training, development, and test data, and
    a number of other related resources will be made available from
    http://www.cs.unt.edu/~rada/wpt05

    2. EXPLOITING PARALLEL TEXTS FOR STATISTICAL MACHINE TRANSLATION

    The focus of this track is to use parallel corpora for machine
    translation.

    Translating documents from foreign languages into English (or between
    any two languages) by computer is one of the oldest goals in
    computational linguistics. Now, armed with vast amounts of digitally
    available translated text and powerful computers, we are witnessing
    significant progress toward achieving that goal. Statistical methods
    allow the analysis of parallel text corpora and the automatic
    construction of machine translation systems. Already, for some
    language pairs such as Chinese-English or Arabic-English, statistical
    machine translation (SMT) systems built at research labs outperform
    commercial systems.

    Recent experimentation has shown that the performance of SMT systems
    varies greatly with the source language. In this workshop we would
    like to encourage researchers to investigate ways to improve the
    performance of SMT systems for diverse languages, including
    morphologically complex languages (e.g., Finnish) and languages with
    partial free word order (e.g., German). These issues lie on the border
    of linguistic analysis and statistical modeling, and the ACL
    conference is the most appropriate forum to investigate them, as ACL
    has a long tradition of hosting high-quality research in both areas.

    Topics of interest include, but are not limited to:

          * word-based, chunk-based, phrase-based, syntax-based SMT
          * using comparable corpora for SMT
          * using morphological and POS information for SMT
          * integration of rule-based MT and statistical MT
          * decoding
          * error analysis

    In addition to submissions on the topics listed above, this track of
    the workshop features a shared task and we encourage participants to
    evaluate their approaches on that task. The shared task is to evaluate
    your approach to machine translation---see the list of topics of
    interests above---on the Europarl corpus.

    A more detailed description of the shared task, the test and training
    corpora, a freely available MT system, and a number of other resources
    are available from

    http://www.statmt.org/wpt05/mt-shared-task/

    SUBMISSION INFORMATION

    Submissions will consist of regular full papers of max. 8 pages,
    formatted following the ACL 2005 guidelines. Authors of regular
    full papers will be required to indicate a track for their submission.
    In addition, teams participating in the shared tasks will be invited
    to submit short papers (max. 4 pages) describing their systems.
    Both submission and review processes will be handled electronically.

    IMPORTANT DATES

    Regular paper submissions April 10
    (shared task) Results submissions April 10
    (shared task) Short paper submissions April 17
    Notification (short and regular papers) May 4
    Camera-ready papers May 15

    ORGANIZERS

    Philipp Koehn (University of Edinburgh)
    Joel Martin (National Research Council of Canada)
    Rada Mihalcea (University of North Texas)
    Christof Monz (University of Maryland)
    Ted Pedersen (University of Minnesota, Duluth)

    CONTACT

    For questions, comments, etc. please send email to
    wpt05@umiacs.umd.edu

    PROGRAM COMMITTEE

    Lars Ahrenberg (Linkoping University)
    Bill Byrne (University of Cambridge)
    Chris Callison-Burch (University of Edinburgh)
    Nicoletta Calzolari (University of Pisa)
    Francisco Casacuberta (University of Valencia)
    David Chiang (University of Maryland)
    Mona Diab (Columbia University)
    George Foster (Canada National Research Council)
    Alexander Fraser (ISI/University of Southern California)
    Pascale Fung (Hong Kong University of Science and Technology)
    Rob Gaizauskas (University of Sheffield)
    Ulrich German (University of Toronto)
    Dan Gildea (University of Rochester)
    Jan Hajic (Charles University)
    Andrew Hardie (University of Lancaster)
    Rebecca Hwa (University of Pittsburgh)
    Nancy Ide (Vassar College)
    Kevin Knight (ISI/University of Southern California)
    Greg Kondrak (University of Alberta)
    Shankar Kumar (Johns Hopkins University)
    Philippe Langlais (University of Montreal)
    Alon Lavie (Carnegie Mellon University)
    Lori Levin (Carnegie Mellon University)
    Daniel Marcu (ISI/University of Southern California)
    Tony McEnery (University of Lancaster)
    Bridget McInnes (University of Minnesota)
    Magnus Merkel (Linkoping University)
    Bob Moore (Microsoft Research)
    Maria das Gracas Volpe Nunes (University of Sao Paulo)
    Franz-Josef Och (Google)
    Kemal Oflazer (Sabanci University)
    Miles Osborne (University of Edinburgh)
    Andrei Popescu-Belis (University of Geneva)
    Katharina Probst (CMU)
    Amruta Purandare (University of Pittsburgh)
    Florence Reeder (MITRE)
    Philip Resnik (University of Maryland)
    Antonio Ribeiro (European Commission Joint Research Council)
    Michel Simard (Xerox)
    Kevin Scannell (St. Louis University)
    Libin Shen (University of Pennsylvania)
    Eiichiro Sumita (ATR Spoken Language Translation Research Lab)
    Joerg Tiedemann (University of Groningen)
    Christoph Tillmann (IBM)
    Dan Tufis (Research Institute for AI of the Romanian Academy)
    Jean Veronis (Universite de Provence)
    Michelle Vanni (Army Research Lab)
    Stephan Vogel (Carnegie Mellon University)
    Clare Voss (Army Research Lab)
    Taro Watanabe (ATR Spoken Language Translation Research Laboratories)
    Dekai Wu (Hong Kong University of Science and Technology)



    This archive was generated by hypermail 2b29 : Thu Mar 03 2005 - 23:16:32 MET