[Corpora-List] Call for contributions: NIPS 2006 Workshop on MACHINE LEARNING FOR MULTILINGUAL INFORMATION ACCESS

From: George Foster (foster@iro.umontreal.ca)
Date: Mon Oct 02 2006 - 21:55:19 MET DST

  • Next message: Ralf Steinberger: "RE: [Corpora-List] Standard ontology for document classification?"

    Call for contributions

    NIPS 2006 Workshop

    MACHINE LEARNING FOR MULTILINGUAL INFORMATION ACCESS
    ====================================================

    http://ilt.iit.nrc.ca/MLIA/

    Description:
    ------------
    In many different settings, accessing information available in different
    languages is a challenge.

    In Europe, the wide variety of languages is clearly a bottleneck for
    efficient circulation and access to information. More than half of EU
    citizens cannot hold a conversation in a language other than their
    mother tongue. Even in an officially bilingual country like Canada, less
    than one in five are considered to have a good enough command of both
    official languages (2001 census data).

    The traditional paradigm for addressing this issue is to perform human
    translation on a massive scale, and rely on monolingual information
    access technology. Although this model has worked reasonably well in the
    past, the rapid increase in the amount of information produced (and, in
    Europe, in the number of languages covered) raises questions as to its
    sustainability. Machine Learning has the potential to help develop and
    deploy technology that provides:

        1. access to information across different languages,
        2. usable translation from one language to another.

    We are interested in Machine Learning techniques addressing for example
    the following problems:

       * Word alignment
       * Machine translation
       * Multilingual lexicon and terminology extraction
       * Cross-lingual information retrieval
       * Cross-lingual categorisation

    Goals of the workshop:
    ----------------------

    Multilingual applications are also emerging as a promising application
    for some Machine Learning techniques, for example the use of Kernel CCA
    for Cross-Language applications, or large-margin approaches to word
    alignment. This new trend converges with a well-established interest of
    the Natural Language Processing community for learning approaches.

    The purpose of this workshop is to provide a forum for discussion of
    current developments at the intersection between multilingual processing
    and machine learning. This includes developing new techniques to address
    various multilingual information access problems (e.g. translation), but
    also scaling up existing techniques to the available NLP data,
    developing tools for cross-language information retrieval, etc.

    We will promote discussions of some inter-related key issues in applying
    Machine Learning to Multilingual problems:

    * SCALING UP:
       - Applying ML to 100 million words corpora (e.g. SMT)
       - Deploying ML solutions on new language pairs

    * SCARCE RESOURCES:
       - Languages or domains with limited bilingual corpora
       - Bootstrapping limited resources

    * EVALUATION:
       - Design of better performance measures
       - Optimisation of application-specific measures
       - Learning human evaluation

    * PRIOR LINGUISTIC KNOWLEDGE:
      - Modelling and using linguistic knowledge in ML
      - The continuum between all-data (SMT) and all prior knowledge
        (handcrafted rules)

    Submission instructions:
    ------------------------

    Researchers interested in presenting their work at the workshop should
    send an email to: mlia@nrc-cnrc.gc.ca
    (preferably plain text) with the following information:

    - Title
    - Author(s)
    - Abstract (around 1 page)

    Schedule:
    Submission deadline: 29 October 2006
    Notification: 6 November 2006
    Workshop date: 8 or 9 December 2006

    Co-organisers:
    --------------
    Cyril Goutte, National Research Council Canada (contact)
    Nicola Cancedda, Xerox Research Centre Europe
    Marc Dymetman, Xerox Research Centre Europe
    George Foster, National Research Council Canada

    Workshop format:
    ----------------
    We intend to leave a good part of the workshop to panel discussions that
    will address relevant topics in multilingual information access (MIA),
    as well as invited talks presenting some important MIA problems and
    associated challenges for Machine Learning. For each half day, we will
    start with either a keynote or a short tutorial, continue with a few
    shorter technical presentations, and end with a panel discussion (topics
    to be decided depending on the confirmed list of speakers).

    Invited speakers:

    - Dan Melamed (Courant Institute, NYU)
    - John Shawe-Taylor (ECS, U. of Southampton, UK), tbc
    - Ralf Steinberger (JRC, Ispra, Italy)
    - Wray Buntine (HIIT, Helsinki, Finland), tbc

    Related work:
    -------------
    Past NIPS workshops have addressed related topics such as learning with
    structured data, or the use of Machine Learning for Natural Language
    Processing. There is also some ongoing interest within the European
    network of excellence Pascal, as exemplified by the recent workshop on
    intelligent information access. However none of these specifically
    target multilingual aspects. We believe there is sufficient interest and
    genuine need on this particular aspect to justify a specific focus on
    multilingual information access. The newly started European project
    SMART (Statistical Multilingual Analysis for Retrieval and Translation)
    is specifically targeting advanced machine learning techniques for
    multilingual applications.



    This archive was generated by hypermail 2b29 : Mon Oct 02 2006 - 22:25:41 MET DST