[Corpora-List] Call for Papers: ELECTRA 2005

From: ddg@di.ubi.pt
Date: Sat Mar 26 2005 - 15:37:39 MET

  • Next message: Carlos Rodriguez: "[Corpora-List] Special-domain corpora"

    [Apologies for Multiple Postings]

    ============================CALL FOR PAPERS============================

         ELECTRA Workshop on Methodologies and Evaluation of Lexical
               Cohesion Techniques in Real-world Applications
                           (Beyond Bag of Words)

               In association with the 28th Annual International
                ACM SIGIR Conference on Research and Development
                     in Information Retrieval (SIGIR 2005)

                       Sponsored by Yahoo! Research Labs

                        Pestana Bahia, Salvador, Brazil

                                August 19, 2005

                http://research.yahoo.com/workshops/electra2005/

    ============================CALL FOR PAPERS============================

    GUIDELINES:

    [1] Description
    [2] Target Audience
    [3] Areas of Interest
    [4] Important Dates
    [5] Paper Submission
    [6] Organising Committee
    [7] Program Committee
    [8] Contact

    ----------------
    [1] Description:
    ----------------

    Lexical cohesion can be subdivided into two distinct areas: (1) lexical
    associations, that embody a wide spectrum of language phenomena such as
    named entities, multiword units, collocations and word co-occurrences
    and (2) lexical relations that provide evidence of the semantic and
    discourse structure of text through relations between terms over large
    distances.

    The central goal of this workshop is to bring together researchers in NLP
    and IR to discuss the use of lexical cohesion in text applications, such
    as document and passage retrieval, question answering, topic segmentation
    and text summarization. Indeed, despite the fact that both communities are
    working with the same material (human language), collaboration between
    them has so far been limited.

    In this workshop we are interested in pointing at successes and failures
    of the integration of lexical cohesion in real-world IR applications. On
    the one hand, lexical cohesion has received much attention in Information
    Retrieval research during its more than 30-year old history, but so far
    with mixed results. On the other hand, a considerable amount of research
    has been devoted to this subject, both in terms of theory and practice, by
    the Natural Language Processing community, but with limited evaluation in
    real-world applications. It is clear that we are at a point where both
    communities should meet in order to discuss related issues. This is the
    objective of this workshop.

    In particular, we will address two questions that are of great importance
    for real-world IR applications.

    1) Efficient methodologies for Lexical Cohesion identification

    Lexical cohesion has received attention in IR research since its outset.
    We can point to (a) the identification and the use of multiword
    units for indexing and search, and (b) the extraction of long-distance
    lexical relations for tasks such as passage retrieval, topic segmentation
    or text summarization.

    On the one hand, the interest in multiword units (or phrases) can be
    partially attributed to the fact that phrases typically have a higher
    information content and specificity than single words, and therefore
    represent the concepts expressed in text more accurately than single terms.

    On the other hand, interest in long-distance lexical relations in
    text has been motivated in IR research by the realization of the limitations
    of most IR models that assume term independence in text. As a consequence,
    a number of techniques have been developed to improve term independence
    models, such as passage retrieval and query expansion techniques.

    The choice of the methodologies and techniques for these tasks has
    always been restricted by the problem of efficiency that is critical
    for real-world IR applications. Indeed, real-world IR applications are
    constrained by variables such as processing time and memory space.
    Identifying and extracting lexical associations and lexical relations
    is a computationally intensive process. In recent years new algorithms
    and new technologies have been proposed to introduce lexical cohesion
    techniques in large scale applications, thus avoiding previous intractable
    implementations.

    Previous workshops on lexical cohesion have mainly focused on the
    unconstrained extraction process. In this workshop, we would like to focus
    on the comparison of different factors that can influence the scalability
    of the treatment of lexical cohesion in real-world applications, namely
    data structures, algorithms, parallel and distributed computing or grid
    computing. We would also be interested in new methodologies for lexical
    cohesion that may easily scale to real-world applications based on complexity
    measurements.

    2) Evaluation of the benefits of Lexical Cohesion in IR applications

    Contiguous lexical associations have often been used in experimental IR
    systems. Different techniques have been studied for this purpose:
    (a) statistical methods based on co-occurrence statistics or ngram language
    modeling techniques (b) hybrid techniques based on simple statistics and
    shallow linguistic techniques such as part-of-speech tagging and noun-phrase
    chunking and (c) knowledge-based techniques. However, the importance of the
    contribution of phrase matching has not been systematically quantified.
    Moreover, the evaluation of such techniques is difficult in IR applications,
    as the number of environment variables is very large and each system combines
    a variety of indexing and matching techniques. Therefore, a more focused
    and systematic approach towards analyzing the uses of lexical associations
    in IR and their evaluation is needed. This workshop will provide a framework
    for such analysis, and will present for discussion a number of challenging
    questions regarding the use of lexical associations in text. In particular
    we will ask questions such as: How should multiword units be incorporated
    into IR models designed for single terms? What weighting models can be used
    for them? How should they be matched against their lexical-syntactic variants
    in text? How should we handle non-contiguous lexical associations? How can we
    avoid over-weighting a phrase occurrence in a document matching more than one
    phrase in the query? These are only few questions of a huge field of research
    full of unsolved problems.

    In contrast with contiguous lexical units, relations between
    non-contiguous lexical units are important building blocks of the text,
    forming its lexical cohesion. Indeed, the complete meaning of a word
    in text can only be realized when it is interpreted in combination with
    the surrounding words, forming lexical cohesive ties with them. These lexical
    relations have been used for a number of IR tasks, for example query
    expansion, passage retrieval, topic segmentation and text summarization.
    However, most of the techniques do not use deep semantic or discourse
    structure information in identifying such relations, instead relying
    on their statistical evidence i.e. their co-occurrence patterns. In fact,
    very little work has explored the use of NLP techniques such as lexical
    chaining or discourse analysis that make use of semantic and discourse
    structure within text to improve the performance of IR applications.
    One of the main objections to the use of such techniques has been the
    claim that they are more computationally demanding than statistical
    co-occurrence techniques. However, with the development of more efficient
    algorithms by the NLP community it will be interesting to
    further explore the use of such techniques in IR applications.

    As a consequence, we would like to gather people who use lexical relations
    in different subfields of IR. Non-trivial questions are addressed here.
    What types of lexical relations prove useful for different IR tasks? What
    statistical models are most effective for the identification of lexical
    relations for different IR tasks? Can linguistic techniques for identifying
    lexical relations in text, such as lexical chaining or discourse analysis
    techniques be useful for any IR tasks? How can contiguous or non-contiguous
    lexical cohesive relations be identified in text? How can we reliably
    evaluate and compare these techniques?

    --------------------
    [2] Target Audience:
    --------------------

    This workshop is intended to bring together IR and NLP researchers
    working on all areas of information retrieval and using lexical
    associations in information retrieval applications. The objective is to
    discuss what has been achieved in this area, to establish common
    themes between different approaches, and to discuss future research
    directions.

    ----------------------
    [3] Areas of Interest:
    ----------------------

    Papers are invited on, but not limited to, the following topics:

    * Efficient Techniques for Lexical Cohesion identification
    * Scalable Algorithms for Lexical Cohesion identification
    * Lexical Associations and Lexical Relations Resources
    * Document Representation and Lexical Associations
    * Document Ranking and Lexical Associations
    * Single-Term and Phrase Information Retrieval
    * Passage Retrieval and Lexical Cohesion
    * Query Expansion and Lexical Associations
    * Local and Global Context Analysis
    * Ontology-based Query Expansion
    * Question Answering and Lexical Relations
    * Web Search and Lexical Cohesion
    * Topic Segmentation and Lexical Cohesion
    * Text Summarization and Lexical Cohesion
    * Evaluation Standards and Benchmarks
    * Qualitative and Quantitative Evaluations

    Papers can cover one or more of these areas.

    --------------------
    [4] Important dates:
    --------------------

    Paper submission deadline: May 15th, 2005
    Notification: June 15th, 2005
    Camera ready papers: July 1st, 2005
    Workshop: August 19th, 2005

    ---------------------
    [5] Paper Submission:
    ---------------------

    Papers should follow SIGIR 2005 instructions
    (http://www.dcc.ufmg.br/eventos/sigir2005/). Papers should
    be submitted electronically in pdf format only to Rosie Jones
    [jonesr@yahoo-inc.com]. The following URL transforms
    postscript files to pdf files (http://www.ps2pdf.com/). The subject
    line should be "SIGIR 2005 ELECTRA WORKSHOP PAPER SUBMISSION".

    Because reviewing is blind, no author information should be included
    as part of the paper (i.e. the names of the authors and references
    that could identify the authors). An identification page must be sent
    in a separate email with the subject line
    "SIGIR 2005 ELECTRA WORKSHOP ID PAGE" and must include title, author(s),
    keywords, page number and name and email of the contact author.

    Late submissions will not be accepted. Notification of receipt will
    be emailed to the contact author shortly after receipt.

    -------------------------
    [6] Organising Committee:
    -------------------------

    Rosie Jones (Yahoo! Inc, United States of America)
    Olga Vechtomova (University of Waterloo, Canada)
    Gaël Harry Dias (University of Beira Interior, Portugal)

    ----------------------
    [7] Program Committee:
    ----------------------

    Brigitte Grau - (LIMSI, France)
    Bruce Croft - (University of Massachusetts, USA)
    Charlie Clarke - (University of Waterloo, Canada)
    Diana Inkpen - (University of Ottawa, Canada)
    Dunja Mladenic - (Josef Stephan Institute, Slovenia)
    Patrick Pantel - (University of Southern California, USA)
    Egidio Terra - (Pontifícia Univ. Católica do Rio Grande do Sul, Brazil)
    Gabriel Lopes - (New University of Lisbon, Portugal)
    Graeme Hirst - (University of Toronto, Canada)
    Hal Daume - (University of Southern California, USA)
    Helena Ahonen-Myka (University of Helsinki, Finland)
    Murat Karamuftuoglu - (Bilkent University, Turkey)
    Nicola Stokes - (University College Dublin, Ireland)
    Peter Turney - (National Research Council Canada, Canada)
    Rafael Muñoz - (University of Alicante, Spain)

    ------------
    [8] Contact:
    ------------

    Rosie Jones
    Yahoo! Overture Matching Sciences
    Yahoo! Inc
    74 N. Pasadena Ave, 3F
    Pasadena, CA 91103
    United States of America
    email: jonesr@yahoo-inc.com



    This archive was generated by hypermail 2b29 : Sat Mar 26 2005 - 15:38:12 MET