[Corpora-List] 3rd WEB AS CORPUS WORKSHOP (WAC3): Call for papers

From: Cédrick Fairon (cedrick.fairon@uclouvain.be)
Date: Wed Mar 14 2007 - 23:53:19 MET

  • Next message: Linda Bawcom: "Re: [Corpora-List] Obituary: John McH. Sinclair"

    ------------------------------------------------------------------------
                           CALL FOR PAPERS
    ------------------------------------------------------------------------

                     3rd WEB AS CORPUS WORKSHOP (WAC3)
                         incorporating CLEANEVAL

                             An ACL-SIGWAC event*

    ------------------------------------------------------------------------
                            Sept. 15-16, 2007
               University of Louvain, Louvain-la-Neuve, Belgium

                   <http://cental.fltr.ucl.ac.be/wac3>
    ------------------------------------------------------------------------

    More and more people are using Web data for linguistic and NLP research.
    The workshop provides a venue for exploring how we can use it
    effectively and what we will find if we do.

    We invite submissions which :

    * describe Web corpus collection projects, or modules for one part
        of the process (crawling, filtering, language-id, tokenising,
        lemmatising, POS-tagging, indexing, ...

    * explore characteristics of Web data, from a linguistics/NLP
        perspective including registers, domains, frequency distributions

    * use crawled Web data for NLP purposes (with emphasis on the data
        rather than the use)

    -- Cleaneval --

    Anyone using web data needs to clean it, to get rid of unwanted material
    including, for example, HTML markup, navigation bars, advertisements.
    To date there has been no sharing of resources or expertise and the
    cleaning has often been done minimally. Cleaneval is an exercise to
    promote sharing and to improve our understanding of the issues. It will
    take the now-familiar form of an open competition and shared task. More
    info at Cleaneval <http://cleaneval.sigwac.org.uk>.

    -- Invited speaker : Kevin Scannell --

    Kevin Scannell, of Saint Louis Univ., Missouri, USA, has been working
    with scholars of a range of smaller languages to develop web corpora for
    those languages : website <http://borel.slu.edu/crubadan/stadas.html>
    currently lists 135 corpora/languages.

    -- Previous WAC workshops --

    WAC1 at Corpus Linguistics conference, Birmingham, UK, July 2005:
    <http://sslmit.unibo.it/~baroni/web_as_corpus_cl05.html>.

    WAC2 at EACL, Trento, Italy, April 2006:
    <http://sslmit.unibo.it/~baroni/web_as_corpus_eacl06.html>.

    -- Submission --

    For regular papers: Papers (6-10 pages), demos (max. 2 pages) and
    posters (max. 2 pages) to be written in English.

    Template files (.doc & Latex) available on the WAC3 website.
    Proceedings will be published in "Cahiers du Cental" at the
    Louvain University Press: http://cental.fltr.ucl.ac.be/cahiers

    For CLEANEVAL submissions see Cleaneval website:
    <http://cleaneval.sigwac.org.uk>.

    Deadline: 1 May 2007

    -- Venue --

    Université catholique de Louvain <http://www.uclouvain.be/en-
    index.html>,
    in the elegant new city of Louvain-la-Neuve
    <http://www.eupedia.com/belgium/louvain-la-neuve.shtml> (Belgium).
    Large computer rooms will be available for demo sessions.

    -- Points of contact --

    Worskshop Co-chairs

    Cédrick Fairon, UCLouvain, Cental, fairon@tedm.ucl.ac.be
    Prof. Gilles-Maurice de Schryver, Universiteit Gent

    Cleaneval committee

    Marco Baroni, U Trento; Secretary, SIGWAC
    Tony Hartley, U Leeds
    Adam Kilgarriff, Lexical Computing Ltd; Chair, SIGWAC
    Serge Sharoff, U Leeds

    Local organisation team

    Bernadette Dehottay, UCLouvain, Cental, dehottay@tedm.ucl.ac.be
    Julia Medori, CENTAL, UCLouvain
    Laurent Kevers, CENTAL, UCLouvain
    Hubert Naets, CENTAL, UCLouvain
    Isabelle Lecroart, CENTAL, UCLouvain
    Claude Devis, CENTAL, UCLouvain

    Contact us :
    Bernadette Dehottay
    Université catholique de Louvain
    Centre for Natural Language Processing (CENTAL)
    Place Blaise Pascal, 1
    1348 Louvain-la-Neuve
    Tel. +32 10 47 37 88
    Fax. +32 10 47 26 06
    dehottay@tedm.ucl.ac.be

    Cédrick Fairon
    cedrick.fairon@uclouvain.be

    Directeur du CENTAL
    Centre de traitement automatique du langage
    Université catholique de Louvain
    Place Blaise Pascal, 1
    1348 Louvain-la-Neuve
    Belgique
    tel: +32 10 47 37 88
    fax: +32 10 47 26 06

    http://cental.fltr.ucl.ac.be
    http://glossa.fltr.ucl.ac.be

      



    This archive was generated by hypermail 2b29 : Wed Mar 14 2007 - 23:53:03 MET