[Corpora-List] Call for Abstracts: Towards a reference corpus of web genres

From: santinim\@inwind\.it
Date: Thu Nov 23 2006 - 09:20:21 MET

  • Next message: Institute for Specialised Communication and Multilingualism: "[Corpora-List] internship offers in (computational) linguistics and/or computer science -- EURAC Bolzano"

    Apologies for cross-postings.

    ==========================================

    Call for Abstracts -
    Corpus Linguistics - Colloquium -
    "Towards a reference corpus of web genres"
    ==========================================

    COLLOQUIUM DESCRIPTION AND OBJECTIVES

    Genres of spoken and written texts are being intensively studied from various angles, e.g., communication studies, discourse analysis, computational linguistics, without arriving at a generally accepted definition. The web is new, so it is not clear how to apply traditional notions of genre to web pages. In this colloquium we would like collect submissions that study characteristics of web genres with respect to traditional paper genres represented in electronic corpora like the BNC.

    Web documents are often characterised by a high level of genre hybridism, by a fragmentation of
    textuality across several documents, by the impact of technical features such as hyperlinking, posting facilities and multi-authoring. The web is a huge reservoir of documents that can be easily mined for building all sorts of corpora with many collections being built according to subjective criteria for corpus composition, genre annotation, genre representativeness and genre granularity. In this colloquium we would like to invite submissions contributing to a reference corpus of web genres. The main goal of the colloquium is to draw up an initial list of characteristics and requirements for building, annotating and evaluation reference corpora of web genres. For
    instance:

    * To what extent should genre hybridism and authorial creativity be represented in a genre collection? These two phenomena appear to be very common on the web.

    * To what extent is it possible to include "emerging genres", i.e., genres still in a transitional phase in genre evolution? The web is currently thriving with emerging genres.

    * How many granularities of the unit of analysis should be included? Only genres representing web sites? Only genre representing web pages? Both?

    * What "format" should be used to store these units in a collection (e.g., a database-like form, DOM trees, a net of graphs, in HTML format, in a text-only version, with or without embedded images, removing boilerplate components)?

    * What level of genre granularity and similarity should be applied in the reference corpus? Genre collections often show different levels of granularity, including genres and super-genres. Should similar genres, such as "tutorial" and "how-to", be accounted for separately?

    TOPICS:

    The topics of interest include but are not limited to:

    - Text theory for the development of web corpora
    - Modelling corpora of web genres
    - Innovative genre classification schemes accounting for multi-genre and no-genre web documents
    - Modelling genre annotation scheme for web documents (metadata organization)
    - Assembling a list of web genres for a reference corpus
    - Creating comparable corpora of web genre
    - Automatic genre classification vs. human genre classification
    - How to evaluate the corpus: using statistical measures, relying on corpus linguists, librarians, or web users?

    PARTICIPATION

    The aim of this colloquium, the first ever organized on this topic, is to bring together researchers from different communities such as corpus linguistics, genre analysis, digital genre community, computational linguistics, and information retrieval in order to promote the discussion and development of new ideas and methods to create new corpora for language
    studies and as evaluation resources.

    Please, submit abstrats to: webgenres@googlemail.com

    Abstract submissions should include:
         * Presenter contact information (mailing address, phone, e-mail & fax)
         * A paper proposal (250 word max)
         * An abstract for the program (50 word max)
         
    The deadline for submissions is Dec 15, 2006
    Notification of acceptance will be sent out by Jan 11, 2006

    Colloquium Organization:

    Marina Santini (University of Brighton, UK)
    Serge Sharoff (University of Leeds, UK)

    Program Committee:

    Marco Baroni (University of Bologna, Italy)
    Stefan Gries (University of California, USA)
    Adam Kilgarriff (Lexmasterclass, UK)
    Alexander Mehler (Bielefeld University, Germany)
    Sven Meyer zu Eissen (University of Weimar, Germany)
    John Paolillo (Indiana University, USA)
    Paul Rayson (UCREL, Lancaster Uni, UK)
    Georg Rehm (University of Tuebingen, Germany)
    Marina Santini (University of Brighton, UK)
    Serge Sharoff (University of Leeds, UK)
    Benno Stein (University of Weimar, Germany)

    Contacts:
    ========
    Main contact: Serge Sharoff (s.sharoff@leeds.ac.uk)
    Other contact: Marina Santini (Marina.Santini@itri.brighton.ac.uk)

    ------------------------------------------------------
    Scopri subito RAS FullCasa: vinci 500€ di buoni Mediaworld a settimana ed un viaggio Expedia!
    http://click.libero.it/ras23nov06

    ------------------------------------------------------
    Scopri subito RAS FullCasa: vinci 500€ di buoni Mediaworld a settimana ed un viaggio Expedia!
    http://click.libero.it/ras23nov06



    This archive was generated by hypermail 2b29 : Thu Nov 23 2006 - 09:18:07 MET