Re: [Corpora-List] SMS corpus

From: Cédrick Fairon (cedrick.fairon@uclouvain.be)
Date: Fri Sep 01 2006 - 16:13:25 MET DST

  • Next message: Sébastien Paumier: "Re: [Corpora-List] SMS corpus"

    Dear Alexander,

    The Centre for natural language processing at the University of
    Louvain (http://cental.fltr.ucl.ac.be) has collected a corpus of
    75.000 French sms (more than 2400 authors, aged 12 to 65). Details
    about the project are available online: http://www.smspourlascience.be

    A subset of this corpus (30.000 SMS) has been released and published
    on a CD-ROM at the Louvain University Press and is available from
    http://www.i6doc.com/doc/sms (licence for non-profit organisations
    only, others may contact us).

    Two interesting remarks about the corpus:
    - it contains information about the authors'profile (sex, age,
    occupation, mother tongue, second language, place of living, etc.).
    These profiles are linked to the messages, so that you can select a
    subset of the corpus corresponding to given sociolinguistic details;
    - each message was linked to a "transcribed" version in "standard"
    French so that you can search for a word and get all the variants
    present in the corpus.

    All the info in C. Fairon, S. Paumier (2006). "A translated corpus of
    30,000 French SMS". In Proceedings of LREC 2006. Genova.

    Best Regards,

    Cédrick

    Le 01-sept.-06 à 15:00, Alexander Osherenko a écrit :

    > Hello,
    >
    > has anybody heard of a text corpus with SMS messages? Actually it
    > should be emotional, but at first it doesn't matter much.
    >
    > Best
    >
    > Alexander
    >

    Cédrick Fairon
    cedrick.fairon@uclouvain.be

    Directeur du CENTAL
    Centre de traitement automatique du langage
    Université catholique de Louvain
    Place Blaise Pascal, 1
    1348 Louvain-la-Neuve
    Belgique
    tel: +32 10 47 37 88
    fax: +32 10 47 26 06

    http://cental.fltr.ucl.ac.be
    http://glossa.fltr.ucl.ac.be



    This archive was generated by hypermail 2b29 : Fri Sep 01 2006 - 16:31:20 MET DST