Re: [Corpora-List] SMS corpus

From: Min-Yen Kan (knmnyn@gmail.com)
Date: Fri Sep 01 2006 - 17:06:11 MET DST

  • Next message: Susana Sotillo: "Re: [Corpora-List] SMS corpus"

    Hi all:

    I think Emmanuel Prochasson already mentioned the corpus that we have
    collected at NUS. It is a medium sized corpus with about 10K messages
    sent by students in Singapore. We are still in the process of
    enlarging the corpus, but also would like to hear what corpus
    researchers are looking to find with such corpora. For example, would
    a collection of more messages from a few individuals be of more use
    than a collection with few messages from a wider variety of
    contributors?

    Most of the messages that we have collected are self-selected by
    university students to be made public in the corpus, so there's we
    believe that there is likely a bias towards messages that are less
    personal than what actually occurs in real life. So you may have less
    luck finding emotional messages in our corpus.

    Have you thought of supplementing your corpus studies with chat
    language? My past student was looking at some chat logs from
    commercial sites to supplement his studies and corpus collection.

    The SMS corpus is here (as stated by Emmanuel)

    http://www.comp.nus.edu.sg/~rpnlpir/downloads/corpora/smsCorpus/

    Min-Yen Kan
    Assistant Professor
    Web / IR / NLP Group (WING), School of Computing
    National University of Singapore

    On 9/1/06, Alexander Osherenko <osherenko@gmx.de> wrote:
    > Hello,
    >
    > has anybody heard of a text corpus with SMS messages? Actually it should
    > be emotional, but at first it doesn't matter much.
    >
    > Best
    >
    > Alexander
    >
    >



    This archive was generated by hypermail 2b29 : Fri Sep 01 2006 - 17:03:59 MET DST