Dear researchers:
We are pleased to make publicly available a small corpus of short
message service (SMS) messages.
**** National University of Singapore Short Message Service Corpus ****
These messages were collected and used in a final year undergraduate project
analyzing the efficiency of SMS input. The corpus contains messages mostly
in English. The message contributors were mainly university students in
Singapore.
Over 10,000 messages were collected, representing over 100 different users.
The corpus is made available under a modified Open Directory Project
license. Please see the webpage for the corpus for more details. More
comprehensive documentation on the (on-going) project will be made available
as time and demand allow.
http://www.comp.nus.edu.sg/~rpnlpir/downloads/corpora/smsCorpus/
We hope the community with find this corpus useful as a small benchmark for
gauging the efficiency of SMS message entry as well as for SMS / chat log
language analysis. These messages are provided as an XML file that
validates against a document-internal DTD.
Regards,
Min-Yen KAN
Assistant Professor
Department of Computer Science, School of Computing
National University of Singapore, Singapore 117543
Office: S15-05-05
Tel: ++ (65) 6874-1885
Fax: ++ (65) 6779-4580
kanmy@comp.nus.edu.sg
http://www.comp.nus.edu.sg/~kanmy
This archive was generated by hypermail 2b29 : Mon May 03 2004 - 15:04:14 MET DST