> I am interested in error tagging and I am looking for corpora which are (or are being) error tagged. Do you know of any? And do you know of any available error tagset?
One more recent effort I know of is the SST Corpus, which is a 1m word corpus
of transcribed English speech by Japanese learners of English. Various errors
are tagged, although I can't find any online account of the full tagset. There
are a couple of papers in English on the corpus, notably:
Tono, Y., Kaneko, T., Isahara, H., Saiga, T. and Izumi, E. The Standard
Speaking Test (SST) Corpus: A 1 million-word spoken corpus of Japanese
learners of English and its implications for L2 lexicography. Lee, S. (ed.)
ASIALEX 2001 Proceedings: Asian Bilingualism and the Dictionary. The Second
Asialex International Congress, August 8-10, 2001, Yonsei University, Korea,
pp. 257-262
There is a web page with some documentation and a copy of this paper at:
http://leo.meikai.ac.jp/~tono/sst/
There was also a paper at this year's ACL:
Emi Izumi, Kiyotaka Uchimoto, Toyomi Saiga, Thepchai Supnithi and Hitoshi
Isahara (2003) Automatic error detection in the Japanese learners' English
spoken data. In Companion Volume to the Proceedings of the 41st Annual Meeting
of the Association for Computational Linguistics (ACL '03), pp. 145-8.
which is also available online at:
http://acl.ldc.upenn.edu/acl2003/posterdemo/pdf/Izumi.pdf
Tim
*-----------------------------------*
Timothy Baldwin
Senior research engineer
Multiword Expression project
CSLI LinGO Lab
Contact details:
Email: tbaldwin@csli.stanford.edu
Tel: (+1)-650-723-0515
Fax: (+1)-650-723-2166
*-----------------------------------*
This archive was generated by hypermail 2b29 : Fri Sep 26 2003 - 19:53:55 MET DST