Re: [Corpora-List] POS-tagging for spoken English and learner English

From: Xiaotian Guo (garlickfred@gmail.com)
Date: Fri Jul 22 2005 - 10:51:05 MET DST

  • Next message: Nicolas Nicolov: "[Corpora-List] NLP BOOK: RANLP vol. III"

    Hi, Adam and colleagues

    I agree with Paul in that "For learner data … POS tagging accuracy
    depends on how advanced the learners are".

    I have tried to have a native speaker corpus, LOCNESS and a learner
    corpus COLEC, as I call it, POS tagged. It works perfectly well with
    LOCNESS. But unfortunately, I was let down by the inaccuracy of the
    tagging to COLEC due to the special features of the learners errors. I
    am not a computer person, but I speculate that when a tagging system
    is devised, it would be based on the syntax rules most native speakers
    abide by. However, non-native speakers, especially those with an
    intermediate level or below would not produce the language in the way
    native speakers produce. You can hardly imagine how messy learner
    English could be. That would cause a huge problem to the POS tagging
    to a learner corpus and very likely indeed would disable the whole
    tagging system. Granger discussed this point in her article in

    Granger S., Hung J. and Petch-Tyson S. (eds) 2002. Computer Corpora,
    Second Language Acquisition and Foreign Language Teaching. Amsterdam:
    John Benjamins Publishing Company.

    Of course, it does not mean there will be no solutions to this. If
    people try hard enough, they may come up with a better accuracy rate.
    As far as I can see (pardon me if I am talking nonsense), at least the
    tagging system should not be based on the native speaker syntax rules.
    Perhaps the tagging system should be trained with adequate learner
    English data? But the problem is that it is hard to find a set of
    syntax rules to learner English. Anyway, I will keep all my fingers
    crossed for those who are dealing with this part of tagging system
    design.

    All the best

    Xiaotian Guo
    PhD Candidate
    The Department of English
    The University of Birmingham



    This archive was generated by hypermail 2b29 : Fri Jul 22 2005 - 11:05:21 MET DST