Re: [Corpora-List] POS-tagging for spoken English and learner English

From: Jean Veronis (Jean.Veronis@up.univ-mrs.fr)
Date: Mon Jul 25 2005 - 12:02:48 MET DST

  • Next message: Adam Kilgarriff: "[Corpora-List] Euralex 2006 first call"

    Adam Kilgarriff a écrit :

    > Do you have recent experiences of using available taggers on either of
    >these kinds of data?
    >
    > Reports including accuracy figures would be particularly useful.
    >
    >

    We have recently tagged a 300,000 word corpus of spoken French. Strategy
    and evaluation and reported here:

    Campione, E., Véronis, J., & Deulofeu, J (2005). 3. The French corpus.
    In Cresti, E. & Moneglia, M. (Eds.), /C-ORAL-ROM, Integrated Reference
    Corpora for Spoken Romance Languages,/ (pp. 111-133). Amsterdam: John
    Benjamins.

    [Draft on-line:
    http://www.up.univ-mrs.fr/veronis/pdf/2005-Coralrom-book.pdf]

    The good surprise is that we achieved results as good as those we get on
    written corpora (ca. 98% precision). This is probably due to the fact
    that, on one hand, spoken corpora are more difficult because of
    disfluencies (repetitions, repairs, etc.), but on the other hand, their
    lexicon is much smaller and sentence complexity much lower.

    Best wishes

    --j
      http://aixtal.blogspot.com

     



    This archive was generated by hypermail 2b29 : Mon Jul 25 2005 - 12:31:00 MET DST