Re: [Corpora-List] Encoding of apostrophes and quotes

From: John F. Sowa (sowa@bestweb.net)
Date: Tue Jul 04 2006 - 22:12:54 MET DST

  • Next message: Fredrik Jørgensen: "[Corpora-List] Sentiment Analysis: Corpora and Software"

    All this variability in how people use apostrophes and
    punctuation of any kind proves one very important point:
    no matter how systematic, expressive, and logical any
    system of encoding or tagging may be, people are going
    to do whatever they damn well please.

    Anybody who has ever tried to parse ordinary NL prose --
    even supposedly well-edited prose -- knows that punctuation
    is highly unreliable. It's useful to consider it, but
    only as one among many possibly contradictory sources of
    information about the structure of a text.

    Tagging a text correctly (according to some set of rules)
    is harder than punctuating it correctly. If people aren't
    very good at punctuation, I seriously doubt that they'll
    be any better at tagging.

    John Sowa



    This archive was generated by hypermail 2b29 : Tue Jul 04 2006 - 22:29:05 MET DST