Re: [Corpora-List] Encoding of apostrophes and quotes

From: Geoffrey Sampson (grs2@sussex.ac.uk)
Date: Mon Jul 03 2006 - 11:26:55 MET DST

  • Next message: Ron Artstein: "Re: [Corpora-List] Encoding of apostrophes and quotes"

    Elision and indication of possession are not really separate uses for
    the apostrophe. I have always understood, and it sounds plausible, that
    the reason why we write "John's" as the genitive of "John" is because in
    centuries past, when less was known than today about language history,
    people mistakenly believed that the genitive form "John's" had arisen as
    a reduction of "John his" (and it was sometimes written out like that in
    full). -- No, I don't know how they explained "Mary's" either.

    The question of tokenization and encoding seems to me not to be an issue
    for which there is one "right answer"; surely it is a matter for
    different researchers to answer differently in terms of their particular
    needs. So far as I am aware the apostrophe and single right inverted
    comma are _never_ distinguished graphically, so it seem quite reasonable
    to me for Unicode to assign them the same code. They are logically
    distinct, but it isn't Unicode's job to delve into the logic of written
    symbols -- I don't think it would be practical to require that.

    Geoffrey Sampson

     
    ............................................................
         Prof. Geoffrey Sampson MA PhD MBCS CITP ILTM

         author of "The 'Language Instinct' Debate"

         Department of Informatics, University of Sussex
         Falmer, Brighton BN1 9QH, England

         www.grsampson.net +44 1273 678525
    ............................................................



    This archive was generated by hypermail 2b29 : Mon Jul 03 2006 - 11:28:02 MET DST