Re: [Corpora-List] Encoding of apostrophes and quotes

From: Niels Ott (niels@drni.de)
Date: Wed Jul 05 2006 - 10:10:32 MET DST

  • Next message: Adam Kilgarriff: "[Corpora-List] Encoding of apostrophes and ... CLEANEVAL ADVANCED WARNING"

    -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1

    E Tonkin wrote:
    > Throughout this drama, people have been ordering Beck's Bier with fine
    > disregard of any neue deutsche Rechtschreibung!

    Even worse, there exist plural forms that cannot even motivated by an
    English-style standard: What about Ampel'n (traffic lights)?

    > Of some relevance to this discussion, though I don't know how accurate it
    > is, is the note on Wikipedia suggesting that a common side-effect of
    > Apostrophitis is the use of a diacritical mark in place of the apostrophe
    > itself.

    Plus, on the web, using the diacritics from the windows-1252 character
    set but specifying iso-8859-1.

    These are things people are concerned with who are creating corpora from
    the WWW. Spelling can be very "generic" out there.

    Maybe this should be considered in corpora exploration software by
    having options on fuzzy matching. (If one can't correct the errors, one
    can possibly work around them as corpus user.)

    Best,

       Niels

    - --
    Me & Myself & All The Rest: http://www.drni.de/
    Auf dem Baum, da sitzt ein Specht, der Baum ist hoch, dem Specht ist
    schlecht.
    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.2.2 (GNU/Linux)

    iD8DBQFEq3P3bosnVosUgx0RAmomAJ9GifNAhqIyRFmOl8sd6K+rvTlm/gCgmLqd
    +Ikh7Esf7I7mxnX2F9fwZfA=
    =uwUF
    -----END PGP SIGNATURE-----



    This archive was generated by hypermail 2b29 : Wed Jul 05 2006 - 10:09:42 MET DST