RE: [Corpora-List] ANC, FROWN, Fuzzy Logic

From: John Goldsmith (goldsmith@uchicago.edu)
Date: Mon Jul 24 2006 - 21:05:49 MET DST

  • Next message: FIDELHOLTZ_DOOCHIN_JAMES_LAWRENCE: "[Corpora-List] Re: ANC, FROWN, Fuzzy Logic"

    Daoud Clarke wrote:
    >It would be extremely interesting however to see whether the use of
    >linguistic theories can help provide better text compression. To my
    >awareness this has not been looked into.

    Several researchers have used improvement in total description length as the
    result of morphological analysis to justify the existence of morphology
    (including me: see my paper in Computational Linguistics in 2001, and our
    website at linguistica.uchicago.edu). At a crude level, it is clear that the
    redundancy in lists of words -- for example, treating jumps, jumped,
    jumping, laughs, laughed, laughing all as separate and unrelated words in
    the lexicon of English -- leads to a longer description of an English corpus
    than one in which there is a list of stems and affixes, and some machinery
    that explicitly indicates how they may be composed in the language in
    question. The devil is in the details, and there has been a lot of work in
    this area over the last half dozen years.
    John Goldsmith



    This archive was generated by hypermail 2b29 : Mon Jul 24 2006 - 21:04:22 MET DST