Re: [Corpora-List] ANC, FROWN, Fuzzy Logic

From: Mark P. Line (mark@polymathix.com)
Date: Mon Jul 24 2006 - 17:25:40 MET DST

  • Next message: John Goldsmith: "RE: [Corpora-List] ANC, FROWN, Fuzzy Logic"

    Daoud Clarke wrote:
    >
    > As far as I understand it, fuzzy logic isn't about uncertainty in
    > qualities, it is about degrees of qualities, or vagueness.
    >
    > <snip>

    All this bit about fuzzy sets and Bayesian inference was very well put,
    and I find nothing to disagree with.

    > I think perhaps what the reference to Greg Chaitin's work was getting
    > at was perhaps related to the following. In practice we are always
    > faced with a finite corpus, whereas the theoretical corpora generated
    > by rules are infinite. We can view our finite corpus as a sample from
    > some hypothetical infinite corpus. The question is, what theory gives
    > us the best estimate of this infinite corpus, given the finite sample?
    > Using our finite corpus we can form theories about the infinite corpus,
    > which may or may not incorporate our linguistic knowledge of the
    > language in question. From an information theoretic perspective, the
    > best theory would be the one that enabled us to express the finite
    > corpus using the least amount of information -- the one that best
    > compressed the information in the corpus.
    >
    > Of course theories become large and unwieldy, so we may prefer the
    > minimum description length principle: the best theory for a sequence of
    > data is the one that minimises the size of the theory plus the size of
    > the data described using the theory.
    >
    > Some of this has been put into practice by Bill Teahan, who applies
    > text compression techniques to NLP applications. It would be extremely
    > interesting however to see whether the use of linguistic theories can
    > help provide better text compression. To my awareness this has not been
    > looked into.

    I'd just want to point out that theory evaluation metrics based on
    description length are only useful for some purposes, and that one need
    not use them except when one's purposes are appropriate to such
    evaluation. (There are no "universal" theory evaluation metrics, because
    the space of purposes to which a theory can be put is infinite. I see this
    as one of the root Cartesian flaws.)

    A model that also predicted neuropsychological phenomena during speech
    would be more useful in my book than one that only produced a formal
    grammatical abstraction of utterances.

    A model that also captured phenomena of language evolution over a social
    network would be more useful in my book than one that only feeds a
    treebank.

    -- Mark

    Mark P. Line
    Polymathix
    San Antonio, TX



    This archive was generated by hypermail 2b29 : Mon Jul 24 2006 - 17:24:09 MET DST