Re: [Corpora-List] ANC, FROWN, Fuzzy Logic

From: John F. Sowa (sowa@bestweb.net)
Date: Wed Jul 26 2006 - 18:22:26 MET DST

  • Next message: Mark P. Line: "Re: [Corpora-List] ANC, FROWN, Fuzzy Logic"

    Ken, Rob, Jim, and Mark,

    I mentioned the point about compression only to tie the
    discussion to Chaitin's work. If any corpus is truly random,
    no compression is possible. But if any compression is
    possible, then there must exist a more compact description
    than a complete enumeration of everything in the corpus.

    RF> On the contrary, the evidence indicates to me that any
    > compression of NL data must be "incomplete" (and each incomplete
    > compression involves a loss of information which can only be
    > prevented by retaining the whole corpus anyway.)
    >
    > We've been running around for 50 years or more finding incomplete
    > compressions. You would think we'd get the hint.

    I don't know what hint you're suggesting. That no rule-based
    system can ever be complete? I think that's obvious. That
    an incomplete compression is useless? I would very strongly
    disagree with any such claim.

    JLD> No linguist, however poor, would deny the importance of
    > having good generalizations about a particular language, corpus,
    > etc. And no decent linguist, however good, would (or certainly:
    > should) deny that their analysis of a particular language, corpus,
    > etc. could be bettered.

    That is the point I was trying to emphasize. Although I agree with
    Rob that having access to corpus data is valuable during language
    analysis, it should be possible to do a large part of the analysis
    by means of some more compact method.

    The goal of linguistics is to characterize that method, but I'll
    avoid any claim that the method must be based on logic, rules,
    neurons, or statistics.

    MPL> For science to work, theories and other models don't have
    > to be things that are "true". They just have to be things that
    > are _useful_ -- and that implies a purpose against which any
    > scientific model must be evaluated. (Bas van Fraassen)

    I agree to a large extent, but I would emphasize the distinction
    between engineering and pure science. The question of "truth" --
    i.e., a correspondence with some reality that exists independently
    of what we may think about it -- is science, but the question of
    usefulness is engineering. Both are important, but we should be
    clear about which goals we are pursuing in any particular project.

    For example, the evidence seems to show that Chomsky's distinction
    between performance and competence was a dead end for science, but
    there may still be valid engineering uses for much of the rule-based
    technology that was inspired by Chomsky's work.

    KL> When I generate, I feel very much as if my use of a particular
    > word may change from one draft of a paper to the next, i.e.,
    > my whole semantic network of associations changes from day to day.

    I agree. I like Alan Cruse's word "microsense" for the subtle
    variations. Below is a famous quotation from Steiner. But I don't
    believe that we need complete corpora. When we're talking with
    someone, we can just ask a question if we're not sure about his or
    her meaning. And in many cases, the speaker isn't sure either
    (note St. Augustine's point about time -- he knows what it is
    until somebody asks him).

    John Sowa
    ______________________________________________________________________

     From Steiner, George (1975) After Babel: Aspects of Language and
    Translation, Oxford University Press, Oxford, third edition 1998.

    No two historical epochs, no two social classes, no two localities use
    words and syntax to signify exactly the same things, to send identical
    signals of valuation and inference. Neither do two human beings. Each
    living person draws, deliberately or in immediate habit, on two sources
    of linguistic supply: the current vulgate corresponding to his level of
    literacy, and a private thesaurus. The latter is inextricably a part of
    his subconscious, of his memories, so far as they may be verbalized, and
    of the singular, irreducibly specific ensemble of his somatic and
    psychological identity. Part of the answer as to whether there can be
    'private language' is that aspects of every language act are unique and
    individual. They form what linguists call an 'idiolect'. Each
    communicatory gesture has a private residue. The 'personal lexicon' in
    every one of us inevitably qualifies the definitions, connotations,
    semantic moves current in public discourse. The concept of a normal or
    standard idiom is a statistically-based fiction (though it may, as we
    shall see, have real existence in machine translation). The language of
    a community, however uniform its social contour, is an inexhaustibly
    multiple aggregate of speech-atoms, of finally irreducible personal
    meanings.... Thus a human being performs an act of translation, in the
    full sense of the word, when receiving a speech-message from any other
    human being. (pp. 47-48)



    This archive was generated by hypermail 2b29 : Wed Jul 26 2006 - 18:21:16 MET DST