Re: [Corpora-List] ANC, FROWN, Fuzzy Logic

From: John F. Sowa (sowa@bestweb.net)
Date: Wed Jul 26 2006 - 18:22:26 MET DST

Next message: Mark P. Line: "Re: [Corpora-List] ANC, FROWN, Fuzzy Logic"

Previous message: Mark P. Line: "Re: [Corpora-List] Re: ANC, FROWN, Fuzzy Logic"
In reply to: Ken Litkowski: "Re: [Corpora-List] ANC, FROWN, Fuzzy Logic"
Next in thread: Mark P. Line: "Re: [Corpora-List] ANC, FROWN, Fuzzy Logic"
Next in thread: Rob Freeman: "Re: [Corpora-List] ANC, FROWN, Fuzzy Logic"
Reply: Mark P. Line: "Re: [Corpora-List] ANC, FROWN, Fuzzy Logic"
Reply: Rob Freeman: "Re: [Corpora-List] ANC, FROWN, Fuzzy Logic"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Ken, Rob, Jim, and Mark,

I mentioned the point about compression only to tie the
discussion to Chaitin's work. If any corpus is truly random,
no compression is possible. But if any compression is
possible, then there must exist a more compact description
than a complete enumeration of everything in the corpus.

RF> On the contrary, the evidence indicates to me that any
> compression of NL data must be "incomplete" (and each incomplete
> compression involves a loss of information which can only be
> prevented by retaining the whole corpus anyway.)
>
> We've been running around for 50 years or more finding incomplete
> compressions. You would think we'd get the hint.

I don't know what hint you're suggesting. That no rule-based
system can ever be complete? I think that's obvious. That
an incomplete compression is useless? I would very strongly
disagree with any such claim.

JLD> No linguist, however poor, would deny the importance of
> having good generalizations about a particular language, corpus,
> etc. And no decent linguist, however good, would (or certainly:
> should) deny that their analysis of a particular language, corpus,
> etc. could be bettered.

That is the point I was trying to emphasize. Although I agree with
Rob that having access to corpus data is valuable during language
analysis, it should be possible to do a large part of the analysis
by means of some more compact method.

The goal of linguistics is to characterize that method, but I'll
avoid any claim that the method must be based on logic, rules,
neurons, or statistics.

MPL> For science to work, theories and other models don't have
> to be things that are "true". They just have to be things that
> are _useful_ -- and that implies a purpose against which any
> scientific model must be evaluated. (Bas van Fraassen)

I agree to a large extent, but I would emphasize the distinction
between engineering and pure science. The question of "truth" --
i.e., a correspondence with some reality that exists independently
of what we may think about it -- is science, but the question of
usefulness is engineering. Both are important, but we should be
clear about which goals we are pursuing in any particular project.

For example, the evidence seems to show that Chomsky's distinction
between performance and competence was a dead end for science, but
there may still be valid engineering uses for much of the rule-based
technology that was inspired by Chomsky's work.

KL> When I generate, I feel very much as if my use of a particular
> word may change from one draft of a paper to the next, i.e.,
> my whole semantic network of associations changes from day to day.

I agree. I like Alan Cruse's word "microsense" for the subtle
variations. Below is a famous quotation from Steiner. But I don't
believe that we need complete corpora. When we're talking with
someone, we can just ask a question if we're not sure about his or
her meaning. And in many cases, the speaker isn't sure either
(note St. Augustine's point about time -- he knows what it is
until somebody asks him).

John Sowa
______________________________________________________________________

From Steiner, George (1975) After Babel: Aspects of Language and
Translation, Oxford University Press, Oxford, third edition 1998.

No two historical epochs, no two social classes, no two localities use
words and syntax to signify exactly the same things, to send identical
signals of valuation and inference. Neither do two human beings. Each
living person draws, deliberately or in immediate habit, on two sources
of linguistic supply: the current vulgate corresponding to his level of
literacy, and a private thesaurus. The latter is inextricably a part of
his subconscious, of his memories, so far as they may be verbalized, and
of the singular, irreducibly specific ensemble of his somatic and
psychological identity. Part of the answer as to whether there can be
'private language' is that aspects of every language act are unique and
individual. They form what linguists call an 'idiolect'. Each
communicatory gesture has a private residue. The 'personal lexicon' in
every one of us inevitably qualifies the definitions, connotations,
semantic moves current in public discourse. The concept of a normal or
standard idiom is a statistically-based fiction (though it may, as we
shall see, have real existence in machine translation). The language of
a community, however uniform its social contour, is an inexhaustibly
multiple aggregate of speech-atoms, of finally irreducible personal
meanings.... Thus a human being performs an act of translation, in the
full sense of the word, when receiving a speech-message from any other
human being. (pp. 47-48)

Next message: Mark P. Line: "Re: [Corpora-List] ANC, FROWN, Fuzzy Logic"
Previous message: Mark P. Line: "Re: [Corpora-List] Re: ANC, FROWN, Fuzzy Logic"
In reply to: Ken Litkowski: "Re: [Corpora-List] ANC, FROWN, Fuzzy Logic"
Next in thread: Mark P. Line: "Re: [Corpora-List] ANC, FROWN, Fuzzy Logic"
Next in thread: Rob Freeman: "Re: [Corpora-List] ANC, FROWN, Fuzzy Logic"
Reply: Mark P. Line: "Re: [Corpora-List] ANC, FROWN, Fuzzy Logic"
Reply: Rob Freeman: "Re: [Corpora-List] ANC, FROWN, Fuzzy Logic"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Wed Jul 26 2006 - 18:21:16 MET DST