Re: [Corpora-List] ANC, FROWN, Fuzzy Logic

From: Seth Grimes (grimes@altaplana.com)
Date: Thu Jul 27 2006 - 05:17:26 MET DST

  • Next message: Peter Kühnlein: "Re: [Corpora-List] ANC, FROWN, Fuzzy Logic"

    > And no one ever worried, afaik, about whether the compression had to be
    > perfect...

    There's a certain irony contained in this sentence, eh?

                                            Seth

    On Wed, 26 Jul 2006, Mike Maxwell wrote:

    > Rob Freeman wrote:
    > > We've been running around for 50 years or more finding incomplete
    > > compressions. You would think we'd get the hint.
    >
    > I don't get the hint, even after you've told me there is a hint :-).
    >
    > I can certainly believe that human beings internalize a grammar without
    > believing that the grammar needs to be "perfect" in any sense.
    >
    > I can also believe that the grammar does not need to extract every last
    > bit of entropy out of the language (and I mean _language_, not corpus,
    > see below).
    >
    > But let's get down to some actual data, and theories. The degree to
    > which the compression should proceed was precisely the point behind a
    > lot of the arguments--particularly among phonologists, the point is less
    > clear in syntax--over abstractness. To take an example, in one of his
    > papers Morris Halle argued (or maybe just assumed) that such
    > semi-regular verbs in English as 'weep' and 'keep' in fact have a
    > rule-governed past tense ('wept' and 'kept', etc.). I, on the other
    > hand, think it's completely possible that native speakers of English do
    > not extract such a rule (although they do extract the rules for regular
    > past tense verbs). (Of course it's possible that some native speakers
    > do, and others do not, extract such a rule.)
    >
    > Another example along the same lines would be the diphthongizing verbs
    > in Spanish, like 'venir', whose stem diphthongizes to 'vien' when
    > stressed. James Harris has argued for a rule-governed approach, which
    > requires a diacritic. Again, it's perfectly possible that native
    > speakers of Spanish just memorize the irregular stems, i.e. that their
    > internalized grammars don't do perfect compression.
    >
    > In cases like these, linguists can argue--and have argued--for a greater
    > or lesser degree of compression. And no one ever worried, afaik, about
    > whether the compression had to be perfect (although admittedly, there
    > were some pretty abstract analyses in the bad olde days).
    >
    > (BTW, it's unclear to me--as I think another poster pointed out--whether
    > compression of a corpus by a grammar is at all relevant. What grammars
    > do, I would say, is compress the _language_, of which the corpus is but
    > a small sample. One can test whether the grammar works by telling how
    > well it compresses a given corpus of the language, but I don't see the
    > point to asking whether we perfectly compress some arbitrary corpus.)
    >
    >

    --
    Seth Grimes   Alta Plana Corp, analytical computing & data management
                  Intelligent Enterprise magazine (CMP), Contributing Editor
    grimes@altaplana.com       http://altaplana.com    301-270-0795
    



    This archive was generated by hypermail 2b29 : Thu Jul 27 2006 - 05:16:28 MET DST