Re: Corpora: Chomsky and corpus linguistics

From: Mike Maxwell (mike_maxwell@sil.org)
Date: Thu Apr 26 2001 - 15:53:11 MET DST

  • Next message: ramesh@clg.bham.ac.uk: "Corpora: Robert De Beaugrande, Chomsky, and corpus linguistics"

    Terry Murphy, quoting Robert De Beaugrande:

    >...the corpus highlights the improbable and unnatural
    >quality of invented data like 'John is eager to please'.

    Concerning the 'improbable' (and therefore rare) nature of certain data
    which has been used to argue for certain generative accounts: This is
    precisely the generativists' point. If some large group of people all have
    the same judgement about the acceptability of certain constructions, and
    those constructions are rare, then how can one explain their consensus? A
    case in point is parasitic gaps. I don't know for sure, but I would guess
    that they are vanishingly rare in corpora, and in the sort of input that
    children get. And yet the first time I heard constructed examples of
    parasitic gaps, I, and the other linguists who were hearing the report,
    immediately reacted the same way: they were "good English." It seems to me
    that there is a datum that needs explaining: you've never (or almost never)
    seen something before, but it is immediately familiar. Group deja vu.

    I hasten to add (as I have said before) that some generativists have
    certainly made questionable grammaticality judgements. Simply put, there
    can be bad data in acceptability/ grammaticality judgements. But this
    problem is not limited to acceptability/ grammaticality judgements; in fact,
    all sciences have to deal with irreproducible data. (And corpora have lots
    of it, IMHO.)

    Concerning the 'unnatural' quality of certain invented data, I guess I'm
    just not sure what the problem is, or even what definition of '(un)natural'
    is being used here. Is it "unnatural" just because it didn't occur in a
    corpus, or in natural conversation? Or is it "unnatural" in the sense that
    it isn't "real English" (or other language)? If the former, that seems an
    odd definition of "natural" (on a par with claiming that synthesized organic
    chemicals, say, are not really organic); if the latter, what is the evidence
    for the claim? (Or maybe there's another definition of "natural" here.)

    Dr. Murphy himself:
    >Chomsky's comment about corpus lingustics not
    >existing seems to be a logical response from
    >someone whose whole enterprise would be
    >undermined by the widespread adoption of real
    >data as a mediator of conflicting linguistic judgements.

    I doubt whether Chomsky is the least little bit worried about his enterprise
    being undermined by corpus work. But I question the phrase "real data"
    here: there is nothing artificial, I claim, about introspective judgments;
    and the fact that the data in a corpus wasn't produced for purposes of doing
    linguistics does not in itself make it _better_ for doing linguistics. It
    may, in some circumstances, make it worse--circumstances like slips at the
    keyboard, people writing at one in the morning or half drunk, non-native
    speakers, etc. Maybe there is a theory that explains the kinds of
    differences created by these circumstances, and sometimes that theory will
    even involve linguistics (e.g. work that's been done on slips of the
    tongue), but IMHO it's wrong to say that linguistics has to explain the
    output from all such circumstances, or that that sort of data is necessarily
    better than introspective data.

                                             Mike Maxwell
                                             Summer Institute of Linguistics
                                             Mike_Maxwell@sil.org



    This archive was generated by hypermail 2b29 : Thu Apr 26 2001 - 15:48:50 MET DST