Re: [Corpora-List] Looking for linguistic principles

From: Stefan Bordag (sbordag@informatik.uni-leipzig.de)
Date: Sat Oct 15 2005 - 10:59:13 MET DST

  • Next message: Sabrus Saghrus: "[Corpora-List] VERY URGENT"

    Dear Rob,

    > > Perhaps it was precisely the lack of these materials [large corpora,
    > > availability of machines] which made the structuralist programme
    > > infeasible during the 1950s, rather than some fundamental theoretical
    > > flaw.
    >
    > "Perhaps", but what was the "theoretical flaw"?

    As I wrote, the criticism was probably not even directed against the
    'distributional' or structuralist programme at all, but instead at the
    behaviorism. But then again, I unfortunately have not read enough about
    phonology to make literate comments on that part of the debate that went
    on there. I am still digesting John Goldsmiths answer. ;-)

    > > And I might add a little further up in the same section of Finchs
    > > dissertation:
    > >
    > > This [structuralist] paradigm was criticised by Chomsky (57) for
    > > failing to properly dissociate the definition of what structure
    > > existed in natural language from the procedures which allowed that
    > > structure to be found
    >
    > This may be it, though I'm not clear exactly what Finch means by
    "failing
    > to ... dissociate the ... structure ... from the procedures..." Does he
    mean
    > Chomsky observed different procedures resulted in different structures?

    What Finch means is actually pretty clear. The problem Chomsky seemed to
    criticise can be summarized in the following exaggerated example. I want
    to write a program that is supposed to find word classes automatically. I
    have then several possibilities. One is that I simply put into the
    program all correct assignments of words to word classes. In that case
    the program will operate flawlessly, of course. And yet it will not
    discover any kind of structure because I have put al the structure into
    it already.
    The other possibility is that I write a clustering mechanism that makes use of
    comparisons of words based on their contexts (the pure distributional
    method). If the program then comes up with several word classes and all
    words assigned to the different word classes, then the program has found
    the structure, not me. And I guess that the potential of clustering and
    this contrastive method of comparisons (which are really independent of
    the language level used) is what Chomsky didn't understand, although this
    sounds almost unlikely. But then again, clustering is no fun (=too much
    work, see also the citation of Martinet I offered in my previous email) if
    there are no computers to do it.

    Then again, Chomsky might also have meant that while it might be possible
    to find the different word classes - that still doesn't help to find
    rules! But as the first automatic grammar induction experiments show this
    is also not really an issue. Simply put, it is enough to allow the system
    to possibly find, say, context free rules in order to compress a loss-free
    representation of the language in question, then it might find them still
    just using context comparisons, i.e. the distributional method. It's all
    about the *kind* of structure that we assume to be in the language. We
    assume that there are classes of elements, so we design an algorithm that
    finds all possible useful or meaningful classes (free morphemes vs. bound
    morphemes, nouns vs. verbs, etc) and assignments to these classes.
    We assume there are rules, then we design an algorithm that finds rules
    (on the morpheme level, on the sentence level, etc). But as soon as we
    give hints to the system such as how many word classes to find, then we
    are actually putting structure into the system which it was supposed to
    find.

    Otherwise, as I said, I cannot comment too much on the phonological
    debate, so I would just refer to the answer of John Goldsmith.

    By the way, Diana Santos has suggested the book Empirical linguistics
    Educating Eve (now new edition called "The language instinct debate")
    by Geoffrey Sampson which is highly relevant to this discussion.

    Best regards,
    Stefan Bordag

    -- 
    ---------------------------------------------------------------------
    - Bordag Stefan, sbordag@informatik.uni-leipzig.de                  -
    - Institut fuer Informatik, Abt. Automatische Sprachverarbeitung    -
    - Universitaet Leipzig                                              -
    ---------------------------------------------------------------------
    



    This archive was generated by hypermail 2b29 : Sat Oct 15 2005 - 11:11:27 MET DST