Re: [Corpora-List] Looking for linguistic principles

From: Stefan Bordag (sbordag@informatik.uni-leipzig.de)
Date: Fri Oct 14 2005 - 10:44:14 MET DST

  • Next message: ELDA: "[Corpora-List] LREC2006 - [Deadline extension to October 20, 2005]"

    Dear Rob,

    First I would like to provide several relevant citations. One from
    Martinet (it's a free translation to English by me, since I have only the
    German version of his introduction to linguistics currently - it would be
    nice if someone could provide me with the proper English translation):

    Some linguists have stated the ideal to produce a description method [of
    language] that excludes the meaning of meaningful [language] units. [...]
    It would be possible to arrive at a complete description of the language
    and it would be possible to compile a grammar and a lexicon that would
    lack only the definitions [of the words] in the way they are present in
    current lexicons. In reality no linguist had yet the idea to analyse and
    describe a language he does not know at all in such a way. Such an
    undertaking would by all accounts require an expense of time and energy
    that has deterred even those who consider this approach as the only
    theoretically acceptale one. [...] (free translation by the author of this
    work) \cite{Martinet:69}

    The second citation is from Finchs dissertation from the year 1993:

    Perhaps it was precisely the lack of these materials [large corpora,
    availability of machines] which made the structuralist programme
    infeasible during the 1950s, rather than some fundamental theoretical
    flaw.

    And I might add a little further up in the same section of Finchs
    dissertation:

    This [structuralist] paradigm was criticised by Chomsky (57) for failing
    to properly dissociate the definition of what structure existed in natural
    language from the procedures which allowed that structure to be found, and
    of being too ambitious in any case, there not being enough information in
    a corpus of a natural language to define its structure.

    However, as another set of important citations from Roy Harris' book
    'Saussure and his interpreters' (01) shows:

    There seems to be no indication that Noam Chomsky, founder of modern
    generative linguistics, had ever read or paid attention to the work of
    Saussure until the appearance, in 1959, of the first English translation
    of the Course de Linguistique Generale (Baskin 59).

    It appears that Chomsky was critizising something else (since this makes
    it after his 57 publication.):
    Also from Roy Harris book:

    Having recently published a swingeing attack on behaviorism in linguistics
    (Chomsky 1959), Chomsky was looking retrospectively for pre-Bloomfieldian
    champions of 'mentalism' who could be posthumously resurrected as avatars
    heralding his own approach to linguistic theory. The welcome was only a
    cautious one, however, because the Course de linguistique did not prima
    facie look at all like a generativist treatise in embryo.

    So it looks that he was attacking behaviorism (which indeed cannot provide
    generalizations), not structuralism (and therefore the distributional
    methods which are based on structuralism) which he at that time did not
    really know and later simply failed to understand (or to acknowledge)
    properly because he was trying to narrow language to a purely generativist
    point of view (with which he unfortunately nevertheless mostly succeeded,
    it appears) and to make prominent his distinction between performance and
    competence.

    On my own account I would agree that at least to me it seems that the
    criticism by Chomsky has indeed 'killed' most efforts on researching
    distributional methods. But that was also perhaps a good thing, because
    back at that time the two essential resources as named by Finch simply
    were not available and the research in pure grammar (which is not
    necessarily dependend on these resources) has also brought many insights
    since.

    However, observing the increasing amount of quite varied work in that
    field including not only Finch, but also Burghard Riegers, Andrea Lehrs,
    (not to forget Andre Martinet), Reinhard Rapps, also such applied works
    like Goldsmiths and many others I would not say that this objection was
    never really adressed. It is simply the fact that the field of linguistics
    has grown so large, that most 'true' linguists in the sense of
    'generativists' (but also formal semantics as an offspring of
    generativism), have not yet become fully aware of the significance of the
    reappearance of the distributional method and its new potential given the
    availability of the two essential resources. The disadvantage to that
    currently is, that sometimes this kind of research is still considered as
    inferior to true generativist research.

    Of course this is also due to the fact that currently only very basic
    notions such as the distinction between word classes can be extracted
    fully automatic from raw text whereas generativists are currently occupied
    with a lot more intricate questions as titles of talks of recent
    conferences show. This might indicate that indeed, generality is not much
    possible with such methods. But I would also say that the field is
    approaching the moment where 'generality', e.g. grammars of a given
    language can be extracted in a fully automatic way from raw text, without
    any introspection. First works (e.g. Henrichsen: "GraSp: Grammar learning
    from unlabelled corpora") are already appearing and represent first
    cautious experiments and I think that several years from now the results
    will already suffice to be of practical use. Of course, the traditional
    objection that manually created grammars are always better will remain for
    a very long time afterwards and only counter-weighted by the simple cost
    factor. ;-)

    Best regards,
    Stefan Bordag

    -- 
    ---------------------------------------------------------------------
    - Bordag Stefan, sbordag@informatik.uni-leipzig.de                  -
    - Institut fuer Informatik, Abt. Automatische Sprachverarbeitung    -
    - Universitaet Leipzig                                              -
    ---------------------------------------------------------------------
    



    This archive was generated by hypermail 2b29 : Fri Oct 14 2005 - 10:54:14 MET DST