Re: [Corpora-List] Grammar checker for English

From: Mike Maxwell (maxwell@ldc.upenn.edu)
Date: Thu Apr 14 2005 - 04:39:00 MET DST

  • Next message: Ute Römer: "RE: [Corpora-List] Collocations"

    Corrin Lakeland wrote:
    > On Tue, 12 Apr 2005 23:32, you wrote:
    >
    >> Does anyone have a technique or tool for checking the
    >> grammatical correctness of a sentence?
    >>
    >> A full parser would be computationally too expensive,
    >> so is there a computationally cheap method for this?
    >
    > I do not know of any systems which check if a sentence is
    > well-formed without parsing it, although it is
    > theoretically possible to do. However, there are many
    > parsers that are quite efficient.
    >
    > ... I'm sure there is lots of other work in the field.

    (I didn't see the original msg for some reason, but I'm
    assuming it was posted to Corpora-List, hence a reply is
    appropriate.)

    Like Corrin, I don't know of any work done on testing
    well-formedness without parsing. (Unlike him, I have a hard
    time imagining how that would work--I suppose you could do
    some sort of n-gram tests, but there would be no guarantee
    that there wouldn't be an error at n+1, or for that matter
    that back-off didn't lead to problems at larger n. But
    maybe I just lack imagination :-).)

    At any rate, there is a considerable amount of work done on
    parsing _restricted_ English, with the intention of finding
    ungrammatical sentences where the standard of grammaticality
    is precisely some computational grammar. One domain where
    this has been used is in aircraft manuals, which must be
    read by technicians who do not have English as their first
    language. As I understand it, the version of simplified
    English used in these manuals is restricted both as to its
    vocabulary and its grammar. (I'm not sure how compound
    nouns are treated, maybe there's just a limit on nesting.)

    One of the simplified-English-for-aircraft checkers was done
    by Boeing. I wrote most of the original grammar rules back
    in the mid-1980s, without the intent of restricting it, so
    that it covered all the constructions we could come up with
    (from both generative grammars and descriptive texts like
    Quark, Greenbaum, Svartvik and Leach (sp?), plus testing
    against various text corpora). I believe that after I left
    in 1987, and the restricted English application came up,
    many of the rules were removed so as to accept only the
    desired restricted language. Phil Harrison wrote the
    original parser in Lisp; I am told it was re-written in C
    (or C++?) for speed, and that after the re-write its speed
    was adequate for checking large manuals. (That was in the
    late 1980s or early 1990s. Moore's Law has, I would
    imagine, made its speed more adequate since then :-).)

    -- 
    	Mike Maxwell
    	Linguistic Data Consortium
    	maxwell@ldc.upenn.edu
    



    This archive was generated by hypermail 2b29 : Thu Apr 14 2005 - 06:00:10 MET DST