Re: Corpora: Question about a Brown Corpus tag

From: Frank Henrik Mueller (fhm@sfs.nphil.uni-tuebingen.de)
Date: Thu Sep 14 2000 - 12:57:40 MET DST

  • Next message: Mark Lewellen: "RE: Corpora: Question about a Brown Corpus tag"

    Hello all!

    > on 17 Aug 2000 Eric S Atwell wrote:
    >
    > > Some tag definitions in Brown were clearly
    > > decided by what TAGGIT found computable;
    > > I *guess* linguistic inconsistencies in tagging
    > > some words may be down to drawing boundaries on
    > > grounds of computational tractability rather than
    > > purely linguistic reasons
    >
    > on 17 Aug 2000 Andrew Harley wrote:
    >
    > > This explains how so many taggers can claim 95% or higher success rates!
    >
    > > I also know taggers that tagged IN as "preposition
    > > or conjunction" on the same grounds.
    > ------------------------

    This is a reasonable decision, because you cannot resolve this ambiguity
    on the grounds of the immediate context (which most taggers use). It is,
    thus, better to keep the POS-information underspecified and resolve the
    ambiguity, when you are doing the parse. Otherwise, your parser has to
    work with unreliable information.

    > So what could be the linguistic reasons that Eric was mentioning? For me
    > (with a rather limited linguistic background) the "traditional" criteria
    > for POS determination look quite arbitrary or let's say heuristic.
    >
    > I cannot, for instance, see any advantage of separating "until" in:
    > * until tomorrow (preposition)
    > * until the morning comes (subordinating conjunction)

    I agree that you can (or even should) also leave this underspecified
    until you do a full parse. However, at some point you have to make a
    decision, because you have to annotate clauses and you have to annotate
    prepositional phrases. Now, the 'until' (when it is a connector) gives
    you a good cue where the clause starts.

    > while not separating "and" in:
    > * you and me (coordinating conjunction)
    > * I go and see (coordinating conjunction)

    As 'and' coordinates constituents of the same kind, you can analyse
    sentences like:

    'I came and see.' as: [CL [NP [N I]] [VP [V came] [CO and] [V see]]
    (my ad-hoc annotation ;-))

    The use of 'and' does not affect the 'global' structure of the clause.
    However, this is clearly different for 'until' as it introduces a
    prepositional phrase in the one case and a clause in the other.

    Think of the German 'um' which causes the same problem in sentences
    like:

    1. Er rannte, [CL um den Bus zu kriegen].
    (He ran to catch the bus.)

    2. Er rannte [PP um den Bus herum].
    (He ran around the bus.)

    You can leave the decision open until you do a parse, but you have to
    make a decision. Here, you could use a heuristic like: 'If 'um' precedes
    a noun phrase, then try to find a matching clause and tag it
    'subordinating conjunction', or either (if there is no clause) attach it
    to the nounphrase and tag it as a 'preposition'. You can, thus, parse
    and specify your tags at the same time.

    > Or why should I call the German "entlang" (along) a PREposition,
    > even if it is behind the noun phrase:
    > * den Fluss entlang (along the river)

    In the STTS (Stuttgart-Tuebingen Tag Set) this is called a postposition
    (APPO) in contrast to prepositions (APPR).

    See for details:

    http://www.sfs.nphil.uni-tuebingen.de/Elwis/stts/stts.html

    I hope that helps, Yours FranK Mueller.

    -- 
    Frank H. Mueller
    Dorfackerstr. 20
    72 074 Tuebingen
    Tel.: p 07071/980797
          d 07071/29 77 152
    

    fhm@sfs.nphil.uni-tuebingen.de http://www.sfs.nphil.uni-tuebingen.de/~fhm



    This archive was generated by hypermail 2b29 : Thu Sep 14 2000 - 12:55:27 MET DST