Re: [Corpora-List] Markup Examples Needed!

From: Eric Atwell (eric@comp.leeds.ac.uk)
Date: Wed Sep 03 2003 - 13:18:10 MET DST

  • Next message: Sylviane Granger: "[Corpora-List] International Corpus of Learner English CD-ROM"

    Peet,

    The AMALGAM project at Leeds University collected a "MULTI-TREEBANK",
    A sample of sentences annotated with 24 rival parsing and PoS-tagging schemes,
    see http://www.comp.leeds.ac.uk/amalgam/amalgam/multi-parsed.html
    and http://www.comp.leeds.ac.uk/amalgam/amalgam/multi-tagged.html

    Parse trees as raw output of 10 rival parsers:
    Alice, DESPAR, ENGCG, Principar, Link, RANLP, Carroll/Briscoe Shallow Parser,
    WordPerfect's Grammatik, Tosca, Sextant;

    Parse trees representing 4 English corpus parsing schemes:
    UPenn, ICE, POW Systemic-Functional Bracketed, POW S-F Numerical

    PoS-tagged text representing 10 English corpus PoS-tagging schemes:
    Brown, ICE, LLC, LOB, UNIX Parts, POW, SEC, UPenn, BNC-C5, and BNC-C6.

    The sample sentences were from software manuals (tho the PoS-tagged samples
    were extended to also include BBC radio and London teenager sentences), see
    http://www.comp.leeds.ac.uk/amalgam/amalgam/corpus/tagged/raw/ipsm_raw.html

    [note: IF YOU HAVE A PARSER/TAGGER, PLEASE VOLUNTEER TO PARSE/TAG THESE
    SENTENCES AND DONATE THE OUTPUT TO THE MULTITREEBANK FOR ALL TO SHARE!]

    Unsurprisingly, the sample does not include your example "Time flies like..."
    - the nearest (in grammatical structure) I could find in the sample was:
    "Select the text you want to protect."

    Alice:
    (SENT (SENT-MOD (UNK-CAT "Select") (NP (DET "the") (NOUN "text")))
    (SENT (VP-ACT (NP "you") (V-TR "want")) (NP NULL-PHON))) (SENT-MOD
    (UNK-CAT "to") (NP "protect"))

    DESPAR:
    VB select 1 --> 8 -
    DT the 2 --> 3 [
    NN text 3 --> 1 + OBJ
    PP you 4 --> 5 " SUB
    VBP want 5 --> 3 ]
    TO to 6 --> 7 -
    VB protect 7 --> 5 -
    . . 8 --> 0 -

    ENGCG:
    "<Select>"
    "select" <*> <SVO> <SV> <P/for> V IMP VFIN @+FMAINV
    "<the>"
    "the" <Def> DET CENTRAL ART SG/PL @DN>
    "<text>"
    "text" N NOM SG @OBJ
    "<you>"
    "you" <NonMod> PRON PERS NOM SG2/PL2 @SUBJ
    "<want>"
    "want" <SVOC/A> <SVO> <SV> <P/for> V PRES -SG3 VFIN @+FMAINV
    "<to>"
    "to" INFMARK> @INFMARK>
    "<protect>"
    "protect" <SVO> V INF @-FMAINV
    "<$.>"

    Principar:
    (
     (Select ~ V_NP *)
     (the ~ Det < text spec)
     (text ~ N > Select comp1)
     (you ~ N < want subj)
     (want ~ V_CP > text rel)
     (to ~ I > want comp1)
     (protect ~ V_NP > to pred)
     (. )
    )

    Link:
    parse not found

    RANLP:
    (VP/NP select
     (N2+/DET1a the
      (N2-
       (N1/INFMOD
        (N1/RELMOD1 (N1/N text)
         (S/THATLESSREL (S1a (N2+/PRO you) (VP/NP want (TRACE1 E)))))
        (VP/TO to (VP/NP protect (TRACE1 E)))))))

    Carroll/Briscoe Shallow Parser:
    parse not found

    WordPerfect's Grammatik:
    SENTENCE
       |- CLAUSE 1
       | |- VERB ---------------- Select
       | |- DIRECT-OBJECT ------- the text
       |- CLAUSE 2 - RELATIVE
            |- SUBJECT ------------- you
            |- VERB ---------------- want
            |- DIRECT-OBJECT ------- {the text}
            |- VERB-Infinitive ----- to protect
            |- --------------------- .

    Tosca:
    parse not found

    Sextant:
    VP 101 Select select INF 0 0
    NP 2 the the DET 1 1 2 (text) DET
    NP* 2 text text NOUN 2 1 0 (select) DOBJ
    NP* 3 you you PRON 3 0
    VP 102 want want INF 4 0
    VP 102 to to TO 5 0
    VP 102 protect protect INF 6 1 3 (you) SUBJ
    -- 0 . . . 7 0

    UPenn:
    ( (S
        (NP-SBJ (-NONE- *) )
        (VP (VB select)
          (NP
            (NP (DT the) (NN text) )
            (SBAR
              (WHNP-1 (-NONE- 0) )
              (S
                (NP-SBJ-2 (PRP you) )
                (VP (VBP want)
                  (S
                    (NP-SBJ (-NONE- *-2) )
                    (VP (TO to)
                      (VP (VB protect)
                        (NP (-NONE- *T*-1) )))))))))
        (. .) ))

    ICE:
    PU CL(main,montr,imp)
     VB VP(trans,imp)
      MVB V(trans,imp) {select}
     OD NP()
      DT DTP()
       DTCE ART(def) {the}
      NPHD N(com,sing) {text}
      NPPO CL(depend,montr,pres)
       SU NP()
        NPHD PRON(pers) {you}
       VB VP(montr,pres)
        MVB V(montr,pres) {want}
       OD CL(depend,montr,infin)
        TO PRTCL(to) {to}
        VB VP(montr,infin)
         MVB V(montr,infin) {protect}
     PUNC PUNC(per) {.}

    POW Systemic-Functional Bracketed:
    [Z
        [CL
            [M select]
            [C
                [NGP
                    [DD the]
                    [H text]
                    [Q
                        [CL
                            [S
                                [NGP
                                    [HP you]
                                ]
                            ]
                            [M want]
                            [C
                                [CL
                                    [I to]
                                    [M protect]
                                ]
                            ]
                        ]
                    ]
                ]
            ]
            [E .]
        ]
    ]

    POW S-F Numerical:
    Z CL 1 M select 1 C NGP 2 DD the 2 H text 2 Q CL 3 S NGP HP you 3 M want
    3 C CL 4 I to 4 M protect 1 E .

    Brown:
    select/VB
    the/AT
    text/NN
    you/PPSS
    want/VB
    to/TO
    protect/VB
    ./.

    ICE:
    select/V(montr,infin)
    the/ART(def)
    text/N(com,sing)
    you/PRON(pers)
    want/V(montr,pres)
    to/PRTCL(to)
    protect/V(montr,imp)
    ./PUNC(per)

    LLC:
    select/VA+0
    the/TA
    text/NC
    you/RC
    want/VA+0
    to/PD
    protect/VA+0
    ./.

    LOB:
    select/VB
    the/ATI
    text/NN
    you/PP2
    want/VB
    to/TO
    protect/VB
    ./.

    UNIX Parts:
    select/adj
    the/art
    text/noun
    you/pron
    want/verb
    to/verb
    protect/verb
    ./.

    POW:
    select/P
    the/DD
    text/H
    you/HP
    want/M
    to/I
    protect/M
    ./.

    SEC:
    select/VB
    the/ATI
    text/NN
    you/PP2
    want/VB
    to/TO
    protect/VB
    ./.

    UPenn:
    select/VB
    the/DT
    text/NN
    you/PRP
    want/VBP
    to/TO
    protect/VB
    ./.

    BNC-C5:
    Select/VVB
    the/AT0
    text/NN1
    you/PNP
    want/VVB
    to/TO0
    protect/VVI
    ./PUN

    BNC-C6:
    Select/VV0
    the/AT
    text/NN1
    you/PPY
    want/VV0
    to/TO
    protect/VVI
    ./YSTP

    On Sat, 30 Aug 2003, peetm wrote:

    > Hi,
    >
    >
    >
    > I'm really interested in seeing alternative mark-ups of the following
    > sentence:
    >
    >
    >
    > "Time flies like an arrow whereas fruit flies like a banana"
    >
    >
    >
    > I know that 'accurate' is entirely subjective - and down to the tagger - but
    > - I'd like to see samples of mark-ups produced by this sentence, 'accurate'
    > or not (preferably with an explanation of the mark-up used:
    > methododology/tag set - or with links to the same).
    >
    >
    >
    > I'm especially interested in any mark-up that produces some hierarchical
    > XML-type output.
    >
    >
    >
    > So, if anyone feels like providing me with examples - PLEASE DO SO!
    >
    >
    >
    > Many thanks,
    >
    >
    >
    > peetm
    >
    >
    >
    > email: peet.morris@clg.ox.ac.uk
    >
    >
    >
    > addr: Computational Linguistics Group
    >
    > University of Oxford
    >
    > The Clarendon Institute
    >
    > Walton Street
    >
    > Oxford
    >
    > OX1 2HG
    >
    >
    >
    > ================================================
    >
    >
    >
    > Important: This email is intended for the use of the individual addressee(s)
    > named above and may contain information that is confidential, privileged or
    > unsuitable for overly sensitive persons with low self-esteem, no sense of
    > humour or irrational religious beliefs.
    >
    >
    >
    > If you are not the intended recipient, then social etiquette demands that
    > you fully appropriate the message without trace of the former sender and
    > triumphantly claim it as your own. Leaving a former sender's signature on a
    > "forwarded" email is very bad form and, while being only a technical breach
    > of the Olympic ideal, does in fact constitute an irritating social faux pas.
    >
    >
    >
    > Further, sending this email to a colleague does not appear to breach the
    > provisions of the Copyright Amendment (Digital Agenda) Act 2000 of the
    > Commonwealth, because chances are none of the thoughts contained in this
    > email are in any sense original...
    >
    >
    >
    > Finally, if you have received this email in error, shred it immediately,
    > then add it to some nutmeg, egg whites and caster sugar. Whisk until stiff
    > peaks form, then place it in a warm oven for 40 minutes. Remove promptly and
    > let it stand for 2 hours before adding the decorative kiwi fruit and cream.
    > Then notify me immediately by return email and eat the original message.
    >
    >
    >
    >

    -- 
    Eric Atwell, CVL: Computer Vision and Language research group
    Distributed Multimedia Systems MSc Tutor & SOCRATES/JYA Tutor
    School of Computing, University of Leeds, LEEDS LS2 9JT
    TEL: 0113-3435761  MOBILE: 0775-1039104 FAX: 0113-3435468
    WWW: http://www.comp.leeds.ac.uk/eric  EMAIL: eric@comp.leeds.ac.uk
    Visit http://www.computingLEEDS.ac.uk - our newsletter for industry
    



    This archive was generated by hypermail 2b29 : Sun Sep 07 2003 - 11:56:48 MET DST