[Corpora-List] Re: similarity

From: Eric Atwell (eric@comp.leeds.ac.uk)
Date: Tue Sep 09 2003 - 23:10:27 MET DST

  • Next message: Jing-Shin Chang: "[Corpora-List] [IJCNLP-04] Call for Proposals for Thematic Sessions"

    Marco,

    I apologise, I (mis-)parsed "Time flies like an arrow"
    as <Imperative-verb> <object-noun-phrase> <adverbial-phrase/clause>
    then looked in the corpus for another sentence with this structure,
    and found "Select the text you want to protect"

    You are right, "Scrolling changes the display but..."
    IS closer in grammatical structure to an alternative parse,
    <subject-noun-phrase> <verb> <object-noun-phrase>
    (You are welcome to look at the rival parses/taggings of this instead!)

    Some linguists might say that a sentence can have more than one structure.
    I prefer the "pro-corpus" stance that a sentence should not be parsed
    in isolation but that even for sentence-syntax you need to take
    context into account; and that a sentence will have only one parse
    depending on context (except for comparatively rare cases of deliberate
    ambiguity, eg in jokes/puns)

    But this may be leading discussion away from "similarity"...

    Eric Atwell

    On Tue, 9 Sep 2003, Marco Antonio Esteves da Rocha wrote:

    >
    >
    > Hello, all, this is very likely to be a linguist's statement of
    > ignorance as to how an automatic POS tagger works. Even worse (for a
    > linguist), it may mean ignorance of the meaning assigned to the phrase
    > 'grammatical structure'. But I am really curious to know why *select the
    > text you want to protect* is in any way similar to *Time flies like an
    > arrow...* any more then, say, *Scrolling changes the display but does
    > not move the insertion point.* (selected from the sample of sentences in
    > AMALGAM). In fact, I believe the idea of similarity and the various
    > degrees of nearness, in terms of grammatical structure, may actually
    > prove useful for enhancing automatic parsing and POS tagging. So it may
    > be worth discussing.
    >
    > Marco Rocha
    >
    > Eric Atwell wrote:
    >
    > > Peet,
    > >
    > > The AMALGAM project at Leeds University collected a "MULTI-TREEBANK",
    > > A sample of sentences annotated with 24 rival parsing and PoS-tagging
    > schemes,
    > > see http://www.comp.leeds.ac.uk/amalgam/amalgam/multi-parsed.html
    > > and http://www.comp.leeds.ac.uk/amalgam/amalgam/multi-tagged.html
    > >
    > > Parse trees as raw output of 10 rival parsers:
    > > Alice, DESPAR, ENGCG, Principar, Link, RANLP, Carroll/Briscoe Shallow
    > Parser,
    > > WordPerfect's Grammatik, Tosca, Sextant;
    > >
    > > Parse trees representing 4 English corpus parsing schemes:
    > > UPenn, ICE, POW Systemic-Functional Bracketed, POW S-F Numerical
    > >
    > > PoS-tagged text representing 10 English corpus PoS-tagging schemes:
    > > Brown, ICE, LLC, LOB, UNIX Parts, POW, SEC, UPenn, BNC-C5, and BNC-C6.
    > >
    > > The sample sentences were from software manuals (tho the PoS-tagged
    > samples
    > > were extended to also include BBC radio and London teenager
    > sentences), see
    > >
    > http://www.comp.leeds.ac.uk/amalgam/amalgam/corpus/tagged/raw/ipsm_raw.html
    > >
    > > [note: IF YOU HAVE A PARSER/TAGGER, PLEASE VOLUNTEER TO PARSE/TAG THESE
    > > SENTENCES AND DONATE THE OUTPUT TO THE MULTITREEBANK FOR ALL TO SHARE!]
    > >
    > > Unsurprisingly, the sample does not include your example "Time flies
    > like..."
    > > - the nearest (in grammatical structure) I could find in the sample was:
    > > "Select the text you want to protect."
    > >
    > >
    > > Alice:
    > > (SENT (SENT-MOD (UNK-CAT "Select") (NP (DET "the") (NOUN "text")))
    > > (SENT (VP-ACT (NP "you") (V-TR "want")) (NP NULL-PHON))) (SENT-MOD
    > > (UNK-CAT "to") (NP "protect"))
    > >
    > > DESPAR:
    > > VB select 1 --> 8 -
    > > DT the 2 --> 3 [
    > > NN text 3 --> 1 + OBJ
    > > PP you 4 --> 5 " SUB
    > > VBP want 5 --> 3 ]
    > > TO to 6 --> 7 -
    > > VB protect 7 --> 5 -
    > > . . 8 --> 0 -
    > >
    > > ENGCG:
    > > "<Select>"
    > > "select" <*> <SVO> <SV> <P/for> V IMP VFIN @+FMAINV
    > > "<the>"
    > > "the" <Def> DET CENTRAL ART SG/PL @DN>
    > > "<text>"
    > > "text" N NOM SG @OBJ
    > > "<you>"
    > > "you" <NonMod> PRON PERS NOM SG2/PL2 @SUBJ
    > > "<want>"
    > > "want" <SVOC/A> <SVO> <SV> <P/for> V PRES -SG3 VFIN @+FMAINV
    > > "<to>"
    > > "to" INFMARK> @INFMARK>
    > > "<protect>"
    > > "protect" <SVO> V INF @-FMAINV
    > > "<$.>"
    > >
    > > Principar:
    > > (
    > > (Select ~ V_NP *)
    > > (the ~ Det < text spec)
    > > (text ~ N > Select comp1)
    > > (you ~ N < want subj)
    > > (want ~ V_CP > text rel)
    > > (to ~ I > want comp1)
    > > (protect ~ V_NP > to pred)
    > > (. )
    > > )
    > >
    > > Link:
    > > parse not found
    > >
    > > RANLP:
    > > (VP/NP select
    > > (N2+/DET1a the
    > > (N2-
    > > (N1/INFMOD
    > > (N1/RELMOD1 (N1/N text)
    > > (S/THATLESSREL (S1a (N2+/PRO you) (VP/NP want (TRACE1 E)))))
    > > (VP/TO to (VP/NP protect (TRACE1 E)))))))
    > >
    > > Carroll/Briscoe Shallow Parser:
    > > parse not found
    > >
    > > WordPerfect's Grammatik:
    > > SENTENCE
    > > |- CLAUSE 1
    > > | |- VERB ---------------- Select
    > > | |- DIRECT-OBJECT ------- the text
    > > |- CLAUSE 2 - RELATIVE
    > > |- SUBJECT ------------- you
    > > |- VERB ---------------- want
    > > |- DIRECT-OBJECT ------- {the text}
    > > |- VERB-Infinitive ----- to protect
    > > |- --------------------- .
    > >
    > > Tosca:
    > > parse not found
    > >
    > > Sextant:
    > > VP 101 Select select INF 0 0
    > > NP 2 the the DET 1 1 2 (text) DET
    > > NP* 2 text text NOUN 2 1 0 (select) DOBJ
    > > NP* 3 you you PRON 3 0
    > > VP 102 want want INF 4 0
    > > VP 102 to to TO 5 0
    > > VP 102 protect protect INF 6 1 3 (you) SUBJ
    > > -- 0 . . . 7 0
    > >
    > > UPenn:
    > > ( (S
    > > (NP-SBJ (-NONE- *) )
    > > (VP (VB select)
    > > (NP
    > > (NP (DT the) (NN text) )
    > > (SBAR
    > > (WHNP-1 (-NONE- 0) )
    > > (S
    > > (NP-SBJ-2 (PRP you) )
    > > (VP (VBP want)
    > > (S
    > > (NP-SBJ (-NONE- *-2) )
    > > (VP (TO to)
    > > (VP (VB protect)
    > > (NP (-NONE- *T*-1) )))))))))
    > > (. .) ))
    > >
    > > ICE:
    > > PU CL(main,montr,imp)
    > > VB VP(trans,imp)
    > > MVB V(trans,imp) {select}
    > > OD NP()
    > > DT DTP()
    > > DTCE ART(def) {the}
    > > NPHD N(com,sing) {text}
    > > NPPO CL(depend,montr,pres)
    > > SU NP()
    > > NPHD PRON(pers) {you}
    > > VB VP(montr,pres)
    > > MVB V(montr,pres) {want}
    > > OD CL(depend,montr,infin)
    > > TO PRTCL(to) {to}
    > > VB VP(montr,infin)
    > > MVB V(montr,infin) {protect}
    > > PUNC PUNC(per) {.}
    > >
    > > POW Systemic-Functional Bracketed:
    > > [Z
    > > [CL
    > > [M select]
    > > [C
    > > [NGP
    > > [DD the]
    > > [H text]
    > > [Q
    > > [CL
    > > [S
    > > [NGP
    > > [HP you]
    > > ]
    > > ]
    > > [M want]
    > > [C
    > > [CL
    > > [I to]
    > > [M protect]
    > > ]
    > > ]
    > > ]
    > > ]
    > > ]
    > > ]
    > > [E .]
    > > ]
    > > ]
    > >
    > > POW S-F Numerical:
    > > Z CL 1 M select 1 C NGP 2 DD the 2 H text 2 Q CL 3 S NGP HP you 3 M want
    > > 3 C CL 4 I to 4 M protect 1 E .
    > >
    > > Brown:
    > > select/VB
    > > the/AT
    > > text/NN
    > > you/PPSS
    > > want/VB
    > > to/TO
    > > protect/VB
    > > ./.
    > >
    > > ICE:
    > > select/V(montr,infin)
    > > the/ART(def)
    > > text/N(com,sing)
    > > you/PRON(pers)
    > > want/V(montr,pres)
    > > to/PRTCL(to)
    > > protect/V(montr,imp)
    > > ./PUNC(per)
    > >
    > > LLC:
    > > select/VA+0
    > > the/TA
    > > text/NC
    > > you/RC
    > > want/VA+0
    > > to/PD
    > > protect/VA+0
    > > ./.
    > >
    > > LOB:
    > > select/VB
    > > the/ATI
    > > text/NN
    > > you/PP2
    > > want/VB
    > > to/TO
    > > protect/VB
    > > ./.
    > >
    > > UNIX Parts:
    > > select/adj
    > > the/art
    > > text/noun
    > > you/pron
    > > want/verb
    > > to/verb
    > > protect/verb
    > > ./.
    > >
    > > POW:
    > > select/P
    > > the/DD
    > > text/H
    > > you/HP
    > > want/M
    > > to/I
    > > protect/M
    > > ./.
    > >
    > > SEC:
    > > select/VB
    > > the/ATI
    > > text/NN
    > > you/PP2
    > > want/VB
    > > to/TO
    > > protect/VB
    > > ./.
    > >
    > > UPenn:
    > > select/VB
    > > the/DT
    > > text/NN
    > > you/PRP
    > > want/VBP
    > > to/TO
    > > protect/VB
    > > ./.
    > >
    > > BNC-C5:
    > > Select/VVB
    > > the/AT0
    > > text/NN1
    > > you/PNP
    > > want/VVB
    > > to/TO0
    > > protect/VVI
    > > ./PUN
    > >
    > > BNC-C6:
    > > Select/VV0
    > > the/AT
    > > text/NN1
    > > you/PPY
    > > want/VV0
    > > to/TO
    > > protect/VVI
    > > ./YSTP
    > >
    > >
    > >
    > >
    > > On Sat, 30 Aug 2003, peetm wrote:
    > >
    > >
    > >>Hi,
    > >>
    > >>
    > >>
    > >>I'm really interested in seeing alternative mark-ups of the following
    > >>sentence:
    > >>
    > >>
    > >>
    > >>"Time flies like an arrow whereas fruit flies like a banana"
    > >>
    > >>
    > >>
    > >>I know that 'accurate' is entirely subjective - and down to the
    > tagger - but
    > >>- I'd like to see samples of mark-ups produced by this sentence,
    > 'accurate'
    > >>or not (preferably with an explanation of the mark-up used:
    > >>methododology/tag set - or with links to the same).
    > >>
    > >>
    > >>
    > >>I'm especially interested in any mark-up that produces some hierarchical
    > >>XML-type output.
    > >>
    > >>
    > >>
    > >>So, if anyone feels like providing me with examples - PLEASE DO SO!
    > >>
    > >>
    > >>
    > >>Many thanks,
    > >>
    > >>
    > >>
    > >>peetm
    > >>
    > >>
    > >>
    > >>email: peet.morris@clg.ox.ac.uk
    > >>
    > >>
    > >>
    > >>addr: Computational Linguistics Group
    > >>
    > >> University of Oxford
    > >>
    > >> The Clarendon Institute
    > >>
    > >> Walton Street
    > >>
    > >> Oxford
    > >>
    > >> OX1 2HG
    > >>
    > >>
    > >>
    > >>================================================
    > >>
    > >>
    > >>
    > >>Important: This email is intended for the use of the individual
    > addressee(s)
    > >>named above and may contain information that is confidential,
    > privileged or
    > >>unsuitable for overly sensitive persons with low self-esteem, no sense of
    > >>humour or irrational religious beliefs.
    > >>
    > >>
    > >>
    > >>If you are not the intended recipient, then social etiquette demands that
    > >>you fully appropriate the message without trace of the former sender and
    > >>triumphantly claim it as your own. Leaving a former sender's
    > signature on a
    > >>"forwarded" email is very bad form and, while being only a technical
    > breach
    > >>of the Olympic ideal, does in fact constitute an irritating social
    > faux pas.
    > >>
    > >>
    > >>
    > >>Further, sending this email to a colleague does not appear to breach the
    > >>provisions of the Copyright Amendment (Digital Agenda) Act 2000 of the
    > >>Commonwealth, because chances are none of the thoughts contained in this
    > >>email are in any sense original...
    > >>
    > >>
    > >>
    > >>Finally, if you have received this email in error, shred it immediately,
    > >>then add it to some nutmeg, egg whites and caster sugar. Whisk until
    > stiff
    > >>peaks form, then place it in a warm oven for 40 minutes. Remove
    > promptly and
    > >>let it stand for 2 hours before adding the decorative kiwi fruit and
    > cream.
    > >>Then notify me immediately by return email and eat the original message.
    > >>
    > >>
    > >>
    > >>
    > >>
    > >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >

    -- 
    Eric Atwell, Senior Lecturer, Computer Vision and Language research group
    Distributed Multimedia Systems MSc Tutor & SOCRATES/JYA Tutor
    School of Computing, University of Leeds, LEEDS LS2 9JT
    TEL: 0113-3435761  MOBILE: 0775-1039104 FAX: 0113-3435468
    WWW: http://www.comp.leeds.ac.uk/eric  EMAIL: eric@comp.leeds.ac.uk
    Visit http://www.computingLEEDS.ac.uk - our newsletter for industry
    



    This archive was generated by hypermail 2b29 : Tue Sep 09 2003 - 23:23:42 MET DST