Re: [Corpora-List] agent and patient probabilities

From: Ulrike Pado (ulrike@CoLi.Uni-SB.DE)
Date: Wed Jan 24 2007 - 15:43:05 MET

  • Next message: Florian Petran: "Re: [Corpora-List] chaker jebari"

    Jim,

    > For some experiments, we need agent-verb-patient triples where the
    > "goodness" of the agents and patients to the verb vary in strength.
    > Typical ways to develop materials for such studies is by having human
    > subjects rate how "good" various items are as agents and patients
    > for particular verbs (e.g., "how likely is a dog to walk?", "how
    > likely is a dog to be walked?"). While this works well, it's of
    > course very labor (and subject) intensive. So I'm hoping to automate
    > this.

    Philip Resnik's work is definitely an excellent place to look.

    Beyond that, my work on modelling human language processing might also
    be of interest to you. One large part of my PhD work (the thesis was
    submitted recently) was to build a model that predicts human judgements
    about the plausibility of verb-argument-relation triples.

    Key differences to Resnik's work are a generative formulation (i.e.,
    plausible roles and arguments can be straightforwardly generated given a
    verb) and the use of thematic roles to define the relation between verb
    and argument. We tested the model against literature norming data (e.g.,
    McRae et al. 1998, Trueswell et al. 1994) and against norms we elicited
    ourselves for verb-argument-role triples extracted from corpora.

    Details can be found in
    U. Pado, M. Crocker and F. Keller, Modelling Semantic Role Plausibility
    in Human Sentence Processing. EACL, Trento, 2006.
    and
    U. Pado, F. Keller and M. Crocker, Combining Syntax and Thematic Fit in
    a Probabilistic Model of Sentence Processing. CogSci, Vancouver, 2006.

    In the thesis, I also do a comparison to Philip Resnik's and two other
    selectional preference models on the sets of norming data I mentioned. I
    replicate Resnik's original successful evaluation, but our model tends
    to do even a bit better at predicting plausibility judgements across the
    different data sets.

    If you'd like more information or have any questions, please let me know :)

    > I know about the Penn Treebank; are there better and/or less
    > expensive options for US English, or is this just the way to go?

    It might be worthwhile to use a role-annotated corpus to make sure you
    really catch the verb-argument relations you're after.

    The PropBank (role annotations to parts of the Penn Treebank) is the
    largest role-annotated corpus available, and it's American English, but
    you may want to have a look at the FrameNet corpus as well. It's a
    subset of the British National Corpus, and therefore much more balanced
    in vocabulary. For example, I find that its vocabulary is closer to
    "typical" psycholinguistic items than that of PropBank with its bias
    towards financial language.

    The FrameNet home page is at http://framenet.icsi.berkeley.edu/, and if
    I understand correctly, the corpus is free for research purposes.

    Regards,

    Ulrike

    -- 
      Ulrike Pado
    

    Computational Linguistics Saarland University D-66041 Saarbrücken

    www.coli.uni-sb.de/~ulrike



    This archive was generated by hypermail 2b29 : Wed Jan 24 2007 - 15:41:05 MET