RE: [Corpora-List] agent and patient probabilities

From: Adam Kilgarriff (adam@lexmasterclass.com)
Date: Thu Jan 25 2007 - 15:19:28 MET

  • Next message: Adam Kilgarriff: "RE: [Corpora-List] audio recordings of doctor-patient consultations"

    Jim,

    Take a look at the Sketch Engine (http://www.sketchengine.co.uk -
    self-registration for free trial). It finds the triples you are looking for
    in a fully corpus-based way.

    Resources like the Penn Treebank are way too small. BNC (100 million words)
    supports pretty good list of high-salience triples, as you'll see on the
    site.

    Adam Kilgarriff

    -----Original Message-----
    From: owner-corpora@lists.uib.no [mailto:owner-corpora@lists.uib.no] On
    Behalf Of Jim Magnuson
    Sent: 23 January 2007 04:31
    To: corpora@lists.uib.no
    Subject: [Corpora-List] agent and patient probabilities

    I'm a psycholinguist rather than a computational linguist, with a
    "newbie" question.

    For some experiments, we need agent-verb-patient triples where the
    "goodness" of the agents and patients to the verb vary in strength.
    Typical ways to develop materials for such studies is by having human
    subjects rate how "good" various items are as agents and patients for
    particular verbs (e.g., "how likely is a dog to walk?", "how likely
    is a dog to be walked?"). While this works well, it's of course very
    labor (and subject) intensive. So I'm hoping to automate this.

    I'm looking for recommendations for parsed corpora and tools to use
    (with the goal of getting this going ASAP).

    I know about the Penn Treebank; are there better and/or less
    expensive options for US English, or is this just the way to go?

    I'm an okay perl programmer, and computer savvy; are there tools that
    would be helpful?

    Thanks very much,

    jim



    This archive was generated by hypermail 2b29 : Thu Jan 25 2007 - 15:17:25 MET