Re: [Corpora-List] Summary of responses: relational databases and phrasal verbs

From: Timothy Baldwin (tbaldwin@csli.stanford.edu)
Date: Wed Oct 29 2003 - 23:31:52 MET

  • Next message: Victoria López: "[Corpora-List] Sublanguage/controlled language"

    > Within the past month or so I posted two queries to CORPORA, one dealing
    > with creating a tagger using relational databases, and the other with
    > the frequency of phrasal verbs in English. I received a number of
    > replies, which are found at the following URL:
    >
    > http://davies-linguistics.byu.edu/responses/corpora1.htm

    As a last minute response to the question of phrasal verb frequency, I have
    compiled a list of verb particles extracted out of the written portion of the
    BNC, along with frequencies and valence counts for each. The list is
    downloadable from:

    mwe.stanford.edu/resources/

    along with a link to a slightly outdated description of the extraction
    technique used. The frequencies are almost certainly underestimates, as I was
    more interested in recall than precision when I put this data together, and
    the valence judgements should be taken with a pinch of salt. By way of note,
    this is the data reported in:

    Villavicencio, Aline (2003) Verb-Particle Constructions and Lexical Resources,
    In Proceedings of the ACL-2003 Workshop on Multiword Expressions: Analysis,
    Acquisition and Treatment, Sapporo, Japan.

    Baldwin, Timothy, Colin Bannard, Takaaki Tanaka and Dominic Widdows (2003) An
    Empirical Model of Multiword Expression Decomposability, In Proceedings of the
    ACL-2003 Workshop on Multiword Expressions: Analysis, Acquisition and
    Treatment, Sapporo, Japan, pp. 89-96.

    Colin Bannard, Timothy Baldwin and Alex Lascarides (2003) A Statistical
    Approach to the Semantics of Verb-Particles, In Proceedings of the ACL-2003
    Workshop on Multiword Expressions: Analysis, Acquisition and Treatment,
    Sapporo, Japan, pp. 65-72.

    I hope this helps,

    Tim



    This archive was generated by hypermail 2b29 : Wed Oct 29 2003 - 23:33:09 MET