RE: [Corpora-List] Variant verbal government extraction

From: Adam Kilgarriff (adam@lexmasterclass.com)
Date: Fri Feb 23 2007 - 05:23:51 MET

  • Next message: Kiril Simov: "[Corpora-List] Second Call for Workshop Proposals"

    Mikhail,

     

    The algorithm you want is

     

    In a large corpus

         For each verb

                Find how often it occurs in pattern <VERB PRONOUN>

                Find how often it occurs in pattern <VERB to PRONOUN>

                Compute a statistic to see how high both these numbers are,
    relative to overall freq of verb

    Sort verbs according to the statistic

     

    Now you have a starter set for examining which verbs show the behaviour you
    want to investigate.

     

    All relevant frequencies are available for, eg, the BNC, in the Sketch
    Engine http://www.sketchengine.co.uk <http://www.sketchengine.co.uk/> where
    you can define the patterns in CQL (Corpus Query Language from Stuttgart
    Uni). We don't currently have a nice web interface for robots but will have
    shortly, in the meantime, ask us and we can set things up to help you (eg by
    allowing you robot access and then you'd need to scrape web pages)

     

    Regards,

     

                Adam

     

     

    -----Original Message-----
    From: owner-corpora@lists.uib.no [mailto:owner-corpora@lists.uib.no] On
    Behalf Of Mikhail Kopotev
    Sent: 22 February 2007 13:15
    Cc: CORPORA@UIB.NO
    Subject: [Corpora-List] Variant verbal government extraction

     

    Dear all,

    does anyone know how to recognize and extract variations of verbal
    government such as "to write you/to you' from a corpus?

    As far as I am interested in Russian morphosyntactic changes, I would like
    you to point me any tools, methods rather than obtained results, concerning
    English or any other irrelevant languages.

    Many thanks,

    Mikhail Kopotev
    Researcher
    Department of Slavonic
    and Baltic Languages and Literatures
    University of Helsinki



    This archive was generated by hypermail 2b29 : Fri Feb 23 2007 - 05:21:58 MET