[Corpora-List] congressional-speech dataset available

From: Lillian Lee (llee@cs.cornell.edu)
Date: Thu Dec 14 2006 - 07:02:59 MET

  • Next message: Rayson, Paul: "[Corpora-List] Release of Russian Semantic Lexicon and Multiword list"

    The "congressional speech" corpus and associated graph information
    used in our "Get out the vote: Determining support or opposition from
    Congressional floor-debate transcripts" EMNLP 2006 paper is now
    available.

    Specifically, the data includes speeches as individual documents,
    together with:

        * automatically-derived labels for whether the speakers supported
          the legislation under discussion or not, allowing for
          experiments with this kind of sentiment analysis

        * indications of which debate each speech comes from (and the
          position within the debate), allowing for consideration of
          conversational structure

        * indications of by-name references between speakers, allowing for
          experiments with agreement classification (if one determines the
          "true" labels from the support/oppose labels assigned to the
          pair of speakers in question)

        * the edge weights and other information we derived to create the
          graphs we used for our experiments upon this data, facilitating
          implementation of alternative graph-based classification methods
          upon the graphs we constructed

    The download site is:
    http://www.cs.cornell.edu/home/llee/data/convote.html

    Matt Thomas, Bo Pang, and Lillian Lee



    This archive was generated by hypermail 2b29 : Thu Dec 14 2006 - 07:18:59 MET