[Corpora-List] congressional-speech dataset available

From: Lillian Lee (llee@cs.cornell.edu)
Date: Thu Dec 14 2006 - 07:02:59 MET

Next message: Rayson, Paul: "[Corpora-List] Release of Russian Semantic Lexicon and Multiword list"

Previous message: Djoerd Hiemstra: "[Corpora-List] SIGIR 2007 2nd Call for Papers"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

The "congressional speech" corpus and associated graph information
used in our "Get out the vote: Determining support or opposition from
Congressional floor-debate transcripts" EMNLP 2006 paper is now
available.

Specifically, the data includes speeches as individual documents,
together with:

    * automatically-derived labels for whether the speakers supported
      the legislation under discussion or not, allowing for
      experiments with this kind of sentiment analysis

    * indications of which debate each speech comes from (and the
      position within the debate), allowing for consideration of
      conversational structure

    * indications of by-name references between speakers, allowing for
      experiments with agreement classification (if one determines the
      "true" labels from the support/oppose labels assigned to the
      pair of speakers in question)

    * the edge weights and other information we derived to create the
      graphs we used for our experiments upon this data, facilitating
      implementation of alternative graph-based classification methods
      upon the graphs we constructed

The download site is:
http://www.cs.cornell.edu/home/llee/data/convote.html

Matt Thomas, Bo Pang, and Lillian Lee

Next message: Rayson, Paul: "[Corpora-List] Release of Russian Semantic Lexicon and Multiword list"
Previous message: Djoerd Hiemstra: "[Corpora-List] SIGIR 2007 2nd Call for Papers"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Thu Dec 14 2006 - 07:18:59 MET