[Corpora-List] CNN Transcripts

From: Mark Davies (Mark_Davies@byu.edu)
Date: Wed Nov 16 2005 - 18:31:10 MET

Next message: Andreea Irina Constantinescu: "[Corpora-List] Summary on "Computers and motivation""

Previous message: Marie-Paule PERY-WOODLEY: "[Corpora-List] CFP: DISCOURSE AND DOCUMENT"
Next in thread: David Graff: "Re: [Corpora-List] CNN Transcripts"
Reply: David Graff: "Re: [Corpora-List] CNN Transcripts"
Reply: Stephanie M. Strassel: "Re: [Corpora-List] CNN Transcripts"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Has anyone here done much with the CNN transcripts:
http://transcripts.cnn.com/TRANSCRIPTS/ ?

I'm aware of one publication (below), but would be interested in others
as well:

Hoffmann, Sebastian. "From Web-Page to Mega-Corpus: The CNN
Transcripts." In: Marianne Hundt, Nadja Nesselhauf and Carolin Biewer
(eds.) Corpus Linguistics and the Web. Amsterdam: Rodopi.

I'm also aware of some LDC Corpora that contain CNN transcripts, but in
general these appear to be either from the newspaper or from scripted
news broadcasts, e.g.:

http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC98T25
http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2003T11

At any rate, even though the genre/register of these transcripts is
fairly homogenous, they do contain more than 170 million words of
unscripted spoken English, so it seems like it might be a nice resource.

Thanks in advance for any information that you might have.

Mark Davies

=================================================

Mark Davies
Assoc. Prof., Linguistics
Brigham Young University
(phone) 801-422-9168 / (fax) 801-422-0906

http://davies-linguistics.byu.edu

** Corpus design and use // Linguistic databases **
** Historical linguistics // Language variation **
** English, Spanish, and Portuguese **

=================================================

Next message: Andreea Irina Constantinescu: "[Corpora-List] Summary on "Computers and motivation""
Previous message: Marie-Paule PERY-WOODLEY: "[Corpora-List] CFP: DISCOURSE AND DOCUMENT"
Next in thread: David Graff: "Re: [Corpora-List] CNN Transcripts"
Reply: David Graff: "Re: [Corpora-List] CNN Transcripts"
Reply: Stephanie M. Strassel: "Re: [Corpora-List] CNN Transcripts"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Wed Nov 16 2005 - 18:57:40 MET