Substitutes for spoken corpora

Mark Davies (mdavies@rs6000.cmp.ilstu.edu)
Mon, 15 Jul 1996 12:07:21 -0500

I've got a multi-million word corpus of both historical and modern
Portuguese (both European and Brazilian). The one gap, however, is a spoken
corpus of *spoken* European Portuguese. I've searched and searched, and am
pretty much convinced that there is no such animal out there.

So, I'm trying to find a substitute, one that will model the spoken language
fairly well. I was thinking that one possibility might be some recent plays
by playwrights whose works have a style that imitates colloquial Portuguese
quite well.

I'm wondering, however, how valid this is methodologically. While it's not
an actual spoken corpus, it's probably as close as I'm going to get (unless
I get on a plane to Lisbon, record a large number of speech sample,
transcribe them, etc etc - which isn't real probable). Any comments /
suggestions on using substitute materials like this?

==================================================================
Mark Davies, Assistant Professor, Spanish Linguistics
Dept. of Foreign Languages, Illinois State University
Normal, IL 61790-4300

Voice:309/438-7975 email:mdavies@ilstu.edu
Fax:309/438-8038 http://www.ilstu.edu/~mdavies/
==================================================================