Dear Corpora Listers,
I have two queries concerning English speech corpora.
1. I am looking for a speech corpus (language: English) that is part-of-
speech tagged and has soundfiles, transcriptions and part-of-speech tags
aligned. Furthermore, it needs to be of considerable size (> 100,000 word
tokens, if possible). Can anyone point me towards pertinent corpora?
So far I only found one corpus that meets all the criteria mentioned
above, the Boston University Radio News Corpus.
2. In spite of hour-long efforts and the help of experienced colleagues I
have not managed to open the example files of the BU Radio News Corpus
properly, no matter whether I used PRAAT, Wavesurfer, or Transcriber. All
three programs can open the sound file (.sph) without problems but neither
of the programs can access the files with the transcription or the part-of-
speech tags and align this information with the sound wave. Can anyone
help? Which program(s) can do the job?
Any help will be greatly appreciated.
Many thanks in advance!
Best regards,
Ingo Plag
-- Ingo Plag Linguistics Research Center University of California at Santa Cruz Santa Cruz CA 95060 USAphone (+1)-831-459-3823 fax (+1)-831-459-3334 (c/o Junko Ito)
phone at home: (+1)-831-429-1306
This archive was generated by hypermail 2b29 : Thu May 13 2004 - 21:23:03 MET DST