Re: Corpora: rewrite rules for speech

From: James L. Fidelholtz (jfidel@siu.buap.mx)
Date: Wed Oct 25 2000 - 01:40:04 MET DST

  • Next message: Kevin Knight: "Corpora: NAACL-01 Final Call for Papers"

    On Mon, 23 Oct 2000, Jim Magnuson wrote:

    >Hi. I am trying to compute estimates of, e.g., diphone transitional
    >probabilities in conversational speech. So far I have worked with the
    >CallHome database from the LDC. What I'm working with are orthographic
    >transcripts of telephone conversations. I've replaced all of the
    >orthographic forms with phonemic citation forms. This gives me very
    >different estimates of diphone probabilities than, e.g., written corpora
    >or frequency-weighted dictionaries.
    >
    >However, citation forms are obviously not ideal. For my purposes, it is
    >not worth investing in retranscribing the corpus phonetically. But I would
    >like to improve my estimates by applying phonological rules to my corpus
    >of phonemic citation forms. Could anyone point me towards a source of such
    >rules for American English? I've started working on my own, but would
    >rather not reinvent anything.
    >
    Jim:
            The Commodore 64 had a pretty decent program for converting
    writing to speech (I think it was in C64 BASIC, which should make it
    easy to read the rules off of, and to convert for your purposes). I
    can't get at it any more, and I don't remember the name, but it should
    be traceable somewhere on the web.
            Another tack: there is a book edited by Philip A. Luelsdorff
    (1987. _Orthography and phonology_. Amsterdam: John Benjamins) with
    articles which should be some help, and not only for English. While old
    and groty, you might find some help from:

    Hultzén, Lee S.; Joseph H. D. Allen Jr.; and Murray S. Miron. 1964.
    _Tables of transitional frequencies of English phonemes_. Urbana: U of
    Illinois Press.

    Even older and grotier, but maybe useful is:

    Dewey, Godfrey. 1923. _Relativ [sic] frequency of English speech
    sounds_. Cambridge: Harvard U. Press. -- I think this is still not out
    of print.

    Luelsdorff, in particular, has more (earlier) stuff of interest, I
    believe one book in the Mouton blue series. Just look around a good
    library a little. Lots of work has definitely been done on this.
                    Jim

    -- 
    James L. Fidelholtz			e-mail: jfidel@siu.buap.mx
    Posgrado en Ciencias del Lenguaje	tel.: +(52-2)229-5500 x5705
    Instituto de Ciencias Sociales y Humanidades	fax: +(01-2) 229-5681
    Benemérita Universidad Autónoma de Puebla, MÉXICO
    



    This archive was generated by hypermail 2b29 : Wed Oct 25 2000 - 01:44:21 MET DST