Re: [Corpora-List] grapheme-to-phoneme mapping

From: Simon King (Simon.King@ed.ac.uk)
Date: Fri Aug 19 2005 - 11:13:50 MET DST

  • Next message: Niels Ott: "Re: [Corpora-List] Extracting only editorial content from a HTML page"

    n.chipere@reading.ac.uk wrote:
    > Dear all
    >
    > I am looking for a word list that specifies direct grapheme-phoneme
    > mappings. The lists I'm familiar with, eg. CMU Pronunciation Dictionary; MRC
    > Psycholinguistic Database; Moby Pronunciator and the Oxford Learner's
    > Dictionary all provide phonetic transcriptions for entire words but do not
    > indicate which particular grapheme(s) correspond(s) to which particular
    > phoneme(s).

    The letter-to-sound decision tree from Festival can do this - you
    provide a letter, plus some left and right context letters, and it
    predicts the zero or more phoneme(s) that it maps to.

    http://www.cstr.ed.ac.uk/projects/festival/

    To train the model, the letters and phonemes in an existing dictionary
    are aligned using dynamic programming (after the mapping is hand-seeded,
    I think) - see

    Alan W Black, Kevin Lenzo, and Vincent Pagel. Issues in building general
    letter to sound rules. In The Third ESCA Workshop in Speech Synthesis,
    pages 77-80, 1998.

    available from

    http://www.cstr.ed.ac.uk/publications/

    I believe this model is as accurate as anything else available, but will
    of course have a much higher error rate than a proper lexicon (e.g. our
    Unisyn lexicon http://www.cstr.ed.ac.uk/projects/unisyn/)

    Simon

    -- 
    Dr. Simon King                               Simon.King@ed.ac.uk
    Centre for Speech Technology Research          www.cstr.ed.ac.uk
    For MSc/PhD info, visit  www.hcrc.ed.ac.uk/language-at-edinburgh
    



    This archive was generated by hypermail 2b29 : Fri Aug 19 2005 - 12:03:02 MET DST