SV: Corpora: edit distance and spell checking

From: Kristina Kjellson (kristina.kjellson@nst.as)
Date: Mon Dec 03 2001 - 11:05:41 MET

  • Next message: Noord G.J.M. van: "Corpora: Journal NLE: special issue FSMNLP, call for papers"

    Is there anyone who has tried the perl package string::approx with success
    when trying to spell check a corpus? Or does anyone have another suggestion?
    Our aim is to try to generate a lexicon from the corpus but because of the
    topic, there are lots of frequent spelling mistakes.

    /Kristina Kjellson
    Language engineer
    Nordisk språkteknologi, Norway

    -----Ursprungligt meddelande-----
    Från: Bruce L. Lambert, Ph.D. [mailto:lambertb@uic.edu]
    Skickat: den 30 november 2001 19:43
    Till: CORPORA@HD.UIB.NO
    Ämne: Re: Corpora: approximations (bounds) for edit distance

    Maybe I'm missing something, but the upper bound on edit distance between
    two strings is always the length of the longer string, and the lower bound
    is always zero (when the strings are identical).

    -bruce

    At 06:43 PM 11/29/01 +0000, Computer Researcher wrote:
    >Hi,
    >
    >Does anyone know good approximations (lower and/or upper bounds) to edit
    >distance? (by using some statistical numbers that can be found by
    >preprocessing of the strings)
    >
    >In the preprocess time we can transform the strings to a bunch of numbers
    >(e.g., multi-dimensional vectors); and then use these vectors to
    >approximate the edit distance between strings.
    >
    >I found a paper by Hadlock, F. (1988), proposing a "lower bound" by using
    >frequencies of the letters in the string. Assuming that the alphabet is
    >same for all strings, all frequency vectors will have same number of
    >dimensions. And he defines a distance metric over these vectors, so that
    >this distance (in the vector space) is a lower-bound to the actual edit
    >distance.
    >
    >Do you know any other method that can achieve a similar goal?
    >
    >Thanks for your attention,
    >
    >CR
    >
    >_________________________________________________________________
    >Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp
    >



    This archive was generated by hypermail 2b29 : Mon Dec 03 2001 - 11:16:27 MET