Re: Corpora: approximations (bounds) for edit distance

From: Bruce L. Lambert, Ph.D. (lambertb@uic.edu)
Date: Fri Nov 30 2001 - 19:42:54 MET

  • Next message: Gregory Aist: "Re: Corpora: approximations (bounds) for edit distance"

    Maybe I'm missing something, but the upper bound on edit distance between
    two strings is always the length of the longer string, and the lower bound
    is always zero (when the strings are identical).

    -bruce

    At 06:43 PM 11/29/01 +0000, Computer Researcher wrote:
    >Hi,
    >
    >Does anyone know good approximations (lower and/or upper bounds) to edit
    >distance? (by using some statistical numbers that can be found by
    >preprocessing of the strings)
    >
    >In the preprocess time we can transform the strings to a bunch of numbers
    >(e.g., multi-dimensional vectors); and then use these vectors to
    >approximate the edit distance between strings.
    >
    >I found a paper by Hadlock, F. (1988), proposing a "lower bound" by using
    >frequencies of the letters in the string. Assuming that the alphabet is
    >same for all strings, all frequency vectors will have same number of
    >dimensions. And he defines a distance metric over these vectors, so that
    >this distance (in the vector space) is a lower-bound to the actual edit
    >distance.
    >
    >Do you know any other method that can achieve a similar goal?
    >
    >Thanks for your attention,
    >
    >CR
    >
    >_________________________________________________________________
    >Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp
    >



    This archive was generated by hypermail 2b29 : Fri Nov 30 2001 - 19:55:28 MET