Re: Corpora: approximations (bounds) for edit distance

From: COMP staff (csrluk@comp.polyu.edu.hk)
Date: Sat Dec 01 2001 - 11:48:49 MET

Next message: Computer Researcher: "Re: Corpora: approximations (bounds) for edit distance"

Previous message: Bruce L. Lambert: "Re: Corpora: approximations (bounds) for edit distance"
Maybe in reply to: Computer Researcher: "Corpora: approximations (bounds) for edit distance"
Next in thread: Computer Researcher: "Re: Corpora: approximations (bounds) for edit distance"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Bruce,

> I agree. I was describing the simplest possible edit distance algorithm,
> with mismatch costs equal to 1 and correct substitutions equal to 0. Edit
> distance gets interesting and powerful when one uses a matrix of
> substitution costs, one cost for each possible
> substitution/insertion/deletion in an alphabet, and when the costs reflect
> some inverse function of similarity or confusability "in the real world".
>
> The real problem with edit distance is that is slow. What approaches people
> are using to speed up edit distance calculations? One heuristic approach
> that has been tried is to first use a fast approach like ngram similarity,
> then only compute edit distance on pairs whose ngram similarities exceed
> some threshold. Are there other methods?

There were some research in fast Viterbi, A* algorithms, Dijsktra in the speech
processing community (around early 1990s in IEEE ICASSP), which they
are relevant. Some kind of prunning would help if string lengths are
very long. However, my knowledge is limited to here. My experience with approximate
matching words is not that slow. If you want to do approximate match of a set
of strings with 1 string, obviously a different algo should be used.

Best,

Robert Luk

Next message: Computer Researcher: "Re: Corpora: approximations (bounds) for edit distance"
Previous message: Bruce L. Lambert: "Re: Corpora: approximations (bounds) for edit distance"
Maybe in reply to: Computer Researcher: "Corpora: approximations (bounds) for edit distance"
Next in thread: Computer Researcher: "Re: Corpora: approximations (bounds) for edit distance"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Sat Dec 01 2001 - 19:14:07 MET