Over a year ago I got the impulse to start a blog. As it turned out, it was truly just an impulse because I made only a few posts before I abandoned the effort. But a couple of my posts, which concerned the Levenshtein distance algorithm, got an interesting number of hits. For those who don’t know, the Levenshtein distance algorithm finds the number of edits (i.e., deletions, insertions, and substitutions) needed to turn one string into another. It’s used, for example, in spell checkers and index searching to find near matches. My previous posts were interesting because I listed implementations of the algorithm in Java and Fortran, and the Fortran post got twice the hits of the Java post. I definitely was not expecting that level of interest in what many dismiss as a dead language!
Anyway, below are both implementations for those who care.
Fortran
integer function distance (a,b) implicit none ! define variables integer :: len_a, len_b, i, j, cost character(len=*), intent(in) :: a, b ! matrix for calculating levenshtein distance integer, dimension(0:len_trim(a), 0:len_trim(b)) :: leven_mat integer, dimension(3) :: three_vals len_a = len_trim(a) len_b = len_trim(b) do i = 0,len_a leven_mat(i,0) = i end do do j = 0, len_b leven_mat(0,j) = j end do do i = 1, len_a do j = 1, len_b if (a(i:i) == b(j:j)) then cost = 0 else cost = 1 end if three_vals(1) = leven_mat(i-1,j) + 1 three_vals(2) = leven_mat(i, j-1) + 1 three_vals(3) = leven_mat(i-1,j-1) + cost leven_mat(i,j) = minval(three_vals) end do end do distance = int(leven_mat(len_a,len_b)) end function distance
Java
public class Levenshtein { public static int distance(String wordA, String wordB){ int cost, i, j, min_val; int[][] leven_matrix = new int[wordA.length() + 1][wordB.length() + 1]; for (i = 0; i <= wordA.length(); i++){ leven_matrix[i][0] = i; } for (j = 0; j <= wordB.length(); j++){ leven_matrix[0][j] = j; } for (i = 1; i <= wordA.length(); i++){ for (j = 1; j <= wordB.length(); j++){ if (wordA.charAt(i-1) == wordB.charAt(j-1)) cost = 0; else cost = 1; min_val = leven_matrix[i-1][j] + 1; if (min_val > leven_matrix[i][j-1] + 1) min_val = leven_matrix[i][j-1] + 1; if (min_val > leven_matrix[i-1][j-1] + cost) min_val = leven_matrix[i-1][j-1] + cost; leven_matrix[i][j] = min_val; } } return leven_matrix[wordA.length()][wordB.length()]; } }