Skip to content

Instantly share code, notes, and snippets.

@Vicfred
Created April 19, 2025 17:48
Show Gist options
  • Save Vicfred/ff110fc1e681d50c910c853e758a4c85 to your computer and use it in GitHub Desktop.
Save Vicfred/ff110fc1e681d50c910c853e758a4c85 to your computer and use it in GitHub Desktop.
levenshtein

Levenshtein Edit Distance – Dynamic Programming Walk‑through

1. State definition

dp[i][j] = minimum edits needed to change
           the first i characters of A (A[0..i‑1])
           into the first j characters of B (B[0..j‑1])

i ranges 0‥|A|, j ranges 0‥|B|. The answer is dp[|A|][|B|].


2. Base cases

  • Empty prefixes
    • dp[0][j] = j — need j insertions to build B[0..j‑1] from an empty string.
    • dp[i][0] = i — need i deletions to erase A[0..i‑1] to get the empty string.

3. Transition / recurrence

For every i ≥ 1, j ≥ 1:

if A[i‑1] == B[j‑1]:          # current characters match
    dp[i][j] = dp[i‑1][j‑1]   #   no new edit needed
else:                         # characters differ
    dp[i][j] = 1 + min(
                  dp[i‑1][j],   # delete A[i‑1]
                  dp[i][j‑1],   # insert B[j‑1]
                  dp[i‑1][j‑1]  # substitute A[i‑1] → B[j‑1]
                )

The +1 counts the chosen edit on top of the best smaller sub‑problem.


4. Algorithm skeleton (pseudocode)

function editDistance(A, B):
    n ← |A|, m ← |B|
    allocate (n+1) × (m+1) array dp

    for i = 0 .. n: dp[i][0] ← i
    for j = 0 .. m: dp[0][j] ← j

    for i = 1 .. n:
        for j = 1 .. m:
            if A[i-1] == B[j-1]:
                dp[i][j] ← dp[i-1][j-1]
            else:
                dp[i][j] ← 1 + min(dp[i-1][j],     # deletion
                                   dp[i][j-1],     # insertion
                                   dp[i-1][j-1])   # substitution
    return dp[n][m]

Time: O(n × m)
Space: O(n × m) (often reduced to O(min(n,m)) by keeping only two rows).


5. Small worked example

Distance between A = "kitten" and B = "sitting":

ε s i t t i n g
ε 0 1 2 3 4 5 6 7
k 1 1 2 3 4 5 6 7
i 2 2 1 2 3 4 5 6
t 3 3 2 1 2 3 4 5
t 4 4 3 2 1 2 3 4
e 5 5 4 3 2 2 3 4
n 6 6 5 4 3 3 2 3

Bottom‑right dp[6][7] = 3, so 3 edits:

kitten → sitten (substitute k→s)
sitten → sittin (substitute e→i)
sittin → sitting (insert g)


6. Why this is dynamic programming

  • Optimal sub‑structure: aligning full prefixes uses optimal alignments of smaller prefixes.
  • Overlapping sub‑problems: many (i,j) pairs repeat; caching them (the table) avoids recomputation.

7. Common refinements

Goal Idea
Save memory Keep only two rows (previous, current).
Trace edit operations Store chosen op during filling or back‑track from dp[n][m].
Weighted edits Replace the +1 with different costs.
Damerau distance Add transpose case using dp[i‑2][j‑2] + 1.

Next steps

  • Show two‑row optimisation code in C++, Python, or Haskell.
  • Explain bit‑parallel DP for fixed alphabets.

Just ask if you’d like any of these!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment