Created
April 19, 2025 17:48
-
-
Save Vicfred/ff110fc1e681d50c910c853e758a4c85 to your computer and use it in GitHub Desktop.
Revisions
-
Vicfred created this gist
Apr 19, 2025 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,113 @@ # Levenshtein Edit Distance – Dynamic Programming Walk‑through ## 1. State definition ```cpp dp[i][j] = minimum edits needed to change the first i characters of A (A[0..i‑1]) into the first j characters of B (B[0..j‑1]) ``` `i` ranges 0‥|A|, `j` ranges 0‥|B|. The answer is `dp[|A|][|B|]`. --- ## 2. Base cases * **Empty prefixes** * `dp[0][j] = j` — need `j` insertions to build `B[0..j‑1]` from an empty string. * `dp[i][0] = i` — need `i` deletions to erase `A[0..i‑1]` to get the empty string. --- ## 3. Transition / recurrence For every `i ≥ 1`, `j ≥ 1`: ```cpp if A[i‑1] == B[j‑1]: # current characters match dp[i][j] = dp[i‑1][j‑1] # no new edit needed else: # characters differ dp[i][j] = 1 + min( dp[i‑1][j], # delete A[i‑1] dp[i][j‑1], # insert B[j‑1] dp[i‑1][j‑1] # substitute A[i‑1] → B[j‑1] ) ``` The **+1** counts the chosen edit on top of the best smaller sub‑problem. --- ## 4. Algorithm skeleton (pseudocode) ```cpp function editDistance(A, B): n ← |A|, m ← |B| allocate (n+1) × (m+1) array dp for i = 0 .. n: dp[i][0] ← i for j = 0 .. m: dp[0][j] ← j for i = 1 .. n: for j = 1 .. m: if A[i-1] == B[j-1]: dp[i][j] ← dp[i-1][j-1] else: dp[i][j] ← 1 + min(dp[i-1][j], # deletion dp[i][j-1], # insertion dp[i-1][j-1]) # substitution return dp[n][m] ``` *Time*: **O(n × m)** *Space*: **O(n × m)** (often reduced to **O(min(n,m))** by keeping only two rows). --- ## 5. Small worked example Distance between `A = "kitten"` and `B = "sitting"`: | | ε | s | i | t | t | i | n | g | |------|---|---|---|---|---|---|---|---| | **ε**| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | | **k**| 1 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | | **i**| 2 | 2 | 1 | 2 | 3 | 4 | 5 | 6 | | **t**| 3 | 3 | 2 | 1 | 2 | 3 | 4 | 5 | | **t**| 4 | 4 | 3 | 2 | 1 | 2 | 3 | 4 | | **e**| 5 | 5 | 4 | 3 | 2 | 2 | 3 | 4 | | **n**| 6 | 6 | 5 | 4 | 3 | 3 | 2 | 3 | Bottom‑right `dp[6][7] = 3`, so 3 edits: `kitten → sitten` (substitute **k→s**) `sitten → sittin` (substitute **e→i**) `sittin → sitting` (insert **g**) --- ## 6. Why this is dynamic programming * **Optimal sub‑structure**: aligning full prefixes uses optimal alignments of smaller prefixes. * **Overlapping sub‑problems**: many `(i,j)` pairs repeat; caching them (the table) avoids recomputation. --- ## 7. Common refinements | Goal | Idea | |------|------| | Save memory | Keep only two rows (`previous`, `current`). | | Trace edit operations | Store chosen op during filling or back‑track from `dp[n][m]`. | | Weighted edits | Replace the `+1` with different costs. | | Damerau distance | Add transpose case using `dp[i‑2][j‑2] + 1`. | --- ### Next steps * Show two‑row optimisation code in C++, Python, or Haskell. * Explain bit‑parallel DP for fixed alphabets. Just ask if you’d like any of these!