Skip to content

Instantly share code, notes, and snippets.

@Vicfred
Created April 19, 2025 17:48
Show Gist options
  • Save Vicfred/ff110fc1e681d50c910c853e758a4c85 to your computer and use it in GitHub Desktop.
Save Vicfred/ff110fc1e681d50c910c853e758a4c85 to your computer and use it in GitHub Desktop.

Revisions

  1. Vicfred created this gist Apr 19, 2025.
    113 changes: 113 additions & 0 deletions levenshtein.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,113 @@
    # Levenshtein Edit Distance – Dynamic Programming Walk‑through

    ## 1. State definition

    ```cpp
    dp[i][j] = minimum edits needed to change
    the first i characters of A (A[0..i‑1])
    into the first j characters of B (B[0..j‑1])
    ```
    `i` ranges 0‥|A|, `j` ranges 0‥|B|. The answer is `dp[|A|][|B|]`.
    ---
    ## 2. Base cases
    * **Empty prefixes**
    * `dp[0][j] = j` — need `j` insertions to build `B[0..j‑1]` from an empty string.
    * `dp[i][0] = i` — need `i` deletions to erase `A[0..i‑1]` to get the empty string.
    ---
    ## 3. Transition / recurrence
    For every `i ≥ 1`, `j ≥ 1`:
    ```cpp
    if A[i‑1] == B[j‑1]: # current characters match
    dp[i][j] = dp[i‑1][j‑1] # no new edit needed
    else: # characters differ
    dp[i][j] = 1 + min(
    dp[i‑1][j], # delete A[i‑1]
    dp[i][j‑1], # insert B[j‑1]
    dp[i‑1][j‑1] # substitute A[i‑1] → B[j‑1]
    )
    ```

    The **+1** counts the chosen edit on top of the best smaller sub‑problem.

    ---

    ## 4. Algorithm skeleton (pseudocode)

    ```cpp
    function editDistance(A, B):
    n ← |A|, m ← |B|
    allocate (n+1) × (m+1) array dp

    for i = 0 .. n: dp[i][0] ← i
    for j = 0 .. m: dp[0][j] ← j

    for i = 1 .. n:
    for j = 1 .. m:
    if A[i-1] == B[j-1]:
    dp[i][j] ← dp[i-1][j-1]
    else:
    dp[i][j] ← 1 + min(dp[i-1][j], # deletion
    dp[i][j-1], # insertion
    dp[i-1][j-1]) # substitution
    return dp[n][m]
    ```
    *Time*: **O(n × m)**
    *Space*: **O(n × m)** (often reduced to **O(min(n,m))** by keeping only two rows).
    ---
    ## 5. Small worked example
    Distance between `A = "kitten"` and `B = "sitting"`:
    | | ε | s | i | t | t | i | n | g |
    |------|---|---|---|---|---|---|---|---|
    | **ε**| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
    | **k**| 1 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
    | **i**| 2 | 2 | 1 | 2 | 3 | 4 | 5 | 6 |
    | **t**| 3 | 3 | 2 | 1 | 2 | 3 | 4 | 5 |
    | **t**| 4 | 4 | 3 | 2 | 1 | 2 | 3 | 4 |
    | **e**| 5 | 5 | 4 | 3 | 2 | 2 | 3 | 4 |
    | **n**| 6 | 6 | 5 | 4 | 3 | 3 | 2 | 3 |
    Bottom‑right `dp[6][7] = 3`, so 3 edits:
    `kitten → sitten` (substitute **k→s**)
    `sitten → sittin` (substitute **e→i**)
    `sittin → sitting` (insert **g**)
    ---
    ## 6. Why this is dynamic programming
    * **Optimal sub‑structure**: aligning full prefixes uses optimal alignments of smaller prefixes.
    * **Overlapping sub‑problems**: many `(i,j)` pairs repeat; caching them (the table) avoids recomputation.
    ---
    ## 7. Common refinements
    | Goal | Idea |
    |------|------|
    | Save memory | Keep only two rows (`previous`, `current`). |
    | Trace edit operations | Store chosen op during filling or back‑track from `dp[n][m]`. |
    | Weighted edits | Replace the `+1` with different costs. |
    | Damerau distance | Add transpose case using `dp[i‑2][j‑2] + 1`. |
    ---
    ### Next steps
    * Show two‑row optimisation code in C++, Python, or Haskell.
    * Explain bit‑parallel DP for fixed alphabets.
    Just ask if you’d like any of these!