Edit Distance

Interview Prep

HardDynamic Programmingdpstrings

The problem

Given two strings a and b, return the minimum number of single-character edits — insert, delete, replace — required to turn a into b. Also called Levenshtein distance.

Pattern: 2D DP over prefix lengths

Let dp[i][j] be the edit distance between the first i characters of a and the first j characters of b. The recurrence is the whole problem.

If a[i-1] == b[j-1]:                  dp[i][j] = dp[i-1][j-1]
Else:                                  dp[i][j] = 1 + min(
                                          dp[i-1][j],     # delete a[i-1]
                                          dp[i][j-1],     # insert b[j-1]
                                          dp[i-1][j-1],   # replace
                                       )

Boundaries:                            dp[0][j] = j
                                       dp[i][0] = i

Each cell looks at its top, left, and top-left neighbors. A char match is free; a mismatch is the cheapest of the three operations, plus 1.

Solution — full 2D table

def min_distance(a: str, b: str) -> int:
    n, m = len(a), len(b)
    dp = [[0] * (m + 1) for _ in range(n + 1)]

    for i in range(n + 1):
        dp[i][0] = i        # deleting all of a[:i]
    for j in range(m + 1):
        dp[0][j] = j        # inserting all of b[:j]

    for i in range(1, n + 1):
        for j in range(1, m + 1):
            if a[i - 1] == b[j - 1]:
                dp[i][j] = dp[i - 1][j - 1]
            else:
                dp[i][j] = 1 + min(
                    dp[i - 1][j],       # delete a[i-1]
                    dp[i][j - 1],       # insert b[j-1]
                    dp[i - 1][j - 1],   # replace a[i-1] with b[j-1]
                )
    return dp[n][m]

O(n · m) time, O(n · m) space. The time is unavoidable for this problem. The space is wasteful — each row only needs the row above.

Solution — O(min(n, m)) space

def min_distance(a: str, b: str) -> int:
    if len(a) < len(b):
        a, b = b, a                     # ensure |b| <= |a| → smaller row
    n, m = len(a), len(b)

    prev = list(range(m + 1))           # i = 0 row
    cur = [0] * (m + 1)

    for i in range(1, n + 1):
        cur[0] = i
        for j in range(1, m + 1):
            if a[i - 1] == b[j - 1]:
                cur[j] = prev[j - 1]
            else:
                cur[j] = 1 + min(prev[j], cur[j - 1], prev[j - 1])
        prev, cur = cur, prev
    return prev[m]

Same O(n · m) time, but only two rows in memory. Always make b the shorter string so the row length is the smaller of the two. The trick is the rolling pair of arrays — we never need rows older than one back.

Walkthrough

a = "horse", b = "ros"

       ""  r  o  s
   ""   0  1  2  3
    h   1  1  2  3
    o   2  2  1  2
    r   3  2  2  2
    s   4  3  3  2
    e   5  4  4  3   ← dp[5][3] = 3, the answer.

Read the table row by row. The bottom-right cell is the answer. Three edits suffice: horse → rorse (replace h with r) → rose (delete r at index 1) → ros (delete e). The DP doesn't tell you which edits; backtracking through the table reconstructs them if you need to.

Edge cases

Either string empty. Distance is the length of the other.
Identical strings. Distance is 0.
Wildly different lengths. Distance is at least |n - m| — every length difference is a forced insert or delete.

Longest Common Subsequence. Same 2D structure; the recurrence is +1 on match, max of two neighbors otherwise.
One Edit Distance. "Is the distance exactly 1?" Solvable in O(n) without DP — case analysis on length difference.
Minimum ASCII Delete Sum. Variant where the cost is character codepoint rather than 1.