Edit Distance

Interview Prep

HardDynamic Programmingdpstrings

The problem

Given two strings a and b, return the minimum number of single-character edits — insert, delete, replace — required to turn a into b. Also called Levenshtein distance.

Pattern: 2D DP over prefix lengths

Let dp[i][j] be the edit distance between the first i characters of a and the first j characters of b. The recurrence is the whole problem.

If a[i-1] == b[j-1]:                  dp[i][j] = dp[i-1][j-1]
Else:                                  dp[i][j] = 1 + min(
                                          dp[i-1][j],     # delete a[i-1]
                                          dp[i][j-1],     # insert b[j-1]
                                          dp[i-1][j-1],   # replace
                                       )

Boundaries:                            dp[0][j] = j
                                       dp[i][0] = i

Each cell looks at its top, left, and top-left neighbors. A char match is free; a mismatch is the cheapest of the three operations, plus 1.

Solution — full 2D table

def min_distance(a: str, b: str) -> int:
    n, m = len(a), len(b)
    dp = [[0] * (m + 1) for _ in range(n + 1)]

    for i in range(n + 1):
        dp[i][0] = i        # deleting all of a[:i]
    for j in range(m + 1):
        dp[0][j] = j        # inserting all of b[:j]

    for i in range(1, n + 1):
        for j in range(1, m + 1):
            if a[i - 1] == b[j - 1]:
                dp[i][j] = dp[i - 1][j - 1]
            else:
                dp[i][j] = 1 + min(
                    dp[i - 1][j],       # delete a[i-1]
                    dp[i][j - 1],       # insert b[j-1]
                    dp[i - 1][j - 1],   # replace a[i-1] with b[j-1]
                )
    return dp[n][m]

O(n · m) time, O(n · m) space. The time is unavoidable for this problem. The space is wasteful — each row only needs the row above.

Solution — O(min(n, m)) space

def min_distance(a: str, b: str) -> int:
    if len(a) < len(b):
        a, b = b, a                     # ensure |b| <= |a| → smaller row
    n, m = len(a), len(b)

    prev = list(range(m + 1))           # i = 0 row
    cur = [0] * (m + 1)

    for i in range(1, n + 1):
        cur[0] = i
        for j in range(1, m + 1):
            if a[i - 1] == b[j - 1]:
                cur[j] = prev[j - 1]
            else:
                cur[j] = 1 + min(prev[j], cur[j - 1], prev[j - 1])
        prev, cur = cur, prev
    return prev[m]

Same O(n · m) time, but only two rows in memory. Always make b the shorter string so the row length is the smaller of the two. The trick is the rolling pair of arrays — we never need rows older than one back.

Walkthrough

a = "horse", b = "ros"

       ""  r  o  s
   ""   0  1  2  3
    h   1  1  2  3
    o   2  2  1  2
    r   3  2  2  2
    s   4  3  3  2
    e   5  4  4  3   ← dp[5][3] = 3, the answer.

Read the table row by row. The bottom-right cell is the answer. Three edits suffice: horserorse (replace h with r) → rose (delete r at index 1) → ros (delete e). The DP doesn't tell you which edits; backtracking through the table reconstructs them if you need to.

Edge cases

Related