Edit Distance
Interview Prep
The problem
Given two strings a and b, return the
minimum number of single-character edits — insert, delete, replace —
required to turn a into b. Also called
Levenshtein distance.
Pattern: 2D DP over prefix lengths
Let dp[i][j] be the edit distance between the first
i characters of a and the first
j characters of b. The recurrence is the
whole problem.
If a[i-1] == b[j-1]: dp[i][j] = dp[i-1][j-1]
Else: dp[i][j] = 1 + min(
dp[i-1][j], # delete a[i-1]
dp[i][j-1], # insert b[j-1]
dp[i-1][j-1], # replace
)
Boundaries: dp[0][j] = j
dp[i][0] = i Each cell looks at its top, left, and top-left neighbors. A char match is free; a mismatch is the cheapest of the three operations, plus 1.
Solution — full 2D table
def min_distance(a: str, b: str) -> int:
n, m = len(a), len(b)
dp = [[0] * (m + 1) for _ in range(n + 1)]
for i in range(n + 1):
dp[i][0] = i # deleting all of a[:i]
for j in range(m + 1):
dp[0][j] = j # inserting all of b[:j]
for i in range(1, n + 1):
for j in range(1, m + 1):
if a[i - 1] == b[j - 1]:
dp[i][j] = dp[i - 1][j - 1]
else:
dp[i][j] = 1 + min(
dp[i - 1][j], # delete a[i-1]
dp[i][j - 1], # insert b[j-1]
dp[i - 1][j - 1], # replace a[i-1] with b[j-1]
)
return dp[n][m] O(n · m) time, O(n · m) space. The time is unavoidable for this problem. The space is wasteful — each row only needs the row above.
Solution — O(min(n, m)) space
def min_distance(a: str, b: str) -> int:
if len(a) < len(b):
a, b = b, a # ensure |b| <= |a| → smaller row
n, m = len(a), len(b)
prev = list(range(m + 1)) # i = 0 row
cur = [0] * (m + 1)
for i in range(1, n + 1):
cur[0] = i
for j in range(1, m + 1):
if a[i - 1] == b[j - 1]:
cur[j] = prev[j - 1]
else:
cur[j] = 1 + min(prev[j], cur[j - 1], prev[j - 1])
prev, cur = cur, prev
return prev[m]
Same O(n · m) time, but only two rows in memory. Always make
b the shorter string so the row length is the smaller
of the two. The trick is the rolling pair of arrays — we never need
rows older than one back.
Walkthrough
a = "horse", b = "ros"
"" r o s
"" 0 1 2 3
h 1 1 2 3
o 2 2 1 2
r 3 2 2 2
s 4 3 3 2
e 5 4 4 3 ← dp[5][3] = 3, the answer.
Read the table row by row. The bottom-right cell is the answer.
Three edits suffice: horse → rorse
(replace h with r) → rose (delete r at index 1) →
ros (delete e). The DP doesn't tell you which edits;
backtracking through the table reconstructs them if you need to.
Edge cases
- Either string empty. Distance is the length of the other.
- Identical strings. Distance is 0.
- Wildly different lengths. Distance is at least
|n - m|— every length difference is a forced insert or delete.
Related
- Longest Common Subsequence. Same 2D structure; the recurrence is +1 on match, max of two neighbors otherwise.
- One Edit Distance. "Is the distance exactly 1?" Solvable in O(n) without DP — case analysis on length difference.
- Minimum ASCII Delete Sum. Variant where the cost is character codepoint rather than 1.