Regular Expression Matching

Interview Prep

LegendaryDynamic Programmingdpstrings

The problem

Implement regex matching with two metacharacters:

The match must cover the entire string (not just a substring). The classic ambiguity: "a*" means "zero or more a's", treated as a single pattern unit.

Why this is famously tricky

The * isn't a postfix on the preceding character in isolation — it's a quantifier that creates a variable-length match. The recursion has to decide, at each step, whether to "consume zero copies of the starred pattern" (move past it in the pattern) or "consume one copy" (move forward in the string but keep the same pattern position). Almost everyone gets one of those two branches wrong on the first try.

Recursive solution (for intuition)

def is_match_recursive(s: str, p: str) -> bool:
    if not p:
        return not s
    first = bool(s) and (p[0] == s[0] or p[0] == '.')
    if len(p) >= 2 and p[1] == '*':
        # Either consume 0 chars of '<char>*' and skip it, or consume 1 char if first matches
        return is_match_recursive(s, p[2:]) or (first and is_match_recursive(s[1:], p))
    return first and is_match_recursive(s[1:], p[1:])

Correct, exponential time on pathological inputs (the classic "aaaa...a", "a*a*a*...a*" blowup). Memoising on the pair (i, j) of suffix indices is the standard fix — same recurrence, polynomial time.

Pattern: 2D DP on prefix lengths

Define dp[i][j] = does s[:i] fully match p[:j]? Three cases for each cell:

Edge case the unwary trip on: an empty string can match patterns like "a*b*c*". Initialize dp[0][j] accordingly by propagating along even-indexed '*' columns.

Bottom-up DP

def is_match(s: str, p: str) -> bool:
    """dp[i][j] = does s[:i] match p[:j]?  O(m*n)."""
    m, n = len(s), len(p)
    dp = [[False] * (n + 1) for _ in range(m + 1)]
    dp[0][0] = True

    # Handle empty s vs patterns like a*, a*b*, a*b*c*
    for j in range(1, n + 1):
        if p[j - 1] == '*':
            dp[0][j] = dp[0][j - 2]

    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if p[j - 1] == '*':
                # zero occurrences of <char>*
                zero = dp[i][j - 2]
                # one or more: only if the preceding pattern char matches s[i-1]
                prev = p[j - 2]
                one_or_more = dp[i - 1][j] and (prev == s[i - 1] or prev == '.')
                dp[i][j] = zero or one_or_more
            elif p[j - 1] == '.' or p[j - 1] == s[i - 1]:
                dp[i][j] = dp[i - 1][j - 1]
            # else: stays False
    return dp[m][n]

Trace

s = "aab", p = "c*a*b"

dp is (4 x 6).  rows = s prefix length 0..3, cols = p prefix length 0..5.

Initialization:
  dp[0][0] = True.
  dp[0][2] = dp[0][0] = True  (c* matches empty)
  dp[0][4] = dp[0][2] = True  (c*a* matches empty)
  dp[0][1] = dp[0][3] = dp[0][5] = False

Filling row i=1 (s="a"):
  j=1 p="c"  : 'c' != 'a' -> False
  j=2 p="c*" : zero=dp[1][0]=False, one+ requires prev='c' to match 'a' -> False
  j=3 p="c*a": p[2]='a' matches s[0]='a' -> dp[1][3] = dp[0][2] = True
  j=4 p="c*a*": zero=dp[1][2]=False, one+: prev='a' matches 'a', dp[0][4]=True -> True
  j=5 p="c*a*b": p[4]='b' vs s[0]='a' -> False

Filling row i=2 (s="aa"):
  ... eventually dp[2][4] = True (c*a* matches "aa")
  dp[2][5] False (no 'b' yet)

Filling row i=3 (s="aab"):
  j=5 p="c*a*b": p[4]='b' matches s[2]='b' -> dp[3][5] = dp[2][4] = True.

return dp[3][5] = True   ✓

Complexity

Variations worth knowing