Word Break

Interview Prep

StandardDynamic Programmingdpstrings

The problem

Given a string s and a dictionary of words, decide whether s can be segmented into a sequence of one or more dictionary words (each word may be reused). Return a boolean.

Why brute force explodes

def word_break_brute(s: str, words: set[str]) -> bool:
    """Try every split. Exponential in the worst case."""
    if not s: return True
    for i in range(1, len(s) + 1):
        if s[:i] in words and word_break_brute(s[i:], words):
            return True
    return False

The recursion has exponentially many overlapping subproblems — "aaaaaab" with dict aaa is the classic pathological case. Memoising on the starting position immediately collapses it to O(n²) (or O(n · max_word_len) with the optimization below).

Pattern: prefix DP on segmentability

Let dp[i] = True iff s[:i] can be segmented. Base case dp[0] = True. Recurrence: dp[i] = True iff there exists j < i with dp[j] = True and s[j:i] in the dictionary. Answer is dp[n].

Optimization: the inner j loop only has to look back as far as the longest word in the dictionary. That changes the overall complexity from O(n² · L) (with string slice cost) to O(n · max_len · L), which is much tighter when the dictionary words are short relative to the input.

Memoized recursion

def word_break_memo(s: str, words: set[str]) -> bool:
    """Same recursion, memoized by tail length."""
    cache = {}
    def helper(i):
        if i == len(s): return True
        if i in cache: return cache[i]
        for j in range(i + 1, len(s) + 1):
            if s[i:j] in words and helper(j):
                cache[i] = True
                return True
        cache[i] = False
        return False
    return helper(0)

Bottom-up DP

def word_break(s: str, words: list[str]) -> bool:
    """Bottom-up DP. dp[i] = True iff s[:i] is segmentable."""
    w = set(words)
    max_len = max(map(len, words)) if words else 0
    n = len(s)
    dp = [False] * (n + 1)
    dp[0] = True
    for i in range(1, n + 1):
        for j in range(max(0, i - max_len), i):
            if dp[j] and s[j:i] in w:
                dp[i] = True
                break
    return dp[n]

Trace

s = "leetcode", words = {"leet", "code"}
max_len = 4

dp[0] = True

i=1, j in [0..0]: s[0:1]="l"   not in words. dp[1]=False.
i=2, j in [0..1]: s[0:2]="le", s[1:2]="e"   neither in words. dp[2]=False.
i=3, j in [0..2]: "lee","ee","e" none in. dp[3]=False.
i=4, j in [0..3]: s[0:4]="leet" IN! dp[0]=True -> dp[4]=True. break.
i=5, j in [1..4]: s[1:5]="eetc"... none in. dp[5]=False.
i=6, j in [2..5]: none in.  dp[6]=False.
i=7, j in [3..6]: "tcod","code","ode","de" — wait, "code" is s[4:8] not in [3..6].
  j=4: s[4:7]="cod"   not in words.
  Others fail. dp[7]=False.
i=8, j in [4..7]: s[4:8]="code" IN! dp[4]=True -> dp[8]=True.

return dp[8] = True

Complexity

Variations worth knowing