Word Ladder

Interview Prep

HardGraphsgraphsbfsshortest-path

The problem

Given two words begin and end of equal length, and a word list, find the length of the shortest transformation sequence from begin to end. Each step changes exactly one letter, and every intermediate word must appear in the word list (begin doesn't need to). Length is the number of words including endpoints; return 0 if impossible.

Pattern: BFS on an implicit graph

Vertices are words. Edges connect words that differ in exactly one letter. The shortest path from begin to end in this graph is the answer. BFS finds shortest paths in unweighted graphs in O(V + E) time.

The trick is not "should I use BFS?" — that's obvious. The trick is making neighbor lookup fast. Naive "compare every pair of words" is O(W² · L) where W is the word list size and L is the word length. The pattern-bucketing trick brings it down to O(W · L²).

Naive: O(W²·L) neighbor lookup

For each word popped, scan all other words and check if they differ by one letter. Correct but quadratic in the dictionary size — TLE on the LeetCode version where W can be ~5000.

from collections import deque

def ladder_length_naive(begin: str, end: str, word_list: list[str]) -> int:
    words = set(word_list)
    if end not in words:
        return 0

    q = deque([(begin, 1)])
    visited = {begin}
    while q:
        word, depth = q.popleft()
        if word == end:
            return depth
        for other in words:
            if other not in visited and differs_by_one(word, other):
                visited.add(other)
                q.append((other, depth + 1))
    return 0

def differs_by_one(a: str, b: str) -> bool:
    return sum(x != y for x, y in zip(a, b)) == 1

Optimal: pattern bucketing

Pre-compute a hash from each "with one letter replaced by *" pattern to the words matching it. A word has L patterns; two words are neighbors iff they share at least one pattern. Now neighbor lookup is O(L) per word: try each L patterns, read off the bucket.

Total work: each word generates L patterns; each pattern has on average ~few words; BFS visits each word once and each edge once. Final complexity: O(W · L² + E) where E is total adjacency size.

from collections import deque, defaultdict

def ladder_length(begin: str, end: str, word_list: list[str]) -> int:
    words = set(word_list)
    if end not in words:
        return 0

    # Build pattern → words index. Each word w generates len(w) patterns
    # of the form w[:i] + '*' + w[i+1:]. Two words are neighbors iff they
    # share at least one such pattern.
    L = len(begin)
    buckets: dict[str, list[str]] = defaultdict(list)
    for w in words | {begin}:
        for i in range(L):
            buckets[w[:i] + '*' + w[i+1:]].append(w)

    q = deque([(begin, 1)])
    visited = {begin}
    while q:
        word, depth = q.popleft()
        if word == end:
            return depth
        for i in range(L):
            pat = word[:i] + '*' + word[i+1:]
            for other in buckets[pat]:
                if other not in visited:
                    visited.add(other)
                    q.append((other, depth + 1))
            buckets[pat] = []      # avoid re-expanding from this pattern
    return 0

Trace

begin = "hit", end = "cog"
word_list = ["hot", "dot", "dog", "lot", "log", "cog"]

Pattern buckets (some):
  "*it":  ["hit"]
  "h*t":  ["hit", "hot"]
  "hi*":  ["hit"]
  "*ot":  ["hot", "dot", "lot"]
  "h*t":  ["hit", "hot"]
  "ho*":  ["hot"]
  ...

BFS from "hit":
  depth=1: hit
  depth=2: hot                    (via "h*t" bucket; hit's only neighbor)
  depth=3: dot, lot               (both share "*ot" with hot)
  depth=4: dog, log               (dog via "do*" with dot; log via "lo*" with lot)
  depth=5: cog                    (via "*og" with dog and log)

return 5

Complexity

Preprocessing: O(W · L²) — each of W words builds L patterns, each of length L.
BFS: O(W · L²) — each pattern is scanned at most once, total work bounded by total bucket size.
Space: O(W · L) for the pattern buckets.

Variations worth knowing

Word Ladder II: return all shortest paths, not just the length. Standard BFS won't suffice — you need to track multiple parents per node, then backtrack from end. Watch out for visited-set semantics: a node can be at the shortest distance via multiple paths; mark visited per level rather than globally.
Bidirectional BFS: run BFS from both begin and end simultaneously, meeting in the middle. Reduces work from O(b^d) to O(2 · b^(d/2)) — the square root of the search space.
Weighted variant: if letter changes have costs (Levenshtein-with-weights), this becomes Dijkstra rather than BFS. Same pattern-bucket idea applies.