Word Ladder
Interview Prep
The problem
Given two words begin and end of equal
length, and a word list, find the length of the shortest
transformation sequence from begin to end.
Each step changes exactly one letter, and every intermediate word
must appear in the word list (begin doesn't need to).
Length is the number of words including endpoints; return 0 if
impossible.
Pattern: BFS on an implicit graph
Vertices are words. Edges connect words that differ in exactly one
letter. The shortest path from begin to end
in this graph is the answer. BFS finds shortest paths in
unweighted graphs in O(V + E) time.
The trick is not "should I use BFS?" — that's obvious. The trick
is making neighbor lookup fast. Naive "compare every pair of
words" is O(W² · L) where W is the word
list size and L is the word length. The
pattern-bucketing trick brings it down to O(W · L²).
Naive: O(W²·L) neighbor lookup
For each word popped, scan all other words and check if they differ by one letter. Correct but quadratic in the dictionary size — TLE on the LeetCode version where W can be ~5000.
from collections import deque
def ladder_length_naive(begin: str, end: str, word_list: list[str]) -> int:
words = set(word_list)
if end not in words:
return 0
q = deque([(begin, 1)])
visited = {begin}
while q:
word, depth = q.popleft()
if word == end:
return depth
for other in words:
if other not in visited and differs_by_one(word, other):
visited.add(other)
q.append((other, depth + 1))
return 0
def differs_by_one(a: str, b: str) -> bool:
return sum(x != y for x, y in zip(a, b)) == 1 Optimal: pattern bucketing
Pre-compute a hash from each "with one letter replaced by *"
pattern to the words matching it. A word has L
patterns; two words are neighbors iff they share at least one
pattern. Now neighbor lookup is O(L) per word: try
each L patterns, read off the bucket.
Total work: each word generates L patterns; each
pattern has on average ~few words; BFS visits each word once and
each edge once. Final complexity: O(W · L² + E) where
E is total adjacency size.
from collections import deque, defaultdict
def ladder_length(begin: str, end: str, word_list: list[str]) -> int:
words = set(word_list)
if end not in words:
return 0
# Build pattern → words index. Each word w generates len(w) patterns
# of the form w[:i] + '*' + w[i+1:]. Two words are neighbors iff they
# share at least one such pattern.
L = len(begin)
buckets: dict[str, list[str]] = defaultdict(list)
for w in words | {begin}:
for i in range(L):
buckets[w[:i] + '*' + w[i+1:]].append(w)
q = deque([(begin, 1)])
visited = {begin}
while q:
word, depth = q.popleft()
if word == end:
return depth
for i in range(L):
pat = word[:i] + '*' + word[i+1:]
for other in buckets[pat]:
if other not in visited:
visited.add(other)
q.append((other, depth + 1))
buckets[pat] = [] # avoid re-expanding from this pattern
return 0 Trace
begin = "hit", end = "cog"
word_list = ["hot", "dot", "dog", "lot", "log", "cog"]
Pattern buckets (some):
"*it": ["hit"]
"h*t": ["hit", "hot"]
"hi*": ["hit"]
"*ot": ["hot", "dot", "lot"]
"h*t": ["hit", "hot"]
"ho*": ["hot"]
...
BFS from "hit":
depth=1: hit
depth=2: hot (via "h*t" bucket; hit's only neighbor)
depth=3: dot, lot (both share "*ot" with hot)
depth=4: dog, log (dog via "do*" with dot; log via "lo*" with lot)
depth=5: cog (via "*og" with dog and log)
return 5 Complexity
- Preprocessing:
O(W · L²)— each ofWwords buildsLpatterns, each of lengthL. - BFS:
O(W · L²)— each pattern is scanned at most once, total work bounded by total bucket size. - Space:
O(W · L)for the pattern buckets.
Variations worth knowing
- Word Ladder II: return all shortest
paths, not just the length. Standard BFS won't suffice — you
need to track multiple parents per node, then
backtrack from
end. Watch out for visited-set semantics: a node can be at the shortest distance via multiple paths; mark visited per level rather than globally. - Bidirectional BFS: run BFS from both
beginandendsimultaneously, meeting in the middle. Reduces work fromO(b^d)toO(2 · b^(d/2))— the square root of the search space. - Weighted variant: if letter changes have costs (Levenshtein-with-weights), this becomes Dijkstra rather than BFS. Same pattern-bucket idea applies.