Use a heap (array-indexed binary tree with parent ≥ children) to maintain a partial order that gives O(log n) extract-max and O(log n) insert — the building block of priority queues, top-k, median maintenance, k-way merge, and Dijkstra.
1 worked example · 7 practice problems · 2 check problems
Worked example: Sort an array with heap sort
Problem. Sort the array a = [4, 10, 3, 5, 1] in place using heap sort. Show every sift_down call across both phases (build, then extract). Verify the final array is non-decreasing.
Diagnosis. Heap sort has two phases. Build phase: turn the input array into a max-heap by calling sift_down at every internal node from the deepest backward to the root. Extract phase: swap a[0] (the max) with the last position of the heap, shrink the heap by one, and call sift_down at the root to restore the heap property. Repeat until the heap has one element. The array is then sorted in place.
Predict before reading on: how many sift_down calls will be made in total? Split it across the two phases before reading on.
Build phase. With n=5, the deepest internal node is at index ⌊n/2⌋−1=1. Call sift_down at indices 1 and 0 in order.
sift_down(a, 1, 5)
Index 1 holds 10. Children at indices 3 (5) and 4 (1). Largest child = 5. 5 < 10, so no swap. Sift terminates.
a = [4, 10, 3, 5, 1] (unchanged)
sift_down(a, 0, 5)
Index 0 holds 4. Children at 1 (10) and 2 (3). Largest child = 10. Swap.
a = [10, 4, 3, 5, 1]
Continue at index 1 (where 4 lives now). Children at 3 (5) and 4 (1). Largest = 5. Swap.
a = [10, 5, 3, 4, 1]
Continue at index 3. Leaf. Done.
After build phase: a = [10, 5, 3, 4, 1] — a max-heap.
Predict before reading on: looking at the max-heap above, what's the largest element and which index holds it? What does the extract phase do with that fact?
Extract phase. Four iterations (for n−1=4 extractions). Each iteration: swap a[0] with a[end], shrink the heap to a[0..end), then sift_down(a, 0, end).
end = 4
Swap a[0]=10 with a[4]=1. The max is now in its final sorted slot.
a = [1, 5, 3, 4, 10] sorted region: [10]
sift_down(a, 0, 4). Children of 1: a[1]=5, a[2]=3. Largest=5. Swap.
a = [5, 1, 3, 4, 10]
Continue at index 1. Children: a[3]=4. (a[4] is outside the heap.) 4 > 1. Swap.
Swap a[0]=3 with a[1]=1. Heap shrinks to one element. Loop ends.
a = [1, 3, 4, 5, 10] sorted!
Verification. Final array is [1, 3, 4, 5, 10]. Non-decreasing. ✓ Element multiset matches the input multiset (no values were lost or duplicated, since every operation is an in-place swap).
Sift-down call count: 2 in the build phase + 4 in the extract phase = 6 total. (Sanity-check: ⌊n/2⌋+(n−1)=2+4=6 for n=5.)
def sift_down(a, root, end): while True: l, r = 2*root + 1, 2*root + 2 largest = root if l < end and a[l] > a[largest]: largest = l if r < end and a[r] > a[largest]: largest = r if largest == root: return a[root], a[largest] = a[largest], a[root] root = largestdef heap_sort(a): n = len(a) for i in range(n//2 - 1, -1, -1): sift_down(a, i, n) for end in range(n - 1, 0, -1): a[0], a[end] = a[end], a[0] sift_down(a, 0, end) return aprint(heap_sort([4, 10, 3, 5, 1])) # [1, 3, 4, 5, 10]
Articulate: state in one sentence what the heap data structure buys you and why each operation costs O(logn).
Practice problems
Seven problems, seven different application surfaces. Each one uses sift_down, sift_up, extract_max, or insert on a heap — the same building blocks heap sort used. Try each one before revealing.
P.1OS task scheduling — priority queue
An operating system uses a max-heap to pick the next task to run by priority (higher number = more urgent). The following events occur in order:
Find the analogue:
the heap's root holds the max. Which heap primitive corresponds to "pick the next task to run"? Which one corresponds to arrive?
show answer
B (prio 7), then D (prio 5), then A (prio 3).
After arrivals A, B, C, the heap (by priority) has B at the root. First run() = extract_max → B. The heap rebalances with A (prio 3) on top.
Then D (prio 5) arrives — insert bubbles D up to the root since 5 > 3. Heap is now {D, A, C}.
Second run() → D. Heap = {A, C}. Third run() → A.
P.2streaming top-k — min-heap of fixed size
Find the top-3 largest values in the stream below using a min-heap of size 3. Process elements in order. After all elements are processed, what's in the heap?
stream = [5, 1, 9, 2, 8, 4, 7, 3, 6, 0]
Find the analogue:
heap sort uses a MAX-heap because it wants the largest element first. Why does top-k use a MIN-heap instead? What does the root of that min-heap represent in the top-k setting?
show answer
Heap ends holding {7, 8, 9} — the three largest seen.
Fill the heap with the first 3: {5, 1, 9} as a min-heap → root is 1. For each later element x: if x > root, pop root and insert x; otherwise discard. The root always represents "the smallest of the current top-3" — anything smaller than it can never break in.
Trace: see 2 (≤1? no, >1 → swap). 8 (>2 → swap). 4 (>5? no). 7 (>5 → swap). 3, 6, 0 all skip.
Cost: O(nlogk) with constant memory in k, not n.
P.3online median — two-heap technique
For each element of the stream below, report the running median. Use a max-heap (lower half) plus a min-heap (upper half), keeping the two sizes within 1 of each other.
stream = [4, 9, 2, 6, 1]
Find the analogue:
a single heap of size n gives O(logn) for find-max, but O(n) for find-median. How does splitting the data across two heaps make find-median O(1)?
show answer
Running medians: 4, 6.5, 4, 5, 4.
After 4: lo={4}, hi={}. Median = 4.
After 9: lo={4}, hi={9}. Even size → median = (4 + 9) / 2 = 6.5.
After 2: lo={4, 2} (2 ≤ lo.top=4 → lo), hi={9}. Median = lo.top = 4.
After 6: lo={4, 2}, hi={6, 9} (6 > lo.top → hi). Median = (4 + 6) / 2 = 5.
After 1: lo={4, 2, 1}, hi={6, 9}. Median = lo.top = 4.
The split invariant — every value in lo ≤ every value in hi, sizes balanced — makes median = lo.top (odd total) or the mean of the two tops (even total).
P.4k-way merge of sorted streams
Three sorted lists need to be merged into one sorted output:
Use a min-heap of (value, list_id, position) entries. What is the output order, element by element?
Find the analogue:
heap sort holds the whole array's elements in the heap. K-way merge holds only one element per list at any moment. Why is that enough? What property of the input streams justifies the smaller heap?
show answer
Output: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11.
Init heap with each list's head: {(1, L1), (2, L2), (3, L3)}. Extract-min, then push the next element from that list. Repeat. Heap size stays bounded at k=3 regardless of total length.
Justification: each input list is sorted, so the next element from any list is at least as large as the previous one. The minimum overall must be the minimum of the current heads.
Cost: O(Nlogk) for N total elements.
P.5Dijkstra's shortest-path frontier
Run Dijkstra's algorithm from source A on this weighted graph. Use a min-heap of (distance, node) as the frontier.
What is the shortest distance from A to D, and what is the path?
Find the analogue:
Dijkstra repeatedly extracts the closest unprocessed node from the frontier. That's the heap's extract_min. Why does this greedy extraction produce correct shortest paths?
show answer
Distance = 4. Path: A → B → C → D.
Trace:
Frontier {(0, A)}. Pop A. Relax: B → 1, C → 4. Frontier {(1, B), (4, C)}.
Pop B. Relax: C via B → 3 (better than 4), D via B → 6. Frontier {(3, C), (4, C), (6, D)} (the stale (4, C) is harmless — we'll skip it when popped).
Pop (3, C). Relax: D via C → 4 (better than 6). Frontier has (4, C) stale, (4, D), (6, D) stale.
Pop (4, C) — already settled. Skip.
Pop (4, D). D settled at distance 4.
Correctness: when extracted, a node's distance is final because all paths through unprocessed nodes are at least as long (edges are non-negative).
P.6partial-heap repair after a single corruption
You're given a valid max-heap:
a = [10, 8, 9, 7, 1, 4, 5, 6, 2, 3]
A bug overwrites a[1], leaving:
a = [10, 0, 9, 7, 1, 4, 5, 6, 2, 3]
Restore the max-heap with a SINGLE sift_down call. Which index do you sift from, and what is the final array?
Find the analogue:build_max_heap calls sift_down on every internal node. Here you need just one call. What's true about the rest of the array that lets one call suffice?
show answer
Sift from index 1 (where the corruption is). The rest of the array is already heap-valid — only the subtree rooted at index 1 violates the property, and sift_down's contract is exactly: "fix one node whose two child subtrees are already heaps."
Trace: 0 at index 1; children a[3]=7, a[4]=1. Largest = 7 → swap. a = [10, 7, 9, 0, 1, 4, 5, 6, 2, 3]. Continue at index 3: children a[7]=6, a[8]=2. Largest = 6 → swap. a = [10, 7, 9, 6, 1, 4, 5, 0, 2, 3]. Continue at index 7: no children in the array. Done.
Final: [10, 7, 9, 6, 1, 4, 5, 0, 2, 3].
P.7data compression — Huffman coding
Build a Huffman tree from these character frequencies using a min-heap. At each step, extract the two smallest-frequency subtrees and combine them into a node whose frequency is their sum; reinsert.
{ A: 5, B: 2, C: 1, D: 3, E: 4 }
For each combine step, list the two frequencies that were combined.
Find the analogue:
Huffman needs the two smallest at each step, not just one. How do you get the second-smallest from a min-heap with O(log n) work?
Two extract-mins per step. n−1 steps total. Cost: O(nlogn).
Check problems
Two problems designed to resist pattern-matching against the practice set.
Check 1derivation
The page asserts that build_max_heap runs in O(n) — not O(nlogn) as a naive analysis would suggest. Derive this O(n) bound from first principles.
Use the fact that in a complete binary tree of n nodes, the number of nodes at height h from the bottom is at most ⌈n/2h+1⌉ and that sift_down at a node of height h costs at most O(h) swaps. You'll need a closed form for h=0∑∞h⋅xh when 0<x<1.
show solution sketch
Total work is the sum, over all internal nodes, of sift_down's cost at that node:
T(n)≤h=0∑⌊log2n⌋2h+1n⋅h
Pull n/2 out and let x=1/2:
T(n)≤2nh=0∑∞h⋅(21)h
The standard identity h=0∑∞hxh=(1−x)2x gives, at x=1/2:
h=0∑∞h⋅(21)h=(1/2)21/2=2
So T(n)≤n/2⋅2=n. Therefore build_max_heap is O(n). ∎
Intuition: most internal nodes are near the bottom of the tree (half are at height 1, a quarter at height 2, …) where sift_down is cheap. The geometric decay of node count overwhelms the linear growth of per-node cost, so the total stays linear.
Check 2transfer
The skyline problem: given a list of rectangular buildings (left, right, height), compute the outline of their union as a sequence of (x, height) key points sorted by x. Adjacent points share the height in between; the final point has height 0.
Use a max-heap to track the heights of buildings currently "active" at the sweep line. Produce the skyline. (None of the practice problems used heaps for geometric sweeping — this is a transfer.)
Sweep-line trace, tracking the max-heap of active heights:
x = 1: building 1 enters (h=11). Top = 11. Key point (1, 11).
x = 2: building 2 enters (h=6). Top still 11. No new key point.
x = 3: building 3 enters (h=13). Top = 13. Key point (3, 13).
x = 5: building 1 exits. Top still 13 (building 3 dominates). No new key point.
x = 7: building 2 exits. Top still 13. No new key point.
x = 9: building 3 exits. Heap empty. Top = 0. Key point (9, 0).
x = 12: building 4 enters (h=7). Top = 7. Key point (12, 7).
x = 16: building 4 exits. Heap empty. Key point (16, 0).
Algorithm. Create events: each building emits a (left, +h) entry-event and a (right, -h) exit-event. Sort all events by x. Sweep left to right. Maintain a max-heap of currently-active heights. For each event:
Entry: push h onto the heap.
Exit: remove h from the heap (lazily — see below).
After each event at position x, peek the heap's top: that's the current skyline height. If it differs from the previous skyline height, emit a new key point (x, top).
Subtle point. Standard heaps don't support remove an arbitrary value in O(logn). The trick is lazy deletion: keep a multiset (or counter) of "heights pending removal." When peeking the top, pop and discard any top values that are pending. Each value is pushed once and popped at most once, so the amortized cost stays O(logn) per event.
Why this is genuinely a transfer. Heap sort and all the practice problems use a heap as a priority queue over one stream of inputs. The skyline problem uses a heap as a spatial data structure indexed by a sweep line — the inputs aren't a stream, they're geometric objects, and "active" changes as the sweep crosses each edge. Same heap primitives; different problem structure.