Heap Sort

Algorithms

What's the idea?

So you want to sort a list of numbers. This list of numbers needs to be sorted greatest to least. The way you envision to do this is I'll just go left to right and pick out the largest on each time. If the list is $n$ long and you have to do the scan $n$ times. The algorithm will be done in $O (n^{2})$ . I think we can do better.

What if we could write the list a different way. We write the list into a binary tree. See, when you are using a list, it is actually a graph with nodes. It's just a boring graph. The immediate neighbors are to the right and to the left. The secret to heap sort is that you can use a special index formula to redefine the nearby nodes in the graph.

Back to our original strategy. Heap sort is the same plan — repeatedly pull the biggest out except you use this "special index formula" to rewrite the list as a tree and then apply the search. By using the tree structure the lookup costs you $lo g n$ instead of $n$ . Multiply that by $n$ extractions and you're at $n lo g n$ instead of $n^{2}$ .

The clever arrangement is called a heap, and the whole thing happens inside the original array — no second array, no pointers, no extra memory beyond a couple of variables. To get there we need a few pieces. Let me show you each one before we glue them together.

Piece 1 — An array can be read as a tree

Here's the first key idea. Take an ordinary array, and pretend the indices line up as a binary tree. Index $0$ is the top. Indices $1$ and $2$ are its two children. Indices $3, 4, 5, 6$ are the next layer, two children per parent. $7$ through $14$ are the layer after that. And so on, level by level, until you run out of array.

The family relationships are pure arithmetic. If you're sitting at index $i$ , then:

your parent is at $⌊(i - 1) /2 ⌋$ ,
your left child is at $2 i + 1$ ,
your right child is at $2 i + 2$ .

No tree object actually exists. Nothing is allocated. The array is still an array. We just look at it as a tree because the arithmetic above gives us a level-by-level walk. That's enough.

A position with at least one child still inside the array is an internal node. A position with no children in the array is a leaf. Work it out: for an array of length $n$ , the leaves are exactly the back half — indices $⌊ n /2 ⌋$ through $n - 1$ . The internal nodes are the front half.

Walking through it makes the formula concrete. Press Step to drop one index at a time into its tree slot. Each step shows the parent calculation that picked the slot — the same arithmetic sift-down and build-heap will lean on later.

Piece 2 — The max-heap property

Now here's the constraint we want to set up. We say the array is a max-heap if every parent is at least as big as both its children. That's the only rule. The constraint runs vertically, parent to child — siblings can be in either order, and any two nodes that aren't directly above-or-below each other have no required relationship. Just parent $\geq$ child, applied to every parent-child pair.

Why want this? Because of one beautiful consequence: the biggest value is always at the top. Look — the root is bigger than its two children. Each of them is bigger than their children. Their children are bigger than their children. Walking that chain down, the root is bigger than everything below it, which is everything in the heap. So $a [0]$ is the max of the whole array, no searching, no scanning, just look at index 0.

That's the whole reason for the heap shape. Once we have it, finding the maximum is free.

Piece 3 — Sift-down (bubble sort, but on a tree)

This is the workhorse. Both halves of heap sort are just repeated calls to sift-down, so it's worth understanding it on its own, before we put it to use.

Sift-down takes one node whose value might be too small for where it sits — its children are bigger than it. The job is to push that value down through the tree until it's big enough for its position. At each step: look at the two children, pick the bigger of the two, and if it's bigger than the value we're carrying, swap them. Then continue from the child's old slot. If the value we're carrying is already $\geq$ both children, stop — we're done.

The connection to bubble sort

If you've seen bubble sort, sift-down should look familiar. Bubble sort takes a flat array and, in each pass, walks left to right, swapping any adjacent pair that's out of order. After one pass the biggest value has bubbled up to the right end. The mechanic is: compare with neighbor, swap if needed, move on.

Sift-down is doing the same thing — except the "neighbor" isn't the slot next door, it's one of your two children in the tree. And we swap with the bigger of the two, because that's the child that was actually beating us. So sift-down is bubble sort, but on a tree path instead of a flat array.

That single change — from flat to tree — is what gives us the speedup. Bubble sort walks past $n$ slots per pass; the tree is only $lo g_{2} n$ levels deep, so sift-down only walks past about $lo g n$ nodes. $lo g n$ vs. $n$ — that's the win, hidden in this one operation.

The contract

Sift-down only works correctly under one assumption: the two subtrees below the node it's called on must already be heaps. If that's true, sift-down's swap-with-bigger-child trick actually settles the value into a valid spot. If it's not true, you'll just be moving junk into more junk.

Concretely: $sift_down (a, r, e)$ assumes the subtrees rooted at $2 r + 1$ and $2 r + 2$ (within $[0, e)$ ) are already max-heaps. After it returns, the full subtree rooted at $r$ — including the node itself — is a max-heap. Same set of values; just shuffled.

The cost

One sift-down call walks at most one root-to-leaf path of the subtree. Tree depth in our setup is $lo g_{2} n$ , so each sift-down costs at most $O (lo g n)$ swaps and twice that in comparisons. Cheap.

Code

def sift_down(a, root, end):
    """Push a[root] down through the subtree over indices [root, end)
    until every parent is >= both its children."""
    while True:
        l = 2 * root + 1     # left child index
        r = 2 * root + 2     # right child index
        largest = root
        if l < end and a[l] > a[largest]:
            largest = l
        if r < end and a[r] > a[largest]:
            largest = r
        if largest == root:        # parent is already >= both children
            return
        a[root], a[largest] = a[largest], a[root]
        root = largest             # continue from the demoted position

The two interesting lines are the child-comparison logic (figure out which of root, l, r holds the largest value) and the swap-and-continue (root = largest). The early-return when largest == root is what stops the walk when the value is finally heavy enough for its current spot.

Piece 4 — Building the heap

We've got sift-down. Now we use it to turn a random array into a max-heap. The plan: call sift-down on every internal node, but in the right order.

The right order is bottom-up — from the deepest internal node (index $⌊ n /2 ⌋ - 1$ ) backward to the root (index $0$ ). Why bottom-up and not top-down? Because of sift-down's contract from Piece 3: it only works if the children' subtrees are already heaps. If we went top-down, none of them would be. Going bottom-up gives us that guarantee for free — when we reach a node, both its children' subtrees have larger indices, and we already finished those. They're already heaps. Sift-down is legal.

def build_max_heap(a):
    n = len(a)
    # Walk internal nodes from the deepest backward to the root.
    # When sift_down runs at index i, its children' subtrees have already
    # been fixed (they had larger indices, processed earlier).
    for i in range(n // 2 - 1, -1, -1):
        sift_down(a, i, n)

Why we skip the leaves

Sift-down at a leaf is a no-op. There's nothing below to compare against, so the heap property is trivially satisfied at that position. We don't bother — the loop only walks the front half of the array, where the internal nodes live.

The unexpectedly low cost

Naïve guess: $n /2$ sift-downs, each up to $lo g n$ , total $O (n lo g n)$ . Wrong, and it's a little surprising why. Most of the sift-downs are not worst-case. Half of the internal nodes are right above the leaves — they can sift down at most one level. A quarter of the internal nodes sit two levels above the leaves — at most two levels of sifting. An eighth, three. The sum is $h = 0 \sum l o g n \frac{n}{2 ^{h + 1}} \cdot h \leq n$ a geometric series that stays linear in $n$ . So building the heap is $O (n)$ , not $O (n lo g n)$ . The heavy-but-rare top sift-downs are dominated by the cheap-but-many bottom ones.

Putting it together — heap sort

Now we have all the pieces. The whole algorithm:

Build the heap (Piece 4). Largest value lands at index 0.
Swap $a [0]$ with the last position of the heap. The max is now at the back of the array, in its final sorted slot. The heap shrinks by one.
The swap broke the heap at the root. Sift-down (Piece 3) fixes it. Now the new max is at index 0.
Repeat steps 2 and 3 until the heap has one element left. The array is sorted.

def heap_sort(a):
    n = len(a)
    build_max_heap(a)                  # phase 1: O(n)
    for end in range(n - 1, 0, -1):    # phase 2: n-1 extractions
        a[0], a[end] = a[end], a[0]    # max → its final slot
        sift_down(a, 0, end)           # restore heap on [0, end)
    return a

Why one sift-down is enough

After the swap in step 2, the only place the heap is broken is at the root. The root's two subtrees were not touched by the swap, so they're still valid heaps. That matches sift-down's contract exactly: children' subtrees are heaps, just the parent is wrong. One call cleans it up.

Total cost

Phase 1 is $O (n)$ . Phase 2 is $n - 1$ sift-downs each $O (lo g n)$ , totalling $O (n lo g n)$ . Phase 2 dominates. Heap sort is $O (n lo g n)$ in the worst, average, and best case — it doesn't get faster on already-sorted input.

In-place, not stable

Heap sort uses no extra storage beyond the array itself. Just two or three variables. $O (1)$ space.

It's not stable, though. When sift-down picks a child to swap with, it doesn't care which one was originally on the left or right; equal values can therefore end up in a different relative order than they started.

The necessarily-true facts

A short list of claims the algorithm leans on. Each one is true by definition or by a one-line argument; together they're why heap sort works and why it costs what it costs.

The biggest value sits at the top of a max-heap. Root $\geq$ children $\geq$ grandchildren $\geq \dots$ So the root is bigger than everything below it, which is everything. $a [0]$ is the max — no search.
Leaves don't need work. A leaf has no children inside the array. There's nothing to compare against, so the heap property is trivially satisfied at that position. Build-heap only walks the internal nodes (front half of the array), not all $n$ indices.
Sift-down's contract. If you call sift-down at a node whose two subtrees are already heaps, then on return the full subtree (including the node) is a heap. If that precondition isn't met, sift-down doesn't fix the mess below — it can only settle one value through already-valid subtrees.
The bottom-up build satisfies the contract every time. When build-heap calls sift-down at index $i$ , both child subtrees (rooted at $2 i + 1$ and $2 i + 2$ ) have larger indices, so they were processed in earlier iterations. Already heaps. Sift-down is legal at every step of the loop.
The phase-2 swap only disturbs the root. Swapping $a [0] \leftrightarrow a [end]$ doesn't touch any other entry, so the children' subtrees of the root are still valid heaps. Only the root is potentially out of place, and that's exactly the case sift-down is built for.
One sift-down per extraction is enough. Combining invariants 3 and 5: after each phase-2 swap, sift-down at the root has its precondition met, so a single call restores the heap.
Sift-down costs at most $lo g_{2} n$ swaps. It walks one root-to-leaf path. The tree's height is bounded by $lo g_{2} n$ .
Phase 1 costs $O (n)$ , not $O (n lo g n)$ . Most internal nodes sit near the bottom of the tree, where their sift-downs are short. The tier-by-tier sum of work is geometric and stays below $2 n$ .
Phase 2 costs $O (n lo g n)$ . $n - 1$ sift-down calls, each at most $O (lo g n)$ by invariant 7.
Heap sort is $O (n lo g n)$ overall, $O (1)$ extra space, not stable. Phase 2 dominates the runtime. The algorithm only swaps within the input array. Equal values can end up reordered because sift-down picks a child based on size, not on original position.

When you'd actually use it

Quick sort is usually faster in practice because its memory access pattern is cache-friendly and its inner loop is tight. But quick sort's worst case is $O (n^{2})$ — degenerate pivot choices can blow up. Heap sort's worst case is the same as its best case: $O (n lo g n)$ .

The standard hybrid sorts (introsort in C++, pdqsort in Rust) use heap sort as a parachute. They run quick sort by default, but monitor the recursion depth; if it goes too deep — a sign that pivots are choosing badly — they switch to heap sort and finish the job at $O (n lo g n)$ guaranteed. That's the niche heap sort fills: not the everyday workhorse, but the safety net that prevents the everyday workhorse from going pathological.

Common questions

In sift-down, why pick the larger child to swap with? Why not just any child that's bigger than the parent?

Imagine you swapped with the smaller of the two children. The new parent is now whatever value the smaller child had. But the other child (the larger one) is still sitting there, still bigger than what's now in the parent slot. You haven't fixed anything! The heap property is still broken. By picking the larger child, you guarantee the new parent is at least as big as both children at this level. Both inequalities are satisfied in one swap.

If sift-down is bubble sort on a tree, why bother learning bubble sort separately?

Bubble sort is the simplest way to see the pattern: swap, walk, swap, walk. Sift-down is the same pattern with the walk happening on a tree path instead of an array. Once you've seen bubble sort, sift-down is "the same idea, on a smarter data structure." The smarter data structure is what gives heap sort its $lo g n$ instead of bubble sort's $n$ .

In phase 2, why does swapping the root to the back actually sort the array?

Because the root is the maximum of everything still in the heap. So when you swap it to position end, you've placed the largest remaining value at the rightmost slot of what'll become the sorted region. Next iteration, with a smaller heap, places the second-largest one slot to the left. And so on. The sorted region grows leftward, each new entry strictly less than the one to its right. That's sorted order.

Could phase 2 be written as $n$ calls to a generic extract_max?

Conceptually it is. The trick is what we do with the extracted max: instead of returning it, we park it in the slot the heap is about to give up by shrinking. The same array holds the heap on the left and the sorted output on the right, with a moving boundary. extract_max done in-place.

Worked examples

Build a max-heap from an unsorted array → Take an unsorted array of 7 numbers and transform it into a max-heap in O(n), running sift-down at each internal node from the deepest backward to the root.

Browse all worked examples →

Exercises

A full exercise set is available for this topic, structured as one worked example + 7 practice problems (across 7 surface contexts) + 2 pattern-resistant check problems.

Open the Heap Sort exercise set → Use a heap (array-indexed binary tree with parent ≥ children) to maintain a partial order that gives O(log n) extract-max and O(log n) insert — the building block of priority queues, top-k, median maintenance, k-way merge, and Dijkstra.