Kth Largest Element

Interview Prep

StandardHeap / Priority Queueheapquickselect

The problem

Given an unsorted array of integers and an integer k, return the k-th largest element. "Largest" is in sorted-descending order; k = 1 is the maximum. Equivalent problem: k-th smallest if you flip the comparison.

Three solutions worth knowing

This problem is special because three very different algorithms each apply, with different trade-offs.

Sort: O(n log n). Trivially correct. Right answer if k is large or the array is small.
Min-heap of size k: O(n log k). Streaming-friendly: works on any iterable without storing the whole array. The right answer when k ≪ n, or for top-k on a live stream.
Quickselect: O(n) expected time, O(n²) worst case (mitigated with random pivots). The right answer when you really do need linear time on a single static array.

Baseline: sort

def kth_largest_sort(nums: list[int], k: int) -> int:
    """O(n log n).  Trivial baseline."""
    return sorted(nums, reverse=True)[k - 1]

Heap of size k

import heapq

def kth_largest_heap(nums: list[int], k: int) -> int:
    """O(n log k).  Maintain a min-heap of size k; its root is the k-th largest."""
    h = []
    for x in nums:
        if len(h) < k:
            heapq.heappush(h, x)
        elif x > h[0]:
            heapq.heapreplace(h, x)         # pop + push in one step
    return h[0]

Quickselect (expected O(n))

Quickselect is quicksort that only recurses into the side containing the target index. The expected number of comparisons is linear because the recurrence T(n) = T(n/2) + n sums to 2n. Pivot selection matters: a deterministic "first element" pivot causes O(n²) on sorted input; a random pivot makes the worst case astronomically unlikely. Median-of-medians selects a pivot deterministically in linear time and gives O(n) worst-case — beautiful theoretically, rarely used in practice because of the constant factor.

import random

def kth_largest(nums: list[int], k: int) -> int:
    """O(n) average via quickselect.  Find element at index k-1 in a sorted-desc view."""
    target = k - 1                          # 0-indexed position in the sorted-desc array
    lo, hi = 0, len(nums) - 1
    a = nums[:]                             # don't mutate caller's array

    while True:
        pivot_idx = partition(a, lo, hi)
        if pivot_idx == target:
            return a[pivot_idx]
        if pivot_idx < target:
            lo = pivot_idx + 1
        else:
            hi = pivot_idx - 1

def partition(a, lo, hi):
    """Lomuto-style partition around a RANDOM pivot, sorting DESC.
    Returns the final index of the pivot.
    Elements left of pivot >= pivot value; elements right of pivot < pivot value.
    """
    p = random.randint(lo, hi)
    a[p], a[hi] = a[hi], a[p]
    pivot = a[hi]
    store = lo
    for i in range(lo, hi):
        if a[i] > pivot:                    # use >= to handle duplicates more evenly
            a[i], a[store] = a[store], a[i]; store += 1
    a[store], a[hi] = a[hi], a[store]
    return store

Trace

nums = [3, 2, 1, 5, 6, 4], k = 2

Heap walk (k=2):
  x=3: h=[3]
  x=2: h=[2, 3]      (size 2)
  x=1: 1 <= h[0]=2, skip
  x=5: 5 > 2, replace top: h=[3, 5]
  x=6: 6 > 3, replace top: h=[5, 6]
  x=4: 4 <= 5, skip

return h[0] = 5    (2nd largest)

Quickselect (target index 1, i.e. 2nd largest):
  partition entire array around some pivot, say 4:
    rearranged to [5, 6, 4, 1, 2, 3] (>= 4 on left, < 4 on right; pivot at index 2)
    pivot_idx = 2, target = 1.   pivot_idx > target -> hi = 1.
  partition [5, 6] around some pivot, say 6:
    [6, 5], pivot 6 ends at index 0.
    pivot_idx = 0, target = 1.   pivot_idx < target -> lo = 1.
  partition [5] around 5:
    pivot_idx = 1, target = 1.   match -> return a[1] = 5.

Complexity summary

Sort: O(n log n) time, O(n) space (or in-place).
Heap: O(n log k) time, O(k) space.
Quickselect: O(n) expected, O(n²) worst; O(log n) stack on average.

Variations worth knowing

Top-k frequent elements: heap of size k over (count, value). Or bucket-sort by frequency for O(n).
k-th smallest in a streaming setting: max-heap of size k; replace the root whenever a smaller value arrives. Symmetric to the top-k heap.
Median of a stream: two heaps (see Find Median from Data Stream).
k-th smallest in a sorted matrix: binary search on the value range, count entries ≤ midpoint with a staircase walk. Or a heap of size k.
Selection in unsorted external memory: sample buckets, count, recurse into the right one. Used in distributed databases for approximate quantiles.