Algorithms

Binary Search — exercises

Recognize a problem as the search for the unique boundary of a monotonic predicate; maintain an invariant interval [lo, hi] with a known predicate sign at each end; halve the interval per step until the boundary is found in O(log n) time.

1 worked example · 7 practice problems · 2 check problems

Worked example: classic sorted-array search

Problem. Find $13$ in the sorted array [1, 3, 5, 7, 9, 11, 13, 15]. Trace each iteration of binary search by hand. Report the final index.

Diagnosis. The array is sorted, so the predicate $P (i) = (arr[i] \geq target)$ is monotonic in $i$ : it stays false up to some boundary index $k$ , then is true from $k$ onward. Binary search finds $k$ in $O (lo g n)$ by halving an interval where the boundary must live.

Predict before reading on: predict the number of iterations needed. The array has 8 elements; binary search halves the search range each step. Roughly how many halvings to isolate the answer?

Iteration trace. Maintain $lo, hi$ as inclusive indices. At each step compute $mid = (lo + hi) /2$ , compare with the target, and shrink the side that can't contain it.

iter	lo	hi	mid	arr[mid]	action
1	0	7	3	7	$7 < 13$ , search right: $lo = 4$
2	4	7	5	11	$11 < 13$ , search right: $lo = 6$
3	6	7	6	13	found at index 6

Three iterations. $lo g_{2} (8) = 3$ , exactly the theoretical bound. Each iteration discards half the remaining candidates.

Predict before reading on: what would happen at iteration 2 if we accidentally wrote $lo = mid$ instead of $lo = mid + 1$ ? Trace one step and see.

The invariant. Throughout the loop, the target (if present) lies in $[lo, hi]$ . The boundaries shrink monotonically — never moving past $mid$ — so the loop terminates. The size of the interval halves each iteration, so total work is $O (lo g n)$ .

Verification.

def search(arr, target):
    lo, hi = 0, len(arr) - 1
    while lo <= hi:
        mid = (lo + hi) // 2
        if arr[mid] == target:
            return mid
        elif arr[mid] < target:
            lo = mid + 1
        else:
            hi = mid - 1
    return -1

print(search([1, 3, 5, 7, 9, 11, 13, 15], 13))   # 6

Articulate: state in one sentence what binary search is "actually about" — not sorted arrays specifically, but the deeper property it exploits.

Practice problems

Seven problems, seven different surfaces. The unifying move is "find the boundary of a monotonic predicate by halving." The predicate changes between problems; the halving structure doesn't.

P.1 classic sorted-array lookup, edge cases

In [2, 4, 6, 8, 10, 12, 14, 16, 18, 20]:

(a) Where is $14$ ? How many iterations?

(b) Search for $11$ . What does the algorithm return, and why?

Find the analogue: same algorithm as the worked example, three different targets. The interesting cases are (b) and (c) where the target isn't in the array — what happens to $lo$ and $hi$ at termination?

show answer

(a) $14$ is at index 6. Iteration 1: $mid = 4, arr [4] = 10 < 14, lo = 5$ . Iteration 2: $mid = 7, arr [7] = 16 > 14, hi = 6$ . Iteration 3: $mid = 5, arr [5] = 12 < 14, lo = 6$ . Iteration 4: $mid = 6, arr [6] = 14$ , found.

(b) $11$ isn't in the array. The loop exits with $lo = 5, hi = 4$ (lo passes hi), returns -1. At exit, $lo$ points to where 11 would be inserted to keep the array sorted — a useful invariant.

(c) $21$ is bigger than every element. After successive lo-bumps, $lo$ goes past $hi$ , terminates with $lo = 10, hi = 9$ . Returns -1.

P.2 lower-bound with duplicates

Find the first occurrence of $5$ in [1, 3, 5, 5, 5, 7, 9, 11]. The vanilla search from the worked example returns some index of 5 (probably the middle one). Modify it to return the leftmost.

Find the analogue: vanilla search uses the predicate $arr [i] \geq target$ . Lower-bound uses the same predicate but doesn't stop when it finds an equal element — it keeps shrinking $hi$ to look for a smaller index that still satisfies it. Modify the comparison.

show answer

Index 2 (the first 5). Use a "predicate" search rather than equality:

def lower_bound(arr, target):
    lo, hi = 0, len(arr)          # half-open [lo, hi)
    while lo < hi:
        mid = (lo + hi) // 2
        if arr[mid] < target:
            lo = mid + 1
        else:
            hi = mid              # don't bracket past mid — could be leftmost
    return lo

print(lower_bound([1, 3, 5, 5, 5, 7, 9, 11], 5))    # 2

Two structural changes from the worked example: (i) half-open interval ( $hi$ exclusive) so out-of-bounds insertion has a natural index; (ii) $hi = mid$ not $mid - 1$ — because the element at mid could still be the leftmost answer.

This pattern — find the first index where a monotonic predicate becomes true — is the most general form of binary search. Vanilla "find a target" is a degenerate case.

P.3 integer square root via number theory

Compute $⌊ 50 ⌋$ (the integer square root: the largest $k$ with $k^{2} \leq 50$ ) using binary search. Then generalize: $⌊ n ⌋$ for arbitrary positive integer $n$ .

Find the analogue: no array here. What plays the role of the array? What's the monotonic predicate? Read off the answer at termination.

show answer

The "array" is the integers $0, 1, 2, \dots, n$ . The predicate is $P (k) = (k^{2} > n)$ . $P$ is monotonic in $k$ : false for small $k$ (their squares are still $\leq n$ ), true for large $k$ . The largest $k$ with $\neg P (k)$ is the integer square root.

def isqrt(n):
    lo, hi = 0, n
    while lo <= hi:
        mid = (lo + hi) // 2
        sq  = mid * mid
        if sq == n:
            return mid
        elif sq < n:
            lo = mid + 1
        else:
            hi = mid - 1
    return hi              # largest k with k² ≤ n

print(isqrt(50))           # 7  (since 7² = 49 ≤ 50 < 64 = 8²)
print(isqrt(64))           # 8

Two iterations on $n = 50$ : $mid = 25, 625 > 50, hi = 24$ . $mid = 12, 144 > 50, hi = 11$ . $mid = 5, 25 < 50, lo = 6$ . $mid = 8, 64 > 50, hi = 7$ . $mid = 6, 36 < 50, lo = 7$ . $mid = 7, 49 < 50, lo = 8$ . Loop exits with $hi = 7$ .

This is the same algorithm as the worked example, but the "array" is virtual — defined by a function rather than stored explicitly. Anywhere a monotonic predicate lives on an ordered domain, binary search applies, whether the domain is in memory or not.

P.4 rotated sorted array — transformed monotonicity

A sorted array has been rotated by an unknown amount: [4, 5, 6, 7, 0, 1, 2]. The array is no longer globally sorted, but it has a special structure (one rotation point). Find $0$ in $O (lo g n)$ time.

Find the analogue: the array isn't sorted, so the vanilla predicate doesn't apply. But it has the property that at least one half of the array around any mid-point is sorted. Identify which half is sorted, then check whether the target lives in that half. If yes, search that half; if no, search the other.

show answer

Index 4. The algorithm:

def search_rotated(arr, target):
    lo, hi = 0, len(arr) - 1
    while lo <= hi:
        mid = (lo + hi) // 2
        if arr[mid] == target:
            return mid
        # Identify the sorted half
        if arr[lo] <= arr[mid]:                   # left [lo..mid] is sorted
            if arr[lo] <= target < arr[mid]:
                hi = mid - 1
            else:
                lo = mid + 1
        else:                                       # right [mid..hi] is sorted
            if arr[mid] < target <= arr[hi]:
                lo = mid + 1
            else:
                hi = mid - 1
    return -1

print(search_rotated([4, 5, 6, 7, 0, 1, 2], 0))   # 4

Trace on [4,5,6,7,0,1,2] with target 0: $mid = 3, arr [3] = 7$ . Left half [4,5,6,7] is sorted (since $arr [lo] = 4 \leq 7$ ), but 0 is not in [4, 7), so search right: $lo = 4$ . $mid = 5, arr [5] = 1, \neq = 0$ . Left half [0,1] is sorted, 0 is in [0, 1), so search left: $hi = 4$ . $mid = 4, arr [4] = 0$ , found.

The key insight: the predicate "is target in [lo, mid] sorted half?" is monotonic given the side identification. Even when the array as a whole isn't sorted, the halving structure survives because each step preserves a sorted-half invariant.

P.5 peak element — local monotonicity

Given an array [1, 3, 7, 8, 5, 4, 2], find any peak — an index $i$ where $arr [i - 1] \leq arr [i] \geq arr [i + 1]$ (using $- \infty$ for out-of-bounds neighbors). The array isn't sorted — but binary search still works in $O (lo g n)$ . Why?

Find the analogue: the array isn't sorted, so the monotonic-predicate-on-values formulation doesn't apply directly. But there's a local property: at any $mid$ , look at the slope around it. If $arr [mid + 1] > arr [mid]$ , the array is rising — there must be a peak to the right. If falling, a peak is to the left. Recurse on the half that contains a peak.

show answer

Peak at index 3 (value 8). The algorithm:

def find_peak(arr):
    lo, hi = 0, len(arr) - 1
    while lo < hi:
        mid = (lo + hi) // 2
        if arr[mid] < arr[mid + 1]:
            lo = mid + 1                  # rising, peak is to the right
        else:
            hi = mid                       # falling or flat, peak is at mid or left
    return lo

print(find_peak([1, 3, 7, 8, 5, 4, 2]))    # 3

Why this works: at each step, the "is there a peak in [lo, hi]?" invariant is maintained. If the array rises at mid, the values continue to grow into the right half before they have to start falling — there must be a peak there. The halving is justified by this monotonic-direction predicate, even though the array as a whole has no global order.

The takeaway is the same as P.4: binary search isn't about sorted arrays specifically. It's about any situation where you can rule out half the search space in $O (1)$ based on a property at the midpoint. Sortedness is one way to enable that ruling; local monotonicity is another.

P.6 parametric search — binary search on the answer

You have packages with weights $[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]$ and must ship them in 5 days, in order, on a conveyor with daily capacity $C$ . What's the minimum $C$ that works?

Find the analogue: no array of capacities to search through — you have to compute the answer. But there's a monotonic predicate hiding: $P (C) =$ "can we ship in 5 days at capacity $C$ ?" is false below some threshold and true above. Binary search on $C$ .

show answer

The minimum capacity is 15. Construct the predicate as a greedy simulation, then binary search on $C$ :

def can_ship(packages, days, cap):
    used, current = 1, 0
    for w in packages:
        if w > cap: return False
        if current + w > cap:
            used += 1
            current = w
        else:
            current += w
    return used <= days

def min_capacity(packages, days):
    lo, hi = max(packages), sum(packages)
    while lo < hi:
        mid = (lo + hi) // 2
        if can_ship(packages, days, mid):
            hi = mid
        else:
            lo = mid + 1
    return lo

print(min_capacity([1,2,3,4,5,6,7,8,9,10], 5))   # 15

Bounds: $lo = max (packages)$ (any smaller and a single package wouldn't fit), $hi = \sum (packages)$ (one giant day always works). Binary search finds the smallest $C$ where can_ship returns true.

This is "binary search on the answer." It applies whenever (i) the answer is a number in some range, (ii) you can check whether any candidate answer works in $O (f (n))$ time, and (iii) the "works"/"doesn't work" predicate is monotonic. Then the total cost is $O (f (n) lo g (range))$ . Pattern shows up in capacity allocation, scheduling, resource provisioning, optimal-cut problems — a huge class of optimization problems.

P.7 real-valued root finding

Find a positive root of $f (x) = x^{3} - x - 1$ to absolute tolerance $1 0^{- 6}$ . (There's one real root between 1 and 2.) Note that $f (1) = - 1 < 0$ and $f (2) = 5 > 0$ , so the intermediate value theorem guarantees a root in between.

Find the analogue: no discrete array; the "indices" are real numbers in $[1, 2]$ . The predicate $P (x) = (f (x) > 0)$ is monotonic on this interval because $f$ is strictly increasing there. Binary search reduces to $hi - lo \to 0$ .

show answer

Root: $x \approx 1.32472$ .

def bisect_root(f, lo, hi, tol=1e-6):
    assert f(lo) * f(hi) < 0     # opposite signs
    while hi - lo > tol:
        mid = (lo + hi) / 2
        if f(mid) == 0:
            return mid
        elif f(lo) * f(mid) < 0:
            hi = mid              # root in left half
        else:
            lo = mid              # root in right half
    return (lo + hi) / 2

print(bisect_root(lambda x: x**3 - x - 1, 1.0, 2.0))   # ≈ 1.32472

This is the bisection method from numerical analysis. Same algorithm as integer binary search, but with continuous bookkeeping and a tolerance-based termination. Number of iterations to achieve tolerance $ε$ on an interval of width $W$ is $lo g_{2} (W / ε)$ — same $O (lo g)$ as discrete binary search.

Practical note: bisection has linear convergence rate $1/2$ per step. Newton's method converges quadratically when it works but can diverge; bisection always works given a sign-change bracket, but never beats linear. Hybrid methods (Brent's algorithm) combine the two — fast when Newton would work, robust when it wouldn't.

Check problems

Two problems that resist pattern-matching against the practice set. The first proves the running-time bound formally; the second is a diagnosis exercise on a notoriously bug-prone family of implementations.

Check 1 derivation

Prove that binary search on a sorted array of $n$ elements has worst-case running time $O (lo g n)$ , terminates on every input, and has the correctness property: if the target is present, it's found; if absent, the loop exits with $lo > hi$ .

Specifically:

(a) Define an invariant that holds at the start of each iteration. State it precisely.

(b) Show the invariant is preserved by every iteration.

(d) Derive a recurrence for the size of $[lo, hi]$ after $k$ iterations. Use it to bound the number of iterations.

show solution sketch

(a) Invariant. If the target is in the array, then the target's index lies in the inclusive range $[lo, hi]$ . Equivalently: $arr [i] < target$ for all $i < lo$ , and $arr [i] > target$ for all $i > hi$ .

(b) Preservation. Suppose the invariant holds at iteration start. We compute $mid$ and compare:

If $arr [mid] = target$ : return immediately. (No need to maintain invariant past return.)
If $arr [mid] < target$ : by sortedness, $arr [i] < target$ for all $i \leq mid$ . So the target's index, if it exists, is in $[mid + 1, hi]$ . Setting $lo = mid + 1$ preserves the invariant.
If $arr [mid] > target$ : symmetric, setting $hi = mid - 1$ preserves the invariant.

(c) Correctness on termination. The loop ends when either (i) $arr [mid] = target$ , which finds the target by direct check, or (ii) $lo > hi$ , in which case the invariant gives "if the target is in the array, its index is in $[lo, hi]$ " — but $[lo, hi]$ is empty, so the target isn't present. Return -1. ✓

(d) Iteration bound. Let $s_{k} = hi - lo + 1$ be the size of the range at iteration $k$ . The next iteration sets either $lo \to mid + 1$ or $hi \to mid - 1$ . In both cases, $s_{k + 1} \leq ⌊ s_{k} /2 ⌋$ (because $mid$ is roughly the midpoint and we discard half plus the midpoint itself).

Starting from $s_{0} = n$ , after $k$ iterations $s_{k} \leq n / 2^{k}$ . The loop exits when $s_{k} = 0$ , which happens by $k = ⌈ lo g_{2} n ⌉ + 1$ .

So the total number of iterations is at most $⌈ lo g_{2} n ⌉ + 1 = O (lo g n)$ . Each iteration does $O (1)$ work, so the total running time is $O (lo g n)$ . ✓

Check 2 diagnosis

The following code is supposed to search for a target in a sorted array. It compiles, runs on most inputs, and looks correct at a glance. But there's a bug that produces incorrect results on specific inputs and may also cause an out-of-bounds access.

def search(arr, target):
    lo, hi = 0, len(arr)            # ← note: hi = len(arr), not len(arr) - 1
    while lo <= hi:                  # ← but uses <=
        mid = (lo + hi) // 2
        if arr[mid] == target:
            return mid
        elif arr[mid] < target:
            lo = mid + 1
        else:
            hi = mid - 1
    return -1

(a) Find the bug. Describe it precisely (one sentence).

(b) Construct a small example input where the bug manifests. Trace the iteration that goes wrong.

show solution sketch

(a) The bug is an off-by-one inconsistency in the bracket convention. The initialization $hi = len (arr)$ uses a half-open upper bound ( $hi$ is one past the last valid index), but the loop body treats $hi$ as a closed upper bound by using $lo \leq hi$ and $hi = mid - 1$ . The two conventions are mixed.

(b) Example. Search for any target in [1, 2, 3], say target = 3. Iteration 1: $lo = 0, hi = 3$ , $mid = 1$ , $arr [1] = 2 < 3$ , $lo = 2$ . Iteration 2: $lo = 2, hi = 3$ , $mid = 2$ , $arr [2] = 3$ , returns 2. ✓ for this case. Now try a target larger than every element, like target = 5. Iteration 1: $mid = 1, 2 < 5, lo = 2$ . Iteration 2: $lo = 2, hi = 3, mid = 2, arr [2] = 3 < 5, lo = 3$ . Iteration 3: $lo = 3, hi = 3, mid = 3$ , but arr[3] is an out-of-bounds access. Python raises IndexError. (In C, this would be undefined behavior.)

Fix to closed-interval convention: change initialization to $hi = len (arr) - 1$ . The loop body's $lo \leq hi$ and $hi = mid - 1$ are correct for this convention.
Fix to half-open convention: keep $hi = len (arr)$ , but change the loop to $lo < hi$ and the upper update to $hi = mid$ (not $mid - 1$ ).

Pick one and stick with it; mixing them is exactly how off-by-one bugs in binary search get into production. The Knuth-attributed observation that "binary search is easy until you have to write it" comes from this: the algorithm is one paragraph of pseudocode, but the boundary convention has to be exact.