Algorithms

Binary Search — exercises

Recognize a problem as the search for the unique boundary of a monotonic predicate; maintain an invariant interval [lo, hi] with a known predicate sign at each end; halve the interval per step until the boundary is found in O(log n) time.

1 worked example · 7 practice problems · 2 check problems

Worked example: classic sorted-array search

Problem. Find in the sorted array [1, 3, 5, 7, 9, 11, 13, 15]. Trace each iteration of binary search by hand. Report the final index.

Diagnosis. The array is sorted, so the predicate is monotonic in : it stays false up to some boundary index , then is true from onward. Binary search finds in by halving an interval where the boundary must live.

Predict before reading on: predict the number of iterations needed. The array has 8 elements; binary search halves the search range each step. Roughly how many halvings to isolate the answer?

Iteration trace. Maintain as inclusive indices. At each step compute , compare with the target, and shrink the side that can't contain it.

iterlohimidarr[mid]action
10737, search right:
247511, search right:
367613found at index 6

Three iterations. , exactly the theoretical bound. Each iteration discards half the remaining candidates.

Predict before reading on: what would happen at iteration 2 if we accidentally wrote instead of ? Trace one step and see.

The invariant. Throughout the loop, the target (if present) lies in . The boundaries shrink monotonically — never moving past — so the loop terminates. The size of the interval halves each iteration, so total work is .

Verification.

def search(arr, target):
    lo, hi = 0, len(arr) - 1
    while lo <= hi:
        mid = (lo + hi) // 2
        if arr[mid] == target:
            return mid
        elif arr[mid] < target:
            lo = mid + 1
        else:
            hi = mid - 1
    return -1

print(search([1, 3, 5, 7, 9, 11, 13, 15], 13))   # 6

Articulate: state in one sentence what binary search is "actually about" — not sorted arrays specifically, but the deeper property it exploits.


Practice problems

Seven problems, seven different surfaces. The unifying move is "find the boundary of a monotonic predicate by halving." The predicate changes between problems; the halving structure doesn't.

P.1 classic sorted-array lookup, edge cases

In [2, 4, 6, 8, 10, 12, 14, 16, 18, 20]:

(a) Where is ? How many iterations?

(b) Search for . What does the algorithm return, and why?

(c) Search for . Trace the final iteration carefully.

Find the analogue: same algorithm as the worked example, three different targets. The interesting cases are (b) and (c) where the target isn't in the array — what happens to and at termination?

show answer

(a) is at index 6. Iteration 1: . Iteration 2: . Iteration 3: . Iteration 4: , found.

(b) isn't in the array. The loop exits with (lo passes hi), returns -1. At exit, points to where 11 would be inserted to keep the array sorted — a useful invariant.

(c) is bigger than every element. After successive lo-bumps, goes past , terminates with . Returns -1.

P.2 lower-bound with duplicates

Find the first occurrence of in [1, 3, 5, 5, 5, 7, 9, 11]. The vanilla search from the worked example returns some index of 5 (probably the middle one). Modify it to return the leftmost.

Find the analogue: vanilla search uses the predicate . Lower-bound uses the same predicate but doesn't stop when it finds an equal element — it keeps shrinking to look for a smaller index that still satisfies it. Modify the comparison.

show answer

Index 2 (the first 5). Use a "predicate" search rather than equality:

def lower_bound(arr, target):
    lo, hi = 0, len(arr)          # half-open [lo, hi)
    while lo < hi:
        mid = (lo + hi) // 2
        if arr[mid] < target:
            lo = mid + 1
        else:
            hi = mid              # don't bracket past mid — could be leftmost
    return lo

print(lower_bound([1, 3, 5, 5, 5, 7, 9, 11], 5))    # 2

Two structural changes from the worked example: (i) half-open interval ( exclusive) so out-of-bounds insertion has a natural index; (ii) not — because the element at mid could still be the leftmost answer.

This pattern — find the first index where a monotonic predicate becomes true — is the most general form of binary search. Vanilla "find a target" is a degenerate case.

P.3 integer square root via number theory

Compute (the integer square root: the largest with ) using binary search. Then generalize: for arbitrary positive integer .

Find the analogue: no array here. What plays the role of the array? What's the monotonic predicate? Read off the answer at termination.

show answer

The "array" is the integers . The predicate is . is monotonic in : false for small (their squares are still ), true for large . The largest with is the integer square root.

def isqrt(n):
    lo, hi = 0, n
    while lo <= hi:
        mid = (lo + hi) // 2
        sq  = mid * mid
        if sq == n:
            return mid
        elif sq < n:
            lo = mid + 1
        else:
            hi = mid - 1
    return hi              # largest k with k² ≤ n

print(isqrt(50))           # 7  (since 7² = 49 ≤ 50 < 64 = 8²)
print(isqrt(64))           # 8

Two iterations on : . . . . . . Loop exits with .

This is the same algorithm as the worked example, but the "array" is virtual — defined by a function rather than stored explicitly. Anywhere a monotonic predicate lives on an ordered domain, binary search applies, whether the domain is in memory or not.

P.4 rotated sorted array — transformed monotonicity

A sorted array has been rotated by an unknown amount: [4, 5, 6, 7, 0, 1, 2]. The array is no longer globally sorted, but it has a special structure (one rotation point). Find in time.

Find the analogue: the array isn't sorted, so the vanilla predicate doesn't apply. But it has the property that at least one half of the array around any mid-point is sorted. Identify which half is sorted, then check whether the target lives in that half. If yes, search that half; if no, search the other.

show answer

Index 4. The algorithm:

def search_rotated(arr, target):
    lo, hi = 0, len(arr) - 1
    while lo <= hi:
        mid = (lo + hi) // 2
        if arr[mid] == target:
            return mid
        # Identify the sorted half
        if arr[lo] <= arr[mid]:                   # left [lo..mid] is sorted
            if arr[lo] <= target < arr[mid]:
                hi = mid - 1
            else:
                lo = mid + 1
        else:                                       # right [mid..hi] is sorted
            if arr[mid] < target <= arr[hi]:
                lo = mid + 1
            else:
                hi = mid - 1
    return -1

print(search_rotated([4, 5, 6, 7, 0, 1, 2], 0))   # 4

Trace on [4,5,6,7,0,1,2] with target 0: . Left half [4,5,6,7] is sorted (since ), but 0 is not in [4, 7), so search right: . . Left half [0,1] is sorted, 0 is in [0, 1), so search left: . , found.

The key insight: the predicate "is target in [lo, mid] sorted half?" is monotonic given the side identification. Even when the array as a whole isn't sorted, the halving structure survives because each step preserves a sorted-half invariant.

P.5 peak element — local monotonicity

Given an array [1, 3, 7, 8, 5, 4, 2], find any peak — an index where (using for out-of-bounds neighbors). The array isn't sorted — but binary search still works in . Why?

Find the analogue: the array isn't sorted, so the monotonic-predicate-on-values formulation doesn't apply directly. But there's a local property: at any , look at the slope around it. If , the array is rising — there must be a peak to the right. If falling, a peak is to the left. Recurse on the half that contains a peak.

show answer

Peak at index 3 (value 8). The algorithm:

def find_peak(arr):
    lo, hi = 0, len(arr) - 1
    while lo < hi:
        mid = (lo + hi) // 2
        if arr[mid] < arr[mid + 1]:
            lo = mid + 1                  # rising, peak is to the right
        else:
            hi = mid                       # falling or flat, peak is at mid or left
    return lo

print(find_peak([1, 3, 7, 8, 5, 4, 2]))    # 3

Why this works: at each step, the "is there a peak in [lo, hi]?" invariant is maintained. If the array rises at mid, the values continue to grow into the right half before they have to start falling — there must be a peak there. The halving is justified by this monotonic-direction predicate, even though the array as a whole has no global order.

The takeaway is the same as P.4: binary search isn't about sorted arrays specifically. It's about any situation where you can rule out half the search space in based on a property at the midpoint. Sortedness is one way to enable that ruling; local monotonicity is another.

P.6 parametric search — binary search on the answer

You have packages with weights and must ship them in 5 days, in order, on a conveyor with daily capacity . What's the minimum that works?

Find the analogue: no array of capacities to search through — you have to compute the answer. But there's a monotonic predicate hiding: "can we ship in 5 days at capacity ?" is false below some threshold and true above. Binary search on .

show answer

The minimum capacity is 15. Construct the predicate as a greedy simulation, then binary search on :

def can_ship(packages, days, cap):
    used, current = 1, 0
    for w in packages:
        if w > cap: return False
        if current + w > cap:
            used += 1
            current = w
        else:
            current += w
    return used <= days

def min_capacity(packages, days):
    lo, hi = max(packages), sum(packages)
    while lo < hi:
        mid = (lo + hi) // 2
        if can_ship(packages, days, mid):
            hi = mid
        else:
            lo = mid + 1
    return lo

print(min_capacity([1,2,3,4,5,6,7,8,9,10], 5))   # 15

Bounds: (any smaller and a single package wouldn't fit), (one giant day always works). Binary search finds the smallest where can_ship returns true.

This is "binary search on the answer." It applies whenever (i) the answer is a number in some range, (ii) you can check whether any candidate answer works in time, and (iii) the "works"/"doesn't work" predicate is monotonic. Then the total cost is . Pattern shows up in capacity allocation, scheduling, resource provisioning, optimal-cut problems — a huge class of optimization problems.

P.7 real-valued root finding

Find a positive root of to absolute tolerance . (There's one real root between 1 and 2.) Note that and , so the intermediate value theorem guarantees a root in between.

Find the analogue: no discrete array; the "indices" are real numbers in . The predicate is monotonic on this interval because is strictly increasing there. Binary search reduces to .

show answer

Root: .

def bisect_root(f, lo, hi, tol=1e-6):
    assert f(lo) * f(hi) < 0     # opposite signs
    while hi - lo > tol:
        mid = (lo + hi) / 2
        if f(mid) == 0:
            return mid
        elif f(lo) * f(mid) < 0:
            hi = mid              # root in left half
        else:
            lo = mid              # root in right half
    return (lo + hi) / 2

print(bisect_root(lambda x: x**3 - x - 1, 1.0, 2.0))   # ≈ 1.32472

This is the bisection method from numerical analysis. Same algorithm as integer binary search, but with continuous bookkeeping and a tolerance-based termination. Number of iterations to achieve tolerance on an interval of width is — same as discrete binary search.

Practical note: bisection has linear convergence rate per step. Newton's method converges quadratically when it works but can diverge; bisection always works given a sign-change bracket, but never beats linear. Hybrid methods (Brent's algorithm) combine the two — fast when Newton would work, robust when it wouldn't.


Check problems

Two problems that resist pattern-matching against the practice set. The first proves the running-time bound formally; the second is a diagnosis exercise on a notoriously bug-prone family of implementations.

Check 1 derivation

Prove that binary search on a sorted array of elements has worst-case running time , terminates on every input, and has the correctness property: if the target is present, it's found; if absent, the loop exits with .

Specifically:

(a) Define an invariant that holds at the start of each iteration. State it precisely.

(b) Show the invariant is preserved by every iteration.

(c) Use the invariant to argue correctness on termination.

(d) Derive a recurrence for the size of after iterations. Use it to bound the number of iterations.

show solution sketch

(a) Invariant. If the target is in the array, then the target's index lies in the inclusive range . Equivalently: for all , and for all .

(b) Preservation. Suppose the invariant holds at iteration start. We compute and compare:

  • If : return immediately. (No need to maintain invariant past return.)
  • If : by sortedness, for all . So the target's index, if it exists, is in . Setting preserves the invariant.
  • If : symmetric, setting preserves the invariant.

(c) Correctness on termination. The loop ends when either (i) , which finds the target by direct check, or (ii) , in which case the invariant gives "if the target is in the array, its index is in " — but is empty, so the target isn't present. Return -1. ✓

(d) Iteration bound. Let be the size of the range at iteration . The next iteration sets either or . In both cases, (because is roughly the midpoint and we discard half plus the midpoint itself).

Starting from , after iterations . The loop exits when , which happens by .

So the total number of iterations is at most . Each iteration does work, so the total running time is . ✓

Check 2 diagnosis

The following code is supposed to search for a target in a sorted array. It compiles, runs on most inputs, and looks correct at a glance. But there's a bug that produces incorrect results on specific inputs and may also cause an out-of-bounds access.

def search(arr, target):
    lo, hi = 0, len(arr)            # ← note: hi = len(arr), not len(arr) - 1
    while lo <= hi:                  # ← but uses <=
        mid = (lo + hi) // 2
        if arr[mid] == target:
            return mid
        elif arr[mid] < target:
            lo = mid + 1
        else:
            hi = mid - 1
    return -1

(a) Find the bug. Describe it precisely (one sentence).

(b) Construct a small example input where the bug manifests. Trace the iteration that goes wrong.

(c) Propose the smallest possible fix. There are at least two valid options; name both.

show solution sketch

(a) The bug is an off-by-one inconsistency in the bracket convention. The initialization uses a half-open upper bound ( is one past the last valid index), but the loop body treats as a closed upper bound by using and . The two conventions are mixed.

(b) Example. Search for any target in [1, 2, 3], say target = 3. Iteration 1: , , , . Iteration 2: , , , returns 2. ✓ for this case. Now try a target larger than every element, like target = 5. Iteration 1: . Iteration 2: . Iteration 3: , but arr[3] is an out-of-bounds access. Python raises IndexError. (In C, this would be undefined behavior.)

(c) Two valid fixes (pick one):

  • Fix to closed-interval convention: change initialization to . The loop body's and are correct for this convention.
  • Fix to half-open convention: keep , but change the loop to and the upper update to (not ).

Pick one and stick with it; mixing them is exactly how off-by-one bugs in binary search get into production. The Knuth-attributed observation that "binary search is easy until you have to write it" comes from this: the algorithm is one paragraph of pseudocode, but the boundary convention has to be exact.