Skip to main content

DSAPath · 18 min

Binary Search

Why This Matters

Binary search is the canonical O(logn)O(\log n) algorithm. It runs faster than the time it takes to read its own input out loud — a billion-element array takes 30 comparisons, not a billion. Understanding why halving works, and why the implementation is so easy to break, is a prerequisite to every divide-and-conquer algorithm that follows.

Jon Bentley reported in Programming Pearls (1986) that 90% of professional programmers, asked to implement binary search after a binary-search lecture, introduced a bug. Joshua Bloch found in 2006 that the standard textbook implementation had a latent integer-overflow bug that broke Arrays.binarySearch in the Java standard library and bsearch in C. The lesson: the algorithm is short, the proof is short, and the implementation is still easy to get wrong.

Specification

Input. A sorted array A[0..n1]A[0..n-1] in non-decreasing order, and a target value tt.

Output. Either an index ii with A[i]=tA[i] = t, or the report that no such index exists.

Cost. Θ(logn)\Theta(\log n) comparisons. Termination guaranteed.

The Algorithm

Maintain two indices \ell (low) and hh (high) such that the target, if present, lies in A[..h]A[\ell..h]. Initialize =0,h=n1\ell = 0, h = n - 1. At each step, compute the midpoint m=+(h)/2m = \ell + (h - \ell) / 2 (more on this below). Compare A[m]A[m] against tt:

  • If A[m]=tA[m] = t, return mm.
  • If A[m]<tA[m] < t, the target is in the upper half: set =m+1\ell = m + 1.
  • If A[m]>tA[m] > t, the target is in the lower half: set h=m1h = m - 1.

Continue while h\ell \le h. If the loop exits, the target is not in the array.

The Loop Invariant

The correctness of binary search rests on a single invariant maintained at every iteration:

If tt is in AA, then tt is in A[..h]A[\ell..h].

Initially =0,h=n1\ell = 0, h = n - 1, so A[..h]A[\ell..h] is the whole array — the invariant trivially holds. At each step, the comparison at mm either finds tt (we return) or eliminates exactly half of the current range. The invariant is preserved because the eliminated half cannot contain tt given the sortedness of AA. When the loop exits with >h\ell > h, the range A[..h]A[\ell..h] is empty, so tt was not in the array.

The Midpoint Trap

The textbook midpoint formula is

m=(+h)/2.m = (\ell + h) / 2.

This is wrong on real machines. If +h\ell + h exceeds the maximum integer width (typically 23112^{31} - 1 for 32-bit signed ints), the addition overflows and the result wraps around to a negative number. Indexing A[m] with a negative index is undefined behavior in C, an ArrayIndexOutOfBoundsException in Java, a crash in most languages.

The fix, attributed to Bloch (2006):

m=+(h)/2.m = \ell + (h - \ell) / 2.

This is mathematically equivalent for non-negative h\ell \le h but does not overflow. Use this form. The bug it fixes was latent in standard libraries for over twenty years before Bloch noticed it.

Lower-Bound and Upper-Bound Variants

In practice you often want not "is tt in the array" but "where would tt go?" or "what is the first index t\ge t?" These are the lower-bound and upper-bound variants. Pseudocode for lower bound (smallest ii with A[i]tA[i] \ge t):

lo, hi = 0, n
while lo < hi:
    m = lo + (hi - lo) / 2
    if A[m] < t: lo = m + 1
    else:        hi = m
return lo  // n if all elements < t

The differences from plain binary search:

  • hh starts at nn, not n1n - 1. The answer can be nn (target larger than every element).
  • The loop condition is <h\ell < h, not h\ell \le h.
  • The else branch sets h=mh = m, not h=m1h = m - 1. Closing the half-open interval [,h)[\ell, h).

Use lower-bound when you need to insert into a sorted array, look up by floor or ceiling, or count occurrences. C++ STL std::lower_bound and Python's bisect.bisect_left both implement this variant.

Common Mistakes

Off-by-one in the loop condition. Plain binary search uses <= h; lower-bound uses < h. Mixing them creates infinite loops or skipped elements.

Forgetting the half-open vs closed interval distinction. Plain binary search works on [,h][\ell, h] (closed). Lower-bound works on [,h)[\ell, h) (half-open). These need different update rules. Keep one in mind at a time.

The midpoint overflow — discussed above. If you ever write (lo + hi) / 2 in production code, this page should make you uncomfortable.

Searching an unsorted array. Binary search requires sortedness. The algorithm gives wrong answers, not "not found," on unsorted input. If you cannot prove the input is sorted, sort it first or use linear search.

References

  • Bentley, J. Programming Pearls (2nd ed., 2000). Ch. 4 — "Writing Correct Programs."
  • Bloch, J. (2006). "Extra, Extra — Read All About It: Nearly All Binary Searches and Mergesorts are Broken." Google Research Blog.
  • Cormen, Leiserson, Rivest, Stein. Introduction to Algorithms (4th ed., 2022). §2.3.5 — Binary search.
  • Knuth. The Art of Computer Programming, Volume 3: Sorting and Searching (2nd ed., 1998). §6.2.1 — Binary tree search.

Next Topics

  • Big-O notation — the asymptotic-analysis foundation that makes "O(logn)O(\log n)" a precise claim.
  • Arrays and pointers — the random-access substrate binary search depends on.