Condition Number and Numerical Stability

Linear Algebra

The condition number measures how sensitively the output of a numerical problem depends on the input. It is a property of the PROBLEM, not of any algorithm used to solve it: a problem with condition number $1 0^{14}$ will lose roughly fourteen decimal digits of accuracy to input perturbations regardless of whether you use Gaussian elimination, QR, SVD, or any other method. Ill-conditioning is not a numerical bug — it is a structural property that says small errors in the input are amplified into large errors in the output, and no algorithm can fully escape it.

For a matrix $A$ (in the context of solving $A x = b$ or computing eigenvalues), the condition number is:

κ (A) = ∥ A ∥ \cdot ∥ A^{- 1} ∥.

In the most common case — the 2-norm — this is exactly the ratio of largest to smallest singular value:

κ_{2} (A) = \frac{σ _{m a x} ( A )}{σ _{m i n} ( A )} .

A well-conditioned matrix has $κ \sim O (1)$ — every direction is stretched by roughly the same amount. An ill-conditioned matrix has $κ ≫ 1$ — some directions are stretched much more than others, and the inverse compresses by the same factor, amplifying any noise that pointed in those directions. $κ = \infty$ means the matrix is singular: at least one direction is annihilated completely.

The rule of thumb

When you solve $A x = b$ with a BACKWARD-STABLE algorithm (Gaussian elimination with partial pivoting, QR, SVD-based) on a computer with unit round-off $ϵ_{mach}$ , the relative error in the computed solution is bounded by:

\frac{∥ x ^ - x ∥}{∥ x ∥} ≲ κ (A) \cdot ϵ_{mach} .

In IEEE double precision, $ϵ_{mach} \approx 1 0^{- 16}$ , so:

digits of accuracy in x \approx 16 - lo g_{10} κ (A) .

A matrix with $κ = 1 0^{6}$ gives ten correct digits; with $κ = 1 0^{10}$ you keep six; with $κ = 1 0^{15}$ you have one; with $κ > 1 0^{16}$ you have nothing. This is the most useful numerical-stability rule in practice — back-of-envelope, but rarely wrong.

Sources of ill-conditioning

Ill-conditioning is not exotic — it shows up everywhere, often invisibly:

Polynomial fitting in the monomial basis. The Vandermonde matrix has condition number growing like $e^{n}$ in the degree. Always use orthogonal polynomial bases (Chebyshev, Legendre) instead, or fit via SVD with truncation.
The Hilbert matrix $H_{ij} = 1/ (i + j - 1)$ . Famously ill-conditioned: $κ (H_{n}) \sim e^{3.5 n}$ . Appears in least-squares fitting of ${x^{k}}$ on the unit interval.
Near-linearly-dependent columns. If two features in a design matrix are nearly collinear, the matrix is nearly singular. Standard in regression with correlated predictors; regularization (Tikhonov / ridge / lasso) is the response.
Discretization of PDEs. Stiffness matrices from finite-difference Poisson have $κ \sim h^{- 2}$ where $h$ is mesh spacing — refining the mesh by 10× makes the system 100× harder to solve. The reason multigrid and preconditioning exist.
Forming $A^{T} A$ . Always doubles the condition number: $κ (A^{T} A) = κ (A)^{2}$ . Solving least squares via the normal equations is therefore numerically inferior to QR or SVD.

Demonstration

# Condition number is a property of the PROBLEM, not the algorithm.
# This script measures: (1) the condition number of standard matrices,
# (2) how accuracy of solving Ax=b degrades as kappa(A) grows.

import numpy as np
from scipy.linalg import hilbert

print("Condition numbers of canonical matrices:")
print(f"  Identity (10x10):        kappa = {np.linalg.cond(np.eye(10)):.2e}")
print(f"  Random Gaussian (10x10): kappa = {np.linalg.cond(np.random.randn(10,10)):.2e}")
for n in [5, 10, 15, 20]:
    H = hilbert(n)
    print(f"  Hilbert ({n:2d}x{n}):         kappa = {np.linalg.cond(H):.2e}")
print("(Hilbert kappa grows exponentially with n — classic ill-conditioning.)")

# Rule of thumb: solving Ax=b in double precision (~16 digits) loses
# log10(kappa(A)) digits to round-off in the worst case.
print("\nForward error of np.linalg.solve(H, b) where b = H @ [1,1,...,1]:")
print("  n   kappa(A)       rel error    expected (log10 ratio)")
for n in [5, 8, 11, 14]:
    H = hilbert(n)
    x_true = np.ones(n)
    b = H @ x_true
    x_hat = np.linalg.solve(H, b)
    rel_err = np.linalg.norm(x_hat - x_true) / np.linalg.norm(x_true)
    kappa = np.linalg.cond(H)
    print(f"  {n:2d}  {kappa:.2e}    {rel_err:.2e}     ~{np.log10(kappa)-15:+.1f} digits lost")

Output:

Condition numbers of canonical matrices:
  Identity (10x10):        kappa = 1.00e+00
  Random Gaussian (10x10): kappa = 1.44e+01
  Hilbert ( 5x5):          kappa = 4.77e+05
  Hilbert (10x10):         kappa = 1.60e+13
  Hilbert (15x15):         kappa = 2.32e+17
  Hilbert (20x20):         kappa = 1.08e+18
(Hilbert kappa grows exponentially with n — classic ill-conditioning.)

Forward error of np.linalg.solve(H, b) where b = H @ [1,1,...,1]:
  n   kappa(A)       rel error    expected (log10 ratio)
   5  4.77e+05    2.08e-13     ~-9.3 digits lost
   8  1.53e+10    1.58e-07     ~-4.8 digits lost
  11  5.22e+14    3.82e-03     ~-0.3 digits lost
  14  2.43e+17    1.06e+01     ~+2.4 digits lost

Read across the rows: as $κ (H_{n})$ grows from $1 0^{5}$ to $1 0^{17}$ , the forward error of the linear solve grows in lockstep — by 18 orders of magnitude — and by $n = 14$ the "solution" has no correct digits at all. This is not a bug in $np.linalg.solve$ : the underlying LAPACK routine is backward-stable. The problem is the problem. Asking what $x$ satisfies $H_{14} x = b$ for a given numerical $b$ is, in IEEE double precision, ill-posed.

Condition number vs algorithm stability

These two ideas are often confused; they are independent.

Condition number is a property of the problem (the matrix, the function being computed). It is intrinsic and cannot be improved by changing algorithms — only by reformulating the problem.
Numerical stability is a property of an algorithm. An algorithm is BACKWARD STABLE if the computed answer is the exact answer to a slightly perturbed problem; it is FORWARD STABLE if the computed answer is close to the exact answer. The right algorithm doesn't amplify round-off beyond what the conditioning already forces.

The two interact via the rule of thumb above. A backward-stable algorithm achieves forward error $\sim κ \cdot ϵ_{mach}$ — the best one can hope for. An UNSTABLE algorithm can do MUCH worse. Examples:

Gaussian elimination WITHOUT pivoting is unstable: small pivots amplify round-off. WITH partial pivoting it is backward stable for almost all matrices encountered in practice.
Classical Gram-Schmidt is forward unstable; MODIFIED Gram-Schmidt is much better, and Householder QR is the gold standard.
Solving least squares via the normal equations squares the condition number. SVD or QR-based least squares incurs the linear factor only. This is the same problem; the algorithm choice changes the achievable accuracy by orders of magnitude.

Trefethen and Bau (Numerical Linear Algebra, 1997) state the principle cleanly: "Accuracy = conditioning × stability." You can't beat the conditioning; you can only fail to live up to it by choosing a poor algorithm.

What to do about it

Three responses to ill-conditioning, in order of preference:

Reformulate the problem. Fit in a Chebyshev basis instead of monomial. Centre and scale your data before regression. Solve $A x = b$ instead of $A^{T} A x = A^{T} b$ . Use SVD instead of normal equations.
Regularize. Add a small ridge term $(A^{T} A + λ I) x = A^{T} b$ to bound the smallest eigenvalue away from zero. Truncate the SVD by dropping singular values below a threshold. Sacrifices a controllable amount of bias for a controllable amount of variance.
Use extended precision. Quad precision (~34 digits) lets you handle $κ \sim 1 0^{30}$ before running out. Rarely the right answer — usually you should fix the conditioning instead — but occasionally the cheapest fix.

Singular value decomposition — makes the condition number explicit as $σ_{1} / σ_{n}$ and provides the standard regularization route.
Conjugate gradient — convergence rate governed by $κ$ ; preconditioning is the response.
Householder QR — the gold-standard backward-stable algorithm for orthogonalisation.
Gaussian elimination — stable with partial pivoting; unstable without.