Condition Number and Numerical Stability
Linear Algebra
The condition number measures how sensitively the output of a numerical problem depends on the input. It is a property of the PROBLEM, not of any algorithm used to solve it: a problem with condition number will lose roughly fourteen decimal digits of accuracy to input perturbations regardless of whether you use Gaussian elimination, QR, SVD, or any other method. Ill-conditioning is not a numerical bug — it is a structural property that says small errors in the input are amplified into large errors in the output, and no algorithm can fully escape it.
For a matrix (in the context of solving or computing eigenvalues), the condition number is:
In the most common case — the 2-norm — this is exactly the ratio of largest to smallest singular value:
A well-conditioned matrix has — every direction is stretched by roughly the same amount. An ill-conditioned matrix has — some directions are stretched much more than others, and the inverse compresses by the same factor, amplifying any noise that pointed in those directions. means the matrix is singular: at least one direction is annihilated completely.
The rule of thumb
When you solve with a BACKWARD-STABLE algorithm (Gaussian elimination with partial pivoting, QR, SVD-based) on a computer with unit round-off , the relative error in the computed solution is bounded by:
In IEEE double precision, , so:
A matrix with gives ten correct digits; with you keep six; with you have one; with you have nothing. This is the most useful numerical-stability rule in practice — back-of-envelope, but rarely wrong.
Sources of ill-conditioning
Ill-conditioning is not exotic — it shows up everywhere, often invisibly:
- Polynomial fitting in the monomial basis. The Vandermonde matrix has condition number growing like in the degree. Always use orthogonal polynomial bases (Chebyshev, Legendre) instead, or fit via SVD with truncation.
- The Hilbert matrix . Famously ill-conditioned: . Appears in least-squares fitting of on the unit interval.
- Near-linearly-dependent columns. If two features in a design matrix are nearly collinear, the matrix is nearly singular. Standard in regression with correlated predictors; regularization (Tikhonov / ridge / lasso) is the response.
- Discretization of PDEs. Stiffness matrices from finite-difference Poisson have where is mesh spacing — refining the mesh by 10× makes the system 100× harder to solve. The reason multigrid and preconditioning exist.
- Forming . Always doubles the condition number: . Solving least squares via the normal equations is therefore numerically inferior to QR or SVD.
Demonstration
# Condition number is a property of the PROBLEM, not the algorithm.
# This script measures: (1) the condition number of standard matrices,
# (2) how accuracy of solving Ax=b degrades as kappa(A) grows.
import numpy as np
from scipy.linalg import hilbert
print("Condition numbers of canonical matrices:")
print(f" Identity (10x10): kappa = {np.linalg.cond(np.eye(10)):.2e}")
print(f" Random Gaussian (10x10): kappa = {np.linalg.cond(np.random.randn(10,10)):.2e}")
for n in [5, 10, 15, 20]:
H = hilbert(n)
print(f" Hilbert ({n:2d}x{n}): kappa = {np.linalg.cond(H):.2e}")
print("(Hilbert kappa grows exponentially with n — classic ill-conditioning.)")
# Rule of thumb: solving Ax=b in double precision (~16 digits) loses
# log10(kappa(A)) digits to round-off in the worst case.
print("\nForward error of np.linalg.solve(H, b) where b = H @ [1,1,...,1]:")
print(" n kappa(A) rel error expected (log10 ratio)")
for n in [5, 8, 11, 14]:
H = hilbert(n)
x_true = np.ones(n)
b = H @ x_true
x_hat = np.linalg.solve(H, b)
rel_err = np.linalg.norm(x_hat - x_true) / np.linalg.norm(x_true)
kappa = np.linalg.cond(H)
print(f" {n:2d} {kappa:.2e} {rel_err:.2e} ~{np.log10(kappa)-15:+.1f} digits lost") Output:
Condition numbers of canonical matrices:
Identity (10x10): kappa = 1.00e+00
Random Gaussian (10x10): kappa = 1.44e+01
Hilbert ( 5x5): kappa = 4.77e+05
Hilbert (10x10): kappa = 1.60e+13
Hilbert (15x15): kappa = 2.32e+17
Hilbert (20x20): kappa = 1.08e+18
(Hilbert kappa grows exponentially with n — classic ill-conditioning.)
Forward error of np.linalg.solve(H, b) where b = H @ [1,1,...,1]:
n kappa(A) rel error expected (log10 ratio)
5 4.77e+05 2.08e-13 ~-9.3 digits lost
8 1.53e+10 1.58e-07 ~-4.8 digits lost
11 5.22e+14 3.82e-03 ~-0.3 digits lost
14 2.43e+17 1.06e+01 ~+2.4 digits lost Read across the rows: as grows from to , the forward error of the linear solve grows in lockstep — by 18 orders of magnitude — and by the "solution" has no correct digits at all. This is not a bug in : the underlying LAPACK routine is backward-stable. The problem is the problem. Asking what satisfies for a given numerical is, in IEEE double precision, ill-posed.
Condition number vs algorithm stability
These two ideas are often confused; they are independent.
- Condition number is a property of the problem (the matrix, the function being computed). It is intrinsic and cannot be improved by changing algorithms — only by reformulating the problem.
- Numerical stability is a property of an algorithm. An algorithm is BACKWARD STABLE if the computed answer is the exact answer to a slightly perturbed problem; it is FORWARD STABLE if the computed answer is close to the exact answer. The right algorithm doesn't amplify round-off beyond what the conditioning already forces.
The two interact via the rule of thumb above. A backward-stable algorithm achieves forward error — the best one can hope for. An UNSTABLE algorithm can do MUCH worse. Examples:
- Gaussian elimination WITHOUT pivoting is unstable: small pivots amplify round-off. WITH partial pivoting it is backward stable for almost all matrices encountered in practice.
- Classical Gram-Schmidt is forward unstable; MODIFIED Gram-Schmidt is much better, and Householder QR is the gold standard.
- Solving least squares via the normal equations squares the condition number. SVD or QR-based least squares incurs the linear factor only. This is the same problem; the algorithm choice changes the achievable accuracy by orders of magnitude.
Trefethen and Bau (Numerical Linear Algebra, 1997) state the principle cleanly: "Accuracy = conditioning × stability." You can't beat the conditioning; you can only fail to live up to it by choosing a poor algorithm.
What to do about it
Three responses to ill-conditioning, in order of preference:
- Reformulate the problem. Fit in a Chebyshev basis instead of monomial. Centre and scale your data before regression. Solve instead of . Use SVD instead of normal equations.
- Regularize. Add a small ridge term to bound the smallest eigenvalue away from zero. Truncate the SVD by dropping singular values below a threshold. Sacrifices a controllable amount of bias for a controllable amount of variance.
- Use extended precision. Quad precision (~34 digits) lets you handle before running out. Rarely the right answer — usually you should fix the conditioning instead — but occasionally the cheapest fix.
Related
- Singular value decomposition — makes the condition number explicit as and provides the standard regularization route.
- Conjugate gradient — convergence rate governed by ; preconditioning is the response.
- Householder QR — the gold-standard backward-stable algorithm for orthogonalisation.
- Gaussian elimination — stable with partial pivoting; unstable without.