Singular Value Decomposition
Linear Algebra
The singular value decomposition factors any real matrix as
with orthogonal, orthogonal, and rectangular-diagonal with non-negative entries on the diagonal. The are the SINGULAR VALUES, the columns of and are the LEFT and RIGHT singular vectors. Unlike eigendecomposition, which doesn't exist for non-diagonalizable matrices and is awkward for rectangular ones, the SVD exists for every matrix and is essentially unique (up to sign choices and the freedom to rotate within singular subspaces of repeated singular values).
The decomposition is THE foundational object of linear algebra in the same sense the eigendecomposition is foundational for square Hermitian matrices: it diagonalizes the action of in a way that respects the inner product on both sides. Almost every theorem about norms, condition numbers, low-rank approximation, and pseudoinverses ultimately reduces to a statement about singular values.
Geometric picture
Read right-to-left, as three sequential transformations of : (1) rotates the coordinate frame so the input is expressed in the right-singular-vector basis; (2) stretches each axis by the corresponding (or zeros it out, if ); (3) rotates again into the left-singular-vector frame. Every linear map is — geometrically — "rotate, stretch along axes, rotate again". The singular values measure the stretching; the singular vectors are the special axes along which the stretching is pure (no shear, no cross-coupling).
Two consequences fall out immediately. — the operator 2-norm equals the largest singular value, since the maximum stretching factor is what the operator norm measures. And equals the number of non-zero singular values, since "destroys" any input direction whose .
Connection to eigendecomposition
For any , the matrices and are symmetric positive semi-definite, so they have orthonormal eigendecompositions. Substituting :
So: is the eigenvector matrix of , the eigenvector matrix of , and the squared singular values are the (shared) eigenvalues of either. This is the textbook proof of existence — but it is also a TERRIBLE way to compute the SVD numerically, because forming squares the condition number. Real SVD algorithms (Golub-Kahan bidiagonalisation followed by an implicit QR sweep) work on directly and never form .
Eckart-Young: best low-rank approximation
The most-used fact about the SVD. Truncating the decomposition to the top singular triples,
gives the unique best rank- approximation to in BOTH the spectral (operator 2-) norm and the Frobenius norm:
The Eckart-Young theorem (1936) is the foundation of every "compress this matrix" technique: image compression, PCA / principal components, recommender systems' latent factors, model order reduction in PDEs. If 's singular values decay fast, you can throw away almost all of them and lose almost nothing — the bound says exactly how much you lose.
The pseudoinverse
For a rectangular or rank-deficient , the Moore-Penrose pseudoinverse is given by:
For least-squares , the minimum-norm solution is . Computing it via SVD is the most numerically stable least-squares route — markedly better than the normal equations when is ill-conditioned, because forming doubles the condition number whereas SVD works directly on . The TRUNCATED pseudoinverse (drop singular values below some tolerance) is the standard regularization trick for ill-posed inverse problems.
Code
# SVD decomposition of a near-low-rank matrix, plus Eckart-Young
# truncation and the connection to A^T A.
import numpy as np
np.random.seed(1)
n, r = 40, 4
# Build a rank-r matrix from orthonormal U, V and known singular values
U_true = np.linalg.qr(np.random.randn(n, r))[0]
V_true = np.linalg.qr(np.random.randn(n, r))[0]
sigma_true = np.array([10.0, 4.0, 2.0, 0.5])
A_lr = U_true @ np.diag(sigma_true) @ V_true.T
# Add small Gaussian noise — matrix becomes full rank but nearly low-rank
A = A_lr + 0.01 * np.random.randn(n, n)
U, S, Vt = np.linalg.svd(A, full_matrices=False)
print("Top 8 singular values:")
print(f" {S[:8]}")
print("(First 4 close to [10, 4, 2, 0.5]; the rest are O(noise).)")
# Eckart-Young: best rank-k approximation in spectral and Frobenius norm
def truncate(U, S, Vt, k):
return U[:, :k] @ np.diag(S[:k]) @ Vt[:k, :]
print("\nBest-rank-k approximation: ||A - A_k||_F vs sqrt(sum_{i>k} sigma_i^2):")
for k in [1, 2, 3, 4, 5, 10]:
Ak = truncate(U, S, Vt, k)
err_actual = np.linalg.norm(A - Ak, 'fro')
err_theory = np.sqrt(np.sum(S[k:]**2))
print(f" k={k:2d}: {err_actual:.4f} vs {err_theory:.4f}")
# Connection to eigendecomposition of A^T A
eigs = np.linalg.eigvalsh(A.T @ A)
print(f"\nsigma_max from SVD = {S[0]:.6f}")
print(f"sqrt(lambda_max of A^T A) = {np.sqrt(eigs[-1]):.6f}") Output:
Top 8 singular values:
[9.98948824 4.00100258 1.98685247 0.51553587 0.11850767 0.11375271
0.10832342 0.10496257]
(First 4 close to [10, 4, 2, 0.5]; the rest are O(noise).)
Best-rank-k approximation: ||A - A_k||_F vs sqrt(sum_{i>k} sigma_i^2):
k= 1: 4.5117 vs 4.5117
k= 2: 2.0851 vs 2.0851
k= 3: 0.6325 vs 0.6325
k= 4: 0.3664 vs 0.3664
k= 5: 0.3467 vs 0.3467
k=10: 0.2603 vs 0.2603
sigma_max from SVD = 9.989488
sqrt(lambda_max of A^T A) = 9.989488 Three things worth noticing. (1) The top four singular values recover the embedded ones very accurately; the remaining 36 are noise-floor at . (2) The Frobenius error of the rank- truncation matches the Eckart-Young theoretical value to machine precision at every . (3) The largest singular value equals the square root of the largest eigenvalue of to six decimal places — the SVD/eigendecomposition connection in action.
What it's used for
- Least squares and regression. Solve via the truncated pseudoinverse; this is what numpy's
np.linalg.lstsqdoes. Numerically stable even when is rank-deficient. - PCA / principal components. The principal components of a centred data matrix are the right singular vectors of ; the explained variances are . Computing PCA via SVD on directly is the standard recipe — no covariance-matrix construction needed.
- Image / data compression. Store only the top singular triples. Reconstruction error bounded by .
- Determining rank in finite precision. The numerical rank is the number of singular values above some tolerance (typically ). Counting eigenvalues of gives the same answer in principle but loses precision.
- Latent-factor models. Recommender systems, topic models, embeddings — all built on truncated SVD or its randomised / streaming variants.
- Model order reduction. PDE simulations: project a high-dim state onto the top singular vectors of a snapshot matrix. The reduced model captures the dominant dynamics for orders of magnitude less work.
Computing the SVD
The standard dense algorithm (Golub-Kahan-Reinsch, 1965-1970) has two phases. (1) Reduce to upper bidiagonal form via Householder reflections from both sides — analogous to the Hessenberg reduction inside symmetric eigensolvers. (2) Iteratively diagonalize the bidiagonal matrix via implicit QR sweeps. Cost: for an matrix. For very large or sparse problems, RANDOMISED SVD (Halko-Martinsson-Tropp 2009) computes a near-optimal rank- approximation in roughly , using a random projection followed by a small SVD.
Related
- Householder QR — the reflections used to reduce to bidiagonal form during SVD computation.
- Condition number — , a property the SVD makes manifest.
- Lanczos iteration — applied to (without forming it), gives a streaming way to compute the largest few singular values of a sparse matrix.