Singular Value Decomposition

Linear Algebra

The singular value decomposition factors any real matrix as

with orthogonal, orthogonal, and rectangular-diagonal with non-negative entries on the diagonal. The are the SINGULAR VALUES, the columns of and are the LEFT and RIGHT singular vectors. Unlike eigendecomposition, which doesn't exist for non-diagonalizable matrices and is awkward for rectangular ones, the SVD exists for every matrix and is essentially unique (up to sign choices and the freedom to rotate within singular subspaces of repeated singular values).

The decomposition is THE foundational object of linear algebra in the same sense the eigendecomposition is foundational for square Hermitian matrices: it diagonalizes the action of in a way that respects the inner product on both sides. Almost every theorem about norms, condition numbers, low-rank approximation, and pseudoinverses ultimately reduces to a statement about singular values.

Geometric picture

Read right-to-left, as three sequential transformations of : (1) rotates the coordinate frame so the input is expressed in the right-singular-vector basis; (2) stretches each axis by the corresponding (or zeros it out, if ); (3) rotates again into the left-singular-vector frame. Every linear map is — geometrically — "rotate, stretch along axes, rotate again". The singular values measure the stretching; the singular vectors are the special axes along which the stretching is pure (no shear, no cross-coupling).

Two consequences fall out immediately. — the operator 2-norm equals the largest singular value, since the maximum stretching factor is what the operator norm measures. And equals the number of non-zero singular values, since "destroys" any input direction whose .

Connection to eigendecomposition

For any , the matrices and are symmetric positive semi-definite, so they have orthonormal eigendecompositions. Substituting :

So: is the eigenvector matrix of , the eigenvector matrix of , and the squared singular values are the (shared) eigenvalues of either. This is the textbook proof of existence — but it is also a TERRIBLE way to compute the SVD numerically, because forming squares the condition number. Real SVD algorithms (Golub-Kahan bidiagonalisation followed by an implicit QR sweep) work on directly and never form .

Eckart-Young: best low-rank approximation

The most-used fact about the SVD. Truncating the decomposition to the top singular triples,

gives the unique best rank- approximation to in BOTH the spectral (operator 2-) norm and the Frobenius norm:

The Eckart-Young theorem (1936) is the foundation of every "compress this matrix" technique: image compression, PCA / principal components, recommender systems' latent factors, model order reduction in PDEs. If 's singular values decay fast, you can throw away almost all of them and lose almost nothing — the bound says exactly how much you lose.

The pseudoinverse

For a rectangular or rank-deficient , the Moore-Penrose pseudoinverse is given by:

For least-squares , the minimum-norm solution is . Computing it via SVD is the most numerically stable least-squares route — markedly better than the normal equations when is ill-conditioned, because forming doubles the condition number whereas SVD works directly on . The TRUNCATED pseudoinverse (drop singular values below some tolerance) is the standard regularization trick for ill-posed inverse problems.

Code

# SVD decomposition of a near-low-rank matrix, plus Eckart-Young
# truncation and the connection to A^T A.

import numpy as np

np.random.seed(1)
n, r = 40, 4

# Build a rank-r matrix from orthonormal U, V and known singular values
U_true = np.linalg.qr(np.random.randn(n, r))[0]
V_true = np.linalg.qr(np.random.randn(n, r))[0]
sigma_true = np.array([10.0, 4.0, 2.0, 0.5])
A_lr = U_true @ np.diag(sigma_true) @ V_true.T

# Add small Gaussian noise — matrix becomes full rank but nearly low-rank
A = A_lr + 0.01 * np.random.randn(n, n)

U, S, Vt = np.linalg.svd(A, full_matrices=False)
print("Top 8 singular values:")
print(f"  {S[:8]}")
print("(First 4 close to [10, 4, 2, 0.5]; the rest are O(noise).)")

# Eckart-Young: best rank-k approximation in spectral and Frobenius norm
def truncate(U, S, Vt, k):
    return U[:, :k] @ np.diag(S[:k]) @ Vt[:k, :]

print("\nBest-rank-k approximation: ||A - A_k||_F  vs  sqrt(sum_{i>k} sigma_i^2):")
for k in [1, 2, 3, 4, 5, 10]:
    Ak = truncate(U, S, Vt, k)
    err_actual = np.linalg.norm(A - Ak, 'fro')
    err_theory = np.sqrt(np.sum(S[k:]**2))
    print(f"  k={k:2d}: {err_actual:.4f}  vs  {err_theory:.4f}")

# Connection to eigendecomposition of A^T A
eigs = np.linalg.eigvalsh(A.T @ A)
print(f"\nsigma_max from SVD      = {S[0]:.6f}")
print(f"sqrt(lambda_max of A^T A) = {np.sqrt(eigs[-1]):.6f}")

Output:

Top 8 singular values:
  [9.98948824 4.00100258 1.98685247 0.51553587 0.11850767 0.11375271
 0.10832342 0.10496257]
(First 4 close to [10, 4, 2, 0.5]; the rest are O(noise).)

Best-rank-k approximation: ||A - A_k||_F  vs  sqrt(sum_{i>k} sigma_i^2):
  k= 1: 4.5117  vs  4.5117
  k= 2: 2.0851  vs  2.0851
  k= 3: 0.6325  vs  0.6325
  k= 4: 0.3664  vs  0.3664
  k= 5: 0.3467  vs  0.3467
  k=10: 0.2603  vs  0.2603

sigma_max from SVD      = 9.989488
sqrt(lambda_max of A^T A) = 9.989488

Three things worth noticing. (1) The top four singular values recover the embedded ones very accurately; the remaining 36 are noise-floor at . (2) The Frobenius error of the rank- truncation matches the Eckart-Young theoretical value to machine precision at every . (3) The largest singular value equals the square root of the largest eigenvalue of to six decimal places — the SVD/eigendecomposition connection in action.

What it's used for

Computing the SVD

The standard dense algorithm (Golub-Kahan-Reinsch, 1965-1970) has two phases. (1) Reduce to upper bidiagonal form via Householder reflections from both sides — analogous to the Hessenberg reduction inside symmetric eigensolvers. (2) Iteratively diagonalize the bidiagonal matrix via implicit QR sweeps. Cost: for an matrix. For very large or sparse problems, RANDOMISED SVD (Halko-Martinsson-Tropp 2009) computes a near-optimal rank- approximation in roughly , using a random projection followed by a small SVD.

Related