Lanczos Iteration

Linear Algebra

Symmetric Lanczos is the Hermitian special case of Arnoldi iteration — and the simplification it brings is one of the cleanest payoffs in numerical linear algebra. When $A = A^{T}$ , the upper-Hessenberg projection produced by Arnoldi collapses to a TRIDIAGONAL matrix, and the orthogonalisation against all previous Krylov vectors collapses to a THREE-TERM recursion against only the two most recent. Cost per step drops from $O (jn)$ to $O (n)$ ; storage drops from $O (mn)$ to $O (n)$ . That collapse is what makes Lanczos the workhorse for extremal eigenvalues of symmetric sparse matrices and for evaluating $f (A) b$ on systems too large to factor.

Why three terms

Build an orthonormal basis ${q_{0}, q_{1}, \dots}$ of the Krylov subspace $K_{m} (A, b)$ . By orthogonality, the new vector $A q_{j}$ expands as $A q_{j} = \sum_{i = 0}^{j + 1} h_{ij} q_{i}$ with $h_{ij} = ⟨ q_{i} ∣ A q_{j} ⟩$ . For Hermitian $A$ we can use $⟨ q_{i} ∣ A q_{j} ⟩ = ⟨ A q_{i} ∣ q_{j} ⟩$ . Both $q_{i}$ and $A q_{i}$ live in the Krylov subspace of dimension $i + 2$ , so the inner product vanishes whenever $j > i + 1$ . Combining with the standard Arnoldi argument ( $h_{ij} = 0$ for $i > j + 1$ ) gives:

h_{ij} = 0 whenever ∣ i - j ∣ > 1.

So only THREE coefficients per column are non-zero: the diagonal $α_{j} = ⟨ q_{j} ∣ A q_{j} ⟩$ and the two off-diagonal entries $β_{j} = ⟨ q_{j + 1} ∣ A q_{j} ⟩ = ⟨ q_{j} ∣ A q_{j + 1} ⟩$ (the latter two equal by symmetry). The matrix $T = Q^{T} A Q$ is tridiagonal:

T = α_{0} β_{0} β_{0} α_{1} β_{1} β_{1} α_{2} ⋱ ⋱ ⋱ β_{m - 2} β_{m - 2} α_{m - 1} .

And the recursion that builds the basis is:

A q_{j} = β_{j - 1} q_{j - 1} + α_{j} q_{j} + β_{j} q_{j + 1} .

Rearranging, $β_{j} q_{j + 1} = A q_{j} - α_{j} q_{j} - β_{j - 1} q_{j - 1}$ . Computing the next basis vector requires only the current and previous one — never the older ones. That's the famous "three-term recursion" and the reason Lanczos's memory footprint is constant in the chain length.

The algorithm

# Symmetric Lanczos — the Hermitian special case of Arnoldi.
# Three-term recursion in place of full Gram-Schmidt:
#    A q_j = beta_{j-1} q_{j-1} + alpha_j q_j + beta_j q_{j+1}
# Output: Q (n x m) with orthonormal columns, and tridiag T with diagonal
# alpha and off-diagonal beta:
#    Q^T A Q = T,  with T = tridiag(beta, alpha, beta).
# Crucially, only THREE vectors (q_prev, q, q_next) need to be in memory
# at once — independent of how long the chain runs.

import numpy as np

def lanczos(A, b, m):
    n = A.shape[0]
    Q = np.zeros((n, m))
    alphas = np.zeros(m)
    betas  = np.zeros(m - 1)
    q = b / np.linalg.norm(b)
    q_prev = np.zeros(n)
    beta = 0.0
    for j in range(m):
        Q[:, j] = q
        v = A @ q - beta * q_prev          # subtract the j-1 direction
        alpha = q @ v                      # diagonal entry
        alphas[j] = alpha
        v = v - alpha * q                  # subtract the j direction
        if j < m - 1:
            beta = np.linalg.norm(v)
            betas[j] = beta
            if beta < 1e-12:
                return Q[:, :j+1], alphas[:j+1], betas[:j]
            q_prev = q
            q = v / beta
    return Q, alphas, betas

# ─── Validation on a symmetric, well-conditioned matrix ─────────────────
np.random.seed(0)
n, m = 40, 15
A = np.random.randn(n, n)
A = (A + A.T) / 2 + n * np.eye(n)        # symmetric, SPD by construction
b = np.random.randn(n)

Q, alphas, betas = lanczos(A, b, m)
T = np.diag(alphas) + np.diag(betas, 1) + np.diag(betas, -1)

print(f"max |Q^T Q - I|   = {np.max(np.abs(Q.T @ Q - np.eye(m))):.2e}")
print(f"max |Q^T A Q - T| = {np.max(np.abs(Q.T @ A @ Q - T)):.2e}")

ritz = np.sort(np.linalg.eigvalsh(T))
true = np.sort(np.linalg.eigvalsh(A))
print(f"Smallest:  Ritz {ritz[0]:.5f}    true {true[0]:.5f}    diff {abs(ritz[0]-true[0]):.2e}")
print(f"Largest:   Ritz {ritz[-1]:.5f}    true {true[-1]:.5f}    diff {abs(ritz[-1]-true[-1]):.2e}")

# Demonstration: loss of orthogonality at long chain length
Q_long, _, _ = lanczos(A, b, m=35)
print(f"After m=35 iters: max |Q^T Q - I| = {np.max(np.abs(Q_long.T @ Q_long - np.eye(35))):.2e}")

Output on a 40×40 symmetric positive-definite matrix with $m = 15$ :

max |Q^T Q - I|   = 2.46e-13
max |Q^T A Q - T| = 8.87e-12
Smallest:  Ritz 30.88945    true 30.88945    diff 7.68e-07
Largest:   Ritz 48.20060    true 48.20061    diff 1.28e-05
After m=35 iters: max |Q^T Q - I| = 4.10e-03

Three things to read off. (1) Orthogonality holds to $\sim 1 0^{- 13}$ and the tridiagonal relation $Q^{T} A Q = T$ to $\sim 1 0^{- 12}$ — both at the level of typical accumulated round-off. (2) The smallest and largest Ritz values (eigenvalues of $T$ ) match the true extremal eigenvalues of $A$ to many decimal places after just 15 iterations on a 40-dim problem. This rapid convergence at the SPECTRAL EDGES is the central practical strength of Lanczos for symmetric eigenproblems. (3) The third number reveals the famous gotcha.

Loss of orthogonality

Extending the chain to 35 iterations, the residual orthogonality error climbs to $\sim 1 0^{- 3}$ — eleven orders of magnitude worse than at iteration 15. This is not a bug in the algorithm; it is intrinsic to the no-re-orthogonalisation Lanczos recursion in finite precision. As Ritz values converge to true eigenvalues, the directions they correspond to get amplified in the basis. Tiny round-off errors that point along already-converged directions are not subtracted by the three-term recursion (since those directions only show up in $q_{j - 1}, q_{j}$ ), so they accumulate.

The effect was identified by Paige (1971-1976) and analyzed in detail by Cullum and Willoughby. The pathology in the eigenvalue problem is the production of "GHOST" Ritz values: spurious duplicate eigenvalues that appear when a converged Ritz value is re-discovered along a drifted basis vector. Distinguishing ghosts from genuine multiplicities requires either re-orthogonalisation (expensive, restores Arnoldi cost) or post-hoc filtering (Cullum-Willoughby ghost test). For most extremal-eigenvalue applications a small partial re-orthogonalisation against converged Ritz vectors is enough.

For the resolvent / continued-fraction applications (see bi-orthogonal Lanczos and Liouville-Lanczos), loss of orthogonality is essentially benign: ghost eigenvalues contribute zero residue to the matrix element you actually care about. The chain coefficients are still the continued-fraction coefficients of the true resolvent.

What it's used for

Extremal eigenvalues of symmetric sparse matrices. Ground states of large Hamiltonians, lowest-frequency vibrational modes, second-smallest eigenvalue of a graph Laplacian. After $m ≪ n$ iterations the smallest few Ritz values are converged to machine precision in the well-separated cases.
Matrix functions $f (A) b$ . The Lanczos approximation is $f (A) b \approx ∥ b ∥ Q_{m} f (T_{m}) e_{1}$ . Useful for $exp (A) b$ in quantum time-propagation, $A^{- 1/2} b$ in Gaussian sampling, and anything Lanczos-quadrature.
Tridiagonalisation as a precursor. Many algorithms (e.g. for spectral density estimation) want a tridiagonal that can be analyzed downstream.
The conjugate-gradient connection. CG for solving $A x = b$ is mathematically Lanczos on $A$ with a clever reorganisation of the iterates so the basis is never stored explicitly. The CG residuals are scalar multiples of the Lanczos basis vectors.

Related methods

Power iteration — keeps only the latest direction; converges to the dominant eigenvector. Lanczos keeps all of them and gets many extremal eigenvalues at once.
Arnoldi iteration — the non-symmetric parent. Without $A = A^{T}$ , the tridiagonal collapse fails and you're stuck with full Gram-Schmidt at every step.
Bi-orthogonal Lanczos — recovers tridiagonal structure for non-Hermitian $A$ by maintaining two Krylov sequences. The Liouville-Lanczos TDDFT method uses this generalization.
Davidson method — preconditioned Lanczos-flavoured iteration tuned for the eigenvalue problems in electronic-structure theory.
Conjugate gradient — Lanczos's reorganisation for solving $A x = b$ when $A$ is SPD.