Arnoldi Iteration
Linear Algebra
Arnoldi iteration is the standard way to build an orthonormal basis of a Krylov subspace for a NON-Hermitian matrix. It is to power iteration what Gram-Schmidt is to a single vector: instead of repeatedly multiplying by and discarding everything but the dominant direction, you keep ALL the directions visited, orthonormalise them as you go, and end up with a basis of together with the projection of onto that basis.
The output is a pair with orthonormal and upper HESSENBERG (zeros below the first sub-diagonal). The pair satisfies the Arnoldi relation:
where is the first columns of . This identity is exact in exact arithmetic: every direction lies in the span of by construction, and the coefficients of that expansion are exactly the entries of column of . The Hessenberg structure is the algorithmic content: a generic non-Hermitian matrix is NOT tridiagonalised by Arnoldi (you need bi-orthogonal Lanczos for that), but it IS Hessenberged — every entry below the first sub-diagonal vanishes.
Why Hessenberg and not tridiagonal
Building the Krylov basis means computing at step and orthonormalising it against the previous basis vectors. The new vector has a priori NO reason to be orthogonal to , so we have to project it explicitly against all previous vectors before normalizing. Those inner products are exactly the non-zero entries in column of at and above the diagonal. The only zero we get for free is below the sub-diagonal: for , simply because lives in the Krylov subspace of dimension , so it has no component along basis vectors numbered . That's the Hessenberg structure, and that's all you get without further symmetry.
If is Hermitian, the OFF-DIAGONAL block also vanishes — for by the same Krylov argument applied to . That collapse from Hessenberg to tridiagonal is precisely what makes Lanczos a special case of Arnoldi with constant-memory three-term recursion.
The algorithm
At iteration :
- Form .
- For : compute and subtract . This is MODIFIED Gram-Schmidt — each subtraction uses the updated , which is dramatically more stable in finite precision than computing all the inner products with the ORIGINAL first (classical Gram-Schmidt).
- Set and .
- If , you've hit "lucky breakdown" — the Krylov subspace has closed and the current eigenvalues of are exact eigenvalues of . Stop.
Each step costs one mat-vec (the ) plus work and storage for the orthogonalisation against previous vectors. The total cost of Arnoldi iterations is work and storage — a far cry from the constant-memory Lanczos recursion. For most problems this is fine; for the very large operators that motivate Liouville-Lanczos linear response, it's the deal-breaker.
Code and validation
# Arnoldi iteration with modified Gram-Schmidt.
# After m steps the columns of Q (n x m+1) are an orthonormal basis of
# the Krylov subspace K_{m+1}(A, b) = span{b, A b, ..., A^m b}, and H
# ((m+1) x m) is upper Hessenberg with H_{i,j} = <q_i | A q_j>.
#
# Two governing identities (both checked below):
# A Q[:, :m] = Q H (Arnoldi relation)
# Q^T Q = I (orthonormality)
import numpy as np
def arnoldi(A, b, m):
n = A.shape[0]
Q = np.zeros((n, m + 1))
H = np.zeros((m + 1, m))
Q[:, 0] = b / np.linalg.norm(b)
for j in range(m):
v = A @ Q[:, j]
# Modified Gram-Schmidt: subtract projections one at a time, reusing
# the updated v. Classical GS would compute all H[i,j] first; MGS
# has much better finite-precision behavior.
for i in range(j + 1):
H[i, j] = Q[:, i] @ v
v = v - H[i, j] * Q[:, i]
H[j + 1, j] = np.linalg.norm(v)
if H[j + 1, j] < 1e-12:
# Lucky breakdown — invariant subspace found.
return Q[:, :j + 1], H[:j + 1, :j]
Q[:, j + 1] = v / H[j + 1, j]
return Q, H
# ─── Validation on a random non-Hermitian matrix ────────────────────────
np.random.seed(0)
n, m = 30, 12
A = np.random.randn(n, n)
b = np.random.randn(n)
Q, H = arnoldi(A, b, m)
print(f"max |Q^T Q - I| = {np.max(np.abs(Q.T @ Q - np.eye(m+1))):.2e}")
print(f"max |H below sub-diag| = {np.max(np.abs(np.tril(H[:m, :m], -2))):.2e}")
print(f"max |A Q_m - Q H| = {np.max(np.abs(A @ Q[:, :m] - Q @ H)):.2e}")
# Ritz values: eigenvalues of the Hessenberg block — approximations to
# the eigenvalues of A. Convergence is fastest for extreme eigenvalues.
ritz = sorted(np.linalg.eigvals(H[:m, :m]), key=lambda z: -abs(z))[:4]
true_eigs = sorted(np.linalg.eigvals(A), key=lambda z: -abs(z))[:4]
for r, t in zip(ritz, true_eigs):
print(f" Ritz {r.real:+.4f}{r.imag:+.4f}j true {t.real:+.4f}{t.imag:+.4f}j") Output on a random 30×30 non-Hermitian matrix with :
max |Q^T Q - I| = 4.44e-16
max |H below sub-diag| = 0.00e+00
max |A Q_m - Q H| = 6.66e-16
Ritz -5.9768+0.0000j true -4.5495+2.8331j
Ritz +5.4420+0.0000j true -4.5495-2.8331j
Ritz -4.6370+2.6934j true +5.3037+0.0000j
Ritz -4.6370-2.6934j true -0.5891+5.1874j All three structural identities hold to machine precision. The Ritz values — eigenvalues of the leading Hessenberg block — are the Arnoldi-projected approximations to the eigenvalues of . Notice they converge to extremal eigenvalues first (largest magnitude), not to a uniform sampling of the spectrum. That bias is shared with power iteration, for the same reason: each step of Arnoldi includes another -multiplication, so dominant directions grow faster than the rest.
What it's used for
- Eigenvalue solvers. Run Arnoldi to length , diagonalize the small Hessenberg block, and read off approximate extremal eigenvalues. RESTARTED Arnoldi (implicitly restarted; Krylov-Schur) keeps moderate by repeatedly compressing the basis to the wanted Ritz vectors. This is what ARPACK and most production sparse-eigensolvers do under the hood.
- GMRES — Arnoldi-based iterative solver for non-Hermitian . After Arnoldi steps you solve a small least-squares problem to get the best approximation to in the Krylov subspace. The whole framework rests on the Arnoldi relation.
- Matrix-function evaluation. For any analytic : . Useful when is too large to factor but mat-vecs are cheap — exponentials of large sparse matrices, time-propagation in quantum dynamics.
Related methods
- Power iteration — the single-vector ancestor; Arnoldi keeps the whole history.
- Symmetric Lanczos — the special case ; tridiagonal projection, three-term recursion, constant memory.
- Bi-orthogonal Lanczos — non-Hermitian alternative that recovers tridiagonal projection at the cost of two Krylov sequences. Use this when memory matters; use Arnoldi when robustness matters.
- Davidson method — preconditioned Arnoldi-like iteration tuned for the eigenvalue problems in electronic-structure theory.