Arnoldi Iteration

Linear Algebra

Arnoldi iteration is the standard way to build an orthonormal basis of a Krylov subspace for a NON-Hermitian matrix. It is to power iteration what Gram-Schmidt is to a single vector: instead of repeatedly multiplying by $A$ and discarding everything but the dominant direction, you keep ALL the directions visited, orthonormalise them as you go, and end up with a basis of $K_{m} (A, b) = span {b, A b, A^{2} b, \dots, A^{m - 1} b}$ together with the projection of $A$ onto that basis.

The output is a pair $(Q, H)$ with $Q \in R^{n \times (m + 1)}$ orthonormal and $H \in R^{(m + 1) \times m}$ upper HESSENBERG (zeros below the first sub-diagonal). The pair satisfies the Arnoldi relation:

A Q_{m} = Q_{m + 1} H,

where $Q_{m}$ is the first $m$ columns of $Q$ . This identity is exact in exact arithmetic: every direction $A q_{j}$ lies in the span of ${q_{0}, \dots, q_{j + 1}}$ by construction, and the coefficients of that expansion are exactly the entries of column $j$ of $H$ . The Hessenberg structure is the algorithmic content: a generic non-Hermitian matrix is NOT tridiagonalised by Arnoldi (you need bi-orthogonal Lanczos for that), but it IS Hessenberged — every entry below the first sub-diagonal vanishes.

Why Hessenberg and not tridiagonal

Building the Krylov basis means computing $A q_{j}$ at step $j$ and orthonormalising it against the previous basis vectors. The new vector $A q_{j}$ has a priori NO reason to be orthogonal to $q_{0}, q_{1}, \dots, q_{j - 1}$ , so we have to project it explicitly against all $j + 1$ previous vectors before normalizing. Those $j + 1$ inner products are exactly the non-zero entries in column $j$ of $H$ at and above the diagonal. The only zero we get for free is below the sub-diagonal: $⟨ q_{i} ∣ A q_{j} ⟩ = 0$ for $i > j + 1$ , simply because $A q_{j}$ lives in the Krylov subspace of dimension $j + 1$ , so it has no component along basis vectors numbered $j + 2, j + 3, \dots$ . That's the Hessenberg structure, and that's all you get without further symmetry.

If $A$ is Hermitian, the OFF-DIAGONAL block also vanishes — $⟨ q_{i} ∣ A q_{j} ⟩ = ⟨ A q_{i} ∣ q_{j} ⟩ = 0$ for $i < j - 1$ by the same Krylov argument applied to $q_{i}$ . That collapse from Hessenberg to tridiagonal is precisely what makes Lanczos a special case of Arnoldi with constant-memory three-term recursion.

The algorithm

At iteration $j$ :

Form $v = A q_{j}$ .
For $i = 0, 1, \dots, j$ : compute $H_{ij} = ⟨ q_{i} ∣ v ⟩$ and subtract $v \leftarrow v - H_{ij} q_{i}$ . This is MODIFIED Gram-Schmidt — each subtraction uses the updated $v$ , which is dramatically more stable in finite precision than computing all the inner products with the ORIGINAL $v$ first (classical Gram-Schmidt).
Set $H_{j + 1, j} = ∥ v ∥$ and $q_{j + 1} = v / H_{j + 1, j}$ .
If $H_{j + 1, j} \approx 0$ , you've hit "lucky breakdown" — the Krylov subspace has closed and the current eigenvalues of $H [: j, : j]$ are exact eigenvalues of $A$ . Stop.

Each step costs one mat-vec (the $A q_{j}$ ) plus $O (jn)$ work and $O (j)$ storage for the orthogonalisation against $j + 1$ previous vectors. The total cost of $m$ Arnoldi iterations is $O (m^{2} n)$ work and $O (mn)$ storage — a far cry from the constant-memory Lanczos recursion. For most problems this is fine; for the very large operators that motivate Liouville-Lanczos linear response, it's the deal-breaker.

Code and validation

# Arnoldi iteration with modified Gram-Schmidt.
# After m steps the columns of Q (n x m+1) are an orthonormal basis of
# the Krylov subspace K_{m+1}(A, b) = span{b, A b, ..., A^m b}, and H
# ((m+1) x m) is upper Hessenberg with H_{i,j} = <q_i | A q_j>.
#
# Two governing identities (both checked below):
#    A Q[:, :m] = Q H            (Arnoldi relation)
#    Q^T Q      = I              (orthonormality)

import numpy as np

def arnoldi(A, b, m):
    n = A.shape[0]
    Q = np.zeros((n, m + 1))
    H = np.zeros((m + 1, m))
    Q[:, 0] = b / np.linalg.norm(b)
    for j in range(m):
        v = A @ Q[:, j]
        # Modified Gram-Schmidt: subtract projections one at a time, reusing
        # the updated v. Classical GS would compute all H[i,j] first; MGS
        # has much better finite-precision behavior.
        for i in range(j + 1):
            H[i, j] = Q[:, i] @ v
            v = v - H[i, j] * Q[:, i]
        H[j + 1, j] = np.linalg.norm(v)
        if H[j + 1, j] < 1e-12:
            # Lucky breakdown — invariant subspace found.
            return Q[:, :j + 1], H[:j + 1, :j]
        Q[:, j + 1] = v / H[j + 1, j]
    return Q, H

# ─── Validation on a random non-Hermitian matrix ────────────────────────
np.random.seed(0)
n, m = 30, 12
A = np.random.randn(n, n)
b = np.random.randn(n)

Q, H = arnoldi(A, b, m)

print(f"max |Q^T Q - I|        = {np.max(np.abs(Q.T @ Q - np.eye(m+1))):.2e}")
print(f"max |H below sub-diag| = {np.max(np.abs(np.tril(H[:m, :m], -2))):.2e}")
print(f"max |A Q_m - Q H|      = {np.max(np.abs(A @ Q[:, :m] - Q @ H)):.2e}")

# Ritz values: eigenvalues of the Hessenberg block — approximations to
# the eigenvalues of A. Convergence is fastest for extreme eigenvalues.
ritz      = sorted(np.linalg.eigvals(H[:m, :m]), key=lambda z: -abs(z))[:4]
true_eigs = sorted(np.linalg.eigvals(A),        key=lambda z: -abs(z))[:4]
for r, t in zip(ritz, true_eigs):
    print(f"  Ritz {r.real:+.4f}{r.imag:+.4f}j   true {t.real:+.4f}{t.imag:+.4f}j")

Output on a random 30×30 non-Hermitian matrix with $m = 12$ :

max |Q^T Q - I|        = 4.44e-16
max |H below sub-diag| = 0.00e+00
max |A Q_m - Q H|      = 6.66e-16
  Ritz -5.9768+0.0000j   true -4.5495+2.8331j
  Ritz +5.4420+0.0000j   true -4.5495-2.8331j
  Ritz -4.6370+2.6934j   true +5.3037+0.0000j
  Ritz -4.6370-2.6934j   true -0.5891+5.1874j

All three structural identities hold to machine precision. The Ritz values — eigenvalues of the leading $m \times m$ Hessenberg block — are the Arnoldi-projected approximations to the eigenvalues of $A$ . Notice they converge to extremal eigenvalues first (largest magnitude), not to a uniform sampling of the spectrum. That bias is shared with power iteration, for the same reason: each step of Arnoldi includes another $A$ -multiplication, so dominant directions grow faster than the rest.

What it's used for

Eigenvalue solvers. Run Arnoldi to length $m ≪ n$ , diagonalize the small Hessenberg block, and read off approximate extremal eigenvalues. RESTARTED Arnoldi (implicitly restarted; Krylov-Schur) keeps $m$ moderate by repeatedly compressing the basis to the wanted Ritz vectors. This is what ARPACK and most production sparse-eigensolvers do under the hood.
GMRES — Arnoldi-based iterative solver for non-Hermitian $A x = b$ . After $m$ Arnoldi steps you solve a small $(m + 1) \times m$ least-squares problem to get the best approximation to $x$ in the Krylov subspace. The whole framework rests on the Arnoldi relation.
Matrix-function evaluation. For any analytic $f$ : $f (A) b \approx ∥ b ∥ Q_{m} f (H_{m}) e_{1}$ . Useful when $A$ is too large to factor but mat-vecs are cheap — exponentials of large sparse matrices, time-propagation in quantum dynamics.

Related methods

Power iteration — the single-vector ancestor; Arnoldi keeps the whole history.
Symmetric Lanczos — the special case $A = A^{T}$ ; tridiagonal projection, three-term recursion, constant memory.
Bi-orthogonal Lanczos — non-Hermitian alternative that recovers tridiagonal projection at the cost of two Krylov sequences. Use this when memory matters; use Arnoldi when robustness matters.
Davidson method — preconditioned Arnoldi-like iteration tuned for the eigenvalue problems in electronic-structure theory.