Arnoldi Iteration

Linear Algebra

Arnoldi iteration is the standard way to build an orthonormal basis of a Krylov subspace for a NON-Hermitian matrix. It is to power iteration what Gram-Schmidt is to a single vector: instead of repeatedly multiplying by and discarding everything but the dominant direction, you keep ALL the directions visited, orthonormalise them as you go, and end up with a basis of together with the projection of onto that basis.

The output is a pair with orthonormal and upper HESSENBERG (zeros below the first sub-diagonal). The pair satisfies the Arnoldi relation:

where is the first columns of . This identity is exact in exact arithmetic: every direction lies in the span of by construction, and the coefficients of that expansion are exactly the entries of column of . The Hessenberg structure is the algorithmic content: a generic non-Hermitian matrix is NOT tridiagonalised by Arnoldi (you need bi-orthogonal Lanczos for that), but it IS Hessenberged — every entry below the first sub-diagonal vanishes.

Why Hessenberg and not tridiagonal

Building the Krylov basis means computing at step and orthonormalising it against the previous basis vectors. The new vector has a priori NO reason to be orthogonal to , so we have to project it explicitly against all previous vectors before normalizing. Those inner products are exactly the non-zero entries in column of at and above the diagonal. The only zero we get for free is below the sub-diagonal: for , simply because lives in the Krylov subspace of dimension , so it has no component along basis vectors numbered . That's the Hessenberg structure, and that's all you get without further symmetry.

If is Hermitian, the OFF-DIAGONAL block also vanishes — for by the same Krylov argument applied to . That collapse from Hessenberg to tridiagonal is precisely what makes Lanczos a special case of Arnoldi with constant-memory three-term recursion.

The algorithm

At iteration :

  1. Form .
  2. For : compute and subtract . This is MODIFIED Gram-Schmidt — each subtraction uses the updated , which is dramatically more stable in finite precision than computing all the inner products with the ORIGINAL first (classical Gram-Schmidt).
  3. Set and .
  4. If , you've hit "lucky breakdown" — the Krylov subspace has closed and the current eigenvalues of are exact eigenvalues of . Stop.

Each step costs one mat-vec (the ) plus work and storage for the orthogonalisation against previous vectors. The total cost of Arnoldi iterations is work and storage — a far cry from the constant-memory Lanczos recursion. For most problems this is fine; for the very large operators that motivate Liouville-Lanczos linear response, it's the deal-breaker.

Code and validation

# Arnoldi iteration with modified Gram-Schmidt.
# After m steps the columns of Q (n x m+1) are an orthonormal basis of
# the Krylov subspace K_{m+1}(A, b) = span{b, A b, ..., A^m b}, and H
# ((m+1) x m) is upper Hessenberg with H_{i,j} = <q_i | A q_j>.
#
# Two governing identities (both checked below):
#    A Q[:, :m] = Q H            (Arnoldi relation)
#    Q^T Q      = I              (orthonormality)

import numpy as np

def arnoldi(A, b, m):
    n = A.shape[0]
    Q = np.zeros((n, m + 1))
    H = np.zeros((m + 1, m))
    Q[:, 0] = b / np.linalg.norm(b)
    for j in range(m):
        v = A @ Q[:, j]
        # Modified Gram-Schmidt: subtract projections one at a time, reusing
        # the updated v. Classical GS would compute all H[i,j] first; MGS
        # has much better finite-precision behavior.
        for i in range(j + 1):
            H[i, j] = Q[:, i] @ v
            v = v - H[i, j] * Q[:, i]
        H[j + 1, j] = np.linalg.norm(v)
        if H[j + 1, j] < 1e-12:
            # Lucky breakdown — invariant subspace found.
            return Q[:, :j + 1], H[:j + 1, :j]
        Q[:, j + 1] = v / H[j + 1, j]
    return Q, H

# ─── Validation on a random non-Hermitian matrix ────────────────────────
np.random.seed(0)
n, m = 30, 12
A = np.random.randn(n, n)
b = np.random.randn(n)

Q, H = arnoldi(A, b, m)

print(f"max |Q^T Q - I|        = {np.max(np.abs(Q.T @ Q - np.eye(m+1))):.2e}")
print(f"max |H below sub-diag| = {np.max(np.abs(np.tril(H[:m, :m], -2))):.2e}")
print(f"max |A Q_m - Q H|      = {np.max(np.abs(A @ Q[:, :m] - Q @ H)):.2e}")

# Ritz values: eigenvalues of the Hessenberg block — approximations to
# the eigenvalues of A. Convergence is fastest for extreme eigenvalues.
ritz      = sorted(np.linalg.eigvals(H[:m, :m]), key=lambda z: -abs(z))[:4]
true_eigs = sorted(np.linalg.eigvals(A),        key=lambda z: -abs(z))[:4]
for r, t in zip(ritz, true_eigs):
    print(f"  Ritz {r.real:+.4f}{r.imag:+.4f}j   true {t.real:+.4f}{t.imag:+.4f}j")

Output on a random 30×30 non-Hermitian matrix with :

max |Q^T Q - I|        = 4.44e-16
max |H below sub-diag| = 0.00e+00
max |A Q_m - Q H|      = 6.66e-16
  Ritz -5.9768+0.0000j   true -4.5495+2.8331j
  Ritz +5.4420+0.0000j   true -4.5495-2.8331j
  Ritz -4.6370+2.6934j   true +5.3037+0.0000j
  Ritz -4.6370-2.6934j   true -0.5891+5.1874j

All three structural identities hold to machine precision. The Ritz values — eigenvalues of the leading Hessenberg block — are the Arnoldi-projected approximations to the eigenvalues of . Notice they converge to extremal eigenvalues first (largest magnitude), not to a uniform sampling of the spectrum. That bias is shared with power iteration, for the same reason: each step of Arnoldi includes another -multiplication, so dominant directions grow faster than the rest.

What it's used for

Related methods