Casida's Equation

Perturbation Methods

Casida's equation is a method of choice when you want excited-state properties out of DFT. The setup is like tapping a wineglass: you apply a small oscillating perturbation — usually a weak light field — and sweep the frequency. At most frequencies nothing happens. At certain special frequencies the system rings: the electrons absorb the photon, the dipole oscillates with definite amplitude, and that's an excitation. The frequencies where it rings are the excitation energies, and how loudly each one rings is the oscillator strength. Casida's equation is the matrix equation that takes the ground-state Kohn-Sham orbitals and tells you exactly which frequencies those are and how strongly each one rings.

Setup: linear response of a closed-shell reference

Quick aside on virtual orbitals: they're real orbitals — don't let the occupied orbitals hear that, but yeah, they are. When you solve an SCF problem you get a set of orbitals; some get filled with electrons, some don't, and the empty ones are called "virtual." Think of them as extra seats in a room: they exist, they're available, no one's sitting there right now. Why nobody's in them: promoting an electron from occupied $i$ into virtual $a$ costs energy $ε_{a} - ε_{i}$ , and in the ground state there's nothing to pay that bill — no field, no thermal kick, nothing. So the electrons fill the bottom of the stack and the rest stay empty. Exciting the molecule means supplying exactly that bill from outside — which is what the oscillating field is going to do in the next section.

Setup: a molecule in its ground state, solved by the standard SCF machinery (Kohn-Sham for DFT, Hartree-Fock for HF). You get a stack of orbitals. The bottom $N /2$ are occupied by paired electrons ( $ψ_{i}$ , energy $ε_{i}$ ); the rest are empty (virtuals $ψ_{a}$ , energy $ε_{a}$ ). That's the static ground state — what a normal DFT calculation gives you.

Now turn on a weak oscillating field — a laser shining on the molecule, say. The electrons respond. To first order in the field strength, the response at each driving frequency $ω$ splits into two pieces. $X_{ia}$ tells you how strongly the field is mixing occupied orbital $i$ with virtual orbital $a$ — that's an electron getting promoted. $Y_{ia}$ tells you the time-reverse: how strongly the same $i \leftrightarrow a$ coupling shows up in the opposite direction, an electron coming back down. Both appear together at every frequency; linear-response quantum mechanics is symmetric under time reversal, so you can't have one without the other.

Plug this into the time-dependent Kohn-Sham equation and keep only first-order terms in the field. The algebra — worked out in the deep dive at the bottom of the page — gives a coupled matrix equation for $X$ and $Y$ . In block form:

(A B^{*} B A^{*}) (X Y) = ω (10 0 - 1) (X Y) .

What in the name of Charles Hermite is going on here? Why is this matrix non-Hermitian? Look closely. The matrix on the left is actually Hermitian. The culprit is one entry, hiding in plain sight: the -1 in the lower-right of the metric on the right-hand side. Without that minus sign the equation would be an ordinary Hermitian eigenvalue problem. With it, the metric is indefinite (eigenvalues $+ 1$ and $- 1$ rather than both $+ 1$ ), and folding it into the left-hand matrix to turn this into an ordinary eigenvalue problem produces something non-Hermitian. The minus sign isn't an accident; it encodes time-reversal symmetry. Every positive-frequency root $ω$ comes with a $- ω$ partner representing the same transition run backwards. For real orbitals (the usual case) $A$ and $B$ are real symmetric, and:

A_{ia, jb} = δ_{ij} δ_{ab} (ε_{a} - ε_{i}) + (ia ∥ jb),

B_{ia, jb} = (ia ∥ bj) .

Two pieces to unpack. The diagonal of $A$ is the orbital-energy gap — what it would cost to promote an electron from $i$ to $a$ if you ignored interactions. Everything else, the $(ia ∥ jb)$ terms, is the interaction part: how the $i \to a$ transition density Coulomb-repels and exchanges with the $j \to b$ transition density. TDHF uses Coulomb minus exchange; TDDFT uses Coulomb plus the XC kernel $f_{x c}$ . The two blocks differ in what they couple. $A$ couples excitations to other excitations. $B$ couples excitations to de-excitations — the time-reversal partner of the same physics — and the Tamm-Dancoff approximation just throws that block away.

Spin adaptation: singlets and triplets

A closed-shell molecule has equal numbers of $α$ and $β$ electrons paired in every orbital — full spin symmetry. That symmetry makes the Casida matrix split into independent blocks. The blocks correspond to singlet excitations (spin-symmetric: same amplitude for $α \to α$ and $β \to β$ transitions) and triplet excitations (antisymmetric: opposite-sign amplitudes for $α \to α$ and $β \to β$ , plus two more components that flip spin). The three triplets are degenerate by symmetry, so you solve them all by solving one. Net: two independent problems of size $N_{occ} N_{vir}$ each, instead of one giant problem four times the size:

A_{ia, jb}^{singlet} = δ_{ij} δ_{ab} (ε_{a} - ε_{i}) + 2 (ia ∣ jb) - (ij ∣ ab),

B_{ia, jb}^{singlet} = 2 (ia ∣ jb) - (ib ∣ ja),

A_{ia, jb}^{triplet} = δ_{ij} δ_{ab} (ε_{a} - ε_{i}) - (ij ∣ ab),

B_{ia, jb}^{triplet} = - (ib ∣ ja) .

The factor of 2 in front of $(ia ∣ jb)$ on the singlet side comes from the spin-symmetric combination; the triplet has no Coulomb piece in $A$ at all (the symmetric Coulomb interaction cancels between the $α \to α$ and $β \to β$ components of a triplet). What's left of the triplet is purely exchange — this is the origin of Hund's rule in the orbital language: triplets sit lower than singlets because they lose Coulomb repulsion.

Why it's a pseudo-eigenvalue problem

The minus sign in the metric is awkward, but when $A - B$ is positive-definite — which it is whenever the ground state is a stable minimum and not a saddle — you can fold the equation into an ordinary Hermitian eigenvalue problem half the size. Here is the algebra.

Write the block equation out as two coupled vector equations. With $A$ and $B$ real symmetric:

A X + B Y = ω X, BX + A Y = - ωY .

Add the two equations and subtract them. The sum gives one equation in $X + Y$ and $X - Y$ , the difference gives the other:

(A + B) (X + Y) = ω (X - Y), (A - B) (X - Y) = ω (X + Y) .

Solve the second for $X + Y$ (using that $A - B$ is invertible) and substitute into the first. The cross-coupling drops out and what's left is an ordinary eigenvalue problem for $X - Y$ alone, with eigenvalues $ω^{2}$ :

(A + B) (A - B) (X - Y) = ω^{2} (X - Y) .

The operator $(A + B) (A - B)$ is not Hermitian — products of Hermitian operators usually aren't. But it is similar to a Hermitian operator. Take the symmetric square root $(A - B)^{1/2}$ (which exists and is real because $A - B$ is positive-definite), define a rescaled eigenvector

Z = (A - B)^{1/2} (X - Y),

and conjugate the eigenvalue problem by $(A - B)^{1/2}$ . The result is Hermitian:

(A - B)^{1/2} (A + B) (A - B)^{1/2} Z = ω^{2} Z .

The eigenvalues are $ω^{2}$ , so excitation energies come out as square roots. The eigenvector $Z$ recovers the physical amplitudes via $X - Y = (A - B)^{- 1/2} Z$ and $X + Y = ω^{- 1} (A - B)^{1/2} Z$ . A triplet instability — an eigenvalue of $A - B$ going negative — is the signature that the closed-shell reference is unstable against a spin-symmetry-breaking deformation, and Casida loudly tells you so by failing to take a real square root.

The Tamm-Dancoff approximation

Drop $B$ and the whole problem collapses to ordinary Hermitian diagonalization of $A$ :

A X = ω^{TDA} X .

This is the Tamm-Dancoff approximation (TDA), or "configuration-interaction singles with the DFT orbital basis" depending on who's talking. Cheaper, more numerically stable (no square-root of a possibly-near-singular matrix), and immune to the triplet-instability failure mode. The price: TDA excitation energies are systematically too high, because the de-excitation coupling $B$ would have lowered the levels through a second-order-like correction and that correction is gone. For valence singlets the TDA error is a few tenths of an eV; for charge-transfer states and Rydberg-like states it can be larger.

Oscillator strengths and transition densities

Once you've got a Casida root $(ω_{n}, X_{n}, Y_{n})$ , the transition density — the spatial pattern of where the electron sloshes during this excitation — is a weighted sum over occupied-virtual orbital products:

ρ_{n}^{T} (r) = ia \sum (X_{ia}^{(n)} + Y_{ia}^{(n)}) ψ_{i} (r) ψ_{a} (r),

Multiply this by $r$ and integrate to get the transition dipole — that's what couples to photons. The oscillator strength $f_{n} = \frac{2}{3} ω_{n} ∣ ⟨ 0∣ r ∣ n ⟩ ∣^{2}$ packages it into a dimensionless intensity. The absorption spectrum is then one delta function per root, weighted by $f_{n}$ and parked at $ω_{n}$ ; broaden them into Lorentzians and you've got something to lay over a measured spectrum.

H2 in 6-31G: working through it by hand

H2 in a minimal-ish basis is the perfect testbed: one occupied orbital ( $σ_{g}$ ), a handful of virtuals, the whole Casida problem is a 3×3 in each spin block. Below: do an RHF SCF, build the four Casida tensors directly from the molecular ERIs, diagonalize, and check against PySCF's library TDHF. The two implementations should agree to machine precision because they are literally the same equation; the point of writing it out is that the equation is no longer a black box.

"""
Casida's equation for H2 (TDHF flavor).

We build the linear-response problem from scratch from a restricted
Hartree-Fock reference: form the Casida A and B matrices in the
occupied-virtual MO basis, project into the singlet and triplet
spin-adapted blocks, solve the pseudo-eigenvalue problem, and verify
against PySCF's built-in TDHF.

  [ A  B ][X]       [ 1   0 ][X]
  [ B  A ][Y] = w   [ 0  -1 ][Y]

When A - B is positive-definite (true at the HF minimum for stable
closed-shell molecules), this is equivalent to the Hermitian problem

  (A - B)^{1/2} (A + B) (A - B)^{1/2} Z = w^2 Z.

The Tamm-Dancoff approximation drops B and diagonalizes A directly.
TDDFT differs only in the two-electron kernel: replace -K_exchange by
the XC kernel f_xc. The matrix structure is identical, so the code is
identical down to the slot where you fill in the kernel.
"""

import numpy as np
from pyscf import gto, scf, tdscf, ao2mo

# ---- 1. Reference: restricted HF on H2 at R = 0.74 A --------------------
mol = gto.M(atom='H 0 0 0; H 0 0 0.74', basis='6-31g', verbose=0)
mf  = scf.RHF(mol).run()

nocc = mol.nelectron // 2
nmo  = mf.mo_coeff.shape[1]
nvir = nmo - nocc
eps  = mf.mo_energy
print(f"H2 / 6-31G  nocc = {nocc}, nvir = {nvir},  E_HF = {mf.e_tot:.6f} Ha")
print(f"HOMO-LUMO gap = {eps[nocc] - eps[nocc-1]:.6f} Ha "
      f"({(eps[nocc] - eps[nocc-1]) * 27.211386:.3f} eV)")

# ---- 2. Two-electron integrals (chemist notation) in the MO basis --------
eri_ao = mol.intor('int2e')
eri_mo = ao2mo.incore.full(eri_ao, mf.mo_coeff)

o = slice(0, nocc)
v = slice(nocc, nmo)

# All four-index tensors are stored with index order (i, a, j, b).
iajb = eri_mo[o, v, o, v]
ijab = eri_mo[o, o, v, v].transpose(0, 2, 1, 3)
ibja = eri_mo[o, v, o, v].transpose(0, 3, 2, 1)

# Orbital-energy gap on the diagonal of A: (eps_a - eps_i) delta_ij delta_ab
delta = np.zeros((nocc, nvir, nocc, nvir))
for i in range(nocc):
    for a in range(nvir):
        delta[i, a, i, a] = eps[nocc + a] - eps[i]

# ---- 3. Spin-adapted Casida A and B --------------------------------------
A_singlet = delta + 2.0 * iajb - ijab
B_singlet =         2.0 * iajb - ibja
A_triplet = delta              - ijab
B_triplet =                    - ibja

def casida_omega(A, B):
    """Solve [[A,B],[B,A]] [X;Y] = w [[1,0],[0,-1]] [X;Y].
    Returns the sorted positive excitation energies."""
    n = A.shape[0] * A.shape[1]
    A2 = 0.5 * (A.reshape(n, n) + A.reshape(n, n).T)
    B2 = 0.5 * (B.reshape(n, n) + B.reshape(n, n).T)
    ApB = A2 + B2
    AmB = A2 - B2
    w, U = np.linalg.eigh(AmB)
    sqrt_AmB = (U * np.sqrt(np.maximum(w, 0))) @ U.T
    M = sqrt_AmB @ ApB @ sqrt_AmB
    M = 0.5 * (M + M.T)
    omega2 = np.linalg.eigvalsh(M)
    return np.sort(np.sqrt(np.maximum(omega2, 0)))

def tda_omega(A):
    """Tamm-Dancoff: just diagonalize A."""
    n = A.shape[0] * A.shape[1]
    A2 = 0.5 * (A.reshape(n, n) + A.reshape(n, n).T)
    return np.sort(np.linalg.eigvalsh(A2))

om_S = casida_omega(A_singlet, B_singlet)
om_T = casida_omega(A_triplet, B_triplet)
om_S_tda = tda_omega(A_singlet)
om_T_tda = tda_omega(A_triplet)

# ---- 4. PySCF reference (full TDHF) --------------------------------------
nstates = nocc * nvir
tdhf_S = tdscf.TDHF(mf); tdhf_S.singlet = True;  tdhf_S.nstates = nstates; tdhf_S.kernel()
tdhf_T = tdscf.TDHF(mf); tdhf_T.singlet = False; tdhf_T.nstates = nstates; tdhf_T.kernel()
ref_S = np.sort(tdhf_S.e)
ref_T = np.sort(tdhf_T.e)

print(f"\nMax abs error vs PySCF TDHF: singlet {np.max(np.abs(om_S - ref_S)):.2e}, "
      f"triplet {np.max(np.abs(om_T - ref_T)):.2e}")

# ---- 5. Singlet-triplet splitting and TDA error --------------------------
Ha_eV = 27.211386
print("\nLowest excitations of H2 / 6-31G:")
print(f"  S0 -> T1   {om_T[0]:.4f} Ha = {om_T[0]*Ha_eV:6.3f} eV   (TDA: {om_T_tda[0]*Ha_eV:6.3f} eV)")
print(f"  S0 -> S1   {om_S[0]:.4f} Ha = {om_S[0]*Ha_eV:6.3f} eV   (TDA: {om_S_tda[0]*Ha_eV:6.3f} eV)")
print(f"  singlet - triplet splitting: {(om_S[0]-om_T[0])*Ha_eV:.3f} eV")
print(f"  TDA overshoots S1 by {(om_S_tda[0]-om_S[0])*Ha_eV:.3f} eV "
      f"(de-excitation coupling that TDA discards)")

Output:

H2 / 6-31G  nocc = 1, nvir = 3,  E_HF = -1.126755 Ha
HOMO-LUMO gap = 0.834290 Ha (22.702 eV)

Max abs error vs PySCF TDHF: singlet 2.89e-15, triplet 2.22e-15

Lowest excitations of H2 / 6-31G:
  S0 -> T1   0.3599 Ha =  9.793 eV   (TDA: 10.316 eV)
  S0 -> S1   0.5520 Ha = 15.020 eV   (TDA: 15.248 eV)
  singlet - triplet splitting: 5.226 eV
  TDA overshoots S1 by 0.228 eV (de-excitation coupling that TDA discards)

Agreement with PySCF is at the level of $1 0^{- 15}$ Hartree across all three roots in each spin block — floating-point noise, as it should be. The triplet sits 5.2 eV below the singlet (Hund's rule in action), and the TDA bumps the singlet up by 0.23 eV.

Two things to flag about these numbers. First, they are HF-quality — TDHF, not TDDFT. The experimental H2 S0→T1 excitation is about 10.6 eV and S0→S1 (B¹Σu+) is about 12.6 eV; we're overshooting badly because HF ignores correlation and 6-31G has no diffuse functions for the diffuse excited states. Switching to TDDFT with a hybrid functional and a basis with diffuse augmentation (aug-cc-pVDZ or larger) brings it into experimental range. Second, this is the entire spectrum at this level — three singlet roots and three triplet roots, exhausting the $N_{occ} N_{vir} = 3$ dimension of the problem. There is no information past that without enlarging the basis.

Why it doesn't scale to solids

For a molecule with a few hundred basis functions you get an $N_{occ} N_{vir}$ Casida matrix of size $\sim 1 0^{4}$ , which dense LAPACK handles in seconds. For a solid in a plane-wave basis with a dense k-mesh the dimension blows up to $1 0^{6}$ and beyond, and worse, the spectrum becomes a continuum — there is no point resolving a million individual roots when what you want is the smooth function $Im χ (ω)$ . That is the regime where Liouville-Lanczos takes over: never form the Liouvillian, never diagonalize it, get the whole spectrum from one matrix-free Krylov chain plus a continued-fraction terminator.

Deep dive: where A and B come from

Before diving into the algebra, an aside on the single-particle business that grinds my gears. Every single-particle model — Hartree-Fock, Kohn-Sham, the nuclear shell model — works by calculating a stack of single-particle levels and then filling them using some occupation rule (RHF, UHF, even shell, odd shell). You are approximating a many-body Hamiltonian by populating a single-particle level diagram. That's a strange thing to do when you write it out, but it works astonishingly well, and the derivation below lives entirely inside that approximation.

The starting point is the time-dependent Kohn-Sham equation for the one-particle density matrix:

i \partial_{t} \overset{γ}{^} (t) = [\hat{H}_{KS} [γ, t], \overset{γ}{^} (t)] .

The Hamiltonian on the right is the instantaneous one, and it itself depends on the density. To first order in the perturbation, it splits into three pieces:

\hat{H}_{KS} (t) = \hat{H}_{KS}^{(0)} + \hat{V}_{ext}^{'} (t) + \int K_{HXC} (r, r^{'}) δ ρ (r^{'}, t) d r^{'} .

The first piece is the static KS Hamiltonian — its eigenstates are the $ψ_{i}$ and $ψ_{a}$ , eigenvalues $ε_{i}$ and $ε_{a}$ . The second is the external perturbation (the laser). The third is the self-consistent response: when the density wobbles by $δ ρ$ , the Hartree-XC potential wobbles with it, and the wobble acts back on the orbitals. The kernel $K_{HXC}$ is the static response function — Coulomb minus exchange in TDHF, Coulomb plus the XC kernel $f_{x c}$ in TDDFT.

Now expand the density matrix to first order: $\overset{γ}{^} (t) = \overset{γ}{^}_{0} + \overset{γ}{^}_{1} (t)$ , where $\overset{γ}{^}_{0}$ is the ground-state projector and $\overset{γ}{^}_{1}$ is the wobble. The wobble has to live in the off-diagonal occupied-virtual block of the density matrix (the diagonal blocks can't change at first order — particle number is fixed). There are two independent corners: virtual-occupied (call its amplitude $X$ ) and occupied-virtual (amplitude $Y$ ):

\overset{γ}{^}_{1} (r, r^{'}, t) = ia \sum [X_{ia} (t) ψ_{a} (r) ψ_{i}^{*} (r^{'}) + Y_{ia} (t) ψ_{i} (r) ψ_{a}^{*} (r^{'})] .

Plug $\overset{γ}{^}_{1}$ into the TDKS commutator and keep only first-order terms. The static-Hamiltonian piece gives the orbital-energy difference:

[\hat{H}_{KS}^{(0)}, \overset{γ}{^}_{1}] = ia \sum [X_{ia} (ε_{a} - ε_{i}) ψ_{a} ψ_{i}^{*} - Y_{ia} (ε_{a} - ε_{i}) ψ_{i} ψ_{a}^{*}] .

The self-consistent piece $[\hat{V}_{HXC}^{'} [\overset{γ}{^}_{1}], \overset{γ}{^}_{0}]$ contributes the two-electron matrix elements — the kernel $K_{HXC}$ sandwiched between orbital products. For a monochromatic field at frequency $ω$ , equate coefficients of $e^{- iω t}$ on both sides of $i \partial_{t} \overset{γ}{^}_{1} = [\cdot, \cdot]$ and project onto orbital pairs $(ψ_{a}, ψ_{j})$ . What drops out is exactly the Casida block equation, with matrix elements:

A_{ia, jb} = δ_{ij} δ_{ab} (ε_{a} - ε_{i}) + ⟨ ψ_{a} ψ_{j} ∣ K_{HXC} ∣ ψ_{i} ψ_{b} ⟩,

B_{ia, jb} = ⟨ ψ_{a} ψ_{b} ∣ K_{HXC} ∣ ψ_{i} ψ_{j} ⟩ .

The bracket notation $⟨ ψ_{a} ψ_{j} ∣ K_{HXC} ∣ ψ_{i} ψ_{b} ⟩$ means the spatial integral $\int ψ_{a}^{*} (r) ψ_{j}^{*} (r^{'}) K_{HXC} (r, r^{'}) ψ_{i} (r) ψ_{b} (r^{'}) d r d r^{'}$ — the kernel acting on a product of two transition densities. That's the $(ia ∥ jb)$ notation used above. The difference between $A$ and $B$ is which orbital pair you put on each side of the kernel: $A$ pairs $(a, j)$ on the left with $(i, b)$ on the right; $B$ pairs $(a, b)$ with $(i, j)$ . That's the only difference between excitation-excitation coupling and excitation-de-excitation coupling.

Related on this site

Liouville-Lanczos is the matrix-free, frequency-sweep cousin of Casida for solids and large systems. Lanczos iteration is the underlying Krylov method; bi-orthogonal Lanczos is the non-Hermitian variant needed when the operator (the Liouvillian) is non-symmetric.