Self-Learning Monte Carlo

Machine Learning

Monte Carlo sampling is the workhorse of statistical physics. You want $⟨ O ⟩$ at temperature $T$ ; you generate spin configurations weighted by $e^{- β H}$ , average $O$ over them. The catch is that consecutive samples are correlated, and near a critical point the correlation persists for a long time. Flipping spins one at a time, the system needs $\sim L^{z}$ sweeps to forget where it started, with $z \approx 2.17$ for 2D Ising. Doubling the lattice doesn't just quadruple your sampling cost — it adds another factor of $\sim 4$ from the slowdown.

For pure nearest-neighbor Ising there's a famous fix. Build a connected cluster of like spins via Fortuin-Kasteleyn bond percolation, flip the whole cluster at once. That's the Wolff algorithm. It satisfies detailed balance for the NN Ising Hamiltonian and kills the critical slowing down: $z$ drops to $\approx 0.35$ .

The trouble is Wolff is bespoke. The bond-construction probability $p_{add} = 1 - e^{- 2 β J}$ comes from a calculation tied to the specific NN Ising bond structure. Add a next-nearest-neighbor coupling, a plaquette term, anything other than $J s_{i} s_{j}$ on nearest neighbors, and the construction breaks. You either invent a new cluster algorithm for your specific model (good luck) or fall back to single-spin updates and eat the slowdown.

Self-learning Monte Carlo is a different bargain. Learn an effective Hamiltonian from data, do cluster moves on the effective model where Wolff works, and correct the mismatch via Metropolis-Hastings. If the surrogate is close to the truth, the correction is gentle and you inherit the cluster speedup.

The detailed-balance trick

Call the target distribution $π (s) \propto e^{- β H_{true} (s)}$ . Pick any effective Hamiltonian $H_{eff}$ and run Wolff using its couplings. Wolff satisfies detailed balance for $π_{eff} (s) \propto e^{- β H_{eff} (s)}$ — for whatever model it was tuned to, not for the real one we want.

So define the SLMC move: build a Wolff cluster using $J_{eff}$ , propose the cluster flip, accept with probability

α = min (1, exp [- β (Δ H_{true} - Δ H_{eff})])

where $Δ H_{true}$ and $Δ H_{eff}$ are the energy changes the true and effective Hamiltonians "see" under the proposed flip. Wolff's construction handles $Δ H_{eff}$ perfectly — that's what it was built for — so the MH factor only has to absorb what the surrogate got wrong. If the surrogate is exact, $Δ H_{true} - Δ H_{eff} = 0$ and we accept everything (vanilla Wolff). If the surrogate is random noise, the MH factor swings wildly and most moves get rejected (we've gained nothing). The sweet spot is in between: the surrogate captures most of $H_{true}$ , the proposed clusters are still well-correlated regions of the true model, and a small MH correction handles the residual.

The example: J₁-J₂ Ising

To make this concrete, take the square-lattice Ising model with nearest-neighbor and next-nearest-neighbor diagonal couplings:

H_{true} = - J_{1} ⟨ ij ⟩ \sum s_{i} s_{j} - J_{2} ⟨⟨ ij ⟩⟩ \sum s_{i} s_{j}

For $J_{2} / J_{1} = 0.2$ this is ferromagnetic in both bonds; $T_{c}$ sits around $2.95$ in units of $J_{1}$ . Wolff alone can't handle this — the cluster construction is tied to one bond type. Local Metropolis works but slows down near $T_{c}$ .

The surrogate we'll fit is just NN-only: $H_{eff} = - J_{eff} \sum_{⟨ ij ⟩} s_{i} s_{j}$ . One parameter. Fitting it is the simplest piece of supervised learning there is — gather samples $(s, E_{true} (s))$ from a short burst of local-update MC, compute the NN-bond sum $Q (s) = \sum_{⟨ ij ⟩} s_{i} s_{j}$ for each, fit $E_{true} \approx - J_{eff} Q + c$ by ordinary least squares.

The fit lands at $J_{eff} \approx 1.25$ . The surrogate inflates the apparent NN coupling to absorb the average effect of the $J_{2}$ bonds — this is the renormalized coupling at the temperatures we sample. The diagonal ferromagnetic bonds enhance the apparent NN attraction by about 25%.

The code

"""
Self-Learning Monte Carlo (SLMC) for the 2D J1-J2 Ising model.

Reference: Liu, Qi, Meng, Fu, Phys. Rev. B 95, 041101(R) (2017).
"""

import numpy as np

# ---------------------------------------------------------------------
# Model: 2D Ising with NN + NNN couplings on a periodic square lattice.
# ---------------------------------------------------------------------

L     = 16
J1    = 1.0
J2    = 0.2
T     = 2.95           # close to the J1+J2 critical temperature
BETA  = 1.0 / T


def nn_sum(spins):
    """Sum over nearest-neighbor bonds (right and down)."""
    return ((spins * np.roll(spins, -1, 0)).sum()
          + (spins * np.roll(spins, -1, 1)).sum())


def nnn_sum(spins):
    """Sum over next-nearest-neighbor diagonal bonds."""
    return ((spins * np.roll(np.roll(spins, -1, 0), -1, 1)).sum()
          + (spins * np.roll(np.roll(spins, -1, 0),  1, 1)).sum())


def H_true(spins, J1, J2):
    return -J1 * nn_sum(spins) - J2 * nnn_sum(spins)


def H_eff(spins, J_eff):
    return -J_eff * nn_sum(spins)


# ---------------------------------------------------------------------
# Local Metropolis sweep on the true Hamiltonian.
# ---------------------------------------------------------------------

def local_sweep(spins, beta, J1, J2, rng):
    L = spins.shape[0]
    for _ in range(L * L):
        i = rng.integers(L)
        j = rng.integers(L)
        s = spins[i, j]
        nn = (spins[(i + 1) % L, j] + spins[(i - 1) % L, j]
            + spins[i, (j + 1) % L] + spins[i, (j - 1) % L])
        nnn = (spins[(i + 1) % L, (j + 1) % L] + spins[(i + 1) % L, (j - 1) % L]
             + spins[(i - 1) % L, (j + 1) % L] + spins[(i - 1) % L, (j - 1) % L])
        dE = 2.0 * s * (J1 * nn + J2 * nnn)
        if dE <= 0 or rng.random() < np.exp(-beta * dE):
            spins[i, j] = -s
    return spins


# ---------------------------------------------------------------------
# Wolff cluster on the *surrogate* Hamiltonian.
# Then MH-accept the cluster flip against the true H.
# ---------------------------------------------------------------------

def slmc_step(spins, beta, J_eff, J1, J2, rng):
    L = spins.shape[0]
    p_add = 1.0 - np.exp(-2.0 * beta * J_eff)

    i0 = rng.integers(L)
    j0 = rng.integers(L)
    seed = spins[i0, j0]
    in_cluster = np.zeros_like(spins, dtype=bool)
    in_cluster[i0, j0] = True
    stack = [(i0, j0)]
    while stack:
        i, j = stack.pop()
        for di, dj in ((1, 0), (-1, 0), (0, 1), (0, -1)):
            ni, nj = (i + di) % L, (j + dj) % L
            if (not in_cluster[ni, nj]
                    and spins[ni, nj] == seed
                    and rng.random() < p_add):
                in_cluster[ni, nj] = True
                stack.append((ni, nj))

    proposed = spins.copy()
    proposed[in_cluster] *= -1

    dH_true = H_true(proposed, J1, J2) - H_true(spins, J1, J2)
    dH_eff  = H_eff (proposed, J_eff) - H_eff (spins, J_eff)

    if rng.random() < np.exp(-beta * (dH_true - dH_eff)):
        return proposed, 1, in_cluster.sum()
    return spins, 0, in_cluster.sum()


# ---------------------------------------------------------------------
# Train the surrogate: fit J_eff via OLS on (NN-sum, true energy) pairs
# gathered from a short burst of local MC.
# ---------------------------------------------------------------------

def fit_surrogate(spins, beta, J1, J2, n_burn, n_train, rng):
    for _ in range(n_burn):
        spins = local_sweep(spins, beta, J1, J2, rng)
    Q  = np.empty(n_train)
    E  = np.empty(n_train)
    for t in range(n_train):
        spins = local_sweep(spins, beta, J1, J2, rng)
        Q[t] = nn_sum(spins)
        E[t] = H_true(spins, J1, J2)
    # E ≈ β1·Q + β0  ⇒  H_eff = −J_eff·Q,   so J_eff = −β1.
    A = np.column_stack([Q, np.ones_like(Q)])
    coefs, *_ = np.linalg.lstsq(A, E, rcond=None)
    J_eff = -coefs[0]
    return J_eff, spins


# ---------------------------------------------------------------------
# Integrated autocorrelation time (sum until ρ(k) first goes negative).
# ---------------------------------------------------------------------

def tau_int(x):
    x = np.asarray(x, dtype=float) - np.mean(x)
    var = np.mean(x * x)
    if var == 0:
        return 0.5
    n = len(x)
    tau = 0.5
    for k in range(1, n // 4):
        rho = np.mean(x[:-k] * x[k:]) / var
        if rho < 0:
            break
        tau += rho
    return tau


# ---------------------------------------------------------------------
# Main: train surrogate, then race local vs SLMC.
# ---------------------------------------------------------------------

def order_obs(spins):
    """|m|² — invariant under global Z2 flip, so the autocorrelation
    measurement isn't confounded by which symmetry sector each chain
    happens to settle in."""
    m = spins.mean()
    return m * m


rng = np.random.default_rng(7)
spins0 = rng.choice([-1, 1], size=(L, L)).astype(np.int8)

print(f"2D Ising  L={L}  J1={J1}  J2={J2}  T={T}")
print(f"true H = -J1·Σ_NN s·s - J2·Σ_NNN s·s\n")

J_eff, spins = fit_surrogate(
    spins0.copy(), BETA, J1, J2, n_burn=200, n_train=800, rng=rng
)
print(f"Surrogate fit:  J_eff = {J_eff:.4f}   (vs J1 = {J1})\n")

N_BURN_PROD = 200
N_MEAS      = 4000

# --- Local-update baseline -------------------------------------------
spins_l = spins0.copy()
for _ in range(N_BURN_PROD):
    spins_l = local_sweep(spins_l, BETA, J1, J2, rng)
m_local = np.empty(N_MEAS)
for t in range(N_MEAS):
    spins_l = local_sweep(spins_l, BETA, J1, J2, rng)
    m_local[t] = order_obs(spins_l)
tau_local = tau_int(m_local)

# --- SLMC -------------------------------------------------------------
spins_s = spins0.copy()
for _ in range(N_BURN_PROD):
    spins_s = local_sweep(spins_s, BETA, J1, J2, rng)
m_slmc      = np.empty(N_MEAS)
acc_count   = 0
cluster_size_sum = 0
for t in range(N_MEAS):
    spins_s, accepted, csize = slmc_step(spins_s, BETA, J_eff, J1, J2, rng)
    acc_count        += accepted
    cluster_size_sum += csize
    m_slmc[t] = order_obs(spins_s)
tau_slmc = tau_int(m_slmc)
acc_rate = acc_count / N_MEAS
mean_csize = cluster_size_sum / N_MEAS

print("Production runs:")
print(f"  measurements per chain          : {N_MEAS}")
print(f"  ⟨m²⟩  local                      : {m_local.mean():.4f}")
print(f"  ⟨m²⟩  SLMC                       : {m_slmc.mean():.4f}\n")
print(f"  τ_int (local sweeps)            : {tau_local:8.2f}")
print(f"  τ_int (SLMC cluster moves)      : {tau_slmc:8.2f}")
print(f"  raw speedup (τ_local / τ_SLMC)  : {tau_local / tau_slmc:8.2f}×\n")
print(f"  SLMC acceptance rate            : {acc_rate:.3f}")
print(f"  mean cluster size (out of {L*L})  : {mean_csize:6.1f}")

Running it

2D Ising  L=16  J1=1.0  J2=0.2  T=2.95
true H = -J1·Σ_NN s·s - J2·Σ_NNN s·s

Surrogate fit:  J_eff = 1.2497   (vs J1 = 1.0)

Production runs:
  measurements per chain          : 4000
  ⟨m²⟩  local                      : 0.4610
  ⟨m²⟩  SLMC                       : 0.4622

  τ_int (local sweeps)            :    31.80
  τ_int (SLMC cluster moves)      :     4.38
  raw speedup (τ_local / τ_SLMC)  :     7.26×

  SLMC acceptance rate            : 0.708
  mean cluster size (out of 256)  :  109.5

What the numbers say

The speedup is real. The integrated autocorrelation time of $⟨ m^{2} ⟩$ is $τ_{local} \approx 32$ sweeps versus $τ_{SLMC} \approx 4$ cluster moves — about $7 \times$ at $L = 16$ near $T_{c}$ . The ratio grows with system size because local has $z \approx 2.17$ while cluster updates have $z \approx 0.35$ , so the relative gain scales like $L^{1.8}$ . At $L = 64$ we'd expect closer to $100 \times$ .

The surrogate is good enough. The MH acceptance rate is 71%. Cluster size averages $\sim 110$ out of 256 spins — about half the lattice, the right ballpark for critical clusters. Each SLMC step is one correlated move on $\sim 100$ spins; each local sweep is 256 random single-spin updates. Per spin flip the cost is comparable, but per decorrelated sample SLMC is dramatically cheaper.

Same physics. $⟨ m^{2} ⟩$ matches between the two methods to four decimal places. That's the point of an asymptotically exact algorithm — same answer, just faster.

Where this goes

The surrogate doesn't have to be linear. The original SLMC papers use simple polynomial models for spin systems; later work uses neural networks for fermion sign-problem cases where the relevant correlations are non-local and hard to write down by hand. Whatever the surrogate, the trick is the same: it has to be cheap to evaluate, easy to do cluster moves on (or otherwise easy to sample from), and close enough to the truth that MH doesn't reject too often.

The big payoff is in quantum many-body simulations where each MC step costs a fermion determinant. SLMC trains a bosonic surrogate that approximates the determinantal weights, proposes moves cheaply, and accepts/rejects against the real action. On the half-filled square-lattice Hubbard model, Liu, Qi, Meng, and Fu showed about two orders of magnitude of wall-clock speedup this way.

What it doesn't fix

Surrogate quality bounds everything. If $H_{true}$ has interactions on scales the surrogate can't represent — long-range, multi-body, or symmetry-breaking — the MH acceptance crashes and SLMC reverts to a slow random walk through configuration space. For an NN-only surrogate facing a strongly frustrated $J_{2}$ , you'd see this.

Training has cost. You pay upfront in local-MC sweeps to gather $(s, E)$ pairs and fit the surrogate. For very expensive likelihoods (auxiliary-field QMC) this is negligible. For cheap models it's noticeable. There's also a self-consistency wrinkle: as the surrogate improves you might want to re-fit on samples drawn from the surrogate, not from local MC. The original papers iterate this in stages.

It doesn't help if the bottleneck isn't autocorrelation. If each local step is slow for its own reasons (a determinant, a long-range force) and decorrelation is already fast, SLMC gives you nothing. The two have to compose: SLMC reduces the number of samples needed; a good surrogate reduces the cost of each proposal.

Core idea

Wolff is fast because it proposes moves drawn from the equilibrium distribution of the model it was built for. SLMC fakes a Wolff for a model you can't build one for directly — by learning a close-enough surrogate, doing cluster moves on the surrogate, and patching the gap with Metropolis-Hastings.

Exercises

A full exercise set is available for this topic, structured as one worked example + 7 practice problems (across 7 surface contexts) + 2 pattern-resistant check problems.

Open the Self-Learning Monte Carlo exercise set → Compute the SLMC Metropolis-Hastings acceptance α = min(1, exp(-β(ΔH_true − ΔH_eff))) for a proposed move on the surrogate; reason about how surrogate quality, temperature, and proposal structure determine acceptance and overall sampling efficiency.

References

Liu, J., Qi, Y., Meng, Z. Y., & Fu, L. (2017). Self-learning Monte Carlo method. Phys. Rev. B 95, 041101(R).
Wolff, U. (1989). Collective Monte Carlo updating for spin systems. Phys. Rev. Lett. 62, 361.
Huang, L., & Wang, L. (2017). Accelerated Monte Carlo simulations with restricted Boltzmann machines. Phys. Rev. B 95, 035105.
Nagai, Y., Shen, H., Qi, Y., Liu, J., & Fu, L. (2017). Self-learning Monte Carlo method: Continuous-time algorithm. Phys. Rev. B 96, 161102(R).