Machine Learning

Self-Learning Monte Carlo — exercises

Compute the SLMC Metropolis-Hastings acceptance α = min(1, exp(-β(ΔH_true − ΔH_eff))) for a proposed move on the surrogate; reason about how surrogate quality, temperature, and proposal structure determine acceptance and overall sampling efficiency.

1 worked example · 7 practice problems · 2 check problems

Worked example: SLMC acceptance for a 4-spin ring

Problem. A 4-spin Ising ring has true Hamiltonian with , and surrogate with . At , the current state is . A Wolff cluster built on the surrogate proposes flipping spins 3 and 4 to give . Compute the MH acceptance probability.

Diagnosis. SLMC's MH acceptance simplifies to because Wolff on the surrogate already accounts for exactly in its proposal distribution. Compute both and directly from bond sums and plug in.

Predict before reading on: before computing: is the proposed move favorable under the true Hamiltonian (i.e., )? Reading off the spin configurations, what's the obvious heuristic?

Bond accounting. For a 4-spin ring, the NN bonds are and the NNN bonds are . Compute bond sums on each state:

stateNN sumNNN sum
0−2
42

Energy changes.

The surrogate predicts a slightly better move than the truth delivers: vs . The surrogate is "too optimistic" by 0.2 units of energy.

Predict before reading on: predict the sign of deviating from 1. If the surrogate is too optimistic, will MH reject some moves or accept them all?

MH acceptance.

About 82% of moves of this type are accepted. The MH correction "tax" for the surrogate's optimism is roughly 18% of proposals.

Verification.

import numpy as np

def H(s, J1, J2):
    n = len(s)
    NN  = sum(s[i] * s[(i+1) % n] for i in range(n))
    NNN = sum(s[i] * s[(i+2) % n] for i in range(n)) / 2
    return -J1 * NN - J2 * NNN

s_old = np.array([+1, +1, -1, -1])
s_new = np.array([+1, +1, +1, +1])

dH_true = H(s_new, 1.0, 0.2) - H(s_old, 1.0, 0.2)
dH_eff  = H(s_new, 1.25, 0) - H(s_old, 1.25, 0)
alpha   = min(1, np.exp(-1.0 * (dH_true - dH_eff)))
print(f"α = {alpha:.4f}")    # ≈ 0.8187

Articulate: state in one sentence what the MH factor is correcting for, and what it would equal if the surrogate were exact.


Practice problems

Seven problems. The first sharpens the bond-accounting muscle; the next test scaling and verification; the last two transfer the technique to other domains (Bayesian inference, multi-fidelity Monte Carlo).

P.1 spin-config bond accounting on an antiferromagnetic state

Same 4-spin ring with . Current state (Néel order). A proposed move flips only spin 1: .

Compute , , and .

Find the analogue: same bond-counting move as the worked example, different state. The relevant bonds touching spin 1 are for NN and — wait, are there two distinct NNN bonds touching spin 1, or one? Recheck the worked example's bond enumeration.

show answer

NN bonds: . NNN bonds: . Two bonds involving spin 1 in each set.

State : NN sum ; NNN sum .

.

State : NN sum ; NNN sum .

.

.

. Lower acceptance than the worked example because the surrogate is more wrong here — it predicted but the truth gave . The big mismatch (1.4 units) makes the MH correction noticeably penalize this kind of move.

P.2 temperature scaling of acceptance

Take the worked example's move (). Compute at temperatures (i.e., ). What's the qualitative limit as ? As ?

Find the analogue: only enters the acceptance through the exponent. The factor that changes between moves is ; temperature is the dial.

show answer

With :

  • :
  • :
  • :
  • :

Limits: as , , the MH factor — at high temperature the algorithm accepts almost everything because thermal fluctuations dominate over the surrogate's mistakes. As , , any positive kills the acceptance — the surrogate must be essentially exact to be useful near zero temperature.

This is one reason SLMC works best at moderate temperatures (near for critical-slowing-down applications): the MH correction is gentle enough that acceptance stays high, but the surrogate-vs-truth mismatch is large enough that proposing via the surrogate genuinely matters.

P.3 detailed-balance verification

For the worked example states and , verify the SLMC update satisfies detailed balance for the true distribution . Specifically:

(a) Compute .

(b) The Wolff-on-surrogate proposal has the property . Compute this ratio.

(c) Compute the acceptance ratio .

(d) Verify , equivalently .

Find the analogue: this is the proof from the concept page, made concrete on the worked example's states. Compute each factor numerically and check that detailed balance holds.

show answer

(a) . .

(b) . .

(c) Forward MH factor: . Since this is , .

Reverse MH factor: . Since this is , .

Ratio .

(d) Detailed balance demands (rearranged). Plug in: RHS , matches to three decimals. ✓

The MH factor exactly compensates for the surrogate's bias in the proposal distribution, sending the chain to the true equilibrium regardless of how the proposal was generated.

P.4 surrogate-quality sensitivity

The worked example used (a fit). What if the surrogate is poorly chosen? Compute for the same move at using . Interpret the trend: why does for under-estimated , and why does it crash for over-estimated values?

Find the analogue: controls how much energy the surrogate thinks the move saves. The MH correction is the gap between the surrogate's prediction and the truth.

show answer

With fixed, .

  • : , gap , .
  • : gap , .
  • : gap , .
  • : , gap , .
  • : gap , .

Under-fit surrogates (gap negative): the surrogate thinks the move is worse than the truth, so the truth's improvement "rewards" the MH ratio — every move accepted. Sounds good, but in practice it means the surrogate isn't doing its proposal job: it's not capturing the easy correlated moves that make cluster proposals useful. The chain then can't decorrelate fast because the surrogate proposals look like single-spin flips.

Over-fit surrogates (gap positive, large): surrogate thinks every move is wonderful, MH rejects most of them. Acceptance crashes. Average accepted move is rare; chain stalls.

Sweet spot: the gap is small in absolute value. SLMC papers typically tune the surrogate by minimizing the variance of , which is the loss whose minimization aligns with maximizing acceptance.

P.5 acceptance rate as diagnostic

You run an SLMC chain for 10,000 steps on a complicated model and observe the following:

  • Run A: mean acceptance , autocorrelation time steps.
  • Run B: mean acceptance , autocorrelation time steps.
  • Run C: mean acceptance , autocorrelation time steps.

Which run is healthy? What's likely wrong with each unhealthy run? What would you do to fix it?

Find the analogue: mean acceptance is the macroscopic average of the per-move from P.4. Combined with autocorrelation time, it diagnoses what's gone wrong in the sampling.

show answer

Run A is healthy. 85% acceptance and short autocorrelation: the surrogate is close enough that the MH correction is mild, and the cluster moves are large enough to decorrelate the chain fast.

Run B is "over-confident surrogate." Acceptance 10% means the surrogate-vs-truth gap is large on most proposals — the surrogate is over-estimating how good cluster moves are, and the truth keeps rejecting them. drops some, but per-step cost is wasted. Fix: re-fit the surrogate on samples from a longer warmup, or use a more expressive surrogate (e.g., add NNN to ).

Run C is "trivial surrogate." Acceptance 99% — every move accepted — but is large. This is the signature of a surrogate so weak it proposes near-identity moves (e.g., single-spin flips disguised as clusters). The MH correction is mild because the proposals don't change much. Fix: use a richer surrogate that actually produces correlated cluster proposals, even if it lowers to 60-80%. The product (cluster size) is the right thing to maximize, not alone.

General rule for MCMC: in the 50-90% range is a good sign with the right move kernel. Very high or very low almost always means the move kernel is mismatched to the problem.

P.6 Bayesian inference with cheap likelihood

A Bayesian inference problem has a parameter with prior and likelihood that requires running an expensive simulator. Training a neural-network surrogate likelihood gives a cheap approximation. Construct an SLMC-style sampler that targets the true posterior using for proposals.

(a) Write down the proposal distribution and the MH acceptance ratio explicitly.

(b) When is this useful? When does it fail?

Find the analogue: the technique extends to any setting where you have a fast surrogate for an expensive quantity inside an MCMC acceptance criterion. In Bayesian inference, "expensive Hamiltonian" becomes "expensive likelihood." Same formula, different domain.

show answer

(a) Proposal: draw from any MCMC kernel against the surrogate posterior — e.g., a few HMC or Metropolis steps targeting . Call this proposal . It satisfies detailed balance for , giving .

Acceptance:

The priors cancel (when the same prior shows up in both factors):

If you write (with absorbed into the log-likelihood scale), this is exactly the SLMC formula. The expensive likelihood gets evaluated only once per accepted move; the cheap surrogate handles the expensive exploration.

(b) Useful when: simulator evaluations dominate cost (radiative transfer, climate models, gravitational-wave templates, cosmological N-body) and the surrogate captures most of the likelihood structure. Massive wall-clock speedups when the ratio of "expensive eval cost" to "proposal cost" is large.

Fails when: the surrogate misses high-density regions (over-confident in the wrong places), or when the surrogate is biased — the MH correction can correct for this in principle, but the variance of blows up and acceptance crashes. Fix: train the surrogate on the chain's actual samples (active learning), or use multi-level/delayed-rejection variants.

P.7 multi-fidelity Monte Carlo framework

SLMC is one example of a broader pattern called two-level Monte Carlo: a low-fidelity model is used to propose, a high-fidelity model is used to correct. List the analogues of (surrogate, target) in three other settings:

(a) Delayed-rejection adaptive Metropolis

(b) Multi-level Monte Carlo for PDE-constrained quantities

(c) Hamiltonian Monte Carlo with surrogate gradients

For each, identify what plays the role of and what plays the role of , and what corresponds to the MH acceptance step.

Find the analogue: identify the recurring two-component pattern: a cheap thing that drives exploration, and an expensive thing that vetoes mistakes. Different domains, same architectural move.

show answer

(a) Delayed-rejection adaptive Metropolis (DRAM): A first proposal from a cheap (often Gaussian-random-walk) distribution. If rejected, a second proposal is drawn from a more expensive (adaptive, covariance-tuned) distribution. The first proposal plays the role of -driven (cheap and wrong-but-fast); the second proposal corrects toward . MH acceptance happens at both stages, with the second-stage formula accounting for the first-stage rejection.

(b) Multi-level Monte Carlo (MLMC): Computing where solves an expensive PDE. Decompose the expectation as across fidelity levels . The cheap coarse-grid solution is the ; the fine-grid is . There's no MH acceptance — the correction is additive rather than ratio-based — but the architecture is the same: cheap thing gets most of the answer, expensive thing fixes the remainder.

(c) HMC with surrogate gradients: HMC needs at every leapfrog step. Use a neural-network gradient for the integration (cheap), then accept/reject the final state using the true Hamiltonian (expensive log-density evaluation, but only once per trajectory). The leapfrog with surrogate gradients plays the role of Wolff-on-surrogate; the final MH acceptance corrects to the true target.

The shared abstraction: a fast inner loop driven by a learned approximation, plus a slower outer-loop correction step that guarantees the right asymptotic answer. This pattern shows up everywhere expensive likelihoods or gradients are the bottleneck; surrogate models trained from chain samples are now standard in cosmology, biophysics, and climate inference.


Check problems

Two problems that don't pattern-match against the practice set. The first works through a non-trivial detailed-balance proof in a different setting; the second tests when the algorithm itself becomes counterproductive.

Check 1 derivation

The concept page states that the SLMC acceptance simplifies to because the Wolff-on-surrogate proposal distribution satisfies .

Derive this simplification from the full Metropolis-Hastings ratio:

with . Identify exactly where the Wolff-on-surrogate property of is used.

Then explain in 100-150 words why the SLMC paper's authors had to prove the Wolff-on-surrogate has this detailed-balance property — what would go wrong if a different proposal scheme were used that didn't satisfy it?

show solution sketch

Derivation. Substitute the targets into the MH ratio:

Wolff-on-surrogate guarantees , so .

Substituting:

Taking gives the stated formula. ✓

The Wolff-on-surrogate property is used exactly once, to rewrite . Without it, the proposal asymmetry doesn't collapse to a function of alone, and the MH ratio retains an explicit ratio that the algorithm doesn't know how to evaluate.

Why the proof matters. Wolff is one of a small handful of cluster algorithms with a known, closed-form expression for . Most ad-hoc cluster constructions don't satisfy any such identity — you can build clusters in clever ways, but you typically can't write down the proposal probability in closed form because it involves summing over all the paths that could have generated the same cluster. Without that, the MH ratio is uncomputable, and you can't certify that the chain samples the right distribution. The Liu et al. SLMC paper's contribution is the observation that Wolff does give you the needed identity, so you can plug in any reasonable surrogate Hamiltonian (linear in spins, polynomial, NN-only, etc.) and still get an exact MCMC algorithm for the true target.

Check 2 diagnosis

The following pseudocode claims to implement SLMC for the J1-J2 Ising model. It runs without errors and produces sensible-looking magnetization values on small lattices. But it has a subtle bug that breaks detailed balance, and the chain's stationary distribution is not the true J1+J2 equilibrium.

def slmc_step(spins, beta, J_eff, J1, J2, rng):
    # Build Wolff cluster on surrogate
    cluster = build_wolff_cluster(spins, beta, J_eff, rng)
    proposed = spins.copy()
    proposed[cluster] *= -1

    # MH acceptance against true Hamiltonian
    dH_true = H_true(proposed, J1, J2) - H_true(spins, J1, J2)

    # Bug: forgets to account for surrogate's contribution
    if rng.random() < np.exp(-beta * dH_true):
        return proposed
    return spins

Identify the bug. Explain in 150-250 words why this breaks detailed balance, what the chain converges to instead (qualitatively), and how the corrected version differs.

show solution sketch

The bug. The acceptance criterion uses alone, treating Wolff-on-surrogate as a symmetric proposal. The actual SLMC MH ratio requires — the surrogate's energy change subtracts out the proposal asymmetry of Wolff-on-surrogate.

The fix:

dH_eff = H_eff(proposed, J_eff) - H_eff(spins, J_eff)
if rng.random() < np.exp(-beta * (dH_true - dH_eff)):
    return proposed

Why this breaks detailed balance. Wolff with bond probability generates moves whose ratio equals not 1. By treating it as symmetric (using ), the algorithm effectively samples from , modulated by Wolff's bias.

Concretely, the chain converges to a distribution proportional to — the wrong target. For ferromagnetic > 0, the chain over-emphasizes magnetized configurations (because the cumulative double-counts the ferromagnetic coupling). On a small lattice this may not be obvious — small magnetic susceptibility issues — but on a larger system you'd see shifted, magnetization curves systematically wrong, and Binder cumulant intercepts off.

This is exactly the kind of bug that would pass a "the code runs and outputs reasonable numbers" sanity check but produce systematically biased physics. It's why SLMC implementations should be validated against a known cluster algorithm (Wolff alone, on a model where it works) in the limit , and against a brute-force enumeration on tiny lattices.