Compute the SLMC Metropolis-Hastings acceptance α = min(1, exp(-β(ΔH_true − ΔH_eff))) for a proposed move on the surrogate; reason about how surrogate quality, temperature, and proposal structure determine acceptance and overall sampling efficiency.
1 worked example · 7 practice problems · 2 check problems
Worked example: SLMC acceptance for a 4-spin ring
Problem. A 4-spin Ising ring has true Hamiltonian Htrue=−J1∑⟨ij⟩sisj−J2∑⟨⟨ij⟩⟩sisj with J1=1,J2=0.2, and surrogate Heff=−Jeff∑⟨ij⟩sisj with Jeff=1.25. At β=1, the current state is s=(+1,+1,−1,−1). A Wolff cluster built on the surrogate proposes flipping spins 3 and 4 to give s′=(+1,+1,+1,+1). Compute the MH acceptance probability.
Diagnosis. SLMC's MH acceptance simplifies to α=min(1,e−β(ΔHtrue−ΔHeff)) because Wolff on the surrogate already accounts for ΔHeff exactly in its proposal distribution. Compute both ΔHtrue and ΔHeff directly from bond sums and plug in.
Predict before reading on: before computing: is the proposed move favorable under the true Hamiltonian (i.e., ΔHtrue<0)? Reading off the spin configurations, what's the obvious heuristic?
Bond accounting. For a 4-spin ring, the NN bonds are (1,2),(2,3),(3,4),(4,1) and the NNN bonds are (1,3),(2,4). Compute bond sums on each state:
state
NN sum
NNN sum
Htrue
Heff
s=(+,+,−,−)
0
−2
0+0.4=0.4
0
s′=(+,+,+,+)
4
2
−4−0.4=−4.4
−5.0
Energy changes.
ΔHtrue=−4.4−0.4=−4.8ΔHeff=−5.0−0=−5.0
The surrogate predicts a slightly better move than the truth delivers: ΔHeff=−5.0 vs ΔHtrue=−4.8. The surrogate is "too optimistic" by 0.2 units of energy.
Predict before reading on: predict the sign of α deviating from 1. If the surrogate is too optimistic, will MH reject some moves or accept them all?
About 82% of moves of this type are accepted. The MH correction "tax" for the surrogate's optimism is roughly 18% of proposals.
Verification.
import numpy as npdef H(s, J1, J2): n = len(s) NN = sum(s[i] * s[(i+1) % n] for i in range(n)) NNN = sum(s[i] * s[(i+2) % n] for i in range(n)) / 2 return -J1 * NN - J2 * NNNs_old = np.array([+1, +1, -1, -1])s_new = np.array([+1, +1, +1, +1])dH_true = H(s_new, 1.0, 0.2) - H(s_old, 1.0, 0.2)dH_eff = H(s_new, 1.25, 0) - H(s_old, 1.25, 0)alpha = min(1, np.exp(-1.0 * (dH_true - dH_eff)))print(f"α = {alpha:.4f}") # ≈ 0.8187
Articulate: state in one sentence what the MH factor e−β(ΔHtrue−ΔHeff) is correcting for, and what it would equal if the surrogate were exact.
Practice problems
Seven problems. The first sharpens the bond-accounting muscle; the next test scaling and verification; the last two transfer the technique to other domains (Bayesian inference, multi-fidelity Monte Carlo).
P.1spin-config bond accounting on an antiferromagnetic state
Same 4-spin ring with J1=1,J2=0.2,β=1,Jeff=1.25. Current state s=(+1,−1,+1,−1) (Néel order). A proposed move flips only spin 1: s′=(−1,−1,+1,−1).
Compute ΔHtrue, ΔHeff, and α.
Find the analogue:
same bond-counting move as the worked example, different state. The relevant bonds touching spin 1 are (1,2),(4,1) for NN and (1,3),(3,1)=(1,3) — wait, are there two distinct NNN bonds touching spin 1, or one? Recheck the worked example's bond enumeration.
show answer
NN bonds: (1,2),(2,3),(3,4),(4,1). NNN bonds: (1,3),(2,4). Two bonds involving spin 1 in each set.
State s=(+,−,+,−): NN sum =−1−1−1−1=−4; NNN sum =(+)(+)+(−)(−)=+2.
Htrue(s)=4−0.4=+3.6,Heff(s)=5.0.
State s′=(−,−,+,−): NN sum =+1−1−1+1=0; NNN sum =(−)(+)+(−)(−)=0.
Htrue(s′)=0,Heff(s′)=0.
ΔHtrue=−3.6,ΔHeff=−5.0,ΔHtrue−ΔHeff=+1.4.
α=min(1,e−1.4)≈0.247. Lower acceptance than the worked example because the surrogate is more wrong here — it predicted −5 but the truth gave −3.6. The big mismatch (1.4 units) makes the MH correction noticeably penalize this kind of move.
P.2temperature scaling of acceptance
Take the worked example's move (ΔHtrue−ΔHeff=+0.2). Compute α at temperatures T∈{0.5,1,2,5} (i.e., β∈{2,1,0.5,0.2}). What's the qualitative limit as T→∞? As T→0?
Find the analogue:β only enters the acceptance through the exponent. The factor that changes between moves is ΔHtrue−ΔHeff; temperature is the dial.
show answer
With ΔHtrue−ΔHeff=+0.2:
T=0.5: α=e−2⋅0.2=e−0.4≈0.670
T=1: α=e−0.2≈0.819
T=2: α=e−0.1≈0.905
T=5: α=e−0.04≈0.961
Limits: as T→∞, β→0, the MH factor →1 — at high temperature the algorithm accepts almost everything because thermal fluctuations dominate over the surrogate's mistakes. As T→0, β→∞, any positive ΔHtrue−ΔHeff kills the acceptance — the surrogate must be essentially exact to be useful near zero temperature.
This is one reason SLMC works best at moderate temperatures (near Tc for critical-slowing-down applications): the MH correction is gentle enough that acceptance stays high, but the surrogate-vs-truth mismatch is large enough that proposing via the surrogate genuinely matters.
P.3detailed-balance verification
For the worked example states s=(+,+,−,−) and s′=(+,+,+,+), verify the SLMC update satisfies detailed balance for the true distribution π(s)∝e−βHtrue(s). Specifically:
(a) Compute π(s′)/π(s).
(b) The Wolff-on-surrogate proposal has the property g(s→s′)/g(s′→s)=e−β(Heff(s′)−Heff(s)). Compute this ratio.
Find the analogue:
this is the proof from the concept page, made concrete on the worked example's states. Compute each factor numerically and check that detailed balance holds.
(c) Forward MH factor: e−β(ΔHtrue−ΔHeff)=e−0.2≈0.819. Since this is <1, α(s→s′)=0.819.
Reverse MH factor: e−β(−ΔHtrue+ΔHeff)=e+0.2≈1.221. Since this is >1, α(s′→s)=1.
Ratio α(s→s′)/α(s′→s)=0.819.
(d) Detailed balance demands π(s′)/π(s)=[g(s→s′)/g(s′→s)]⋅[α(s→s′)/α(s′→s)] (rearranged). Plug in: RHS =148.41×0.819≈121.49, matches π(s′)/π(s)≈121.51 to three decimals. ✓
The MH factor exactly compensates for the surrogate's bias in the proposal distribution, sending the chain to the true equilibrium π regardless of how the proposal was generated.
P.4surrogate-quality sensitivity
The worked example used Jeff=1.25 (a fit). What if the surrogate is poorly chosen? Compute α for the same move (+,+,−,−)→(+,+,+,+) at β=1 using Jeff∈{0.5,1.0,1.25,2.0,5.0}. Interpret the trend: why does α=1 for under-estimated Jeff, and why does it crash for over-estimated values?
Find the analogue:Jeff controls how much energy the surrogate thinks the move saves. The MH correction is the gap between the surrogate's prediction and the truth.
show answer
With ΔHtrue=−4.8 fixed, ΔHeff=−4Jeff.
Jeff=0.5: ΔHeff=−2, gap =−4.8−(−2)=−2.8, α=min(1,e+2.8)=1.
Jeff=1.0: gap =−0.8, α=1.
Jeff=1.25: gap =+0.2, α≈0.819.
Jeff=2.0: ΔHeff=−8, gap =+3.2, α≈0.041.
Jeff=5.0: gap =+15.2, α≈2.5×10−7.
Under-fit surrogates (gap negative): the surrogate thinks the move is worse than the truth, so the truth's improvement "rewards" the MH ratio — every move accepted. Sounds good, but in practice it means the surrogate isn't doing its proposal job: it's not capturing the easy correlated moves that make cluster proposals useful. The chain then can't decorrelate fast because the surrogate proposals look like single-spin flips.
Over-fit surrogates (gap positive, large): surrogate thinks every move is wonderful, MH rejects most of them. Acceptance crashes. Average accepted move is rare; chain stalls.
Sweet spot: the gap is small in absolute value. SLMC papers typically tune the surrogate by minimizing the variance of ΔHtrue−ΔHeff, which is the loss whose minimization aligns with maximizing acceptance.
P.5acceptance rate as diagnostic
You run an SLMC chain for 10,000 steps on a complicated model and observe the following:
Run A: mean acceptance αˉ=0.85, autocorrelation time τ=5 steps.
Run B: mean acceptance αˉ=0.10, autocorrelation time τ=30 steps.
Run C: mean acceptance αˉ=0.99, autocorrelation time τ=200 steps.
Which run is healthy? What's likely wrong with each unhealthy run? What would you do to fix it?
Find the analogue:
mean acceptance is the macroscopic average of the per-move α from P.4. Combined with autocorrelation time, it diagnoses what's gone wrong in the sampling.
show answer
Run A is healthy. 85% acceptance and short autocorrelation: the surrogate is close enough that the MH correction is mild, and the cluster moves are large enough to decorrelate the chain fast.
Run B is "over-confident surrogate." Acceptance 10% means the surrogate-vs-truth gap is large on most proposals — the surrogate is over-estimating how good cluster moves are, and the truth keeps rejecting them. τ drops some, but per-step cost is wasted. Fix: re-fit the surrogate on samples from a longer warmup, or use a more expressive surrogate (e.g., add NNN to Heff).
Run C is "trivial surrogate." Acceptance 99% — every move accepted — but τ=200 is large. This is the signature of a surrogate so weak it proposes near-identity moves (e.g., single-spin flips disguised as clusters). The MH correction is mild because the proposals don't change much. Fix: use a richer surrogate that actually produces correlated cluster proposals, even if it lowers αˉ to 60-80%. The product αˉ× (cluster size) is the right thing to maximize, not αˉ alone.
General rule for MCMC: αˉ in the 50-90% range is a good sign with the right move kernel. Very high or very low αˉ almost always means the move kernel is mismatched to the problem.
P.6Bayesian inference with cheap likelihood
A Bayesian inference problem has a parameter θ with prior p(θ) and likelihood p(y∣θ) that requires running an expensive simulator. Training a neural-network surrogate likelihoodL^(θ) gives a cheap approximation. Construct an SLMC-style sampler that targets the true posterior p(θ∣y)∝p(y∣θ)p(θ) using L^ for proposals.
(a) Write down the proposal distribution and the MH acceptance ratio explicitly.
(b) When is this useful? When does it fail?
Find the analogue:
the technique extends to any setting where you have a fast surrogate for an expensive quantity inside an MCMC acceptance criterion. In Bayesian inference, "expensive Hamiltonian" becomes "expensive likelihood." Same formula, different domain.
show answer
(a) Proposal: draw θ′ from any MCMC kernel against the surrogate posterior p^(θ∣y)∝L^(θ)p(θ) — e.g., a few HMC or Metropolis steps targeting p^. Call this proposal g(θ→θ′). It satisfies detailed balance for p^, giving g(θ→θ′)/g(θ′→θ)=p^(θ′)/p^(θ).
The priors cancel (when the same prior shows up in both factors):
α=min(1,p(y∣θ)/L^(θ)p(y∣θ′)/L^(θ′))
If you write logp(y∣θ)−logL^(θ)≡−β(Htrue−Heff) (with β absorbed into the log-likelihood scale), this is exactly the SLMC formula. The expensive likelihood gets evaluated only once per accepted move; the cheap surrogate handles the expensive exploration.
(b) Useful when: simulator evaluations dominate cost (radiative transfer, climate models, gravitational-wave templates, cosmological N-body) and the surrogate captures most of the likelihood structure. Massive wall-clock speedups when the ratio of "expensive eval cost" to "proposal cost" is large.
Fails when: the surrogate misses high-density regions (over-confident in the wrong places), or when the surrogate is biased — the MH correction can correct for this in principle, but the variance of logp/logL^ blows up and acceptance crashes. Fix: train the surrogate on the chain's actual samples (active learning), or use multi-level/delayed-rejection variants.
P.7multi-fidelity Monte Carlo framework
SLMC is one example of a broader pattern called two-level Monte Carlo: a low-fidelity model is used to propose, a high-fidelity model is used to correct. List the analogues of (surrogate, target) in three other settings:
(a) Delayed-rejection adaptive Metropolis
(b) Multi-level Monte Carlo for PDE-constrained quantities
(c) Hamiltonian Monte Carlo with surrogate gradients
For each, identify what plays the role of Heff and what plays the role of Htrue, and what corresponds to the MH acceptance step.
Find the analogue:
identify the recurring two-component pattern: a cheap thing that drives exploration, and an expensive thing that vetoes mistakes. Different domains, same architectural move.
show answer
(a) Delayed-rejection adaptive Metropolis (DRAM): A first proposal from a cheap (often Gaussian-random-walk) distribution. If rejected, a second proposal is drawn from a more expensive (adaptive, covariance-tuned) distribution. The first proposal plays the role of Heff-driven (cheap and wrong-but-fast); the second proposal corrects toward Htrue. MH acceptance happens at both stages, with the second-stage formula accounting for the first-stage rejection.
(b) Multi-level Monte Carlo (MLMC): Computing E[f(u)] where u solves an expensive PDE. Decompose the expectation as E[f(u0)]+∑ℓE[f(uℓ)−f(uℓ−1)] across fidelity levels 0,1,…,L. The cheap coarse-grid solution u0 is the Heff; the fine-grid uL is Htrue. There's no MH acceptance — the correction is additive rather than ratio-based — but the architecture is the same: cheap thing gets most of the answer, expensive thing fixes the remainder.
(c) HMC with surrogate gradients: HMC needs ∇logp(θ∣y) at every leapfrog step. Use a neural-network gradient ∇logp for the integration (cheap), then accept/reject the final state using the true Hamiltonian (expensive log-density evaluation, but only once per trajectory). The leapfrog with surrogate gradients plays the role of Wolff-on-surrogate; the final MH acceptance corrects to the true target.
The shared abstraction: a fast inner loop driven by a learned approximation, plus a slower outer-loop correction step that guarantees the right asymptotic answer. This pattern shows up everywhere expensive likelihoods or gradients are the bottleneck; surrogate models trained from chain samples are now standard in cosmology, biophysics, and climate inference.
Check problems
Two problems that don't pattern-match against the practice set. The first works through a non-trivial detailed-balance proof in a different setting; the second tests when the algorithm itself becomes counterproductive.
Check 1derivation
The concept page states that the SLMC acceptance simplifies to α=min(1,e−β(ΔHtrue−ΔHeff)) because the Wolff-on-surrogate proposal distribution g satisfies g(s→s′)/g(s′→s)=e−β(Heff(s′)−Heff(s)).
Derive this simplification from the full Metropolis-Hastings ratio:
α(s→s′)=min(1,π(s)g(s→s′)π(s′)g(s′→s))
with π(s)∝e−βHtrue(s). Identify exactly where the Wolff-on-surrogate property of g is used.
Then explain in 100-150 words why the SLMC paper's authors had to prove the Wolff-on-surrogate g has this detailed-balance property — what would go wrong if a different proposal scheme were used that didn't satisfy it?
show solution sketch
Derivation. Substitute the targets into the MH ratio:
The Wolff-on-surrogate property is used exactly once, to rewrite g(s′→s)/g(s→s′). Without it, the proposal asymmetry doesn't collapse to a function of Heff alone, and the MH ratio retains an explicit g ratio that the algorithm doesn't know how to evaluate.
Why the proof matters. Wolff is one of a small handful of cluster algorithms with a known, closed-form expression for g(s→s′)/g(s′→s). Most ad-hoc cluster constructions don't satisfy any such identity — you can build clusters in clever ways, but you typically can't write down the proposal probability in closed form because it involves summing over all the paths that could have generated the same cluster. Without that, the MH ratio is uncomputable, and you can't certify that the chain samples the right distribution. The Liu et al. SLMC paper's contribution is the observation that Wolff does give you the needed identity, so you can plug in any reasonable surrogate Hamiltonian (linear in spins, polynomial, NN-only, etc.) and still get an exact MCMC algorithm for the true target.
Check 2diagnosis
The following pseudocode claims to implement SLMC for the J1-J2 Ising model. It runs without errors and produces sensible-looking magnetization values on small lattices. But it has a subtle bug that breaks detailed balance, and the chain's stationary distribution is not the true J1+J2 equilibrium.
Identify the bug. Explain in 150-250 words why this breaks detailed balance, what the chain converges to instead (qualitatively), and how the corrected version differs.
show solution sketch
The bug. The acceptance criterion uses e−βΔHtrue alone, treating Wolff-on-surrogate as a symmetric proposal. The actual SLMC MH ratio requires e−β(ΔHtrue−ΔHeff) — the surrogate's energy change subtracts out the proposal asymmetry of Wolff-on-surrogate.
Why this breaks detailed balance. Wolff with bond probability padd=1−e−2βJeff generates moves whose ratio g(s→s′)/g(s′→s) equals e−βΔHeff — not 1. By treating it as symmetric (using e−βΔHtrue), the algorithm effectively samples from π(s)∝e−βHtrue(s)⋅e−βHeff(s)⋅const, modulated by Wolff's bias.
Concretely, the chain converges to a distribution proportional to e−β(Htrue+Heff) — the wrong target. For ferromagnetic Jeff > 0, the chain over-emphasizes magnetized configurations (because the cumulative Heff double-counts the ferromagnetic coupling). On a small lattice this may not be obvious — small magnetic susceptibility issues — but on a larger system you'd see Tc shifted, magnetization curves systematically wrong, and Binder cumulant intercepts off.
This is exactly the kind of bug that would pass a "the code runs and outputs reasonable numbers" sanity check but produce systematically biased physics. It's why SLMC implementations should be validated against a known cluster algorithm (Wolff alone, on a model where it works) in the limit J2→0, and against a brute-force enumeration on tiny lattices.