Where ABC Breaks

Simulation-Based Inference

ABC is clean and correct. So why does anyone need anything else? Four reasons, mostly. They're not subtle — each one is something you can feel as soon as you try ABC on anything bigger than the pinball.

Wasted simulations

Go back to the widget from the previous post. Set and run 500 simulations. The accept rate hovers around 15–20%. Now drag down to 0.5. The posterior tightens, which is what you wanted, but the accept rate falls under 2%. To get the same number of accepted samples you need an order of magnitude more simulations.

The limit is unbiased, but the cost in simulator runs is brutal. For the pinball, where one simulation is microseconds, this hurts but you survive. For a Lotka-Volterra system at seconds per simulation, you're suddenly looking at hours. For a climate model at minutes per simulation, ABC is over before it starts.

The waste is structural, not algorithmic. ABC discards any simulation that doesn't land inside the tolerance ball — which is most of them, for any sharp posterior. There's no way to recycle that information into refining future proposals or learning about globally. Every simulation that doesn't hit is just gone.

The curse of dimensionality

ABC's accept criterion is "simulated within of ". In one dimension that's an interval — a slice through the line. In two dimensions it's a disk in the plane. In dimensions it's a -dimensional ball whose volume collapses like .

Suppose you've calibrated so the accept rate is 10% in 1D. In 2D with the same per-coordinate precision, the accept rate falls to roughly 1%. In 5D, around . In 10D, . In 100D — and 100-dimensional observations are routine (a time series with 100 timepoints, an image with 100 pixels) — you're past , which is fewer accepted samples than there are atoms in the observable universe.

This isn't "ABC needs a faster computer". It's "ABC is structurally the wrong tool for high-dimensional ". The geometry of the tolerance ball loses to the geometry of the ambient space, and no amount of compute fixes that.

The summary-statistic burden

The standard workaround for high-dimensional is to compute a low-dimensional summary and accept based on in -space instead of -space. The mean, the variance, the autocorrelation at lag 1 — pick the projections you think carry the information about , and let the rest go.

Two problems with this. First, you have to pick . If you pick badly — if your summaries throw away information that mattered for — you get a biased posterior, possibly without realizing it. Second, there's no general recipe for choosing summaries. The "right" ones depend on in a way that's only obvious in retrospect, and in practice summary-statistic selection is its own subfield. ABC papers spend a lot of pages on it for good reason.

The deeper problem: if is genuinely high-dimensional — an image, a long time series, a particle physics event — there might not be a low-dimensional that retains enough information about . In that regime ABC is structurally limited, not by compute but by the fact that you've been asked to hand-engineer compression of a complicated object you don't fully understand.

No amortization

Every ABC run is for a single . New observation → new tolerance band → new sampling loop. None of the work from the previous run carries over. The simulations you ran for galaxy 1 don't help you with galaxy 2.

This matters whenever you want inference on many observations from the same model. A thousand galaxies, a thousand outbreak time series, a thousand gravitational-wave events — each is a fresh ABC run from scratch. The total cost scales linearly with the number of observations, and most real applications have lots of observations.

What each failure points at

Each failure mode tells you what the next method has to provide.

Wasted simulations. We want every simulation to contribute information, not just the ones that land in the tolerance ball. That means using rejected simulations too — fitting some kind of function to the whole dataset instead of filtering it.

The curse of dimensionality. We want a method whose cost doesn't blow up with . The way out is to stop thinking of "match the observation" as a geometric distance check in -space and start thinking of it as learning a function of .

The summary-statistic burden. We want the method to find its own informative summary of , rather than asking us to guess one. That means letting a learnable function play the role of .

No amortization. We want inference to be a function of — train once, apply at any new observation in a single forward pass — instead of a procedure that restarts from zero each time.

All four point at the same thing: replace the rejection step with a learned function that approximates the posterior directly. Train it on simulated pairs from anywhere in the prior; query it at any for free. The cost moves from "per-observation simulator runs" to "one-time training pass on a big simulated dataset".

The trade isn't free. You're now choosing an architecture, choosing a loss, and trusting a network. The bias source changes from "the well-understood -kernel" to "whatever the network failed to fit", which is much harder to characterize. This is why the calibration material in a later post is non-optional — neural posteriors need diagnostics that ABC didn't.

Before the network can fit anything useful, though, we need it to be flexible enough to represent the posteriors that real simulators produce: skewed, multimodal, bounded, correlated across dimensions. Plain Gaussian density estimators won't cut it. The right tool is from a different branch of ML, and the next post is about it: normalizing flows.