The Path-Integral Frame

Simulation-Based Inference

The series has covered six methods now — ABC, NPE, NLE, NRE, normalizing flows, the synthetic-likelihood and indirect-inference cluster. They look like a collection of tricks for slightly different problems. They aren't. Every one of them is the same calculation: an approximation to a single integral over latent trajectories. Once the integral is in view, the field stops looking like a grab bag of techniques and starts looking like one idea with multiple discretization schemes.

Acryonyms, who doesn't enjoy them. They are the price we pay for brevity. We literally have ABC, NPE, NLE, and NRE as the result of our SBI expedition. This method really pushes back against the notion that we need to make some kind of parametric assumptions about the process that generates our data. You have some kind of magical simulator and the simulator sends out a distribution that is complicated, but nonetheless you have a simulator. Every method is an approach to improving our estimation of what is happening in that distribution without resorting to some nightmare-inducing parametric fitting scheme.

This is the capstone post. It's the angle that pays off if the previous posts landed, and it needs them in place first — if you jumped here cold, the connections will read as abstract. Reading the foundations and coming back is the right move.

The integral

For any stochastic model, the marginal likelihood is an integral over latent state:

p (x ∣ θ) = \int p (x, z ∣ θ) d z = \int p (x ∣ z, θ) p (z ∣ θ) d z

Here $z$ is whatever latent state lives between the parameters and the observation. For the pinball, $z$ is the sequence of kicks and velocities — the trajectory through the eight stages. For a stochastic differential equation, $z$ is the continuous path. For a GAN, $z$ is the latent code. The integral is taken over every possible $z$ that could have produced the observation.

When you can do this integral analytically, you have a closed-form likelihood and you're not in SBI territory. When you can't — and you usually can't, for any model with state-dependent dynamics — this integral is exactly what every SBI method is trying to approximate. The methods correspond to different ways of doing the integration.

The posterior follows from Bayes:

p (θ ∣ x) \propto p (θ) p (x ∣ θ) = p (θ) \int p (x ∣ z, θ) p (z ∣ θ) d z

So everything we've been doing — sampling from posteriors, training flows, computing rejection rates — is at root about approximating this path integral and feeding the result through Bayes' rule (or its frequentist counterpart, depending on taste). The intractable likelihood is the intractable integral. The rest is implementation.

The methods, as approximation schemes

ABC: Monte Carlo rejection on the integrand

ABC is the most direct approximation. It does the integral by Monte Carlo: draw a latent path $z$ from $p (z ∣ θ)$ , push it through the observation map, check whether the resulting $x$ falls inside the $ε$ -ball around $x_{obs}$ . The accept/reject mechanism estimates the integrand at one sampled path. Across many draws, the acceptance rate is an unbiased estimator of a smoothed version of the integral:

P (accept ∣ θ) = \int p (x ∣ z, θ) p (z ∣ θ) 1 [∣ x - x_{obs} ∣ < ε] d x d z ε \to 0 p (x_{obs} ∣ θ)

ABC = Monte Carlo on the path integral with no variance reduction. Every wasted simulation is a path where the integrand happened to be small. The bias is the $ε$ -blurring of the indicator; the variance is the inverse of the accept rate. Both are entirely about the integration scheme, not about Bayesian philosophy.

NPE: a learned conditional propagator

NPE replaces the per- $θ$ integral approximation with a learned function. Instead of running Monte Carlo at each $θ$ , you fit a conditional density $q_{ϕ} (θ ∣ x)$ on simulated $(θ, x)$ pairs. The flow learns the answer to the integral implicitly — the posterior $q_{ϕ} (θ ∣ x)$ is the result of dividing the path-integrated likelihood by the marginal, evaluated at the conditioning $x$ .

Said differently: NPE doesn't evaluate the path integral, it learns the result of having evaluated it. The training data provides empirical realizations of the integrand at simulated $(θ, x)$ pairs; the network interpolates. Once trained, "running the integral" at a new $x_{obs}$ is one forward pass — the amortization that ABC couldn't do.

NLE: a learned kernel

NLE is the same idea but the network learns $p (x ∣ θ)$ directly — a learned kernel for the integral over latent paths. After training, you evaluate the learned likelihood inside a standard MCMC sampler. The integration over $z$ is implicit in the training data; the explicit integration over $θ$ , for posterior sampling, happens in the MCMC step. NLE splits the path integral and the parameter integral into two separately-attacked stages.

NRE: a learned ratio of integrals

NRE estimates the likelihood-to-marginal ratio:

r (x, θ) = \frac{p ( x ∣ θ )}{p ( x )} = \frac{\int p ( x ∣ z , θ ) p ( z ∣ θ ) d z}{\int p ( θ ^{'} ) \int p ( x ∣ z , θ ^{'} ) p ( z ∣ θ ^{'} ) d z d θ ^{'}}

That's a ratio of two path integrals, learned by binary classification. The classifier doesn't need either integral's value — only the ratio, which classification gives you cleanly. This is also the cleanest bridge to frequentist inference: a ratio of likelihoods is the natural input to a Neyman-Pearson test, no prior needed.

Particle filters: forward Monte Carlo

For sequential state-space models, particle filters are the textbook approach. They approximate the same path integral but factor it along the time index: $z = (z_{1}, \dots, z_{T})$ , and the integral becomes a chain of one-step integrals updated by importance sampling. At each step, a cloud of weighted particles approximates the running marginal posterior over latent state; the next observation reweights and resamples.

This is what physics calls forward Monte Carlo evaluation: walk the integrand step by step, carrying a particle representation of the running solution. The path integral is never written down in full — it's evaluated incrementally by sampling.

Diffusion models: score matching as a workaround

Diffusion-based SBI sidesteps the path integral by learning its score rather than its value. Train a network to predict $\nabla_{θ} lo g p (x, θ)$ at every noise level along a forward diffusion process. Then sampling proceeds by integrating a reverse-time stochastic differential equation whose drift is the learned score. The integral itself is never computed; only its log-gradient is, and even that gradient is learned indirectly through denoising.

Geometrically: the score is the local slope of the integrand. Knowing the slope everywhere lets you navigate to high-density regions by gradient flow without ever evaluating the density. This works because of the equivalence between denoising score-matching and the time-reversed SDE — itself a path-integral identity, dressed up in machine-learning notation.

Connections to adjacent fields

The path-integral frame isn't just convenient organization. Path integrals are load-bearing objects in several adjacent fields, and the connections are deep enough that techniques from those fields transfer.

Operator splitting and BCH

A path integral in continuous time is what you get when you take a generator of motion, exponentiate it, and propagate. When the operators that generate different parts of the motion don't commute — positions and momenta in mechanics, different reaction channels in chemistry, drift and diffusion in an SDE — the propagation has to be split. The Baker-Campbell-Hausdorff formula governs the splitting error:

e^{A} e^{B} = e^{A + B + \frac{1}{2} [A, B] + \frac{1}{12} ([A, [A, B]] - [B, [A, B]]) + \dots}

Higher-order commutators are exactly what we're throwing away when we discretize a path integral into time steps. SBI methods that step through a simulator are doing operator splitting whether they call it that or not — each simulator step exponentiates the local generator, and the time-discretized trajectory is the discretized exponential. The pinball's velocity persistence $v \leftarrow αv + kick$ is a first-order BCH splitting of a two-operator system. Recognizing this changes nothing operationally, but it tells you exactly where the discretization bias comes from, and how to reduce it by going to higher order.

Itô versus Stratonovich

For continuous SDEs, the path integral has two flavors depending on whether you discretize time forward (Itô) or symmetrically (Stratonovich). The two conventions give different drift coefficients for the same SDE. SBI for SDEs has to be honest about which convention the simulator uses — a synthetic-likelihood method that assumes one and fits a trajectory generated under the other will systematically misestimate the drift.

This is a place where the path-integral frame pays off concretely. "Which stochastic-calculus convention does your network use?" is not a question you can meaningfully ask. "Which convention does your simulator use?" is a question with a definite answer, and the answer is part of the model specification. SBI inherits the question from its underlying simulator; the inference machinery is downstream of it.

Exchangeability and de Finetti

de Finetti's theorem says that any infinitely exchangeable sequence of observations admits a conditional iid representation given some latent $θ$ . Read in reverse: when observations are exchangeable, there exists a $θ$ such that $p (x_{1}, \dots, x_{n} ∣ θ) = \prod_{i} p (x_{i} ∣ θ)$ . This is the theoretical scaffold for why factoring the joint into prior-times-likelihood is the right move for SBI — the path is over latent $θ$ , and the observations factor.

Most realistic SBI applications have either exchangeable observations (Lotka-Volterra runs over independent realizations of a population) or weakly-exchangeable structure (stationary time series). The de Finetti decomposition over latent parameters and the path integral over latent dynamics are two facets of the same factorization — one across replicate runs of the simulator, the other across stages of a single run.

Feynman-Kac

The Feynman-Kac formula is the explicit statement that solutions to certain PDEs can be written as expectations over stochastic paths. In one direction it's a tool for solving PDEs by sampling. In the other direction it's a tool for computing expectations by solving PDEs. SBI uses it implicitly: the posterior at $x_{obs}$ is the solution to a backward PDE whose dynamics are defined by the simulator, and the simulator computes that solution by stochastic forward sampling.

When you train an NPE network, you're effectively learning the Feynman-Kac solution operator for that PDE — a map from boundary conditions ( $x_{obs}$ ) to PDE solutions (posteriors). This is why amortized neural SBI works at all: it's not learning one solution, it's learning the solution operator, which is a single object regardless of how many boundary conditions you'll query. The PDE-solver framing is the right way to think about why amortization is possible.

Why this frame matters

Three practical payoffs from seeing the field this way.

First, the choice of SBI method becomes a choice of approximation scheme for an integral. That's a much more familiar question than "which neural architecture should I pick" — every field that does path integrals has built up intuitions about which approximations are valid in which regimes. ABC: unbiased but high variance. NPE: lower variance, but committed to a function class. Particle filters: bias from finite particles. Diffusion: bias from approximating the score field. Pick the bias-variance trade you want, knowing what trade you're picking.

Second, the cross-field connections give you transfer tools. Stochastic integration techniques developed for quantum field theory work for SBI. Operator-splitting analysis from numerical analysis is the right way to think about simulator discretization error. The path-integral form of the Cramér-Rao bound tells you when your simulator can identify the parameters at all — before you train anything.

Third, it gives you the right unit of explanation. "What's the path integral here?" has a definite answer for any specific model — it's the marginalization over latent dynamics that produces the likelihood. Once you've written that down, you can see what's hard about the integral (high dimension, non-trivial measure, non-commuting operators) and pick a method whose approximation matches the hard part.

What's still open

Most of these connections are well-developed mathematically, but the SBI literature hasn't systematically harvested them yet. A few visible open problems.

Adaptive approximation. Different regions of the path integral are hard for different reasons — some methods nail the bulk of the integrand, others nail the tails. Combining methods at the level of the integral, rather than at the level of the output posterior, is mostly unstudied.

Calibrated transfer. A network trained on one path integral can sometimes be re-purposed for a related one, but only if the change between integrals is small in some appropriate metric. What that metric should be in path-integral terms — Wasserstein on path-space measures? Operator distance between generators? — is unsettled.

Operator-aware architectures. Coupling and autoregressive flow layers correspond to particular factorizations of the integration measure. A more direct approach would design the flow architecture to match the operator structure of the model being inferred — analogous to how convolutional networks match translation-invariant priors. Early work along these lines exists in lattice quantum field theory applications, but it isn't standard practice in SBI.

The path-integral frame is the right level of abstraction for thinking about questions like these. The field has spent most of its first decade in "introduce a method, benchmark it" mode; the next decade looks more like one where method choice becomes principled, and the principles will come from the integral side, not from the network side.

Closing

SBI started out looking like a collection of tricks for different kinds of intractable models. After this series, it should look like one trick — replace an intractable integral with a tractable approximation — implemented in a few different ways. Sampling, for ABC. Density estimation, for NPE. Ratio estimation, for NRE. Score matching, for diffusion. Operator splitting in time, for particle filters. All of these are concrete answers to the same abstract question: how do you compute an integral you can't write down, given a sampler from its integrand?

That question has a forty-year history outside of SBI — in quantum field theory, in stochastic analysis, in numerical PDE — and SBI is still a young field reaching back across the boundary for tools that were already there. The path-integral frame is the bridge. The methods will keep evolving; the integral was always what they were approximating.