Value at Risk and Expected Shortfall

Finance

Value at Risk (VaR) is the single most-used risk-management number in finance. It answers "How much can I lose in a normal day?" by quoting a quantile of the loss distribution: $VaR_{α}$ is the loss level such that the probability of a worse loss in the chosen time horizon is at most $α$ . Regulators (Basel framework) use 99%-VaR over a 10-day horizon to size bank capital reserves. Hedge funds and prop desks use it as an internal risk metric. Despite well-known limitations — most importantly, that it says NOTHING about how bad the loss is WHEN it does breach — VaR is the lingua franca, and any practical risk system must compute it.

Formally, given a P&L random variable $X$ (positive = gain, negative = loss), the Value at Risk at confidence level $1 - α$ is:

VaR_{α} = - in f {x : P (X \leq x) \geq α} .

In plain English: the $α$ -quantile of the loss distribution, sign-flipped to be reported as a positive number. The 95%-VaR corresponds to $α = 0.05$ ; the 99%-VaR to $α = 0.01$ . "Loss greater than VaR happens once in $1/ α$ periods on average."

Three computational approaches

Each has different assumptions, costs, and failure modes.

1. Historical VaR

Take the empirical $α$ -quantile of historical returns: order the returns from worst to best, pick the one at position $α N$ . NO distributional assumptions. Captures fat tails and skewness in the historical sample. But: only as good as the SAMPLE — if the historical window doesn't contain a regime as bad as the one coming, you miss it. Standard window: 250 trading days (one year). Lookback choice is the only parameter.

2. Parametric (Gaussian) VaR

Assume returns are Gaussian with mean $\overset{μ}{^}$ and standard deviation $\overset{σ}{^}$ (estimated from the sample), and use the inverse CDF:

VaR_{α} = - \overset{μ}{^} - \overset{σ}{^} Φ^{- 1} (α) .

Fast, parsimonious, easy to update incrementally. But misses fat tails: empirical asset returns have far more extreme moves than a Gaussian predicts. The 99% Gaussian VaR is typically an UNDERESTIMATE of the true 99% VaR for equity indices, often by 20-50%.

3. Monte Carlo VaR

Specify a model (which CAN be richer than Gaussian — t-distribution for fat tails, GARCH for time-varying volatility, jump-diffusion for jumps), simulate $N$ P&L scenarios, take the empirical quantile. Computationally heavy but extremely flexible. For a multi-asset portfolio, this is the only approach that handles nonlinear instruments (options) correctly — you simulate the underlying risk factors and revalue the portfolio in each scenario.

Expected shortfall

VaR has a structural flaw: it tells you the THRESHOLD at which the worst $α$ of losses begin, but says NOTHING about how bad those losses are once you've crossed the threshold. A position with $VaR_{0.01} = $1 M$ could lose $1.01M next month or $100M — same VaR, vastly different consequences. EXPECTED SHORTFALL (also called Conditional VaR / CVaR / Average VaR) fixes this:

ES_{α} = - E [X ∣ X \leq - VaR_{α}] = - \frac{1}{α} \int_{0}^{α} VaR_{u} d u .

The expected loss CONDITIONAL on a tail event. $ES_{α} \geq VaR_{α}$ always. ES is COHERENT in the Artzner-Delbaen-Eber-Heath sense (sub-additive, monotone, positively homogeneous, translation-invariant) while VaR isn't (it fails sub-additivity: a diversified portfolio can have HIGHER VaR than the sum of its parts' VaRs in pathological cases). For these reasons, the Basel III/IV framework has moved toward ES as the regulatory metric (Fundamental Review of the Trading Book, 2019).

Backtesting: does the VaR estimate actually work?

Any VaR estimator can be tested historically by counting BREACHES: days when the realized loss exceeded the predicted VaR. By construction, breaches should occur at frequency $α$ (5% of days for 95% VaR). The KUPIEC POF (Proportion of Failures) test (1995) checks this statistically:

LR = - 2 [n lo g α + (T - n) lo g (1 - α) - n lo g (n / T) - (T - n) lo g (1 - n / T)],

where $n$ is the observed number of breaches in $T$ days. Under the null hypothesis (the VaR is correctly calibrated), $LR$ is asymptotically chi-squared with 1 degree of freedom; the 95% critical value is 3.84. Reject if $LR > 3.84$ . The test catches systematic UNDER-estimation (too many breaches) or OVER-estimation (too few). It does NOT catch CLUSTERING of breaches — a more refined test (Christoffersen's) checks that breaches are independent in time as well.

Code

# VaR three ways (historical, parametric, Monte Carlo) and Expected
# Shortfall, with a Kupiec backtest on rolling-window historical VaR.

import numpy as np
from scipy.stats import norm

rng = np.random.default_rng(7)

# Mock daily P&L: 1000 days, slightly positive mean, ~1.5% vol, plus
# two clusters of crisis losses to make the tails realistic.
returns = rng.normal(0.0005, 0.015, 1000)
returns[100:104] -= 0.04
returns[700:703] -= 0.05

def var_historical(returns, alpha=0.05):
    """Historical: -alpha-quantile of empirical P&L distribution."""
    return -np.quantile(returns, alpha)

def var_parametric(returns, alpha=0.05):
    """Parametric: assume Gaussian and use the inverse CDF."""
    mu_r, sd_r = np.mean(returns), np.std(returns, ddof=1)
    return -(mu_r + sd_r * norm.ppf(alpha))

def var_monte_carlo(returns, alpha=0.05, n_sim=1_000_000):
    """Monte Carlo: simulate from fitted Gaussian, take quantile."""
    mu_r, sd_r = np.mean(returns), np.std(returns, ddof=1)
    sim = rng.normal(mu_r, sd_r, n_sim)
    return -np.quantile(sim, alpha)

def expected_shortfall(returns, alpha=0.05):
    """ES = mean loss conditional on exceeding the VaR threshold."""
    thr = np.quantile(returns, alpha)
    return -np.mean(returns[returns <= thr])

for alpha in [0.05, 0.01]:
    print(f"alpha = {alpha:.2f}  ({int(100*(1-alpha))}% confidence VaR):")
    print(f"  Historical VaR:  {var_historical(returns, alpha):.4f}")
    print(f"  Parametric VaR:  {var_parametric(returns, alpha):.4f}")
    print(f"  Monte Carlo VaR: {var_monte_carlo(returns, alpha):.4f}")
    print(f"  Expected Shortfall (CVaR): {expected_shortfall(returns, alpha):.4f}")
    print()

# Kupiec POF (proportion of failures) backtest: estimate VaR from a
# rolling 250-day window, count how often the NEXT day's return exceeds
# the VaR. Expected breach rate = alpha; chi-squared(1) likelihood ratio.
window = 250
alpha = 0.05
breaches, total = 0, 0
for t in range(window, len(returns) - 1):
    v = -np.quantile(returns[t-window:t], alpha)
    if returns[t] < -v:
        breaches += 1
    total += 1
p_obs = breaches / total
LR = (-2 * (breaches*np.log(alpha) + (total-breaches)*np.log(1-alpha)
            - breaches*np.log(p_obs) - (total-breaches)*np.log(1-p_obs))
      if breaches > 0 else 0.0)
print(f"Kupiec POF backtest (rolling 250-day, alpha = {alpha}):")
print(f"  Observed breaches: {breaches}/{total} = {p_obs:.3%}")
print(f"  Expected breaches: {alpha:.0%}")
print(f"  LR statistic = {LR:.2f}  "
      f"({'REJECT' if LR > 3.84 else 'PASS'} at 95% level: chi²(1) crit = 3.84)")

Output:

alpha = 0.05  (95% confidence VaR):
  Historical VaR:  0.0248
  Parametric VaR:  0.0248
  Monte Carlo VaR: 0.0247
  Expected Shortfall (CVaR): 0.0321

alpha = 0.01  (99% confidence VaR):
  Historical VaR:  0.0370
  Parametric VaR:  0.0346
  Monte Carlo VaR: 0.0347
  Expected Shortfall (CVaR): 0.0449

Kupiec POF backtest (rolling 250-day, alpha = 0.05):
  Observed breaches: 39/749 = 5.207%
  Expected breaches: 5%
  LR statistic = 0.07  (PASS at 95% level: chi²(1) crit = 3.84)

Three things to read off. (1) At the 95% confidence level, all three methods give essentially the same answer (~2.5%) — the data is approximately Gaussian over typical scales, so the parametric estimate is fine. At 99%, the historical estimator (3.7%) is HIGHER than the parametric (3.5%) because of the crisis-day shocks in the sample — fat tails that the Gaussian doesn't see. (2) Expected shortfall is roughly 30% higher than VaR at 95% (3.2% vs 2.5%) and 30% higher at 99% (4.5% vs 3.5%) — ES is more sensitive to the tail. (3) The Kupiec backtest PASSES (LR=0.07 vs critical 3.84): 5.2% observed breaches vs 5% expected — well within sampling noise.

How VaR is used in practice

Regulatory capital. Banks compute 99%-VaR over a 10-day horizon under the Basel framework; the calculated VaR (or ES, under the new rules) is multiplied by a regulatory factor and held as required capital. A model whose VaR fails Kupiec gets a higher multiplier — strong incentive to keep the model accurate.
Internal risk limits. Trading desks have VaR limits assigned by management. Going over the limit means the position must be reduced. This is the day-to-day operational use.
Performance evaluation. Risk-adjusted return measures like the $return / VaR$ ratio are tracked alongside Sharpe; they emphasize downside risk over total volatility.
Capital allocation. Across business lines or trading books, allocate capital in proportion to each unit's VaR contribution.

The big failure modes

Tail risk underestimation. Gaussian-based VaR systematically undercounts extreme losses. The 2008 crisis, the 2020 COVID crash, the 1987 Black Monday — all generated tail moves an order of magnitude beyond what Gaussian VaR predicted. The Mandelbrot critique (1963 onward) that asset returns follow fat-tailed (Lévy-stable) rather than Gaussian distributions is still the standard caveat.
Volatility regime shifts. VaR estimated in a low-volatility regime massively understates risk just before regime shifts. The right response: GARCH-based estimators that track volatility clustering, or simply switch to a stress-scenario approach during transitions.
Correlation breakdowns. In crises, asset correlations spike toward 1 — diversification benefits that mean-variance and VaR rely on evaporate exactly when needed. Stress tests rather than statistical VaR are the only honest way to capture this.
Non-coherence. VaR is not sub-additive — a portfolio can have higher VaR than the sum of its parts' VaRs. Mathematically pathological; practically rare but documented. ES avoids this.

Mean-variance portfolio optimization — the asset-allocation side; both use the same variance-covariance machinery.
Monte Carlo option pricing — same simulation machinery applied to option pricing.
Greeks and delta hedging — the option-book risk that VaR aggregates.
Statistics & inference — the underlying inference theory for quantile estimation and likelihood-ratio tests.