Value at Risk and Expected Shortfall

Finance

Value at Risk (VaR) is the single most-used risk-management number in finance. It answers "How much can I lose in a normal day?" by quoting a quantile of the loss distribution: is the loss level such that the probability of a worse loss in the chosen time horizon is at most . Regulators (Basel framework) use 99%-VaR over a 10-day horizon to size bank capital reserves. Hedge funds and prop desks use it as an internal risk metric. Despite well-known limitations — most importantly, that it says NOTHING about how bad the loss is WHEN it does breach — VaR is the lingua franca, and any practical risk system must compute it.

Formally, given a P&L random variable (positive = gain, negative = loss), the Value at Risk at confidence level is:

In plain English: the -quantile of the loss distribution, sign-flipped to be reported as a positive number. The 95%-VaR corresponds to ; the 99%-VaR to . "Loss greater than VaR happens once in periods on average."

Three computational approaches

Each has different assumptions, costs, and failure modes.

1. Historical VaR

Take the empirical -quantile of historical returns: order the returns from worst to best, pick the one at position . NO distributional assumptions. Captures fat tails and skewness in the historical sample. But: only as good as the SAMPLE — if the historical window doesn't contain a regime as bad as the one coming, you miss it. Standard window: 250 trading days (one year). Lookback choice is the only parameter.

2. Parametric (Gaussian) VaR

Assume returns are Gaussian with mean and standard deviation (estimated from the sample), and use the inverse CDF:

Fast, parsimonious, easy to update incrementally. But misses fat tails: empirical asset returns have far more extreme moves than a Gaussian predicts. The 99% Gaussian VaR is typically an UNDERESTIMATE of the true 99% VaR for equity indices, often by 20-50%.

3. Monte Carlo VaR

Specify a model (which CAN be richer than Gaussian — t-distribution for fat tails, GARCH for time-varying volatility, jump-diffusion for jumps), simulate P&L scenarios, take the empirical quantile. Computationally heavy but extremely flexible. For a multi-asset portfolio, this is the only approach that handles nonlinear instruments (options) correctly — you simulate the underlying risk factors and revalue the portfolio in each scenario.

Expected shortfall

VaR has a structural flaw: it tells you the THRESHOLD at which the worst of losses begin, but says NOTHING about how bad those losses are once you've crossed the threshold. A position with could lose $1.01M next month or $100M — same VaR, vastly different consequences. EXPECTED SHORTFALL (also called Conditional VaR / CVaR / Average VaR) fixes this:

The expected loss CONDITIONAL on a tail event. always. ES is COHERENT in the Artzner-Delbaen-Eber-Heath sense (sub-additive, monotone, positively homogeneous, translation-invariant) while VaR isn't (it fails sub-additivity: a diversified portfolio can have HIGHER VaR than the sum of its parts' VaRs in pathological cases). For these reasons, the Basel III/IV framework has moved toward ES as the regulatory metric (Fundamental Review of the Trading Book, 2019).

Backtesting: does the VaR estimate actually work?

Any VaR estimator can be tested historically by counting BREACHES: days when the realized loss exceeded the predicted VaR. By construction, breaches should occur at frequency (5% of days for 95% VaR). The KUPIEC POF (Proportion of Failures) test (1995) checks this statistically:

where is the observed number of breaches in days. Under the null hypothesis (the VaR is correctly calibrated), is asymptotically chi-squared with 1 degree of freedom; the 95% critical value is 3.84. Reject if . The test catches systematic UNDER-estimation (too many breaches) or OVER-estimation (too few). It does NOT catch CLUSTERING of breaches — a more refined test (Christoffersen's) checks that breaches are independent in time as well.

Code

# VaR three ways (historical, parametric, Monte Carlo) and Expected
# Shortfall, with a Kupiec backtest on rolling-window historical VaR.

import numpy as np
from scipy.stats import norm

rng = np.random.default_rng(7)

# Mock daily P&L: 1000 days, slightly positive mean, ~1.5% vol, plus
# two clusters of crisis losses to make the tails realistic.
returns = rng.normal(0.0005, 0.015, 1000)
returns[100:104] -= 0.04
returns[700:703] -= 0.05

def var_historical(returns, alpha=0.05):
    """Historical: -alpha-quantile of empirical P&L distribution."""
    return -np.quantile(returns, alpha)

def var_parametric(returns, alpha=0.05):
    """Parametric: assume Gaussian and use the inverse CDF."""
    mu_r, sd_r = np.mean(returns), np.std(returns, ddof=1)
    return -(mu_r + sd_r * norm.ppf(alpha))

def var_monte_carlo(returns, alpha=0.05, n_sim=1_000_000):
    """Monte Carlo: simulate from fitted Gaussian, take quantile."""
    mu_r, sd_r = np.mean(returns), np.std(returns, ddof=1)
    sim = rng.normal(mu_r, sd_r, n_sim)
    return -np.quantile(sim, alpha)

def expected_shortfall(returns, alpha=0.05):
    """ES = mean loss conditional on exceeding the VaR threshold."""
    thr = np.quantile(returns, alpha)
    return -np.mean(returns[returns <= thr])

for alpha in [0.05, 0.01]:
    print(f"alpha = {alpha:.2f}  ({int(100*(1-alpha))}% confidence VaR):")
    print(f"  Historical VaR:  {var_historical(returns, alpha):.4f}")
    print(f"  Parametric VaR:  {var_parametric(returns, alpha):.4f}")
    print(f"  Monte Carlo VaR: {var_monte_carlo(returns, alpha):.4f}")
    print(f"  Expected Shortfall (CVaR): {expected_shortfall(returns, alpha):.4f}")
    print()

# Kupiec POF (proportion of failures) backtest: estimate VaR from a
# rolling 250-day window, count how often the NEXT day's return exceeds
# the VaR. Expected breach rate = alpha; chi-squared(1) likelihood ratio.
window = 250
alpha = 0.05
breaches, total = 0, 0
for t in range(window, len(returns) - 1):
    v = -np.quantile(returns[t-window:t], alpha)
    if returns[t] < -v:
        breaches += 1
    total += 1
p_obs = breaches / total
LR = (-2 * (breaches*np.log(alpha) + (total-breaches)*np.log(1-alpha)
            - breaches*np.log(p_obs) - (total-breaches)*np.log(1-p_obs))
      if breaches > 0 else 0.0)
print(f"Kupiec POF backtest (rolling 250-day, alpha = {alpha}):")
print(f"  Observed breaches: {breaches}/{total} = {p_obs:.3%}")
print(f"  Expected breaches: {alpha:.0%}")
print(f"  LR statistic = {LR:.2f}  "
      f"({'REJECT' if LR > 3.84 else 'PASS'} at 95% level: chi²(1) crit = 3.84)")

Output:

alpha = 0.05  (95% confidence VaR):
  Historical VaR:  0.0248
  Parametric VaR:  0.0248
  Monte Carlo VaR: 0.0247
  Expected Shortfall (CVaR): 0.0321

alpha = 0.01  (99% confidence VaR):
  Historical VaR:  0.0370
  Parametric VaR:  0.0346
  Monte Carlo VaR: 0.0347
  Expected Shortfall (CVaR): 0.0449

Kupiec POF backtest (rolling 250-day, alpha = 0.05):
  Observed breaches: 39/749 = 5.207%
  Expected breaches: 5%
  LR statistic = 0.07  (PASS at 95% level: chi²(1) crit = 3.84)

Three things to read off. (1) At the 95% confidence level, all three methods give essentially the same answer (~2.5%) — the data is approximately Gaussian over typical scales, so the parametric estimate is fine. At 99%, the historical estimator (3.7%) is HIGHER than the parametric (3.5%) because of the crisis-day shocks in the sample — fat tails that the Gaussian doesn't see. (2) Expected shortfall is roughly 30% higher than VaR at 95% (3.2% vs 2.5%) and 30% higher at 99% (4.5% vs 3.5%) — ES is more sensitive to the tail. (3) The Kupiec backtest PASSES (LR=0.07 vs critical 3.84): 5.2% observed breaches vs 5% expected — well within sampling noise.

How VaR is used in practice

The big failure modes

Related