Skip to content

Chapter 14: Stochastic Volatility

This chapter develops the theory of stochastic volatility models, in which the diffusion coefficient is itself a random process driven by its own source of uncertainty. Starting from the empirical failures of constant and local volatility---volatility clustering, mean reversion, the leverage effect, heavy tails, and market incompleteness---we build the general two-factor diffusion framework, study the Heston and SABR models in comprehensive detail (including the SABR SDE, Hagan approximation, CEV backbone, heat kernel density, arbitrage-free extensions, Greeks, calibration to swaption smiles, forward smile management, Monte Carlo simulation, finite difference methods, and end-to-end worked examples), analyze the fundamental consequences of market incompleteness for pricing and hedging, develop calibration methods that jointly fit the implied volatility surface across strikes and maturities, and survey extensions including multi-factor volatility models and rough volatility.

Key Concepts

Empirical Failures of Constant Volatility

Constant-volatility models fail to explain several robust empirical facts documented through rigorous analysis of financial data. Volatility clustering---large price changes tend to be followed by large changes---manifests as significant positive autocorrelation in squared returns \(\text{Corr}(r_t^2, r_{t+k}^2) > 0\) with half-lives of 20--40 days for indices, formalized in the ARCH/GARCH literature via \(\sigma_t^2 = \omega + \alpha r_{t-1}^2 + \beta\sigma_{t-1}^2\) with persistence \(\alpha + \beta \approx 0.95\)--\(0.99\). The leverage effect---negative correlation between returns and subsequent volatility, \(\text{Corr}(r_t, \sigma_{t+1}^2 - \sigma_t^2) \approx -0.4\) to \(-0.7\) for equity indices---generates the characteristic negative skew. Distributional failures include heavy tails (daily kurtosis 5--10 vs. Gaussian 3), negative skewness (\(-0.3\) to \(-1.0\) for equities), and extreme event probabilities far exceeding Gaussian predictions (3\(\sigma\) daily moves occur at 1--2% vs. predicted 0.27%). Rolling-window realized volatility analysis confirms time-varying volatility with ranges from 6% (calm) to 80%+ (stressed) and a coefficient of variation around 40%.

Smile and Term Structure Revisited

The implied volatility surface \(\sigma_{\text{impl}}(K,T)\) exhibits systematic patterns that constant-volatility models cannot produce. Equity indices show negative skew with 25-delta put vol exceeding ATM by 3--8%, reflecting crash risk, the leverage effect, and demand for downside protection. The term structure varies systematically: upward-sloping in calm markets (uncertainty increasing with horizon), downward-sloping during stress (high current vol expected to mean-revert), and humped around event dates. Short-maturity smiles are steeper than long-maturity smiles, scaling approximately as \(\text{Skew}(T) \propto 1/\sqrt{T}\). Under a mean-reverting volatility process, the ATM term structure follows \(\sigma_{\text{impl}}^2(T) \approx V_0(1-e^{-\kappa T})/(\kappa T) + \bar{\sigma}^2[1 - (1-e^{-\kappa T})/(\kappa T)]\). The variance risk premium \(VRP = \mathbb{E}^{\mathbb{Q}}[\sigma^2] - \mathbb{E}^{\mathbb{P}}[\sigma^2]\) is persistently positive for equities.

Incompleteness of Equity Markets

Stochastic volatility markets are incomplete: the volatility factor \(v_t\) introduces risk that cannot be hedged by trading the underlying alone. With two sources of randomness (\(W^S\), \(W^V\)) but only one tradeable risky asset, the risk-neutral measure \(\mathbb{Q}\) is not unique---it depends on the market price of volatility risk \(\lambda_V\), yielding a family of equivalent martingale measures \(\{\mathbb{Q}_\lambda\}\) with no-arbitrage price bounds \(\underline{C} \leq C \leq \overline{C}\). Additional sources of incompleteness include jumps, discrete trading (stale delta hedges accumulating error), transaction costs, and trading constraints. Quantifying incompleteness involves the variance-optimal hedge \(\min_{\Delta}\mathbb{E}[(H - V_T^{\Delta})^2]\), the minimal entropy martingale measure, and utility indifference pricing.

General Two-Factor Diffusion Framework

The generic stochastic volatility model specifies

\[dS_t = (r-q)S_t\,dt + \sqrt{v_t}\,S_t\,dW_t^{(1)}, \qquad dv_t = \alpha(t, v_t)\,dt + \eta(t, v_t)\,dZ_t\]

with \(d\langle W^{(1)}, Z \rangle_t = \rho\,dt\) and \(|\rho| \leq 1\). The pair \((S_t, v_t)\) is a two-dimensional Markov process with infinitesimal generator \(\mathcal{L} = \frac{1}{2}\sigma^2 S^2\partial_{SS} + \rho\sigma\eta S\partial_{Sv} + \frac{1}{2}\eta^2\partial_{vv} + \mu S\partial_S + \alpha\partial_v\). Prominent models include: Heston (CIR variance, affine structure), Hull-White (lognormal volatility, no closed-form CF), Stein-Stein (OU volatility, can go negative), the 3/2 model (power-law dynamics, heavier tails), and SABR (CEV backbone, asymptotic IV formula). The unconditional distribution of log-returns is a mixture of normals, producing heavy tails and (when \(\rho \neq 0\)) skewness.

Correlation and the Leverage Effect

The correlation \(\rho\) between price and volatility shocks is the primary determinant of implied volatility skew. For the Heston model, the first-order skew approximation is \(\text{Skew} \approx \rho\xi/(2\sigma_{\text{ATM}})\), showing skew is linear in \(\rho\) and \(\xi\) and inversely proportional to ATM vol. Empirically, equity index correlations range from \(-0.6\) to \(-0.8\) (strongest at daily/weekly frequencies, weakening at monthly). Economic explanations include balance sheet leverage effects, volatility feedback (higher expected vol raises required returns, depressing prices), behavioral asymmetry, and the risk premium channel. The correlation affects hedging through the stochastic volatility delta \(\Delta^{\text{SV}} = \partial C/\partial S + \rho(\xi/\sigma)(\partial C/\partial V)\) and creates vanna exposure measuring sensitivity to joint price-volatility moves.

Risk-Neutral vs. Physical Measure and the Volatility Risk Premium

The Girsanov transformation changes Brownian motions via \(dW_t^{\mathbb{Q}} = dW_t^{\mathbb{P}} + \lambda_t\,dt\) while leaving diffusion coefficients unchanged. For the Heston model with \(\lambda^V = \lambda\sqrt{V}\), the risk-neutral parameters become \(\kappa^{\mathbb{Q}} = \kappa^{\mathbb{P}} + \lambda\xi\) and \(\theta^{\mathbb{Q}} = \kappa^{\mathbb{P}}\theta^{\mathbb{P}}/\kappa^{\mathbb{Q}}\). The volatility risk premium is typically negative for equity markets (implied variance exceeds realized variance by 2--5% annualized), reflecting compensation for volatility spikes during market stress. Option calibration recovers \(\mathbb{Q}\)-parameters only; mixing \(\mathbb{P}\) and \(\mathbb{Q}\) parameters leads to inconsistent dynamics. Joint modeling approaches include separate estimation, joint maximum likelihood, and Bayesian methods.

The Heston Model

The Heston (1993) model specifies a CIR process for variance:

\[dv_t = \kappa(\theta - v_t)\,dt + \xi\sqrt{v_t}\,dW_t^{(2)}\]

with five risk-neutral parameters: mean-reversion speed \(\kappa\) (controlling term structure, half-life \(t_{1/2} = \ln 2/\kappa\)), long-run variance \(\theta\) (controlling long-maturity IV level, \(\sigma_{\text{impl}}^2(T\to\infty) \approx \theta\)), vol-of-vol \(\xi\) (controlling smile curvature, \(\partial^2\sigma_{\text{impl}}/\partial k^2 \propto \xi^2\)), correlation \(\rho\) (controlling skew, \(\partial\sigma_{\text{impl}}/\partial k|_{k=0} \propto \rho\)), and initial variance \(V_0\) (controlling short-maturity ATM level). The Feller condition \(2\kappa\theta \geq \xi^2\) ensures strict positivity of variance; when violated (common in calibration with typical \(\nu = 2\kappa\theta/\xi^2 \in [0.3, 1.5]\)), the boundary is attainable but reflecting, and option pricing formulas remain valid. Simulation under Feller violation requires careful schemes: the Quadratic-Exponential (QE) method (Andersen) matches moments of the exact non-central \(\chi^2\) transition density and is the industry standard.

Affine Structure and Characteristic Function

The Heston model's affine structure yields a closed-form characteristic function \(\varphi(\tau, u) = \exp(C(\tau,u) + D(\tau,u)V_0 + iu\ln S_0)\) where \(C\) and \(D\) satisfy Riccati ODEs. The explicit solutions are \(D(\tau,u) = [(\kappa - \rho\xi iu - d)/\xi^2] \cdot (1 - e^{d\tau})/(1 - ge^{d\tau})\) with \(d = \sqrt{(\kappa - \rho\xi iu)^2 + \xi^2(iu + u^2)}\) and \(g = (\kappa - \rho\xi iu - d)/(\kappa - \rho\xi iu + d)\). The Lord-Kahl formulation avoids the "little Heston trap" (branch cut issues, cancellation errors, overflow). The martingale condition \(\varphi(\tau, -i) = e^{(r-q)\tau}\) must be verified numerically. Moment existence is limited by the Andersen-Piterbarg critical moment \(n^*(T)\), which governs the strip of regularity for analytic continuation and constrains Fourier pricing damping parameters.

Pricing Under Stochastic Volatility: PDE and Fourier Methods

The two-dimensional pricing PDE is \(\partial_t V + (r-q)S\partial_S V + \kappa(\theta-v)\partial_v V + \frac{1}{2}vS^2\partial_{SS}V + \rho\xi vS\partial_{Sv}V + \frac{1}{2}\xi^2 v\partial_{vv}V - rV = 0\), with degenerate boundary conditions at \(v = 0\) depending on the Feller condition. Numerical solution uses ADI schemes (Douglas-Rachford, Hundsdorfer-Verwer) that split the operator into tridiagonal systems, with the mixed derivative \(\partial_{Sv}V\) treated explicitly or via Craig-Sneyd modifications. PDE methods excel for American options and barriers. Fourier pricing converts option pricing into numerical integration using the characteristic function. The Carr-Madan method introduces a damping factor \(e^{\alpha k}\) and uses FFT to price many strikes simultaneously with \(O(N\log N)\) complexity. The COS method (Fang-Oosterlee) expands the density in Fourier-cosine series with exponential convergence, requiring only \(N = 64\)--\(256\) terms. The Lewis formula provides a symmetric representation requiring no damping parameter. These Fourier methods are essential for calibration across many strikes.

Hedging Under Stochastic Volatility

Delta hedging with the Black-Scholes delta leaves residual vega risk proportional to the vol-of-vol and the option's vega. Under stochastic volatility, higher-order Greeks become important: volga (vomma, sensitivity of vega to volatility) and vanna (cross-sensitivity between spot and volatility) are typically large for long-dated or exotic options. In incomplete markets, hedging strategies are not unique: the minimum-variance hedge modifies the delta to \(\Delta_{\text{MV}} = \Delta_{\text{BS}} + \rho\xi\sqrt{v_t}\partial_v V / (S\sqrt{v_t})\). Full hedging of vega risk requires trading a second option or a variance swap. Optimal hedging criteria include mean-variance hedging \(\min_{\pi}\mathbb{E}[(H - V_T^{\pi})^2]\), utility-based hedging \(\max_{\pi}\mathbb{E}[U(V_T^{\pi})]\), and robust hedging approaches that minimize worst-case losses. Hedging error should be viewed as comprising model error plus unhedgeable residual risk.

The SABR Model: Comprehensive Treatment

The SABR model (Hagan et al., 2002) specifies \(dF_t = \alpha_t F_t^{\beta}\,dW_t^{(1)}\), \(d\alpha_t = \nu\alpha_t\,dW_t^{(2)}\) with \(d\langle W^{(1)}, W^{(2)}\rangle_t = \rho\,dt\), where \(F_t\) is the forward price, \(\beta \in [0,1]\) controls the CEV backbone (interpolating between normal \(\beta=0\) and lognormal \(\beta=1\)), \(\alpha_t\) is the stochastic volatility with lognormal dynamics (no mean reversion), and \(\nu\) is the vol-of-vol. The Hagan implied volatility approximation provides a closed-form formula:

\[\sigma_{\text{imp}}(K) \approx \frac{\alpha}{(FK)^{(1-\beta)/2}}\cdot\frac{z}{x(z)}\cdot\left[1 + \left(\frac{(1-\beta)^2\alpha^2}{24(FK)^{1-\beta}} + \frac{\rho\beta\nu\alpha}{4(FK)^{(1-\beta)/2}} + \frac{2-3\rho^2}{24}\nu^2\right)T\right]\]

where \(z = (\nu/\alpha)(FK)^{(1-\beta)/2}\ln(F/K)\) and \(x(z) = \ln((\sqrt{1-2\rho z+z^2}+z-\rho)/(1-\rho))\). The CEV exponent \(\beta\) controls how ATM volatility changes with the forward level, and boundary behavior depends on \(\beta\): for \(\beta < 1\), the forward can reach zero (absorption vs. reflection). The probability density is analyzed through heat kernel expansion on the underlying Riemannian geometry. Normal SABR (\(\beta = 0\)) naturally handles negative rates, while shifted SABR (\(dF_t = \sigma_t(F_t + s)^{\beta}\,dW_t^F\)) extends the model to negative rate environments. Exact solutions exist for special cases of the parameters.

Arbitrage-Free SABR extensions (Hagan et al., 2014; Antonov et al., 2015) address the density leakage and arbitrage violations in the wings of the Hagan approximation. SABR calibration to swaption smiles follows a standard procedure: fix β by convention, set α from ATM implied volatility, and fit ρ and ν to OTM options. Managing the forward smile addresses the dynamic inconsistency arising from the absence of mean reversion. SABR Greeks are computed analytically by differentiating the Hagan formula. Numerical methods include Monte Carlo simulation of SABR paths and finite difference methods for the associated PDE. End-to-end worked examples demonstrate the complete calibration and pricing pipeline. Python implementations provide practical code for the Hagan formula with numerical stability considerations for small z and extreme ρ.

Calibration of Stochastic Volatility Models

Stochastic volatility models are calibrated by minimizing the distance between model and market implied volatilities across the \((K,T)\) surface:

\[\min_{\Theta} \sum_{i,j} w_{ij}\left(\sigma_{\text{imp}}^{\text{model}}(K_i, T_j; \Theta) - \sigma_{\text{imp}}^{\text{market}}(K_i, T_j)\right)^2\]

Joint calibration across strikes and maturities enforces parameter consistency and coherent term-structure behavior, with maturity balancing, liquidity-based weights, and exclusion of unreliable wings. Identifiability challenges arise because different parameter combinations can produce similar surfaces---κ and θ are partially degenerate for short maturities, and ρ and ξ interact in the skew-curvature decomposition. Weak identifiability manifests as flat loss surfaces, multiple local minima, and large day-to-day parameter swings. Stability of calibration---smooth day-to-day parameter evolution---is often more important than achieving the tightest fit, achieved through regularization, parameter anchoring, and reduced parameterization. The volatility risk premium must be treated consistently: option calibration recovers Q-dynamics only, and mixing P and Q parameters without explicit VRP modeling leads to inconsistent dynamics and degraded calibration stability.

Extensions: Multi-Factor and Rough Volatility

Multi-factor volatility models introduce additional latent factors \(V_t = \sum_{i=1}^n V_t^{(i)}\) with different mean-reversion speeds to capture both short-term fluctuations and long-term persistence, improving fit and stability across maturities at the cost of additional identifiability challenges. Rough volatility models address the empirical finding that volatility has very low Holder regularity, with \(\sigma_t = \sigma_0 + \int_0^t K(t-s)\,dW_s\) where \(K(t) \sim t^{H-1/2}\) for Hurst parameter \(H \in (0, 1/2)\). These models naturally explain steep short-maturity smiles and fast ATM skew decay, but their non-Markovian dynamics introduce higher computational cost, calibration complexity, and limited closed-form pricing results.

Role in the Book

Stochastic volatility models address the key limitations of local volatility (Chapter 13) and constant volatility (Chapter 6) by introducing a separate variance factor. The Heston model's affine structure connects to Chapter 15 (affine processes) and is developed in full detail in Chapter 16. The SABR model provides the standard framework for interest rate smile modeling (Chapters 18-19). Calibration methods connect to Chapter 17, and the incompleteness framework motivates the hedging strategies of Chapter 11. Fourier pricing methods build on the characteristic function theory of Chapter 9. The empirical failures documented here---volatility clustering, leverage effect, heavy tails---provide the motivation for all subsequent modeling choices.