Connection to the Bates Model¶
Neither the Merton jump-diffusion model nor the Heston stochastic volatility model alone can reproduce the full implied volatility surface across all strikes and maturities. The Merton model captures short-maturity skew through jumps but fails at long maturities where the jump effect fades. The Heston model generates persistent long-maturity smiles through stochastic volatility but cannot produce the steep short-maturity skew observed in practice. The Bates model (1996) combines both mechanisms, adding compound Poisson jumps to the Heston stochastic volatility framework, yielding a model that fits the entire surface with high accuracy.
Learning Objectives
By the end of this section, you will be able to:
- Write down the Bates (SVJ) SDE and identify each component
- Compare the implied volatility surface generated by Merton, Heston, and Bates
- Derive the characteristic function of the Bates model by combining Heston and Merton components
- Determine when to use each model based on the application and available data
Motivation¶
The Complementary Weaknesses¶
| Feature | Merton | Heston | Bates |
|---|---|---|---|
| Short-maturity skew | Strong | Weak | Strong |
| Long-maturity smile | Weak (fades as \(1/\sqrt{T}\)) | Strong | Strong |
| Volatility clustering | No | Yes | Yes |
| Leverage effect (\(\rho < 0\)) | No | Yes | Yes |
| Excess kurtosis (short \(T\)) | High | Moderate | High |
| Number of parameters | 4 | 5 | 8 |
The Bates model inherits the strengths of both parents while adding only the jump parameters on top of the Heston framework.
The Bates SDE¶
Stochastic Volatility with Jumps (SVJ)¶
Definition: Bates Model
Under the risk-neutral measure \(\mathbb{Q}\), the Bates model specifies:
where:
- \(v_t\) is the instantaneous variance (stochastic)
- \(\kappa > 0\) is the mean-reversion speed of variance
- \(\theta > 0\) is the long-run variance level
- \(\xi > 0\) is the volatility of variance (vol-of-vol)
- \(\rho \in [-1, 1]\) is the correlation between asset returns and variance
- \(N_t\) is a Poisson process with intensity \(\lambda\), independent of \(W^{(1)}\) and \(W^{(2)}\)
- \(Y\) is the jump multiplier with \(\ln Y \sim N(\mu_J, \sigma_J^2)\)
- \(\bar{k} = e^{\mu_J + \sigma_J^2/2} - 1\)
The Eight Parameters¶
| Parameter | Source | Role |
|---|---|---|
| \(v_0\) | Heston | Initial variance |
| \(\kappa\) | Heston | Variance mean-reversion speed |
| \(\theta\) | Heston | Long-run variance |
| \(\xi\) | Heston | Vol-of-vol |
| \(\rho\) | Heston | Return-variance correlation |
| \(\lambda\) | Merton | Jump intensity |
| \(\mu_J\) | Merton | Mean log-jump |
| \(\sigma_J\) | Merton | Jump size dispersion |
Special Cases¶
The Bates model nests three important models:
- \(\lambda = 0\): Reduces to the Heston model (pure stochastic volatility)
- \(\xi = 0\), \(v_t = \sigma^2\): Reduces to the Merton model (jump-diffusion with constant volatility)
- \(\lambda = 0\), \(\xi = 0\): Reduces to Black-Scholes
Characteristic Function¶
Modularity of the Levy-Khintchine Structure¶
The log-price \(x_t = \ln S_t\) in the Bates model has a characteristic function that combines the Heston and Merton components multiplicatively.
Theorem: Bates Characteristic Function
The characteristic function of the log-return \(x_T = \ln(S_T/S_0)\) is
where the Heston part is:
with
and the jump part is:
Proof sketch. The independence of \((W^{(1)}, W^{(2)})\) from \(N_t\) implies that the characteristic function factorizes. The Heston component follows from the affine structure of the CIR variance process. The jump component is the same as in the Merton model. \(\square\)
This factorization means that the same Fourier pricing machinery (Carr-Madan, COS method) used for Heston immediately extends to Bates by simply multiplying the characteristic function by the jump factor.
When to Use Which Model¶
Decision Criteria¶
Model Selection Guide
| Criterion | Recommended model |
|---|---|
| Only short-maturity options (< 3 months) | Merton |
| Only long-maturity options (> 1 year) | Heston |
| Full term structure needed | Bates |
| Fast calibration (few parameters) | Merton or Heston |
| Maximum accuracy across surface | Bates |
| Exotic pricing (path-dependent) | Heston or Bates (MC simulation) |
| American option pricing | Heston (2D PDE) or Bates (PIDE) |
Calibration Complexity¶
| Model | Parameters | Typical calibration time | IV RMSE |
|---|---|---|---|
| Black-Scholes | 1 (\(\sigma\)) | Instant | 2--5% |
| Merton | 4 | Seconds | 0.5--1.5% |
| Heston | 5 | Seconds--minutes | 0.3--1.0% |
| Bates | 8 | Minutes | 0.1--0.5% |
The Bates model achieves the best fit but at the cost of a higher-dimensional optimization with more local minima. A common strategy is to first calibrate Heston, then add jumps as a perturbation.
Implied Volatility Surface Comparison¶
Short Maturity (\(T = 1\) month)¶
At very short maturities, the Heston model produces a nearly flat smile (stochastic volatility has not had time to act), while the Merton model produces steep skew through jumps. The Bates model matches the market by combining both effects.
Medium Maturity (\(T = 6\) months)¶
Both Heston and Bates fit well. The Merton model begins to underperform because its smile amplitude is decaying.
Long Maturity (\(T = 2\) years)¶
The Merton smile has almost vanished (the jump contribution scales as \(1/\sqrt{T}\)). The Heston model, through its persistent stochastic volatility, maintains the smile. The Bates model provides the same long-maturity fit as Heston since the jump effect is negligible.
Summary Table¶
| Maturity | Market feature | Merton | Heston | Bates |
|---|---|---|---|---|
| 1 month | Steep skew | Good | Poor | Good |
| 3 months | Moderate skew | Good | Fair | Good |
| 1 year | Moderate smile | Fair | Good | Good |
| 2 years | Flat smile/mild skew | Poor | Good | Good |
Extensions Beyond Bates¶
Double-Exponential Jumps (Kou-Bates)¶
Replacing the log-normal jump with Kou's double-exponential distribution provides asymmetric up and down jump rates, improving the fit to equity skew while maintaining analytical tractability.
Jumps in Volatility (SVJJ)¶
The SVJJ model adds jumps to the variance process:
where \(J_v > 0\) is the variance jump size and \(N_t^v\) may be correlated with \(N_t\) (the price jump process). This captures the observation that large price drops are accompanied by spikes in implied volatility (the "fear" effect).
Levy Models¶
Infinite-activity Levy processes (Variance Gamma, CGMY, NIG) replace the compound Poisson process with processes that have infinitely many small jumps per unit time. These models can produce smoother implied volatility surfaces but lose some analytical tractability.
Summary¶
The Bates model combines Heston stochastic volatility with Merton jumps, inheriting the short-maturity skew from jumps and the long-maturity smile persistence from stochastic volatility. Its characteristic function factorizes as \(\phi_{\text{Bates}} = \phi_{\text{Heston}} \cdot \phi_{\text{Jump}}\), allowing direct application of Fourier pricing methods. The model nests Black-Scholes, Merton, and Heston as special cases, and its eight parameters provide enough flexibility to fit the full implied volatility surface across strikes and maturities. The choice between Merton, Heston, and Bates depends on the maturity range of interest, the required accuracy, and the computational budget available for calibration.
Exercises¶
Exercise 1. Show that the Bates model reduces to the Heston model when \(\lambda = 0\). Specifically, verify that the Bates SDE simplifies to the standard Heston SDE and that \(\phi_{\text{Bates}}(u) = \phi_{\text{Heston}}(u)\) when the jump intensity vanishes.
Solution to Exercise 1
When \(\lambda = 0\), the jump component vanishes entirely. In the Bates SDE:
Setting \(\lambda = 0\): the compensator \(\lambda\bar{k} = 0\), so the drift becomes \(r\,dt\). The term \((Y-1)\,dN_t\) vanishes because \(N_t\) has intensity \(0\) (no jumps occur). The SDE reduces to:
which is the standard Heston SDE.
For the characteristic function, \(\phi_{\text{Jump}}(u) = \exp[\lambda T(e^{iu\mu_J - \frac{1}{2}\sigma_J^2 u^2} - 1) - iu\lambda\bar{k}T]\). When \(\lambda = 0\), the exponent is identically zero, so \(\phi_{\text{Jump}}(u) = e^0 = 1\). Therefore:
Exercise 2. Show that the Bates model reduces to the Merton model when \(\xi = 0\) and \(v_t = \sigma^2\) (constant variance). What happens to the Heston characteristic function \(\phi_{\text{Heston}}(u)\) in this limit, and what does the product \(\phi_{\text{Bates}}(u)\) simplify to?
Solution to Exercise 2
When \(\xi = 0\) and \(v_t = \sigma^2\) (constant), the variance process becomes deterministic:
(since \(v_t = \sigma^2 = \theta\) is constant). The Bates SDE reduces to:
which is exactly the Merton SDE.
For the Heston characteristic function with \(\xi = 0\) and \(v_0 = \sigma^2\): the discriminant simplifies to \(d = \sqrt{(\rho \cdot 0 \cdot iu - \kappa)^2 + 0} = \kappa\). Then \(g = (\kappa - \kappa)/(\kappa + \kappa) = 0\), and \(D(u, T) = \frac{\kappa - \kappa}{0} \cdot \frac{1 - e^{-\kappa T}}{1 - 0} = 0\). The function \(C(u, T)\) similarly reduces to account only for the deterministic drift. In the limit, \(\phi_{\text{Heston}}(u)\) reduces to the characteristic function of a log-normal process with constant volatility \(\sigma\):
Multiplying by the jump factor gives \(\phi_{\text{Bates}}(u) = \exp[iu(r - \tfrac{1}{2}\sigma^2)T - \tfrac{1}{2}\sigma^2 u^2 T + \lambda T(e^{iu\mu_J - \frac{1}{2}\sigma_J^2 u^2} - 1) - iu\lambda\bar{k}T]\), which is the Merton characteristic function.
Exercise 3. The jump component of the characteristic function is \(\phi_{\text{Jump}}(u) = \exp[\lambda T(e^{iu\mu_J - \frac{1}{2}\sigma_J^2 u^2} - 1) - iu\lambda\bar{k}T]\). Verify that \(\phi_{\text{Jump}}(0) = 1\) and compute \(-i\phi_{\text{Jump}}'(0)\) to find \(\mathbb{E}[\ln(S_T/S_0)]\) contributed by the jump component alone.
Solution to Exercise 3
Verification that \(\phi_{\text{Jump}}(0) = 1\): Substituting \(u = 0\):
Computing the mean contribution: The mean of the log-return from the jump component is obtained from \(-i\phi_{\text{Jump}}'(0)\). Differentiate with respect to \(u\):
Evaluating at \(u = 0\):
Therefore:
This is the contribution of the jump component to \(\mathbb{E}[\ln(S_T/S_0)]\). Since \(\bar{k} = e^{\mu_J + \sigma_J^2/2} - 1\), this equals \(\lambda T(\mu_J - e^{\mu_J + \sigma_J^2/2} + 1)\).
Exercise 4. Consider the Bates model with parameters \(v_0 = 0.04\), \(\kappa = 2\), \(\theta = 0.04\), \(\xi = 0.3\), \(\rho = -0.7\), \(\lambda = 0.5\), \(\mu_J = -0.10\), \(\sigma_J = 0.15\). Compute the compensator \(\bar{k} = e^{\mu_J + \sigma_J^2/2} - 1\) and the adjusted drift \(r - \lambda\bar{k}\) for \(r = 0.05\). Explain the economic role of the drift adjustment.
Solution to Exercise 4
With \(\mu_J = -0.10\) and \(\sigma_J = 0.15\):
The adjusted drift with \(r = 0.05\) and \(\lambda = 0.5\):
The economic role of the drift adjustment is to ensure the discounted stock price \(e^{-rt}S_t\) is a martingale under \(\mathbb{Q}\). Without the \(-\lambda\bar{k}\) adjustment, jumps would on average lower the stock price (since \(\bar{k} < 0\) for these parameters), giving the stock a higher expected return than the risk-free rate. The positive correction \(-\lambda\bar{k} = +0.04244\) increases the continuous drift to compensate for the average downward drag from jumps, maintaining the no-arbitrage condition \(\mathbb{E}^{\mathbb{Q}}[S_T] = S_0 e^{rT}\).
Exercise 5. The factorization \(\phi_{\text{Bates}} = \phi_{\text{Heston}} \cdot \phi_{\text{Jump}}\) relies on the independence of the Brownian motions \((W^{(1)}, W^{(2)})\) from the Poisson process \(N_t\). Give a counterexample scenario where the jump process and the variance process are correlated (as in the SVJJ model), and explain why the characteristic function would no longer factorize in this case.
Solution to Exercise 5
In the SVJJ model, the jump process \(N_t\) simultaneously affects both the price and the variance. At a jump time \(T_i\), the price jumps by \((Y_i - 1)\) and the variance jumps by \(J_v^{(i)}\), where the jump sizes may be correlated (e.g., \(J_v^{(i)}\) could depend on \(Y_i\), or \(N_t\) could trigger both jumps simultaneously).
Counterexample: Suppose \(N_t\) is a single Poisson process, and at each jump time, the variance jumps by \(J_v = -\beta\ln Y\) (variance spikes up when the price drops, since \(\ln Y < 0\) for a crash). Then \(v_{T_i} = v_{T_i^-} + J_v^{(i)}\) and \(S_{T_i} = S_{T_i^-} \cdot Y_i\), where \(J_v\) and \(\ln Y\) are dependent.
The factorization \(\phi_{\text{Bates}} = \phi_{\text{Heston}} \cdot \phi_{\text{Jump}}\) relies on the independence of the Brownian motions \((W^{(1)}, W^{(2)})\) from the Poisson process \(N_t\). When variance jumps are correlated with price jumps, the conditional characteristic function of the variance process \(v_t\) (given the jump history) depends on the price jumps, and vice versa. The joint characteristic function cannot be separated into a product of independent factors because the jump sizes in the price and variance equations are coupled. One must instead solve a system of Riccati ODEs that couple the price and variance jump contributions, yielding a more complex (non-factorized) characteristic function.
Exercise 6. The Merton model's implied volatility smile amplitude decays as \(1/\sqrt{T}\) for long maturities. Starting from the CLT applied to \(N_T \sim \text{Poisson}(\lambda T)\) jumps, explain why the per-unit-time jump contribution to total variance becomes negligible relative to the diffusion component as \(T \to \infty\). Contrast this with the Heston model, where the stochastic volatility generates a persistent smile.
Solution to Exercise 6
At time \(T\), the number of jumps \(N_T \sim \text{Poisson}(\lambda T)\). By the law of large numbers, \(N_T/T \to \lambda\) as \(T \to \infty\). By the CLT, the jump sum \(\sum_{i=1}^{N_T}\ln Y_i\) has variance \(\lambda T(\sigma_J^2 + \mu_J^2)\), which grows linearly with \(T\), just like the diffusion variance \(\sigma^2 T\).
The total variance per unit time is:
This is constant in \(T\), meaning the jump and diffusion contributions to per-unit-time variance are both constant. However, the higher cumulants (skewness, kurtosis) that generate the smile decay: the skewness scales as \(T^{-1/2}\) and excess kurtosis as \(T^{-1}\). As \(T \to \infty\), the CLT applied to the sum of \(N_T \approx \lambda T\) independent log-normal jumps makes the total jump contribution approximately Gaussian. A Gaussian plus a Gaussian (diffusion) is Gaussian, which produces a flat implied volatility smile. Thus the Merton smile amplitude vanishes for long maturities.
In contrast, the Heston model generates a persistent smile because the stochastic volatility process \(v_t\) does not average out over time. The instantaneous variance \(v_t\) fluctuates around its long-run mean \(\theta\), and the correlation \(\rho < 0\) between returns and variance creates a leverage effect that produces skew at all maturities. The vol-of-vol \(\xi\) ensures that the variance of variance remains positive indefinitely, sustaining the smile.
Exercise 7. A practitioner calibrates both the Heston model (5 parameters) and the Bates model (8 parameters) to the same set of 30 market option prices. The Heston fit achieves an implied volatility RMSE of 0.8%, while the Bates fit achieves 0.3%. Discuss the trade-offs: is the improved fit worth the additional parameters? Address overfitting risk, identifiability of parameters, and how you would use out-of-sample testing to evaluate the models.
Solution to Exercise 7
The Bates fit (RMSE 0.3%) is substantially better than Heston (0.8%), but several trade-offs must be considered:
Overfitting risk: The Bates model has 8 parameters vs. 5 for Heston, calibrated to 30 data points. The ratio of data points to parameters is 30/8 = 3.75 for Bates vs. 30/5 = 6.0 for Heston. With fewer degrees of freedom per parameter, the Bates model is more prone to overfitting noise in the market quotes. The improved in-sample RMSE may not translate to better out-of-sample performance.
Identifiability: The three additional jump parameters \((\lambda, \mu_J, \sigma_J)\) may be poorly identified when the option data spans mostly medium and long maturities, where jump effects are weak. Correlated parameters (e.g., \(\lambda\) and \(\sigma_J\) producing similar smile shapes) can lead to unstable calibrations.
Out-of-sample testing: The proper evaluation would split the 30 options into training and test sets (e.g., calibrate to 20 options, test on the remaining 10), or use time-series cross-validation (calibrate to today's data, test on tomorrow's prices). If the Bates model consistently outperforms Heston out-of-sample, the additional parameters are justified. If the advantage disappears, Heston's parsimony is preferred.
Practical recommendation: The Bates improvement is most valuable when the data includes short-maturity options (where jumps matter most). If the data is concentrated at maturities above 6 months, Heston may suffice. A sequential calibration strategy (calibrate Heston first, then add jumps) can help ensure the jump parameters capture genuine market features rather than noise.