Heston with Jumps (Bates Model)¶
Introduction¶
The Heston model captures the diffusive component of equity volatility dynamics --- mean-reverting stochastic variance, the leverage effect, and heavy tails --- but it cannot reproduce the steep short-maturity implied volatility smiles observed in equity markets. Short-dated options are highly sensitive to sudden large moves (jumps) in the underlying, which the continuous Heston dynamics smooth out. The diffusive skew generated by Heston is approximately proportional to \(\sqrt{T}\), while market smiles for \(T < 1\) month are much steeper.
The Bates model (Bates, 1996) resolves this by combining the Heston stochastic volatility process with Merton-type lognormal jumps in the stock price. This stochastic volatility with jumps in diffusion (SVJD) model retains the affine structure, so the characteristic function is available in closed form and all Fourier pricing methods apply unchanged. The jump parameters provide the extra degrees of freedom needed to fit short-maturity smiles.
Prerequisites
- Heston SDE and Parameters (Heston dynamics)
- Closed-Form Characteristic Function (Heston CF)
- Affine Structure and Riccati (affine framework)
Learning Objectives
By the end of this section, you will be able to:
- Write down the Bates SVJD model and explain the role of each component
- Derive the characteristic function by combining Heston and Merton jump contributions
- Explain how jumps steepen the short-maturity implied volatility smile
- Compare Bates and Heston implied volatility surfaces
- Discuss calibration considerations for the extended parameter set
Model Specification¶
The Bates SDE¶
Under the risk-neutral measure \(\mathbb{Q}\), the Bates model specifies:
with \(d\langle W^{(1)}, W^{(2)} \rangle_t = \rho \, dt\), where:
- \(N_t\) is a Poisson process with intensity \(\lambda > 0\) (average number of jumps per year)
- \(J \sim \mathcal{N}(\mu_J, \sigma_J^2)\) is the log-jump size, independent of \(W^{(1)}, W^{(2)}, N\)
- \(\bar{k} = \mathbb{E}[e^J - 1] = e^{\mu_J + \sigma_J^2/2} - 1\) is the compensator ensuring the discounted stock price is a \(\mathbb{Q}\)-martingale
The compensator \(-\lambda\bar{k}\) in the drift ensures that \(\mathbb{E}^{\mathbb{Q}}[dS_t / S_{t^-}] = (r - q) \, dt\), preserving risk-neutral pricing consistency.
Parameters¶
The Bates model adds three jump parameters to the five Heston parameters:
| Parameter | Symbol | Description | Typical Range |
|---|---|---|---|
| Jump intensity | \(\lambda\) | Expected jumps per year | 0.1--5 |
| Mean log-jump | \(\mu_J\) | Average log-return per jump | \(-0.1\) to \(0\) |
| Jump volatility | \(\sigma_J\) | Std dev of log-jump size | 0.05--0.3 |
| Compensator | \(\bar{k}\) | \(e^{\mu_J + \sigma_J^2/2} - 1\) | Derived |
Jump Interpretation
With \(\lambda = 1\) and \(\mu_J = -0.05\), the model expects on average one jump per year with a \(-5\%\) mean impact on the stock price. The jump volatility \(\sigma_J = 0.10\) means individual jumps range from approximately \(-25\%\) to \(+15\%\) (two standard deviations around \(\mu_J\)). Negative \(\mu_J\) generates downward jumps, which steepen the put skew.
Characteristic Function¶
Decomposition¶
The affine structure of the Bates model means the characteristic function separates into a Heston component and a jump component. For the log-price \(X_T = \ln S_T\):
where \(\tau = T - t\).
Heston Component¶
The Heston component is the standard characteristic function:
where \(C\) and \(D\) are the standard Heston Riccati solutions (with the drift adjusted by \(-\lambda\bar{k}\)).
Jump Component¶
The Merton jump component adds a Poisson-lognormal contribution. During \([t, T]\), the number of jumps \(N_T - N_t \sim \text{Poisson}(\lambda\tau)\), and each jump adds \(J_k \sim \mathcal{N}(\mu_J, \sigma_J^2)\) to \(\ln S\). The characteristic function of the total jump contribution is:
Theorem (Bates Characteristic Function)
The characteristic function of \(X_T = \ln S_T\) under the Bates model is:
where \(C(\tau, u)\) and \(D(\tau, u)\) are the Heston Riccati solutions (Albrecher formulation).
Proof
Conditional on the variance path \(\{v_s\}_{s \in [t,T]}\), the log-price is:
The jumps \(\{J_k\}\) are independent of the Brownian motions and the variance process. Therefore:
The first factor is \(\phi_{\text{Heston}}\) (with drift adjustment), and the second is computed using the Poisson compound distribution:
\(\square\)
Effect of Jumps on Implied Volatility¶
Short-Maturity Smile Steepening¶
The key advantage of jumps is their effect on short-maturity options. For small \(T\):
- Diffusive skew (from \(\rho < 0\)) scales as \(\rho \xi \sqrt{T}\), vanishing as \(T \to 0\)
- Jump skew (from \(\mu_J < 0\)) remains finite as \(T \to 0\), because even a single jump can move the stock significantly relative to a short-dated option's premium
This means jumps produce a finite short-maturity skew that diffusive stochastic volatility alone cannot generate.
Wings and Tail Behavior¶
Jumps fatten the tails of the log-return distribution. The implied volatility at extreme strikes behaves as:
for fixed \(T\). This produces steeper wings than the Heston model, which has only moderately fat tails from the stochastic variance.
Decomposition of the Smile¶
The Bates implied volatility smile can be approximately decomposed:
where \(\alpha = \ln(K/F)\) is the log-moneyness and \(\Delta\sigma^{\text{Jump}}\) is the incremental smile contribution from jumps. For \(T > 1\) year, \(\Delta\sigma^{\text{Jump}}\) becomes small relative to the Heston contribution (the diffusive dynamics dominate over long horizons). For \(T < 3\) months, jumps can be the dominant source of smile curvature.
Calibration Trade-off
Adding jumps improves short-maturity fit but introduces a risk: the eight Bates parameters (\(v_0, \kappa, \theta, \xi, \rho, \lambda, \mu_J, \sigma_J\)) are partially degenerate. The diffusive skew (from \(\rho, \xi\)) and the jump skew (from \(\lambda, \mu_J, \sigma_J\)) can partially substitute for each other. Regularization or fixing some parameters is often necessary.
Monte Carlo Simulation¶
Jump-Diffusion Discretization¶
Simulating the Bates model requires augmenting the Heston QE scheme with jumps:
Step 1. Simulate \(v_{t+\Delta t}\) using the QE scheme (same as pure Heston).
Step 2. Simulate the number of jumps: \(N_{\Delta t} \sim \text{Poisson}(\lambda \Delta t)\).
Step 3. Simulate jump sizes: \(J_k \sim \mathcal{N}(\mu_J, \sigma_J^2)\) for \(k = 1, \ldots, N_{\Delta t}\).
Step 4. Update the log-stock price:
where \(\hat{v}\) is the effective variance from the QE scheme and \(Z \sim \mathcal{N}(0, 1)\) is the diffusive increment.
Efficiency
For typical equity parameters (\(\lambda \leq 5\), \(\Delta t = 1/252\)), the expected number of jumps per step is \(\lambda \Delta t \leq 0.02\). On most steps, \(N_{\Delta t} = 0\) and the simulation reduces to pure Heston. The computational overhead of jumps is negligible.
Worked Example¶
Parameters¶
| Heston | Bates (additional) | |
|---|---|---|
| \(v_0\) | 0.04 | |
| \(\kappa\) | 2.0 | |
| \(\theta\) | 0.04 | |
| \(\xi\) | 0.5 | |
| \(\rho\) | \(-0.7\) | |
| \(\lambda\) | 2.0 | |
| \(\mu_J\) | \(-0.05\) | |
| \(\sigma_J\) | 0.10 |
Implied Volatility Comparison¶
| Maturity | Moneyness | Heston IV | Bates IV | Difference |
|---|---|---|---|---|
| 1 week | 95% | 22.1% | 28.4% | +6.3% |
| 1 week | 100% | 20.2% | 21.8% | +1.6% |
| 1 week | 105% | 18.9% | 18.5% | \(-0.4\)% |
| 1 month | 95% | 22.8% | 26.1% | +3.3% |
| 1 month | 100% | 20.3% | 21.2% | +0.9% |
| 1 year | 95% | 22.5% | 23.0% | +0.5% |
| 1 year | 100% | 20.1% | 20.3% | +0.2% |
Observations
- The jump effect is strongest at short maturities and low strikes: the 1-week 95% moneyness point shows a 6.3% IV increase from Heston to Bates.
- At 1-year maturity, the jump contribution is only 0.2--0.5% --- the diffusive dynamics dominate.
- The negative mean jump (\(\mu_J = -0.05\)) asymmetrically steepens the put side of the smile, consistent with the observation that equity markets exhibit crash fear.
- At high strikes (105%, 1 week), the Bates IV is actually slightly below Heston. This occurs because the compensator \(-\lambda\bar{k}\) slightly reduces the drift, marginally lowering OTM call values.
Summary¶
| Concept | Formula / Description |
|---|---|
| Bates SDE | \(dS/S = (r-q-\lambda\bar{k})dt + \sqrt{v}\,dW^{(1)} + (e^J - 1)dN\) |
| Jump distribution | \(J \sim \mathcal{N}(\mu_J, \sigma_J^2)\), \(N \sim \text{Poisson}(\lambda)\) |
| CF factorization | \(\phi_{\text{Bates}} = \phi_{\text{Heston}} \times \exp(\lambda\tau[e^{iu\mu_J - u^2\sigma_J^2/2} - 1])\) |
| Parameters | 8 total: 5 Heston + 3 jump (\(\lambda, \mu_J, \sigma_J\)) |
| Key advantage | Finite short-maturity skew from jump component |
Key Takeaways
-
Heston + Merton jumps: The Bates model combines stochastic volatility with lognormal jumps, addressing the Heston model's inability to fit steep short-maturity smiles.
-
CF is multiplicative: The Bates CF is the product of the Heston CF and the Merton jump CF, so existing Fourier pricing infrastructure (COS, FFT, Gil-Pelaez) applies unchanged.
-
Short-maturity effect: Jumps produce a finite skew as \(T \to 0\), while the diffusive skew from \(\rho\) vanishes. This is the key empirical motivation for including jumps.
-
Long-maturity convergence: For \(T > 1\) year, the Bates and Heston surfaces are nearly identical; jumps matter primarily for \(T < 3\) months.
-
Calibration challenges: The eight parameters are partially degenerate between diffusive and jump contributions to the smile. Regularization is recommended.
What's Next¶
| Section | Topic |
|---|---|
| Double Heston Model | Two-factor variance alternative |
| Rough Heston (Overview) | Fractional kernel approach |
| Time-Dependent Parameters | Piecewise-constant calibration |
Exercises¶
Exercise 1. Compute the jump compensator \(\bar{k} = e^{\mu_J + \sigma_J^2/2} - 1\) for \(\mu_J = -0.08\) and \(\sigma_J = 0.15\). Verify that \(\bar{k} < 0\) (the drift adjustment is positive, compensating for the expected downward jumps). What happens to \(\bar{k}\) if \(\mu_J = 0\) and \(\sigma_J > 0\)? Explain the financial interpretation.
Solution to Exercise 1
We compute the jump compensator \(\bar{k} = e^{\mu_J + \sigma_J^2/2} - 1\) for \(\mu_J = -0.08\) and \(\sigma_J = 0.15\).
First, evaluate the exponent:
Therefore:
Since \(\bar{k} < 0\), the drift adjustment \(-\lambda\bar{k} > 0\) is positive, which increases the drift to compensate for the expected downward jumps. The stock's conditional expected return remains \(r - q\) under \(\mathbb{Q}\) despite the negative average jump.
Case \(\mu_J = 0\), \(\sigma_J > 0\): When the mean log-jump is zero, we get:
This is strictly positive because \(e^{\sigma_J^2/2} > 1\) for any \(\sigma_J > 0\). The financial interpretation is as follows. The compensator \(\bar{k}\) measures \(\mathbb{E}[e^J - 1]\), which is the expected proportional price change per jump. Even when the log-jump has zero mean (\(\mu_J = 0\)), the price jump \(e^J - 1\) has a positive mean due to Jensen's inequality: \(\mathbb{E}[e^J] = e^{\mu_J + \sigma_J^2/2} > e^{\mu_J} = 1\). The convexity of the exponential function means symmetric log-jumps produce asymmetric price jumps that are biased upward. The drift correction \(-\lambda\bar{k}\) is therefore negative, reducing the drift to offset this convexity bias.
Exercise 2. The Bates CF is the product \(\phi_{\text{Bates}} = \phi_{\text{Heston}} \times \phi_{\text{Jump}}\) where \(\phi_{\text{Jump}}(u, \tau) = \exp(\lambda\tau[e^{iu\mu_J - u^2\sigma_J^2/2} - 1])\). Show that \(|\phi_{\text{Jump}}(u, \tau)| = \exp(\lambda\tau[e^{-u^2\sigma_J^2/2}\cos(u\mu_J) - 1])\). For \(u = 10\), \(\lambda = 2\), \(\mu_J = -0.05\), \(\sigma_J = 0.10\), \(\tau = 0.5\), compute \(|\phi_{\text{Jump}}|\) and determine whether the jump component decays faster or slower than the Heston component for large \(u\).
Solution to Exercise 2
Starting from the jump characteristic function:
we compute the modulus. Write the exponent inside the brackets as:
Therefore:
The real part of the full exponent is:
Since \(|\phi_{\text{Jump}}| = \exp(\text{Re}(\cdot))\):
Numerical evaluation with \(u = 10\), \(\lambda = 2\), \(\mu_J = -0.05\), \(\sigma_J = 0.10\), \(\tau = 0.5\):
The jump component decays to about \(0.63\) at \(u = 10\). For comparison, the Heston characteristic function decays roughly as \(\exp(-\text{const} \cdot u^2)\) for large \(u\), driven by the variance of the log-price. The jump component decays as \(\exp(-\lambda\tau(1 - e^{-u^2\sigma_J^2/2})) \to \exp(-\lambda\tau)\) for large \(u\), which is a constant limit. Therefore the jump component decays more slowly than the Heston component for large \(u\), meaning jumps contribute to the heavy tails of the characteristic function and consequently to fatter tails in the return distribution.
Exercise 3. From the worked example, the Bates IV at 1-week 95% moneyness is 28.4% versus Heston's 22.1%. Convert both to Black-Scholes call prices for \(S_0 = 100\), \(K = 95\), \(T = 1/52\), \(r = 3\%\), \(q = 0\). What is the price difference in dollar terms? For a market maker selling this put, discuss why the Bates correction is critical.
Solution to Exercise 3
We use the Black-Scholes formula with \(S_0 = 100\), \(K = 95\), \(T = 1/52 \approx 0.01923\), \(r = 0.03\), \(q = 0\).
Heston IV = 22.1%: Compute \(d_1\) and \(d_2\):
Using standard normal CDF values: \(N(1.7080) \approx 0.9562\) and \(N(1.6774) \approx 0.9533\).
Bates IV = 28.4%: Repeating with \(\sigma = 0.284\):
\(N(1.3363) \approx 0.9093\), \(N(1.2969) \approx 0.9027\).
Price difference: \(C_{\text{Bates}} - C_{\text{Heston}} \approx 5.23 - 5.12 = \$0.11\) per call.
For the corresponding put prices (via put-call parity \(P = C - S + Ke^{-rT}\)):
The put price difference is \(\$0.11\), but in relative terms the Bates put is approximately \(0.18 / 0.07 \approx 2.6\) times the Heston put. For a market maker selling this short-dated OTM put, the Bates correction is critical because:
- The Heston model prices the put at $0.07, while the Bates model prices it at $0.18 --- a factor of 2.6 difference
- Underpricing short-dated puts by using pure Heston exposes the market maker to significant crash risk: if a jump occurs, the payoff can be many multiples of the collected premium
- The jump component captures exactly this tail risk, and ignoring it means systematically undercharging for downside protection at short maturities
Exercise 4. Show that in the limit \(T \to 0\), the Bates ATM skew approaches \(\lambda\mu_J / (2\sqrt{v_0})\), which is finite and proportional to \(\lambda\mu_J\). Compare this with the Heston ATM skew \(\rho\xi/(4\sqrt{v_0})\), which also has a finite short-maturity limit. Why is the jump skew more effective at matching steep market skews?
Solution to Exercise 4
The ATM implied volatility skew is defined as \(\mathcal{S}(T) = \partial\sigma_{\text{imp}} / \partial\ln K \big|_{K=F}\). We analyze the short-maturity limit for both models.
Heston ATM skew: In the standard Heston model, the short-maturity ATM skew converges to a finite constant:
More precisely, \(\mathcal{S}_{\text{Heston}}(T) \approx \rho\xi / (4\sqrt{v_0})\) for small \(T\), which is finite and constant.
Bates ATM skew (jump contribution): The jump component adds a skew contribution. For small \(T\), the leading-order jump contribution to the skew comes from the asymmetry of the jump distribution. Using the expansion of the Bates CF near ATM, the jump-induced skew for small \(T\) is approximately:
The total Bates skew for small \(T\) is therefore:
Both terms are finite as \(T \to 0\), so the total skew has a finite short-maturity limit.
Why the jump skew is more effective: The jump skew component \(\lambda\mu_J / (2\sqrt{v_0})\) is more effective at matching steep market skews for several reasons:
-
Magnitude: The diffusive skew is bounded by \(|\rho\xi| / (4\sqrt{v_0})\), which for typical values (\(|\rho| \leq 1\), \(\xi \leq 1\), \(\sqrt{v_0} \approx 0.2\)) gives at most about \(1.25\). The jump skew \(\lambda\mu_J / (2\sqrt{v_0})\) can be made arbitrarily large by increasing \(\lambda\) without affecting other model features.
-
Independence: The diffusive skew parameters \(\rho\) and \(\xi\) also control the curvature and long-maturity behavior of the smile. Increasing \(|\rho|\) or \(\xi\) to steepen the short-maturity skew distorts the smile at other maturities. The jump parameters \(\lambda\) and \(\mu_J\) affect primarily the short-maturity region, providing a more targeted adjustment.
-
Empirical skew levels: Market short-maturity equity skews (e.g., 1-week S&P 500 skews) are often much steeper than what the Heston diffusive component alone can produce, even with \(\rho = -1\). Jumps provide the additional degrees of freedom needed.
Exercise 5. The Bates model has a partial degeneracy between the diffusive skew parameters \((\rho, \xi)\) and the jump skew parameters \((\lambda, \mu_J)\). Design a calibration strategy that resolves this degeneracy: calibrate \((\rho, \xi)\) primarily from long-maturity smile data (where jumps are negligible) and \((\lambda, \mu_J, \sigma_J)\) from short-maturity data (where jumps dominate). Describe the two-stage procedure.
Solution to Exercise 5
The proposed two-stage calibration strategy exploits the maturity-dependent contributions of diffusive and jump components.
Stage 1: Calibrate diffusive parameters \((\rho, \xi)\) from long-maturity data.
Select options with \(T \geq 1\) year. At these maturities, the jump contribution to the smile is small (as shown in the worked example, the Bates-Heston IV difference is only 0.2--0.5% at 1 year). Therefore, these options are approximately insensitive to the jump parameters.
- Fix preliminary values \(\lambda = 0\), \(\mu_J = 0\), \(\sigma_J = 0\) (pure Heston)
- Calibrate \(\{v_0, \kappa, \theta, \xi, \rho\}\) to the long-maturity surface by minimizing:
This is a standard 5-parameter Heston calibration on a reduced dataset.
Stage 2: Calibrate jump parameters \((\lambda, \mu_J, \sigma_J)\) from short-maturity data.
Fix \(\{v_0, \kappa, \theta, \xi, \rho\}\) from Stage 1. Now calibrate the three jump parameters using options with \(T \leq 3\) months:
This is a 3-dimensional optimization, which is much easier than the full 8-parameter joint calibration.
Optional refinement (Stage 3): Use the Stage 1 + Stage 2 parameters as the initial guess for a joint 8-parameter optimization across all maturities. The good starting point from the two-stage procedure typically leads to rapid convergence and avoids the local minima that plague random initialization of the full problem.
Why this resolves the degeneracy: The key insight is maturity separation. At long maturities, the jump and diffusive skew contributions are asymptotically separable because the jump effect decays as \(1/T\) while the diffusive skew persists. By calibrating \((\rho, \xi)\) in a regime where jumps are negligible, we obtain stable estimates uncontaminated by jump-diffusive cross-talk. The jump parameters are then identified from the short-maturity residuals that \((\rho, \xi)\) alone cannot explain.
Exercise 6. For the Monte Carlo simulation of the Bates model, the expected number of jumps per daily time step is \(\lambda\Delta t = 2.0/252 \approx 0.008\). Compute the probability of zero, one, and two jumps in a single step. Argue that the computational overhead of adding jumps to the QE scheme is negligible because \(P(N_{\Delta t} = 0) > 99\%\) for typical parameters.
Solution to Exercise 6
The number of jumps per step follows a Poisson distribution with parameter \(\mu = \lambda\Delta t = 2.0/252 \approx 0.007937\).
Probability of zero jumps:
Probability of one jump:
Probability of two jumps:
Summary:
| \(N_{\Delta t}\) | Probability |
|---|---|
| 0 | 99.209% |
| 1 | 0.787% |
| 2 | 0.003% |
| \(\geq 3\) | \(< 10^{-6}\) |
The computational overhead argument follows directly:
-
On 99.2% of steps, \(N_{\Delta t} = 0\) and the simulation is identical to pure Heston QE. No jump sizes need to be generated, and the log-price update is the standard QE formula.
-
On 0.8% of steps, exactly one jump occurs. The only additional operations are: (a) sampling one Gaussian \(J \sim \mathcal{N}(\mu_J, \sigma_J^2)\) and (b) adding \(J\) to the log-price. This is \(\mathcal{O}(1)\) additional work.
-
Two or more jumps occur on fewer than 0.004% of steps and can be neglected in the cost analysis.
The total additional cost per path of \(N_{\text{steps}} = 252\) daily steps is approximately \(252 \times 0.008 = 2\) jump events per year, each requiring one Gaussian sample and one addition. Compared to the \(252\) QE variance updates and \(252\) log-price updates in pure Heston, the jump overhead is approximately \(2/504 < 0.4\%\) additional computation. The overhead is indeed negligible.