Black-Karasinski Python Implementation Guide¶
This guide describes the design and usage of the Black-Karasinski (BK) trinomial tree implementation in the companion python_implementation.py module. The BK model is non-affine and has no closed-form bond or option prices, so a trinomial tree is the standard pricing engine. The code constructs the tree in log-rate space \(x = \ln r\), calibrates the time-dependent drift \(\theta(t)\) to the market yield curve by forward induction, and prices derivatives by backward induction. Each method is mapped to its mathematical formula from the preceding theory sections.
Prerequisites
- Log-Normal Short Rate SDE (BK dynamics: \(d(\ln r) = [\theta(t) - a\ln r]\,dt + \sigma\,dW\))
- Trinomial Tree Implementation (tree construction theory)
- No Closed-Form Bond Prices (why numerical methods are required)
Learning Objectives
By the end of this guide, you will be able to:
- Explain the class architecture and its inputs (yield curve, mean reversion, volatility)
- Map the trinomial tree construction to the branching probability formulas
- Describe how \(\theta(t_k)\) is calibrated at each time step via forward induction
- Price zero-coupon bonds and caplets by backward induction on the tree
- Identify numerical parameters (time steps, \(j_{\max}\)) and their impact on accuracy
Class Overview¶
The BlackKarasinski class encapsulates the model and the tree:
python
class BlackKarasinski:
def __init__(self, a, sigma, P, n_steps=100):
self.a = a # mean-reversion speed
self.sigma = sigma # log-rate volatility
self.P = P # market discount curve P^M(0, T)
self.n_steps = n_steps
The market curve P is a callable \(P^M: [0, \infty) \to (0, 1]\) satisfying \(P^M(0) = 1\). All tree construction and pricing methods derive from these three inputs.
Tree Construction¶
Grid parameters¶
The tree discretizes the log-rate \(x = \ln r\) on a uniform grid:
The time step \(\Delta t = T_{\max}/N\) where \(N\) is n_steps and \(T_{\max}\) is the longest maturity to be priced.
Branching probabilities¶
At node \((t_k, x_j)\) with conditional drift \(\mu_j = \theta(t_k) - a\,x_j\), the standard branching probabilities are:
When \(|j| > j_{\max}\), the code switches to up-branching or down-branching geometry to keep all probabilities in \([0, 1]\).
python
def _branching_probs(self, j, theta_k, dt, dx):
mu = (theta_k - self.a * j * dx) * dt
if abs(j) <= self.j_max:
# Standard branching
pu = 1/6 + (mu**2 + mu * dx) / (2 * dx**2)
pm = 2/3 - mu**2 / dx**2
pd = 1/6 + (mu**2 - mu * dx) / (2 * dx**2)
elif j > 0:
# Down branching for high rates
...
else:
# Up branching for low rates
...
return pu, pm, pd
Calibration of Theta¶
Forward induction¶
The drift \(\theta(t_k)\) at each time step is determined by matching the market discount factor \(P^M(0, t_{k+1})\). The procedure is:
- At time \(t_0 = 0\), the tree has a single node at \(x_0 = \ln r(0)\) with Arrow-Debreu price \(Q(0, 0) = 1\)
- At each subsequent step \(t_k\), the Arrow-Debreu prices \(Q(k, j)\) propagate forward:
- Choose \(\theta(t_k)\) so that
This is a one-dimensional root-finding problem (bisection or Newton) at each time step because \(\theta(t_k)\) affects the branching probabilities and hence the forward propagation.
python
def calibrate_theta(self, T_max):
"""Calibrate theta(t_k) at each time step."""
dt = T_max / self.n_steps
dx = self.sigma * np.sqrt(3 * dt)
...
for k in range(self.n_steps):
# Solve for theta[k] matching P^M(0, t_{k+1})
target = self.P((k + 1) * dt)
theta_k = brentq(lambda th: self._forward_step(Q, th, k, dt, dx) - target, -2, 2)
self.theta[k] = theta_k
Q = self._propagate(Q, theta_k, k, dt, dx)
Bond and Derivative Pricing¶
Zero-coupon bond¶
To price a bond maturing at \(T\):
- Set terminal values: \(V(T, x_j) = 1\) for all \(j\)
- Roll backward: \(V(t_k, x_j) = e^{-e^{x_j}\Delta t}[p_u V(t_{k+1}, j_u) + p_m V(t_{k+1}, j_m) + p_d V(t_{k+1}, j_d)]\)
- Read \(V(0, x_0)\)
Caplet¶
A caplet with reset \(T_i\) and payment \(T_{i+1}\):
- Compute the LIBOR rate at each node at \(T_i\): \(L_j = \frac{1}{\delta}(1/P_{\text{tree}}(T_i, T_{i+1}; x_j) - 1)\)
- Set payoff at \(T_{i+1}\): \(V(T_{i+1}, x_j) = \delta[L_j - K]^+\)
- Roll backward to \(t = 0\)
European swaption¶
Swaptions are priced similarly: compute the swap value at the exercise date at each node, set the payoff as \([\text{swap value}]^+\), and roll backward.
Numerical Parameters¶
| Parameter | Default | Effect |
|---|---|---|
n_steps |
100 | Higher = more accurate, \(O(N^2)\) cost |
| \(j_{\max}\) | \(\lceil 0.184/(a\Delta t)\rceil\) | Controls tree width |
| \(\Delta x\) | \(\sigma\sqrt{3\Delta t}\) | Log-rate spacing |
Convergence Testing
Run pricing at 50, 100, and 200 steps. If the 100-step and 200-step prices agree to 4 decimal places, 100 steps is sufficient. For swaptions with long tenors (10Y+), 200+ steps may be needed.
Summary¶
| Component | Method | Mathematical basis |
|---|---|---|
| Tree grid | __init__ |
\(\Delta x = \sigma\sqrt{3\Delta t}\), \(j_{\max}\) cutoff |
| Probabilities | _branching_probs |
Mean/variance matching for OU in log-rate |
| \(\theta\) calibration | calibrate_theta |
Forward induction matching \(P^M(0, t_k)\) |
| Bond pricing | price_bond |
Backward induction with \(e^{x_j}\) discounting |
| Caplet pricing | price_caplet |
LIBOR payoff + backward induction |
For calibration of the model parameters \((a, \sigma)\) to market cap volatilities using this tree, see Calibration to Cap Volatilities. For a comparison with the Hull-White tree implementation, see Comparison with Hull-White.
Exercises¶
Exercise 1. Given \(a = 0.10\), \(\sigma = 0.20\), and \(\Delta t = 0.01\) year, compute the grid spacing \(\Delta x = \sigma\sqrt{3\Delta t}\) and the maximum node index \(j_{\max} = \lceil 0.184/(a\Delta t) \rceil\). How many nodes exist at a given time step (in terms of \(j_{\max}\))?
Solution to Exercise 1
With \(a = 0.10\), \(\sigma = 0.20\), \(\Delta t = 0.01\):
Grid spacing:
Maximum node index:
Number of nodes at a given time step: The nodes run from \(j = -j_{\max}\) to \(j = +j_{\max}\), giving
Note that in practice, not all nodes may be reachable at every time step (the tree grows gradually), but at later time steps, up to 369 nodes can be active.
Exercise 2. At node \((t_k, x_j)\) with \(j = 3\), \(\Delta x = 0.03464\), \(\theta(t_k) = -0.25\), \(a = 0.10\), and \(\Delta t = 0.01\), compute the conditional drift \(\mu_j = \theta(t_k) - a \cdot j \cdot \Delta x\) and then the standard branching probabilities \(p_u\), \(p_m\), \(p_d\). Verify that \(p_u + p_m + p_d = 1\) and all probabilities are non-negative.
Solution to Exercise 2
At node \((t_k, x_j)\) with \(j = 3\), \(\Delta x = 0.03464\), \(\theta(t_k) = -0.25\), \(a = 0.10\), \(\Delta t = 0.01\):
Conditional drift:
Drift times time step:
Intermediate quantities:
Branching probabilities:
Verification: \(p_u + p_m + p_d = 0.13191 + 0.66102 + 0.20707 = 1.00000\). All probabilities are non-negative: \(p_u = 0.132 > 0\), \(p_m = 0.661 > 0\), \(p_d = 0.207 > 0\).
The negative drift (\(\mu_j < 0\)) shifts probability toward the down branch (\(p_d > p_u\)), reflecting mean reversion pulling the rate back toward the center when \(j > 0\) (rate above the long-run mean).
Exercise 3. Explain the role of the Arrow-Debreu prices \(Q(k, j)\) in the forward induction procedure. If at time step \(k\) there are 5 active nodes with Arrow-Debreu prices \(Q(k, -2) = 0.02\), \(Q(k, -1) = 0.15\), \(Q(k, 0) = 0.55\), \(Q(k, 1) = 0.20\), \(Q(k, 2) = 0.03\), what is the model discount factor \(P^{\text{model}}(0, t_k)\)? How does this relate to the calibration condition?
Solution to Exercise 3
Arrow-Debreu prices \(Q(k, j)\) represent the present value at time \(0\) of a security that pays $1 if and only if node \((t_k, j)\) is reached. They aggregate the probability-weighted discounting across all paths from the root to that node.
Role in forward induction: The Arrow-Debreu prices carry the combined information of path probabilities and accumulated discounting. They propagate forward through the tree via
This eliminates the need for backward induction when calibrating \(\theta(t_k)\).
Model discount factor: The model discount factor to time \(t_k\) is the sum of all Arrow-Debreu prices at that time step:
This is because a zero-coupon bond paying $1 at \(t_k\) pays $1 at every node at \(t_k\), so its value is the sum of all state prices.
With the given values:
Relation to calibration: The calibration condition requires \(P^{\text{model}}(0, t_k) = P^{\text{mkt}}(0, t_k)\). If the market discount factor to \(t_k\) is 0.95, then the model is correctly calibrated at this step. The forward induction calibration adjusts \(\theta(t_{k-1})\) (the drift at the previous step) to ensure this equality holds at each time step.
Exercise 4. The calibrate_theta method uses brentq to solve for \(\theta(t_k)\) at each time step. Explain why the calibration equation \(\sum_j Q(k+1, j) = P^{\text{mkt}}(0, t_{k+1})\) is monotone in \(\theta(t_k)\) (i.e., why the root is unique). What is the economic intuition for why increasing \(\theta(t_k)\) changes the sum of Arrow-Debreu prices?
Solution to Exercise 4
The calibration equation is \(\sum_j Q(k+1, j) = P^{\text{mkt}}(0, t_{k+1})\), where
Why it is nonlinear in \(\theta_k\): The branching probabilities \(p_u\), \(p_m\), \(p_d\) depend on \(\theta_k\) through the conditional drift \(\mu_j = \theta_k - ax_j\). Specifically, the probabilities are quadratic polynomials in \(\mu_j \Delta t\) (and hence in \(\theta_k\)). When these probabilities multiply the Arrow-Debreu prices \(Q(k,j)\) and sum over \(j\), the resulting function of \(\theta_k\) is a sum of quadratics in \(\theta_k\), making the overall equation nonlinear (quadratic) in \(\theta_k\).
Economic intuition for monotonicity: Increasing \(\theta_k\) raises the drift of the log rate \(x_{t_{k+1}}\), which shifts the short rate distribution at \(t_{k+1}\) upward (higher rates). Higher short rates at \(t_{k+1}\) mean higher discounting, which reduces the one-period discount factors \(e^{-e^{x_j}\Delta t}\). This in turn reduces the Arrow-Debreu prices \(Q(k+1, j)\) and hence their sum. Therefore:
The function \(\sum_j Q(k+1, j)\) is monotonically decreasing in \(\theta_k\), guaranteeing a unique root for any target \(P^{\text{mkt}}(0, t_{k+1}) \in (0, 1)\).
Robustness of bisection vs. Newton: Bisection is more robust because it only requires monotonicity (guaranteed) and bounded brackets. Newton's method is faster (quadratic convergence vs. linear) but requires computing the derivative of \(\sum_j Q(k+1, j)\) with respect to \(\theta_k\), which involves differentiating the branching probabilities. In practice, both work well, but bisection (or Brent's method, which combines bisection with interpolation) is preferred for its guaranteed convergence without derivative computation.
Exercise 5. Describe how the backward induction for a caplet differs from that for a zero-coupon bond. Specifically, for a caplet with reset \(T_i\) and payment \(T_{i+1}\), explain: (i) what terminal condition is set at \(T_{i+1}\), (ii) how the LIBOR rate \(L_j\) is computed at each node at \(T_i\), and (iii) why a separate bond price computation is needed at \(T_i\) nodes.
Solution to Exercise 5
Zero-coupon bond: Terminal condition \(V(T, x_j) = 1\) for all \(j\). Backward induction rolls the constant terminal value back through the tree, with discounting at each node. No payoff computation is needed at intermediate nodes --- only discounting.
Caplet: The differences are:
(i) Terminal condition at \(T_{i+1}\): Instead of \(V = 1\), the caplet payoff is \(V(T_{i+1}, x_j) = \delta[L_j - K]^+\), where \(L_j\) is the LIBOR rate at node \(j\) at the reset date \(T_i\). Strictly speaking, the payoff at \(T_{i+1}\) depends on the LIBOR rate observed at \(T_i\), so the payoff must be set based on the rate information at \(T_i\).
(ii) LIBOR rate computation at \(T_i\): At each node \((T_i, x_j)\), the one-period bond price \(P_{\text{tree}}(T_i, T_{i+1}; x_j)\) is computed by a single-step backward induction:
(since the bond pays 1 at all successor nodes). Then the LIBOR rate is
(iii) Why a separate bond price is needed: The LIBOR rate \(L_j\) is not directly available from the log-rate \(x_j\); it depends on the one-period discount factor, which requires knowing how the tree discounts between \(T_i\) and \(T_{i+1}\). This is a bond price computation (albeit a simple one-step one). Without this bond price, the caplet payoff cannot be evaluated, since the caplet is an option on the LIBOR rate, not on the short rate directly. The relationship \(L_j \approx e^{x_j}\) holds only approximately for small \(\delta\); for accurate pricing, the exact tree-based bond price must be used.
Exercise 6. A practitioner runs the BK tree with n_steps = 50 and gets a 5-year zero-coupon bond price of 0.7695. With n_steps = 100, the price is 0.7700, and with n_steps = 200, it is 0.7701. Use Richardson extrapolation to estimate the converged bond price from the 50-step and 100-step results. How does your extrapolated value compare to the 200-step result?
Solution to Exercise 6
The tree convergence is first order: the error is approximately \(E(N) = c/N + O(1/N^2)\) for some constant \(c\), where \(N\) is n_steps.
Given:
- \(P_{50} = 0.7695\) (50 steps)
- \(P_{100} = 0.7700\) (100 steps)
- \(P_{200} = 0.7701\) (200 steps)
For first-order convergence with \(\Delta t \propto 1/N\), the error scales as \(E_N \approx c \cdot \Delta t = c'/N\). With two resolutions \(N\) and \(2N\):
Subtracting: \(P_N - P_{2N} = \frac{c'}{2N}\), so \(c' = 2N(P_N - P_{2N})\).
The Richardson extrapolation combines the two:
Using \(N = 50\) and \(2N = 100\):
Comparing with the 200-step result of \(0.7701\), the Richardson extrapolated value (\(0.7705\)) overshoots slightly, suggesting that the convergence is not purely first-order (there are higher-order terms). Nevertheless, the Richardson estimate (\(0.7705\)) is closer to the converged value than either the 50-step (\(0.7695\)) or 100-step (\(0.7700\)) results alone.
A more refined approach would use 100 and 200 steps for Richardson extrapolation: \(P_{\text{Rich}} = 2 \times 0.7701 - 0.7700 = 0.7702\), which would be expected to be more accurate since both base estimates are closer to convergence.
Exercise 7. The code switches from standard branching to up/down branching when \(|j| > j_{\max}\). Explain why standard branching produces negative probabilities for large \(|j|\) by examining the formula for \(p_m\) as \(|\mu_j \Delta t|\) grows relative to \(\Delta x\). What is the economic scenario (in terms of the short rate level) that triggers the branching switch?
Solution to Exercise 7
The standard branching middle probability is
where \(\mu_j = \theta(t_k) - a \cdot j \cdot \Delta x\). As \(|j|\) increases, \(|a \cdot j \cdot \Delta x|\) grows, making \(|\mu_j|\) larger (the mean-reverting drift becomes stronger at nodes far from the center).
With \(\Delta x = \sigma\sqrt{3\Delta t}\), we have \(\Delta x^2 = 3\sigma^2 \Delta t\), so
For \(p_m\) to become negative, we need
Since \(\mu_j \approx -a \cdot j \cdot \Delta x\) for large \(|j|\) (the \(\theta\) contribution is bounded), the condition becomes approximately
The threshold \(j_{\max} = \lceil 0.184/(a\Delta t) \rceil\) is set conservatively below this bound to also ensure \(p_u \geq 0\) and \(p_d \geq 0\).
Economic scenario: Large \(|j|\) corresponds to the log rate \(x_j = j \cdot \Delta x\) being far from the center. For \(j \gg 0\), the short rate \(r = e^{x_j}\) is extremely high; for \(j \ll 0\), the rate is extremely low. In both cases, the mean-reverting drift is very strong (pulling rates back to normal levels), and the drift displacement \(|\mu_j \Delta t|\) exceeds the grid spacing \(\Delta x\). This means the expected move in one time step is larger than one grid step, making the standard three-node branching unable to allocate non-negative probabilities. The up/down branching shifts the successor nodes to accommodate the large drift, ensuring valid probabilities while preserving the mean and variance matching conditions.