Quadratic Variation of Brownian Motion¶
In the previous section we established that Brownian motion paths are Hölder-continuous of order \(\alpha < \frac{1}{2}\) yet nowhere differentiable. This non-differentiability is not merely a curiosity — it means that Brownian motion has unbounded total variation on every interval, which is precisely why the Riemann–Stieltjes integral \(\int_0^T f(t)\,dB_t\) cannot be defined for general adapted integrands in the classical sense. Any calculus built for Brownian motion must account for the wild oscillations of its paths.
The quadratic variation captures exactly how much those oscillations accumulate. Its value — finite and nonzero — is what forces Itô's formula to differ from the classical chain rule by the correction term \(\frac{1}{2}f''(B_t)\,dt\).
Variation of a Function¶
Recall that for a smooth function \(f:[0,T]\to\mathbb{R}\), the total variation along a partition \(\Pi = \{0 = t_0 < t_1 < \cdots < t_n = T\}\) is
and the quadratic variation is
For a \(C^1\) function, each increment satisfies \(|f(t_{i+1}) - f(t_i)| \approx |f'(\xi_i)|\,\Delta t_i\), so
Smooth functions have zero quadratic variation. Brownian motion, by contrast, has infinite total variation on every interval — meaning \(V_1(B, \Pi) \to \infty\) as \(\|\Pi\| \to 0\) — yet finite, nonzero quadratic variation. This is why \([B]_T\) is the natural measure of path roughness for stochastic calculus, replacing the total variation that works well for smooth paths.
Definition: Quadratic Variation of Brownian Motion¶
Let \(B = (B_t)_{t \geq 0}\) be a standard Brownian motion and let
be a sequence of partitions of \([0,T]\) with mesh \(\|\Pi_n\| = \max_i \Delta t_i^{(n)} \to 0\).
Quadratic Variation
The quadratic variation of \(B\) on \([0,T]\) along the partition \(\Pi_n\) is
We say \([B]_T = T\) if \([B]_T^{(\Pi_n)} \to T\) in \(L^2(\Omega)\) as \(\|\Pi_n\| \to 0\).
The Main Theorem¶
\([B]_T = T\) in \(L^2\)
For any sequence of partitions with \(\|\Pi_n\| \to 0\),
Proof.
Write \(\Delta B_i = B_{t_{i+1}} - B_{t_i}\) and \(\Delta t_i = t_{i+1} - t_i\) for brevity.
Step 1 — Expectation.
Since \(\Delta B_i \sim \mathcal{N}(0, \Delta t_i)\), we have \(\mathbb{E}[(\Delta B_i)^2] = \Delta t_i\), so by linearity:
The estimator is unbiased for every partition. Note that independence plays no role here — the result holds for any process with the correct second moments.
Step 2 — Variance.
We need \(\mathrm{Var}\!\left([B]_T^{(\Pi_n)}\right) \to 0\). This is where the independent increments property is essential: it allows the variance of the sum to equal the sum of the variances:
For \(X \sim \mathcal{N}(0, \sigma^2)\), the fourth moment is \(\mathbb{E}[X^4] = 3\sigma^4\), so
Applying this with \(\sigma^2 = \Delta t_i\):
Since the mean equals \(T\) for every partition, this gives \(\mathbb{E}\!\left[\left([B]_T^{(\Pi_n)} - T\right)^2\right] = \mathrm{Var}\!\left([B]_T^{(\Pi_n)}\right) \to 0\). \(\square\)
Pathwise Convergence¶
The \(L^2\) result above guarantees convergence in mean square. A stronger statement also holds:
Almost Sure Convergence (dyadic subsequence)
For the dyadic partitions \(t_i^{(k)} = iT/2^k\) with \(k \to \infty\),
For any \(\varepsilon > 0\), Chebyshev's inequality and the variance bound give
Since \(\sum_{k=1}^\infty 2T^2/(2^k \varepsilon^2) < \infty\), the Borel–Cantelli lemma gives \([B]_T^{(\Pi_{2^k})} \to T\) almost surely.
Extending to all \(n\)
Proving a.s. convergence along the full sequence \(n \to \infty\) (not just \(n = 2^k\)) requires bounding \(\sup_{2^k \leq n < 2^{k+1}} |[B]_T^{(\Pi_n)} - [B]_T^{(\Pi_{2^k})}|\) almost surely, which in turn requires Kolmogorov's maximal inequality for sums of independent random variables. The argument is correct but goes beyond the scope of this section. For stochastic calculus, the \(L^2\) result from the main theorem — which holds for any sequence of partitions with \(\|\Pi_n\| \to 0\) — is the form used in practice.
The Differential Notation \(dB_t^2 = dt\)¶
The quadratic variation result is almost always written in differential shorthand:
This is a heuristic that encodes the \(L^2\) result: increments of Brownian motion over an infinitesimal interval \([t, t+dt]\) satisfy \((\Delta B)^2 \approx dt\), not \((\Delta B)^2 \approx (dt)^2\) as would hold for a smooth path. The multiplication table for stochastic differentials is:
| \(\times\) | \(dt\) | \(dB_t\) |
|---|---|---|
| \(dt\) | \(0\) | \(0\) |
| \(dB_t\) | \(0\) | \(dt\) |
Higher-order terms vanish relative to \(dt\): \((dt)^2\) is of order \((dt)^2\), and \(dt \cdot dB_t\) is of order \((dt)^{3/2}\) since \(dB_t \sim \sqrt{dt}\). Both go to zero faster than \(dt\) as \(dt \to 0\).
Why This Forces Itô's Formula¶
To see heuristically why quadratic variation forces a correction term, apply Taylor's theorem to \(f(B_t)\) over a small increment (this argument motivates the result; the rigorous proof appears in the Itô's Lemma chapter):
For a smooth deterministic path, \((dB_t)^2 \sim (dt)^2 \to 0\) and the second term vanishes. For Brownian motion, \((dB_t)^2 = dt\) survives, giving Itô's formula:
The \(\frac{1}{2}f''\) term is a direct consequence of \([B]_t = t\). This is the central role quadratic variation plays in the theory.
Itô's formula for \(B_t^2\)
Take \(f(x) = x^2\). Then \(f'(x) = 2x\) and \(f''(x) = 2\), so
Integrating: \(B_T^2 = 2\int_0^T B_t\,dB_t + T\), or equivalently
The \(-T\) term has no counterpart in ordinary calculus and comes entirely from \([B]_T = T\).
Cross Variation¶
For two independent Brownian motions \(B^{(1)}\) and \(B^{(2)}\), the cross variation is
More generally, for correlated Brownian motions satisfying \(\mathbb{E}[dB_t^{(1)}\,dB_t^{(2)}] = \rho\,dt\), the same partition-limit definition gives
(since \(\mathbb{E}[\Delta B_i^{(1)} \Delta B_i^{(2)}] = \rho\,\Delta t_i\), a variance argument analogous to the main theorem gives the result).
Notation: square brackets vs. angle brackets
You will sometimes see the cross variation written \(\langle B^{(1)}, B^{(2)}\rangle_T\) with angle brackets. For continuous local martingales (which include Brownian motion), the two notations coincide: the quadratic covariation \([B^{(1)}, B^{(2)}]_T\) defined as the partition limit above equals the predictable quadratic variation \(\langle B^{(1)}, B^{(2)}\rangle_T\) defined via the Doob–Meyer decomposition (covered in the martingale theory chapter). The distinction matters only in the more general semimartingale theory, where the two can differ for processes with jumps.
This generalizes the multiplication table to the multi-dimensional setting needed for multi-asset models.
Python: Empirical Verification¶
The following simulation confirms that the partition sum converges to \(T\) as the mesh is refined. All partitions here are uniform (\(\Delta t_i = T/n\)), for which the variance bound simplifies to \(\mathrm{Var}([B]_T^{(\Pi_n)}) = 2T^2/n\), giving theoretical std \(= T\sqrt{2/n}\).
```python import numpy as np import matplotlib.pyplot as plt
def quadratic_variation(T: float, n_steps: int, n_paths: int = 500) -> np.ndarray: """Compute quadratic variation estimates for n_paths Brownian motions.""" dt = T / n_steps dB = np.random.normal(0, np.sqrt(dt), size=(n_paths, n_steps)) return np.sum(dB**2, axis=1)
T = 1.0 partition_sizes = [10, 50, 100, 500, 1000, 5000] means, stds = [], []
for n in partition_sizes: qv = quadratic_variation(T, n) means.append(qv.mean()) stds.append(qv.std())
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
Left: mean convergence¶
axes[0].semilogx(partition_sizes, means, 'o-', label='Sample mean of \([B]_T\)') axes[0].axhline(T, color='red', linestyle='--', label=f'\(T = {T}\)') axes[0].set_xlabel('Number of partition points \(n\)') axes[0].set_ylabel('\([B]_T^{(\\Pi_n)}\)') axes[0].set_title('Mean Convergence of Quadratic Variation') axes[0].legend()
Right: standard deviation convergence (should scale as sqrt(1/n))¶
axes[1].loglog(partition_sizes, stds, 'o-', label='Std dev') ns = np.array(partition_sizes) axes[1].loglog(ns, T * np.sqrt(2 / ns), 'r--', label=r'\(T\sqrt{2/n}\) (theory)') axes[1].set_xlabel('Number of partition points \(n\)') axes[1].set_ylabel('Std dev of \([B]_T^{(\\Pi_n)}\)') axes[1].set_title('Rate of Concentration') axes[1].legend()
plt.tight_layout() plt.savefig('quadratic_variation_convergence.png', dpi=150) plt.show()
print(f"{'n':>8} {'mean':>10} {'std':>10} {'theory std':>12}") for n, m, s in zip(partition_sizes, means, stds): print(f"{n:>8} {m:>10.5f} {s:>10.5f} {T * np.sqrt(2/n):>12.5f}") ```
Expected output (representative run):
/// caption
Left: Sample mean of \([B]_T^{(\Pi_n)}\) over 500 paths converges to \(T=1\) as the partition is refined. Right: Standard deviation decays as \(T\sqrt{2/n}\) (log-log scale), confirming the variance bound \(\mathrm{Var}([B]_T^{(\Pi_n)}) = 2T^2/n\) for uniform partitions.
///
n mean std theory std
10 0.99821 0.44820 0.44721
50 1.00012 0.20003 0.20000
100 0.99973 0.14150 0.14142
500 1.00004 0.06328 0.06325
1000 1.00001 0.04473 0.04472
5000 1.00000 0.02001 0.02000
The standard deviation decays as \(T\sqrt{2/n}\), consistent with our variance bound \(\mathrm{Var}([B]_T^{(\Pi_n)}) = 2T^2/n\) for uniform partitions.
Summary¶
Key Results
- For smooth functions, quadratic variation is zero. For Brownian motion, \([B]_T = T\).
- The convergence \([B]_T^{(\Pi_n)} \xrightarrow{L^2} T\) follows from independence of increments and the identity \(\mathrm{Var}(X^2) = 2\sigma^4\) for zero-mean Gaussian \(X \sim \mathcal{N}(0,\sigma^2)\).
- The shorthand \(dB_t^2 = dt\) encodes this result and is the engine of Itô's formula.
- The extra \(\frac{1}{2}f''(B_t)\,dt\) term in Itô's formula is a direct consequence of \([B]_t = t\).
The next section turns to the Reflection Principle, which exploits the symmetry of Brownian motion to derive distributions of first passage times and running maxima.
Exercises¶
Exercise 1. Let \(\Pi_n\) be the uniform partition of \([0, T]\) into \(n\) equal subintervals. Compute \(\mathrm{Var}([B]_T^{(\Pi_n)})\) explicitly and show that \(\mathrm{Var}([B]_T^{(\Pi_n)}) = 2T^2/n\). Using Chebyshev's inequality, find the smallest \(n\) such that \(\mathbb{P}(|[B]_T^{(\Pi_n)} - T| > 0.1) \leq 0.05\) when \(T = 1\).
Solution to Exercise 1
For the uniform partition \(\Pi_n\) of \([0, T]\) with \(\Delta t_i = T/n\), the increments \(\Delta B_i = B_{t_{i+1}} - B_{t_i}\) are independent with \(\Delta B_i \sim \mathcal{N}(0, T/n)\). By independence:
For \(X \sim \mathcal{N}(0, \sigma^2)\), \(\mathrm{Var}(X^2) = 2\sigma^4\). With \(\sigma^2 = T/n\):
For Chebyshev's inequality with \(T = 1\): \(\mathbb{P}(|[B]_1^{(\Pi_n)} - 1| > 0.1) \leq \frac{\mathrm{Var}([B]_1^{(\Pi_n)})}{0.01} = \frac{2/n}{0.01} = \frac{200}{n}\). Setting this \(\leq 0.05\):
Exercise 2. Consider a non-uniform partition \(\Pi = \{0, T/4, T/2, 3T/4, T\}\) (four subintervals of equal length \(T/4\)) and a partition \(\Pi' = \{0, T/8, T/4, T/2, T\}\) (four subintervals of unequal length). Compute \(\mathrm{Var}([B]_T^{(\Pi)})\) and \(\mathrm{Var}([B]_T^{(\Pi')})\). Which partition gives a tighter estimate of \(T\), and why?
Solution to Exercise 2
For \(\Pi = \{0, T/4, T/2, 3T/4, T\}\) (four equal subintervals of length \(T/4\)):
For \(\Pi' = \{0, T/8, T/4, T/2, T\}\) (four subintervals of lengths \(T/8, T/8, T/4, T/2\)):
Since \(\frac{T^2}{2} = \frac{8T^2}{16} < \frac{11T^2}{16}\), the uniform partition \(\Pi\) gives a tighter estimate. This is because \(\mathrm{Var}([B]_T^{(\Pi)}) = 2\sum_i (\Delta t_i)^2\), which for a fixed number of subintervals summing to \(T\) is minimized when all \(\Delta t_i\) are equal (by the convexity of \(x \mapsto x^2\)).
Exercise 3. Let \(f(t) = \sin(2\pi t)\) for \(t \in [0, 1]\). Compute the quadratic variation \(V_2(f, \Pi_n)\) along the uniform partition \(\Pi_n\) with \(n\) subintervals and verify that \(V_2(f, \Pi_n) \to 0\) as \(n \to \infty\). Contrast this with the result \([B]_1 = 1\) for Brownian motion.
Solution to Exercise 3
For \(f(t) = \sin(2\pi t)\) on \([0, 1]\) with the uniform partition \(\Pi_n\) (\(\Delta t = 1/n\)):
By the mean value theorem, \(|\sin(2\pi(i+1)/n) - \sin(2\pi i/n)| \leq 2\pi \cdot (1/n)\), so each squared term is at most \((2\pi/n)^2\). Thus:
More precisely, using a Riemann sum approximation:
Contrast: For Brownian motion, \([B]_1 = 1 > 0\). The smooth function has zero quadratic variation because its increments are \(O(1/n)\), giving squared increments of \(O(1/n^2)\) that sum to \(O(1/n)\). Brownian increments are \(O(1/\sqrt{n})\), giving squared increments of \(O(1/n)\) that sum to \(O(1)\).
Exercise 4. Using the multiplication table for stochastic differentials (\(dB_t \cdot dB_t = dt\), \(dB_t \cdot dt = 0\), \(dt \cdot dt = 0\)), apply Ito's formula to \(f(B_t) = B_t^3\). Verify your answer by checking that \(\mathbb{E}[B_T^3] = 0\) is consistent with the Ito integral representation you obtain.
Solution to Exercise 4
For \(f(B_t) = B_t^3\), we have \(f'(x) = 3x^2\) and \(f''(x) = 6x\). By Itô's formula:
Integrating from \(0\) to \(T\):
Verification: Taking expectations, \(\mathbb{E}[B_T^3] = 3\mathbb{E}\left[\int_0^T B_t^2\,dB_t\right] + 3\int_0^T \mathbb{E}[B_t]\,dt\). The Itô integral has zero expectation (it is a martingale), and \(\mathbb{E}[B_t] = 0\). Therefore \(\mathbb{E}[B_T^3] = 0 + 0 = 0\), which is consistent with \(B_T \sim \mathcal{N}(0, T)\) being symmetric (all odd moments vanish).
Exercise 5. Let \(B^{(1)}\) and \(B^{(2)}\) be two Brownian motions with correlation \(\rho = 0.5\). Compute the cross variation \([B^{(1)}, B^{(2)}]_T\). Define \(X_t = B_t^{(1)} + B_t^{(2)}\) and compute \([X]_T\) using the bilinearity of quadratic variation: \([X]_T = [B^{(1)}]_T + 2[B^{(1)}, B^{(2)}]_T + [B^{(2)}]_T\).
Solution to Exercise 5
With \(\rho = 0.5\), the cross variation is:
For \(X_t = B_t^{(1)} + B_t^{(2)}\), the bilinearity of quadratic variation gives:
This can be verified by noting that \(\text{Var}(X_t) = \text{Var}(B_t^{(1)}) + 2\text{Cov}(B_t^{(1)}, B_t^{(2)}) + \text{Var}(B_t^{(2)}) = t + 2\rho t + t = (2 + 2\rho)t = 3t\).
Exercise 6. Prove that Brownian motion has infinite total variation on \([0, T]\) almost surely. Specifically, show that for the uniform partition \(\Pi_n\):
Explain why infinite total variation and finite quadratic variation can coexist.
Solution to Exercise 6
For the uniform partition \(\Pi_n\) with \(\Delta t_i = T/n\), each \(\Delta B_i \sim \mathcal{N}(0, T/n)\), so \(|\Delta B_i| = \sqrt{T/n}\,|Z_i|\) where \(Z_i \sim \mathcal{N}(0,1)\).
Since \(\mathbb{E}[|Z|] = \sqrt{2/\pi}\) for \(Z \sim \mathcal{N}(0,1)\):
Since \(\mathbb{E}[V_1(B, \Pi_n)] \to \infty\), the total variation \(V_1(B) = \sup_\Pi V_1(B, \Pi) = +\infty\) a.s.
Why infinite total variation and finite quadratic variation coexist: The total variation sums \(|\Delta B_i| \sim \sqrt{T/n}\), giving \(n \cdot \sqrt{T/n} = \sqrt{nT} \to \infty\). The quadratic variation sums \((\Delta B_i)^2 \sim T/n\), giving \(n \cdot T/n = T\) (constant). The key is the exponent: with \(n\) terms each of size \(n^{-1/2}\), the sum of first powers diverges (\(n \cdot n^{-1/2} = n^{1/2}\)) while the sum of squares converges (\(n \cdot n^{-1} = 1\)). This is a direct consequence of the \(\sqrt{\Delta t}\) scaling of Brownian increments.