Moment Analysis of SDEs¶
In many stochastic differential equations the full probability distribution of the solution is difficult or impossible to obtain explicitly. Moment analysis provides an alternative: instead of solving for the entire distribution, we track a small set of summary statistics — expectations, variances, and correlations — that already capture many economically and physically relevant properties such as growth rates, volatility, persistence, and equilibrium behavior.
For Gaussian processes such as Brownian motion and the Ornstein–Uhlenbeck process, the distribution is fully determined by the first two moments. For nonlinear diffusion models such as geometric Brownian motion and the CIR process, higher moments reveal how multiplicative or state-dependent noise affects skewness, tail behavior, and long-run variability.
The models in this section illustrate three fundamental types of stochastic behavior: pure diffusion (Brownian motion), multiplicative noise (geometric Brownian motion), and mean reversion (Vasicek and CIR). For each model we develop moment formulas using three main techniques: direct computation from explicit solutions, Itô's lemma applied to powers, and moment ODEs derived from the infinitesimal generator.
Learning Goals
After completing this section you should be able to:
- compute expectations, variances, and higher moments for Gaussian and log-normal SDE models
- use Itô isometry to evaluate variances of stochastic integrals
- derive moment ODEs using Itô's lemma applied to powers of the process
- understand how mean reversion produces stationary distributions
- recognize the effect of state-dependent volatility on variance formulas
1. Brownian Motion¶
SDE¶
Standard Brownian motion \(W_t\) is characterized by \(W_0 = 0\), independent increments, and \(W_t - W_s \sim \mathcal{N}(0, t - s)\) for \(s < t\). It is the canonical example of a process with no drift — all randomness, no systematic direction.
Moments¶
By definition of standard Brownian motion, \(W_t \sim \mathcal{N}(0, t)\):
Higher Moments¶
For a centered Gaussian random variable \(X \sim \mathcal{N}(0, \sigma^2)\), all odd moments vanish by symmetry. The even moments are
where \((2k-1)!! = 1 \cdot 3 \cdot 5 \cdots (2k-1)\) is the double factorial.
For Brownian motion (\(\sigma^2 = t\)):
| \(k\) | \(\mathbb{E}[W_t^{2k}]\) |
|---|---|
| 1 | \(t\) |
| 2 | \(3t^2\) |
| 3 | \(15t^3\) |
| 4 | \(105t^4\) |
Moment Generating Function¶
This follows from the Gaussian integral:
An equivalent and widely used form is the exponential martingale identity:
This identity connects moment computations to the martingale techniques used in mathematical finance.
Interpretation¶
Brownian motion has zero mean at all times, so there is no systematic drift. The variance grows linearly in time, reflecting the diffusive spreading of paths. Because \(W_t\) is Gaussian, its distribution is completely determined by the first two moments — all higher moments follow from the variance alone.
2. Brownian Motion with Drift¶
SDE and Solution¶
Integrating directly:
First Two Moments¶
Since \(W_t \sim \mathcal{N}(0, t)\):
Distribution: \(X_t \sim \mathcal{N}(X_0 + \mu t,\; \sigma^2 t)\)
Higher Moments and MGF¶
The central moments follow the Gaussian formula:
The moment generating function is
Conditional Moments¶
For \(s < t\), the increment \(X_t - X_s = \mu(t - s) + \sigma(W_t - W_s)\) is independent of \(\mathcal{F}_s\) because Brownian motion has independent increments:
Interpretation¶
The drift \(\mu\) shifts the mean linearly in time, while the variance grows at rate \(\sigma^2\) regardless of \(\mu\). The process has no memory of its past beyond the current value, a direct consequence of independent increments.
3. Geometric Brownian Motion¶
SDE and Solution¶
Applying Itô's lemma to \(\log S_t\):
Since \(\log S_t \sim \mathcal{N}\!\left(\log S_0 + (\mu - \sigma^2/2)t,\; \sigma^2 t\right)\), the process \(S_t\) follows a log-normal distribution.
First Two Moments¶
For a log-normal variable, if \(Z \sim \mathcal{N}(m, v^2)\) then \(\mathbb{E}[e^Z] = e^{m + v^2/2}\).
Expectation:
The Itô correction \(-\sigma^2/2\) in the exponent is exactly cancelled by the factor \(\mathbb{E}[e^{\sigma W_t}] = e^{\sigma^2 t/2}\).
Second moment:
Variance:
General Power Moments¶
For any real \(n\), we apply the log-normal MGF with parameter \(u = n\sigma\):
Combining the exponents gives
Verification: Setting \(n = 1\) gives \(S_0 e^{\mu t}\) and \(n = 2\) gives \(S_0^2 e^{2\mu t + \sigma^2 t}\), consistent with the first two moments above.
Note: this is the \(n\)-th power moment, not the moment generating function. The ordinary MGF \(\mathbb{E}[e^{uS_t}]\) of a log-normal random variable does not admit a closed-form expression for \(u > 0\).
Conditional Moments¶
For \(s < t\), using \(S_t = S_s \exp[(\mu - \sigma^2/2)(t-s) + \sigma(W_t - W_s)]\) and the independence of the increment \(W_t - W_s\) from \(\mathcal{F}_s\):
Skewness and Kurtosis¶
The log-normal distribution is right-skewed and heavy-tailed:
Both increase exponentially with \(\sigma^2 t\): as time grows or volatility rises, the distribution becomes increasingly right-skewed and heavy-tailed. This is a direct consequence of the multiplicative noise structure.
Interpretation¶
GBM has the property that the expected value grows exponentially at rate \(\mu\), independent of \(\sigma\). However, the variance also grows exponentially, and the distribution becomes increasingly right-skewed over time. The median \(S_0 e^{(\mu - \sigma^2/2)t}\) grows more slowly than the mean — a direct consequence of the Itô correction. The gap between mean and median reflects the fact that mean growth is driven by rare large outcomes in the right tail.
4. Vasicek Model (Ornstein–Uhlenbeck)¶
SDE and Solution¶
where \(a > 0\) is the mean reversion speed, \(\theta\) is the long-term mean, and \(\sigma > 0\) is the volatility.
Using the integrating factor \(e^{at}\):
Expectation¶
The stochastic integral has zero expectation, giving
As \(t \to \infty\), \(\mathbb{E}[r_t] \to \theta\): the process mean-reverts to the long-term level.
Variance via Itô Isometry¶
The variance comes entirely from the stochastic integral term \(I_t = \sigma\int_0^t e^{-a(t-s)}\,dW_s\).
By Itô isometry, \(\mathbb{E}\!\left[\left(\int_0^t f(s)\,dW_s\right)^2\right] = \int_0^t f(s)^2\,ds\), so we square the kernel and integrate:
As \(t \to \infty\), \(\operatorname{Var}(r_t) \to \dfrac{\sigma^2}{2a}\).
Stationary distribution: \(r_\infty \sim \mathcal{N}\!\left(\theta,\; \dfrac{\sigma^2}{2a}\right)\)
Conditional Moments¶
For \(s < t\), restarting the process at time \(s\):
Covariance and Correlation¶
For \(s < t\):
This follows from the tower property: \(\operatorname{Cov}(r_s, r_t) = \operatorname{Cov}(r_s, \mathbb{E}[r_t \mid r_s]) = e^{-a(t-s)} \operatorname{Var}(r_s)\), since \(\mathbb{E}[r_t \mid r_s]\) is an affine function of \(r_s\).
In the stationary regime where \(\operatorname{Var}(r_t) = \sigma^2/(2a)\) for all relevant times, the correlation simplifies to
This exponentially decaying autocorrelation — depending only on the time difference \(|t - s|\) — is a hallmark of second-order stationarity and is characteristic of Ornstein–Uhlenbeck processes.
Centered Ornstein–Uhlenbeck Process
Setting \(\theta = 0\) gives the centered OU process \(dX_t = -aX_t\,dt + \sigma\,dW_t\) with \(\mathbb{E}[X_t] = x_0\,e^{-at}\) and \(\operatorname{Var}(X_t) = \frac{\sigma^2}{2a}(1 - e^{-2at})\). All formulas above apply with \(\theta = 0\).
Interpretation¶
Mean reversion creates a fundamental tension between the deterministic pull toward \(\theta\) and the random shocks that push the process away. The stationary variance \(\sigma^2/(2a)\) reflects this balance: stronger mean reversion (larger \(a\)) reduces the equilibrium spread, while higher volatility increases it. The exponential memory decay means that the process effectively forgets its past on a timescale of \(1/a\).
Structural Insight: Mean Reversion vs Diffusion
Brownian motion and geometric Brownian motion have variances that grow without bound over time. The OU/Vasicek process behaves differently: mean reversion counteracts the accumulation of random shocks, so the variance converges to a finite limit \(\sigma^2/(2a)\), producing a stationary distribution. This balance between deterministic pull (\(a\)) and stochastic forcing (\(\sigma\)) is a central theme in many stochastic models, including the CIR and Heston models used in finance.
5. CIR Model¶
SDE¶
where \(a, \theta, \sigma > 0\). The Feller condition \(2a\theta \geq \sigma^2\) ensures that the boundary at zero is unattainable, so \(r_t > 0\) for all \(t > 0\). Without this condition, the process may reach zero but remains nonnegative.
The CIR model shares the mean-reverting drift of the Vasicek model, but its diffusion coefficient \(\sigma\sqrt{r_t}\) depends on the state. This state dependence prevents the process from becoming negative and makes the moment analysis more involved.
Expectation via Moment ODE¶
To derive \(\mathbb{E}[r_t]\), we take expectations of both sides of the integral form of the SDE:
The stochastic integral \(\int_0^t \sigma\sqrt{r_s}\,dW_s\) is a martingale with zero expectation, provided the integrand is square-integrable. For the CIR process, \(\mathbb{E}\!\left[\int_0^t r_s\,ds\right] < \infty\), so this condition is satisfied.
Differentiating with respect to \(t\) gives the ODE
This is a linear first-order ODE with solution
The expectation is identical to the Vasicek model: the state-dependent volatility does not affect the mean because the diffusion term contributes zero expected drift.
Variance via Second Moment ODE¶
To find the variance, we first derive an ODE for \(\mathbb{E}[r_t^2]\).
Apply Itô's lemma to \(f(r) = r^2\) with \(f'(r) = 2r\) and \(f''(r) = 2\):
Taking expectations (the stochastic integral term vanishes):
This is a linear first-order ODE for \(m_2(t) = \mathbb{E}[r_t^2]\) with known forcing from \(m(t) = \mathbb{E}[r_t]\). Using the integrating factor \(e^{2at}\):
Substituting \(m(s) = r_0\,e^{-as} + \theta(1 - e^{-as})\) and integrating:
Computing \(\operatorname{Var}(r_t) = m_2(t) - m(t)^2\) and simplifying:
The first term captures the contribution of the initial condition (which decays over time), while the second term captures the contribution of the equilibrium level \(\theta\).
Long-Term Behavior¶
Compare with the Vasicek long-term variance \(\sigma^2/(2a)\): the CIR variance has an extra factor of \(\theta\) because the diffusion magnitude \(\sigma\sqrt{r_t}\) grows with the level of the process, so higher equilibrium levels produce larger fluctuations.
The stationary distribution of the CIR process is a Gamma distribution with mean \(\theta\) and variance \(\theta\sigma^2/(2a)\). The transition distribution (the conditional distribution of \(r_t\) given \(r_0\) at finite times) follows a scaled noncentral chi-square distribution.
Conditional Moments¶
For \(s < t\):
The conditional variance depends on the current level \(r_s\), which is a direct consequence of state-dependent volatility. Higher rates produce larger fluctuations.
Interpretation¶
The CIR model demonstrates why state-dependent volatility matters. The expectation follows the same formula as Vasicek, but the variance has a fundamentally different structure: it depends on the initial condition \(r_0\) in a way that reflects the \(\sqrt{r_t}\) diffusion. Near zero, the volatility vanishes, which (together with the Feller condition) prevents the process from becoming negative. This makes CIR suitable for modeling interest rates and other inherently non-negative quantities.
6. General Techniques for Moment Computation¶
Four main techniques are used to compute moments of SDE solutions.
Method 1 — Direct Computation from Explicit Solutions¶
When an explicit solution exists, moments can be computed directly using known distributions. This approach gives the most transparent formulas but only works for a limited class of models.
Examples in this chapter: Brownian motion and BM with drift use Gaussian moments; GBM uses log-normal power moments; Vasicek uses the Gaussian stochastic integral representation.
Method 2 — Itô's Lemma Applied to Powers¶
To find \(\mathbb{E}[X_t^n]\), apply Itô's lemma to \(f(x) = x^n\):
Taking expectations eliminates the stochastic integral and yields an ODE for the \(n\)-th moment. This technique works even when explicit solutions are unavailable.
Example in this chapter: in the CIR model, applying Itô's lemma to \(r_t^2\) produced the second-moment ODE used to derive the variance.
Method 3 — Moment ODEs from the Infinitesimal Generator¶
For an SDE \(dX_t = b(X_t)\,dt + \sigma(X_t)\,dW_t\), the infinitesimal generator is
The evolution of expectations is governed by
Setting \(f(x) = x^n\) produces moment ODEs. For processes with polynomial drift and diffusion coefficients (such as OU and CIR), this generates a closed hierarchy where each moment ODE depends only on lower moments.
Method 4 — Characteristic Functions¶
The characteristic function \(\phi(u, t) = \mathbb{E}[e^{iuX_t}]\) encodes all moments via
Vasicek and CIR are affine diffusions, meaning their characteristic functions have exponential-affine form and satisfy systems of Riccati ODEs. This structure extends to the Heston stochastic volatility model and affine term-structure models, making the characteristic function approach the most powerful method when direct moment computation becomes intractable.
7. Comparison Table¶
| Model | \(\mathbb{E}[X_t]\) | \(\operatorname{Var}(X_t)\) | Distribution | Volatility |
|---|---|---|---|---|
| BM | \(0\) | \(t\) | Gaussian | constant |
| BM with drift | \(X_0 + \mu t\) | \(\sigma^2 t\) | Gaussian | constant |
| GBM | \(S_0 e^{\mu t}\) | \(S_0^2 e^{2\mu t}(e^{\sigma^2 t} - 1)\) | log-normal | multiplicative |
| Vasicek | \(r_0 e^{-at} + \theta(1-e^{-at})\) | \(\frac{\sigma^2}{2a}(1-e^{-2at})\) | Gaussian | constant |
| CIR | \(r_0 e^{-at} + \theta(1-e^{-at})\) | \(\frac{\sigma^2}{a}[r_0 e^{-at}(1-e^{-at}) + \frac{\theta}{2}(1-e^{-at})^2]\) | noncentral \(\chi^2\) / Gamma | state-dependent |
Key observations:
- The mean of CIR and Vasicek is identical — state-dependent volatility does not affect the expected value
- Variance grows linearly for BM models but converges to a finite limit for mean-reverting models
- GBM variance grows exponentially, reflecting the compounding effect of multiplicative noise
8. Computational Strategy¶
When computing moments for a new SDE:
- Check for an explicit solution. If one exists, use the distribution directly.
- Apply Itô's lemma to powers of the process. This produces moment ODEs.
- Solve the ODE hierarchy. For many models, the first and second moment ODEs are linear and solvable in closed form.
- Verify via simulation for models where analytical results are complex. For processes with state-dependent diffusion (such as CIR), standard Euler schemes can introduce discretization bias near boundaries — exact simulation or higher-order methods may be needed for reliable verification. See the Simulation page for implementation details.
Key Takeaway
Moment analysis extracts quantitative information from SDEs without solving for the full distribution. Gaussian processes (Brownian motion, Vasicek) have moments determined entirely by the mean and variance. Log-normal processes (GBM) require power-moment techniques. State-dependent volatility (CIR) couples the moment hierarchy, requiring ODE methods. In all cases, Itô isometry and Itô's lemma are the essential computational tools.
Exercises¶
Exercise 1. For Brownian motion with drift \(dX_t = 2\,dt + 3\,dW_t\) with \(X_0 = 1\):
(a) Compute \(\mathbb{E}[X_5]\) and \(\operatorname{Var}[X_5]\).
(b) Find \(\mathbb{E}[X_5^2]\).
(c) Compute the fourth central moment \(\mathbb{E}[(X_5 - \mathbb{E}[X_5])^4]\).
Solution to Exercise 1
The SDE is \(dX_t = 2\,dt + 3\,dW_t\) with \(X_0 = 1\), so \(X_t = 1 + 2t + 3W_t\).
At \(t = 5\): \(X_5 = 1 + 10 + 3W_5\), where \(W_5 \sim \mathcal{N}(0, 5)\).
(a)
(b) Using \(\mathbb{E}[X_5^2] = \operatorname{Var}[X_5] + (\mathbb{E}[X_5])^2\):
(c) Since \(X_5 - \mathbb{E}[X_5] = 3W_5 \sim \mathcal{N}(0, 45)\), this is a centered Gaussian with variance \(\sigma^2 = 45\). The fourth central moment of a Gaussian is \(3\sigma^4\):
Exercise 2. For GBM with \(\mu = 0.08\), \(\sigma = 0.3\), and \(S_0 = 50\):
(a) Compute \(\mathbb{E}[S_2]\) and \(\operatorname{Var}[S_2]\).
(b) Compute \(\mathbb{E}[S_2^3]\) using the general power moment formula.
(c) Is \(\mathbb{E}[S_t]\) increasing or decreasing in \(\sigma\)? Is the median \(S_0 e^{(\mu - \sigma^2/2)t}\) increasing or decreasing in \(\sigma\)? Explain the difference.
Solution to Exercise 2
GBM with \(\mu = 0.08\), \(\sigma = 0.3\), \(S_0 = 50\).
(a) At \(t = 2\):
(b) Using the general power moment formula \(\mathbb{E}[S_t^n] = S_0^n \exp[n\mu t + \frac{n(n-1)}{2}\sigma^2 t]\):
(c) The mean \(\mathbb{E}[S_t] = S_0 e^{\mu t}\) is independent of \(\sigma\) — it does not change as volatility increases. However, the median \(S_0 e^{(\mu - \sigma^2/2)t}\) is decreasing in \(\sigma\) because higher volatility increases the Ito correction \(\sigma^2/2\), which lowers the median growth rate.
The difference arises because GBM has multiplicative noise. Higher volatility makes the distribution more right-skewed: rare large upward moves (which drive the mean) become more extreme, while the typical outcome (the median) actually decreases. The mean remains constant because the increased probability mass in the right tail exactly compensates the reduction in the center of the distribution.
Exercise 3. For the Vasicek model \(dr_t = 0.8(0.05 - r_t)\,dt + 0.02\,dW_t\) with \(r_0 = 0.03\):
(a) Compute \(\mathbb{E}[r_1]\) and \(\operatorname{Var}[r_1]\).
(b) Find the stationary mean and variance.
(c) Compute the autocorrelation \(\rho(s, t)\) in the stationary regime for \(|t - s| = 2\).
Solution to Exercise 3
Vasicek model: \(a = 0.8\), \(\theta = 0.05\), \(\sigma = 0.02\), \(r_0 = 0.03\).
(a) At \(t = 1\):
(b) Stationary mean: \(\theta = 0.05\).
Stationary variance: \(\frac{\sigma^2}{2a} = \frac{0.0004}{1.6} = 2.5 \times 10^{-4}\).
(c) In the stationary regime, \(\rho(s, t) = e^{-a|t-s|}\). For \(|t - s| = 2\):
Exercise 4. For the CIR model \(dr_t = 0.5(0.04 - r_t)\,dt + 0.1\sqrt{r_t}\,dW_t\) with \(r_0 = 0.04\):
(a) Verify whether the Feller condition \(2a\theta \geq \sigma^2\) is satisfied.
(b) Compute \(\mathbb{E}[r_t]\) for general \(t\).
(c) Compute the long-term variance \(\lim_{t \to \infty} \operatorname{Var}[r_t]\) and compare it with the Vasicek long-term variance using the same \(a\), \(\theta\), and \(\sigma\).
Solution to Exercise 4
CIR model: \(a = 0.5\), \(\theta = 0.04\), \(\sigma = 0.1\), \(r_0 = 0.04\).
(a) The Feller condition requires \(2a\theta \geq \sigma^2\):
Since \(0.04 \geq 0.01\), the Feller condition is satisfied, so \(r_t > 0\) for all \(t > 0\).
(b) The expectation has the same formula as Vasicek:
Since \(r_0 = \theta = 0.04\), the mean is constant: \(\mathbb{E}[r_t] = 0.04\) for all \(t\).
(c) The CIR long-term variance is:
The Vasicek long-term variance with the same parameters would be:
The CIR variance (\(4 \times 10^{-4}\)) is smaller than the Vasicek variance (\(0.01\)) by a factor of \(\theta = 0.04\). This reflects the state-dependent volatility: when \(r_t\) is near \(\theta = 0.04\), the CIR diffusion coefficient \(\sigma\sqrt{r_t} = 0.1 \times 0.2 = 0.02\) is much smaller than the constant Vasicek diffusion \(\sigma = 0.1\), producing significantly less variability.
Exercise 5. Use Itô isometry to compute the variance of the stochastic integral
Solution to Exercise 5
By Ito isometry, for a deterministic integrand \(f(s)\):
Here \(f(s) = s\,e^{-s}\), so \(f(s)^2 = s^2 e^{-2s}\):
Integrating by parts twice (or using the formula \(\int s^2 e^{-2s}\,ds = -e^{-2s}(\frac{s^2}{2} + \frac{s}{2} + \frac{1}{4})\)):
Exercise 6. Consider the SDE \(dX_t = -\alpha X_t\,dt + \sigma X_t\,dW_t\) with \(X_0 > 0\).
(a) Derive an ODE for \(m(t) = \mathbb{E}[X_t]\) by taking expectations of the SDE.
(b) Solve the ODE to find \(m(t)\).
(c) Apply Itô's lemma to \(X_t^2\) to derive an ODE for \(\mathbb{E}[X_t^2]\), then compute \(\operatorname{Var}[X_t]\).
Solution to Exercise 6
The SDE is \(dX_t = -\alpha X_t\,dt + \sigma X_t\,dW_t\) with \(X_0 > 0\).
(a) Taking expectations of the SDE (the stochastic integral has zero expectation):
(b) This is a linear ODE with solution:
(c) Apply Ito's lemma to \(f(x) = x^2\) with \(f'(x) = 2x\) and \(f''(x) = 2\):
Taking expectations:
Solving: \(\mathbb{E}[X_t^2] = X_0^2\,e^{(-2\alpha + \sigma^2)t}\).
The variance is:
Exercise 7. The skewness of GBM at time \(t\) is given by \((e^{\sigma^2 t} + 2)\sqrt{e^{\sigma^2 t} - 1}\). Compute the skewness for \(\sigma = 0.2\) at \(t = 1\) and \(t = 10\). Explain why the distribution becomes more skewed over longer time horizons in terms of the multiplicative noise structure.
Solution to Exercise 7
The skewness formula is \(\operatorname{Skew}(S_t) = (e^{\sigma^2 t} + 2)\sqrt{e^{\sigma^2 t} - 1}\).
With \(\sigma = 0.2\), we have \(\sigma^2 = 0.04\).
At \(t = 1\): \(\sigma^2 t = 0.04\), so \(e^{0.04} \approx 1.04081\):
At \(t = 10\): \(\sigma^2 t = 0.4\), so \(e^{0.4} \approx 1.49182\):
The skewness increases from approximately \(0.614\) at \(t = 1\) to \(2.449\) at \(t = 10\).
The distribution becomes more skewed over longer horizons because of the multiplicative noise structure. In GBM, random shocks multiply the current price level. Over long periods, this compounding effect causes the distribution of \(S_t\) to become increasingly right-skewed: most paths cluster near or below the median (which grows at rate \(\mu - \sigma^2/2\)), while a small fraction of paths experience sustained positive shocks that push them far to the right. The mean is pulled upward by these rare extreme outcomes. As \(t\) grows, the cumulative effect of multiplicative compounding amplifies the asymmetry, causing both skewness and kurtosis to increase exponentially in \(\sigma^2 t\).