t-Distribution¶
The t-distribution (Student's t-distribution) arises when estimating the mean of a normally distributed population using a small sample with unknown variance. It has heavier tails than the standard normal distribution, which accounts for the additional uncertainty introduced by estimating the variance from the data. The t-distribution is the basis for t-tests and confidence intervals in classical statistics.
Mental Model
The t-distribution is a "cautious normal." When you estimate variance from a small sample, there is extra uncertainty, so the distribution has heavier tails -- extreme values are more probable than a normal would predict. As the sample size (degrees of freedom) grows, this extra uncertainty vanishes and the t-distribution converges to the standard normal.
Mathematical Definition¶
If \(Z \sim N(0, 1)\) and \(V \sim \chi^2(\nu)\) are independent, then the random variable
follows a t-distribution with \(\nu\) degrees of freedom, written \(T \sim t(\nu)\).
The probability density function is:
where \(\Gamma(\cdot)\) is the gamma function and \(\nu > 0\) is the degrees of freedom parameter.
Usage in scipy.stats¶
The scipy.stats.t distribution object takes the degrees of freedom df (\(\nu\)):
```python import scipy.stats as stats import numpy as np import matplotlib.pyplot as plt
df = 5 a = stats.t(df)
print(f"Mean: {a.mean():.4f}") # 0 (for ν > 1) print(f"Variance: {a.var():.4f}") # ν/(ν-2) = 5/3
x = np.linspace(-5, 5, 200) y_pdf = a.pdf(x) y_cdf = a.cdf(x)
plt.plot(x, y_pdf, label='PDF') plt.plot(x, y_cdf, label='CDF') plt.legend() plt.title(f't-Distribution (ν={df})') plt.xlabel('x') plt.ylabel('Density / Probability') plt.show() ```
Comparison with the Normal Distribution¶
The t-distribution is symmetric about zero like the standard normal, but it has heavier tails. As \(\nu\) increases, the t-distribution converges to the standard normal:
```python import scipy.stats as stats import numpy as np import matplotlib.pyplot as plt
x = np.linspace(-5, 5, 200) plt.plot(x, stats.norm.pdf(x), 'k--', linewidth=2, label='N(0,1)') for df in [1, 3, 5, 10, 30]: plt.plot(x, stats.t(df).pdf(x), label=f'ν={df}')
plt.legend() plt.title('t-Distribution PDFs vs Standard Normal') plt.xlabel('x') plt.ylabel('f(x)') plt.show() ```
The case \(\nu = 1\) is the Cauchy distribution, which has such heavy tails that neither the mean nor the variance exists.
Key Properties¶
- Mean: \(E[T] = 0\) for \(\nu > 1\) (undefined for \(\nu \le 1\))
- Variance: \(\text{Var}(T) = \dfrac{\nu}{\nu - 2}\) for \(\nu > 2\) (infinite for \(1 < \nu \le 2\))
- Symmetry: The distribution is symmetric about 0
- Heavy tails: For any finite \(\nu\), the tails decay as a power law \(|x|^{-(\nu+1)}\), slower than the exponential decay of the normal
- Convergence: As \(\nu \to \infty\), \(t(\nu) \to N(0, 1)\)
Special Cases¶
- \(\nu = 1\): Cauchy distribution (no finite moments)
- \(\nu = \infty\): standard normal distribution
Parameters in scipy.stats¶
| Parameter | Symbol | scipy.stats keyword |
Default |
|---|---|---|---|
| Degrees of freedom | \(\nu\) | df |
(required) |
| Location | — | loc |
0 |
| Scale | — | scale |
1 |
Applications in Hypothesis Testing¶
The t-distribution is used whenever a test statistic involves an estimated standard deviation:
- One-sample t-test: Testing \(H_0\colon \mu = \mu_0\) using \(T = (\bar{X} - \mu_0)/(S/\sqrt{n})\), which follows \(t(n-1)\) under \(H_0\)
- Two-sample t-test: Comparing means of two independent groups
- Confidence intervals: A 95% confidence interval for the mean is \(\bar{X} \pm t_{0.025,\,n-1} \cdot S/\sqrt{n}\)
Critical values are obtained using the PPF:
python
alpha = 0.05
df = 20
t_critical = stats.t(df).ppf(1 - alpha / 2)
print(f"Two-sided critical value: ±{t_critical:.4f}")
Financial Applications¶
In finance, the t-distribution is used in modeling asset returns when normal tails are too thin to capture extreme events. The Student-t copula models joint tail dependence between assets. Risk measures such as Value at Risk use the t-distribution for more conservative tail estimates, and GARCH models with t-distributed innovations provide better fits to financial return series.
Summary¶
The t-distribution arises from the ratio of a standard normal to a scaled chi-square variable and has heavier tails than the normal. In scipy.stats, use stats.t(df) to create a frozen distribution for computing PDFs, CDFs, critical values, and generating random samples.
Exercises¶
Exercise 1. Plot the PDF of the \(t\)-distribution for \(\nu = 2, 5, 30\) and the standard normal on the same axes over \([-4, 4]\). Observe how the \(t\)-distribution approaches the normal as \(\nu\) increases.
Solution to Exercise 1
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
x = np.linspace(-4, 4, 200)
for df in [2, 5, 30]:
plt.plot(x, stats.t.pdf(x, df), label=f't(df={df})')
plt.plot(x, stats.norm.pdf(x), 'k--', label='Normal')
plt.legend()
plt.title('t-distribution vs Normal')
plt.show()
Exercise 2. For a \(t\)-distribution with 10 degrees of freedom, compute the probability \(P(|T| > 2)\). Compare this with the corresponding probability under the standard normal.
Solution to Exercise 2
from scipy import stats
p_t = 2 * stats.t.sf(2, df=10)
p_norm = 2 * stats.norm.sf(2)
print(f"P(|T|>2), t(10): {p_t:.4f}")
print(f"P(|Z|>2), Normal: {p_norm:.4f}")
Exercise 3. Compute the 95th percentile of the \(t\)-distribution for \(\nu = 5, 10, 30, 100\) and the standard normal. Show that the \(t\) percentile converges toward the normal percentile as \(\nu\) grows.
Solution to Exercise 3
from scipy import stats
for df in [5, 10, 30, 100]:
print(f"t(df={df:3d}) 95th percentile: {stats.t.ppf(0.95, df):.4f}")
print(f"Normal 95th percentile: {stats.norm.ppf(0.95):.4f}")