Student's t Distribution¶
Overview¶
The Student's \(t\) distribution arises when estimating the mean of a normally distributed population using the sample standard deviation \(S\) instead of the known population standard deviation \(\sigma\). It accounts for the additional uncertainty introduced by estimating \(\sigma\).
Definition¶
Let \(Z \sim N(0,1)\) and \(V \sim \chi^2_d\) be independent. Then the ratio:
follows the Student's \(t\) distribution with \(d\) degrees of freedom.
Degrees of Freedom¶
The degrees of freedom \(d = n - 1\) reflects the number of independent pieces of information used to estimate the sample variance.
- Small \(d\): Heavier tails than the normal, reflecting greater uncertainty.
- Large \(d\) (\(> 30\)): Virtually indistinguishable from \(N(0, 1)\).
Properties¶
As \(d \to \infty\), the variance approaches 1 and \(t_d \to N(0, 1)\).
PDF¶
where \(B(\cdot, \cdot)\) is the Beta function.
Proof Sketch¶
With \(T = Z / \sqrt{V/d}\), use the change-of-variables technique on the joint density of \((Z, V)\). The Jacobian factor is \(\sqrt{v/d}\), and after integrating out the \(\chi^2\) variable, the marginal density of \(T\) takes the form above. The conditional distribution \(V | T = t\) turns out to be Gamma.
Fat Tails¶
The \(t\) distribution has heavier tails than the normal distribution, meaning extreme values are more likely:
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as stats
fig, (ax_full, ax_tail) = plt.subplots(1, 2, figsize=(12, 3))
x = np.linspace(-4, 4, 200)
ax_full.plot(x, stats.norm().pdf(x), label='Normal')
ax_full.plot(x, stats.t(df=10).pdf(x), label='t(10)')
ax_full.set_title('Full PDF')
ax_full.legend()
ax_tail.plot(x[-50:], stats.norm().pdf(x[-50:]), label='Normal')
ax_tail.plot(x[-50:], stats.t(df=10).pdf(x[-50:]), label='t(10)')
ax_tail.set_title('Right Tail (zoomed)')
ax_tail.legend()
plt.tight_layout()
plt.show()
Convergence to Normal¶
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as stats
fig, ax = plt.subplots(figsize=(12, 3))
x = np.linspace(-3, 3, 200)
for df in [1, 2, 5, 10, 20]:
ax.plot(x, stats.t(df).pdf(x), label=f'df={df}')
ax.plot(x, stats.norm().pdf(x), 'r--', lw=2, label='Normal')
ax.legend()
ax.set_title('t-Distribution Converges to Normal as df Increases')
plt.show()
Why t?¶
When the population is normal and \(\sigma\) is unknown, replacing \(\sigma\) with \(S\) yields:
This arises because:
- \(\bar{X} \sim N(\mu, \sigma^2/n)\), so \(\frac{\bar{X} - \mu}{\sigma/\sqrt{n}} \sim N(0,1)\).
- \(\frac{(n-1)S^2}{\sigma^2} \sim \chi^2_{n-1}\).
- \(\bar{X}\) and \(S^2\) are independent (a special property of the normal distribution).
- The ratio \(\frac{N(0,1)}{\sqrt{\chi^2_{n-1}/(n-1)}}\) is by definition \(t_{n-1}\).
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
np.random.seed(0)
n, mu, sigma = 10, 0, 10
n_sim = 10_000
samples = np.random.normal(mu, sigma, (n, n_sim))
x_bar = samples.mean(axis=0)
s = samples.std(axis=0, ddof=1)
t_stats = (x_bar - mu) / (s / np.sqrt(n))
fig, ax = plt.subplots(figsize=(12, 3))
bins = np.arange(-6, 6, 0.1)
ax.hist(t_stats, bins=bins, density=True, alpha=0.7, label=f'Simulated $t_{{{n-1}}}$')
ax.plot(bins, stats.t(n-1).pdf(bins), '--r', lw=2, label=f'$t_{{{n-1}}}$ PDF')
ax.legend()
ax.spines[['top', 'right']].set_visible(False)
plt.show()
Interpreting the Role of t¶
Large n: CLT Justifies z¶
When \(n\) is large, \(S \approx \sigma\), and the difference between \(t_{n-1}\) and \(N(0,1)\) is negligible. In practice, \(z\) is just as good.
Small n: Where t Shines — But Only Under Normality¶
The \(t\) distribution matters most for small \(n\). Its heavier tails properly account for the extra variability from using \(S\) instead of \(\sigma\). However, this result is exact only if the population is normal.
Non-Normal Populations¶
If the population is skewed or heavy-tailed, the \(t\) approximation is poor for small \(n\). Neither \(t\) nor \(z\) is trustworthy; robust or nonparametric methods are preferable.
Summary¶
| Scenario | Recommendation |
|---|---|
| Large \(n\) | Use \(z\); the \(t\) adjustment is negligible |
| Small \(n\), normal population | \(t\) is exact and appropriate |
| Small \(n\), non-normal population | Use robust/nonparametric methods |
Random Samples¶
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
np.random.seed(0)
df = 5
data = stats.t(df).rvs(10_000)
fig, ax = plt.subplots(figsize=(12, 3))
bins = np.linspace(-5, 5, 101)
ax.hist(data, bins=bins, density=True, histtype='step', label='t Samples')
ax.plot(bins, stats.t(df).pdf(bins), '--b', lw=2, label='t PDF')
ax.plot(bins, stats.norm(data.mean(), data.std()).pdf(bins),
'--r', lw=2, label='Normal Approx')
ax.legend()
plt.show()
Key Takeaways¶
- The \(t\) distribution accounts for the uncertainty of estimating \(\sigma\) with \(S\).
- It has heavier tails than the normal, especially for small degrees of freedom.
- As \(d \to \infty\), the \(t\) distribution converges to \(N(0,1)\).
- The exactness of the \(t\) result depends critically on the normality of the population.