F-Distribution¶
The F-distribution arises naturally when comparing the variances of two independent normal populations. It is the sampling distribution of the ratio of two scaled chi-square random variables and serves as the foundation for analysis of variance (ANOVA) and F-tests in regression analysis.
Mathematical Definition¶
If \(X_1 \sim \chi^2(d_1)\) and \(X_2 \sim \chi^2(d_2)\) are independent chi-square random variables, then the ratio
follows an F-distribution with \(d_1\) (numerator) and \(d_2\) (denominator) degrees of freedom, written \(F \sim F(d_1, d_2)\).
The probability density function is:
where \(B(\cdot,\cdot)\) is the beta function.
Usage in scipy.stats¶
The scipy.stats.f distribution object takes the numerator degrees of freedom dfn (\(d_1\)) and denominator degrees of freedom dfd (\(d_2\)):
import scipy.stats as stats
import numpy as np
import matplotlib.pyplot as plt
dfn, dfd = 5, 20
a = stats.f(dfn, dfd)
print(f"Mean: {a.mean():.4f}") # d2/(d2-2) = 20/18
print(f"Variance: {a.var():.4f}")
x = np.linspace(0, 5, 200)
y_pdf = a.pdf(x)
y_cdf = a.cdf(x)
plt.plot(x, y_pdf, label='PDF')
plt.plot(x, y_cdf, label='CDF')
plt.legend()
plt.title(f'F-Distribution (d₁={dfn}, d₂={dfd})')
plt.xlabel('x')
plt.ylabel('Density / Probability')
plt.show()
The PDF is right-skewed with support on \([0, \infty)\). The CDF rises from 0 and approaches 1 as \(x\) increases.
Effect of Degrees of Freedom¶
As the degrees of freedom increase, the F-distribution becomes more concentrated around its mean and more symmetric:
import scipy.stats as stats
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 5, 200)
params = [(2, 5), (5, 10), (10, 20), (20, 40), (50, 100)]
for (d1, d2) in params:
plt.plot(x, stats.f(d1, d2).pdf(x), label=f'd₁={d1}, d₂={d2}')
plt.legend()
plt.title('F-Distribution PDFs for Various Degrees of Freedom')
plt.xlabel('x')
plt.ylabel('f(x)')
plt.show()
Key Properties¶
- Mean: \(E[F] = \dfrac{d_2}{d_2 - 2}\) for \(d_2 > 2\)
- Variance: \(\text{Var}(F) = \dfrac{2\,d_2^2\,(d_1 + d_2 - 2)}{d_1\,(d_2 - 2)^2\,(d_2 - 4)}\) for \(d_2 > 4\)
- Reciprocal property: If \(F \sim F(d_1, d_2)\), then \(1/F \sim F(d_2, d_1)\)
- Relation to t-distribution: If \(T \sim t(d)\), then \(T^2 \sim F(1, d)\)
Parameters in scipy.stats¶
| Parameter | Symbol | scipy.stats keyword |
Default |
|---|---|---|---|
| Numerator df | \(d_1\) | dfn |
(required) |
| Denominator df | \(d_2\) | dfd |
(required) |
| Location | — | loc |
0 |
| Scale | — | scale |
1 |
Applications in Hypothesis Testing¶
The F-distribution is the basis for several important tests:
- ANOVA: The F-statistic compares between-group variance to within-group variance. Under the null hypothesis that all group means are equal, the test statistic follows \(F(k-1, n-k)\) where \(k\) is the number of groups and \(n\) is the total sample size.
- Regression F-test: Tests whether a set of regression coefficients are jointly zero.
- Variance ratio test: Tests \(H_0\colon \sigma_1^2 = \sigma_2^2\) for two independent normal populations.
Critical values are obtained using the PPF:
alpha = 0.05
dfn, dfd = 3, 20
critical_value = stats.f(dfn, dfd).ppf(1 - alpha)
print(f"Critical value: {critical_value:.4f}")
# Reject H₀ if F-statistic > critical_value
Financial Applications¶
In finance, the F-distribution appears in testing whether portfolio risk models with different numbers of factors provide significantly different fits, in comparing the variances of returns across different market regimes, and in multivariate regression tests for asset pricing models such as the Fama-French framework.
Summary¶
The F-distribution is the sampling distribution of the ratio of two independent chi-square variables scaled by their degrees of freedom. In scipy.stats, use stats.f(dfn, dfd) to create a frozen distribution for computing PDFs, CDFs, critical values, and generating random samples.