Skip to content

F-Distribution

The F-distribution arises naturally when comparing the variances of two independent normal populations. It is the sampling distribution of the ratio of two scaled chi-square random variables and serves as the foundation for analysis of variance (ANOVA) and F-tests in regression analysis.


Mathematical Definition

If \(X_1 \sim \chi^2(d_1)\) and \(X_2 \sim \chi^2(d_2)\) are independent chi-square random variables, then the ratio

\[ F = \frac{X_1 / d_1}{X_2 / d_2} \]

follows an F-distribution with \(d_1\) (numerator) and \(d_2\) (denominator) degrees of freedom, written \(F \sim F(d_1, d_2)\).

The probability density function is:

\[ f(x) = \frac{1}{B\!\left(\frac{d_1}{2},\, \frac{d_2}{2}\right)} \left(\frac{d_1}{d_2}\right)^{d_1/2} \frac{x^{\,d_1/2 - 1}}{\left(1 + \frac{d_1}{d_2}\,x\right)^{(d_1 + d_2)/2}}, \quad x \ge 0 \]

where \(B(\cdot,\cdot)\) is the beta function.

Usage in scipy.stats

The scipy.stats.f distribution object takes the numerator degrees of freedom dfn (\(d_1\)) and denominator degrees of freedom dfd (\(d_2\)):

import scipy.stats as stats
import numpy as np
import matplotlib.pyplot as plt

dfn, dfd = 5, 20
a = stats.f(dfn, dfd)

print(f"Mean: {a.mean():.4f}")        # d2/(d2-2) = 20/18
print(f"Variance: {a.var():.4f}")

x = np.linspace(0, 5, 200)
y_pdf = a.pdf(x)
y_cdf = a.cdf(x)

plt.plot(x, y_pdf, label='PDF')
plt.plot(x, y_cdf, label='CDF')
plt.legend()
plt.title(f'F-Distribution (d₁={dfn}, d₂={dfd})')
plt.xlabel('x')
plt.ylabel('Density / Probability')
plt.show()

The PDF is right-skewed with support on \([0, \infty)\). The CDF rises from 0 and approaches 1 as \(x\) increases.

Effect of Degrees of Freedom

As the degrees of freedom increase, the F-distribution becomes more concentrated around its mean and more symmetric:

import scipy.stats as stats
import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(0, 5, 200)
params = [(2, 5), (5, 10), (10, 20), (20, 40), (50, 100)]
for (d1, d2) in params:
    plt.plot(x, stats.f(d1, d2).pdf(x), label=f'd₁={d1}, d₂={d2}')

plt.legend()
plt.title('F-Distribution PDFs for Various Degrees of Freedom')
plt.xlabel('x')
plt.ylabel('f(x)')
plt.show()

Key Properties

  • Mean: \(E[F] = \dfrac{d_2}{d_2 - 2}\) for \(d_2 > 2\)
  • Variance: \(\text{Var}(F) = \dfrac{2\,d_2^2\,(d_1 + d_2 - 2)}{d_1\,(d_2 - 2)^2\,(d_2 - 4)}\) for \(d_2 > 4\)
  • Reciprocal property: If \(F \sim F(d_1, d_2)\), then \(1/F \sim F(d_2, d_1)\)
  • Relation to t-distribution: If \(T \sim t(d)\), then \(T^2 \sim F(1, d)\)

Parameters in scipy.stats

Parameter Symbol scipy.stats keyword Default
Numerator df \(d_1\) dfn (required)
Denominator df \(d_2\) dfd (required)
Location loc 0
Scale scale 1

Applications in Hypothesis Testing

The F-distribution is the basis for several important tests:

  • ANOVA: The F-statistic compares between-group variance to within-group variance. Under the null hypothesis that all group means are equal, the test statistic follows \(F(k-1, n-k)\) where \(k\) is the number of groups and \(n\) is the total sample size.
  • Regression F-test: Tests whether a set of regression coefficients are jointly zero.
  • Variance ratio test: Tests \(H_0\colon \sigma_1^2 = \sigma_2^2\) for two independent normal populations.

Critical values are obtained using the PPF:

alpha = 0.05
dfn, dfd = 3, 20
critical_value = stats.f(dfn, dfd).ppf(1 - alpha)
print(f"Critical value: {critical_value:.4f}")
# Reject H₀ if F-statistic > critical_value

Financial Applications

In finance, the F-distribution appears in testing whether portfolio risk models with different numbers of factors provide significantly different fits, in comparing the variances of returns across different market regimes, and in multivariate regression tests for asset pricing models such as the Fama-French framework.

Summary

The F-distribution is the sampling distribution of the ratio of two independent chi-square variables scaled by their degrees of freedom. In scipy.stats, use stats.f(dfn, dfd) to create a frozen distribution for computing PDFs, CDFs, critical values, and generating random samples.