Skip to content

CI for σ₁² / σ₂²

Confidence Interval for the Ratio of Two Variances

When comparing the variability of two independent populations, we construct a confidence interval for the ratio \(\theta = \sigma_1^2 / \sigma_2^2\). This is based on the F-distribution.

Formula

\[ \left[\frac{s_1^2 / s_2^2}{F_{\alpha/2,\, n_1-1,\, n_2-1}},\;\; \frac{s_1^2 / s_2^2}{F_{1-\alpha/2,\, n_1-1,\, n_2-1}}\right] \]

where

  • \(s_1^2\) and \(s_2^2\) are the sample variances (with Bessel's correction),
  • \(F_{\alpha/2, n_1-1, n_2-1}\) and \(F_{1-\alpha/2, n_1-1, n_2-1}\) are the critical values from the F-distribution with degrees of freedom \(\text{df}_1 = n_1 - 1\) and \(\text{df}_2 = n_2 - 1\).

Sampling Distribution

The pivotal quantity is

\[ \frac{s_1^2 / \sigma_1^2}{s_2^2 / \sigma_2^2} \sim F_{n_1-1, \, n_2-1} \]

This result holds exactly when both populations are normally distributed.

Conditions for Validity

\[ \text{F-interval for } \sigma_1^2/\sigma_2^2 \quad\text{if}\quad \begin{cases} \text{both population distributions are normal} \\ n_i \le 0.1 N_i \text{ for each group (IID approximation)} \end{cases} \]

Normality Requirement

Like the chi-square variance CI, the F-interval is exact only under Normality. Non-Normal data (skewed, heavy-tailed, or with outliers) can cause serious miscoverage. Consider bootstrap or robust alternatives for non-Normal data.

Python Code

import numpy as np
from scipy.stats import f

n1, n2 = 15, 12
alpha = 0.05

# Simulate samples
rng = np.random.default_rng(42)
x = rng.normal(loc=0, scale=1.0, size=n1)
y = rng.normal(loc=0, scale=1.5, size=n2)

s1_sq = x.var(ddof=1)
s2_sq = y.var(ddof=1)
rhat = s1_sq / s2_sq

df1, df2 = n1 - 1, n2 - 1
F_lo = f(dfn=df1, dfd=df2).ppf(alpha / 2.0)
F_hi = f(dfn=df1, dfd=df2).ppf(1 - alpha / 2.0)

ci_lower = rhat / F_hi
ci_upper = rhat / F_lo

print(f"95% CI for σ₁²/σ₂²: ({ci_lower:.4f}, {ci_upper:.4f})")

Simulation: Variance Ratio CI Coverage

#!/usr/bin/env python3
"""
F-interval simulation for θ = σ₁²/σ₂².
"""

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import f

rng_seed = None
n_simulations = 100
n1, n2 = 15, 12
mu1, mu2 = 0.0, 0.0
sigma1, sigma2 = 1.0, 1.5
alpha = 0.05


def main():
    if rng_seed is not None:
        np.random.seed(rng_seed)

    theta_true = (sigma1**2) / (sigma2**2)
    df1, df2 = n1 - 1, n2 - 1

    lowers = np.empty(n_simulations)
    uppers = np.empty(n_simulations)
    centers = np.empty(n_simulations)

    F_lo = f(dfn=df1, dfd=df2).ppf(alpha / 2.0)
    F_hi = f(dfn=df1, dfd=df2).ppf(1 - alpha / 2.0)

    for i in range(n_simulations):
        x = np.random.normal(loc=mu1, scale=sigma1, size=n1)
        y = np.random.normal(loc=mu2, scale=sigma2, size=n2)
        s1_sq = x.var(ddof=1)
        s2_sq = y.var(ddof=1)
        rhat = s1_sq / s2_sq
        lowers[i] = rhat / F_hi
        uppers[i] = rhat / F_lo
        centers[i] = rhat

    covered = (lowers <= theta_true) & (theta_true <= uppers)
    n_fail = (~covered).sum()
    coverage_pct = 100.0 * covered.mean()

    fig, ax = plt.subplots(figsize=(12, 12))
    for i in range(n_simulations):
        color = "k" if covered[i] else "r"
        ax.plot([lowers[i], uppers[i]], [i, i], lw=2, color=color)
        ax.plot(centers[i], i, marker="o", ms=3, color=color)

    ax.axvline(theta_true, linestyle="--", linewidth=1.5, color="r")
    ax.set_title(
        f"{n_simulations} F-intervals for σ₁²/σ₂² | n1={n1}, n2={n2}, "
        f"df=({df1},{df2}), CL={int((1 - alpha) * 100)}% | "
        f"Fail={n_fail} (Coverage ≈ {coverage_pct:.1f}%)")
    ax.set_yticks([])
    for sp in ["left", "right", "top"]:
        ax.spines[sp].set_visible(False)
    ax.set_xlabel("θ = σ₁² / σ₂²")
    plt.tight_layout()
    plt.show()


if __name__ == "__main__":
    main()

Key Points

  • The F-interval for \(\sigma_1^2 / \sigma_2^2\) requires both populations to be Normal.
  • If the CI includes 1, there is no evidence that the two population variances differ.
  • The F-distribution is asymmetric, so the CI is not centered symmetrically around the point estimate.
  • For non-Normal data, consider bootstrap-based alternatives.