Chapter 15: Variance Tests¶
Overview¶
Variance tests evaluate whether observed differences in variability across groups or populations are statistically significant. These tests are fundamental tools in statistical inference, playing a critical role in validating assumptions for other methods such as ANOVA and regression analysis. This chapter covers a comprehensive suite of variance testing methods, ranging from classical parametric approaches (chi-square, F-test, Bartlett's) through robust alternatives (Levene's, Brown-Forsythe, Fligner-Killeen) to advanced computational and Bayesian techniques.
Chapter Structure¶
15.1 Introduction to Variance Testing¶
Motivation and overview of the variance testing landscape:
- Why Test Variances -- Explains the practical importance of variance testing for checking homoscedasticity in ANOVA, validating regression assumptions, and comparing volatility across financial instruments.
- Overview of Variance Tests -- Surveys the major variance tests (chi-square, F-test, Bartlett's, Levene's, Brown-Forsythe, Fligner-Killeen), including their test statistics, distributional assumptions, and appropriate use cases.
- Assumptions Common to Variance Tests -- Discusses the shared requirements of independence, random sampling, and (for parametric tests) normality, and the consequences of violating these assumptions.
15.2 Chi-Square Test for Variance¶
A one-sample test for whether a population variance equals a specified value:
- One-Sample Chi-Square Variance Test -- Tests \(H_0: \sigma^2 = \sigma_0^2\) using the statistic \(\chi^2 = (n-1)s^2 / \sigma_0^2\), with one-tailed and two-tailed formulations and worked examples.
- Derivation and Distribution Theory -- Shows how the test statistic arises from the distribution of the sample variance under normality, connecting to the chi-square distribution with \(n-1\) degrees of freedom.
- Confidence Interval for \(\sigma^2\) -- Inverts the chi-square test to construct confidence intervals for the population variance and standard deviation.
15.3 F-Test for Comparing Two Variances¶
A two-sample test based on the ratio of sample variances:
- Two-Sample F-Test -- Tests \(H_0: \sigma_1^2 = \sigma_2^2\) using the statistic \(F = s_1^2 / s_2^2\), which follows an \(F\)-distribution under the null hypothesis when both populations are normal.
- F-Distribution and Degrees of Freedom -- Details the properties of the \(F\)-distribution and how the numerator and denominator degrees of freedom are determined.
- Sensitivity to Non-Normality -- Warns that the F-test is highly sensitive to departures from normality, often producing inflated Type I error rates with non-normal data.
15.4 Bartlett's Test¶
A multi-group parametric test for homogeneity of variances:
- Bartlett's Test for Equality of Variances -- Tests \(H_0: \sigma_1^2 = \sigma_2^2 = \cdots = \sigma_k^2\) using a pooled-variance-based statistic that follows a \(\chi^2_{k-1}\) distribution under normality.
- Derivation and Chi-Square Approximation -- Derives the Bartlett test statistic from the ratio of pooled to individual variances with the correction factor for small samples.
- Limitations Under Non-Normality -- Emphasizes that Bartlett's test is highly sensitive to non-normality, making it unreliable when the normality assumption is questionable.
15.5 Robust Tests¶
Distribution-free and outlier-resistant alternatives for comparing variances:
- Levene's Test -- Tests equality of variances by performing a one-way ANOVA on the absolute deviations of observations from their group means; robust to moderate departures from normality.
- Brown-Forsythe Test -- A variant of Levene's test that uses deviations from group medians instead of means, providing additional robustness to skewed distributions and outliers.
- Fligner-Killeen Test -- A non-parametric rank-based test that uses ranks of absolute deviations from group medians, offering the strongest robustness among the three methods.
- Comparison of Robust Methods -- Side-by-side evaluation of Levene's, Brown-Forsythe, and Fligner-Killeen in terms of Type I error control, power, and robustness across different distribution shapes.
15.6 Advanced Methods¶
Computational and Bayesian approaches for variance comparison:
- Bootstrap Variance Testing -- Uses resampling to construct a null distribution for variance ratios or differences, providing valid inference without distributional assumptions.
- Bayesian Variance Testing -- Employs prior distributions on variance parameters (typically inverse-gamma) to compute posterior probabilities and Bayes factors for variance hypotheses.
- Likelihood Ratio Test for Variances -- Compares the maximized likelihoods under the null and alternative hypotheses, providing an asymptotically chi-squared test statistic.
15.7 Applications¶
Practical use cases where variance testing is essential:
- Pre-Test for ANOVA Homoscedasticity -- Uses Levene's or Bartlett's test to verify the equal-variance assumption before conducting ANOVA, with Welch's ANOVA as a fallback when variances differ.
- Variance Testing in Regression -- Applies variance tests (e.g., Breusch-Pagan) to regression residuals to detect heteroscedasticity, with remedies including weighted least squares and robust standard errors.
- Financial Volatility Comparisons -- Compares the volatility of different assets, portfolios, or time periods to assess risk differences and inform portfolio construction decisions.
15.8 Code¶
Complete Python implementations:
- Chi-Squared Test for Variance -- One-sample variance test with critical values and \(p\)-value computation.
- F-Test of Equality of Variances -- Two-sample F-test implementation with visualization of the rejection region.
- Bartlett's Test -- Multi-group homogeneity test using scipy and manual computation.
- Levene's Test -- Robust variance equality test with mean-based deviations.
- Chi-Square Distribution -- Visualization of the chi-square distribution for different degrees of freedom.
- Robust Variance Tests Comparison -- Side-by-side comparison of Levene, Brown-Forsythe, and Fligner-Killeen on the same data.
- F-Test Tail Region Visualization -- Plots the F-distribution with shaded rejection regions.
- F-Test Normality Sensitivity and Robust Alternatives -- Simulation showing how the F-test's Type I error inflates under non-normality, with robust alternatives performing correctly.
- F-Test Power Simulation -- Monte Carlo study of F-test power as a function of variance ratio and sample size.
- Bartlett Test Non-Normality Sensitivity -- Simulation demonstrating Bartlett's poor performance with skewed or heavy-tailed data.
- Levene Test Normal vs Skewed Simulation -- Compares Levene's test performance across normal and non-normal distributions.
- Brown-Forsythe Test (scipy) -- Implementation using scipy's median-based Levene variant.
- Fligner-Killeen Test (scipy) -- Non-parametric rank-based variance test.
- Bootstrap Variance Test -- Resampling-based approach for comparing variances without distributional assumptions.
- Bayesian Variance Test -- Posterior inference on variance parameters using conjugate priors.
15.9 Exercises¶
Practice problems covering F-test computation and interpretation, Levene's test application, comparison of classical and robust test results, chi-square confidence intervals for variance, and practical decision-making when test results conflict.
Prerequisites¶
This chapter builds on:
- Chapter 5 (Sampling Distributions) -- The chi-square, \(F\), and normal distributions and their roles in inference about variances.
- Chapter 8 (Confidence Intervals) -- Confidence interval construction, particularly intervals for \(\sigma^2\) using the chi-square distribution.
- Chapter 9 (Hypothesis Testing) -- The general framework of null and alternative hypotheses, test statistics, \(p\)-values, and decision rules.
- Chapter 14 (Normality Tests) -- Methods for checking whether the normality assumption required by classical variance tests is satisfied.
Key Takeaways¶
- Classical variance tests (chi-square, F-test, Bartlett's) are powerful under normality but highly sensitive to non-normal data, often producing misleading results when the normality assumption is violated.
- Robust alternatives (Levene's, Brown-Forsythe, Fligner-Killeen) maintain proper Type I error rates across a wider range of distributions and should be preferred when normality is uncertain.
- The Brown-Forsythe test (median-based Levene) offers the best balance of robustness and power for most practical applications, while Fligner-Killeen provides the strongest non-parametric guarantees.
- Bootstrap and Bayesian methods provide modern alternatives that avoid distributional assumptions entirely or incorporate prior information, respectively.
- Variance testing is not just an end in itself -- it serves as a prerequisite check for ANOVA (homoscedasticity), regression (constant error variance), and financial analysis (volatility comparison), making it an essential step in many analysis workflows.