Bessel's Correction¶

Introduction¶

Bessel's correction refers to the use of \(n-1\) instead of \(n\) in the denominator of the sample variance formula, yielding an unbiased estimator of the population variance. Named after Friedrich Bessel, this correction accounts for the fact that estimating the mean from the same data "uses up" one degree of freedom, causing the naive estimator (dividing by \(n\)) to systematically underestimate the true variance.

Definition¶

The Bessel-corrected sample variance is:

\[S^2 = \frac{1}{n-1}\sum_{i=1}^n (X_i - \bar{X})^2\]

This is the standard "sample variance" used in most statistical software (e.g., numpy.var(ddof=1), R's var()).

Unbiasedness Proof¶

Main Result¶

\[E[S^2] = \sigma^2\]

Proof¶

Starting from the key identity:

\[\sum_{i=1}^n (X_i - \bar{X})^2 = \sum_{i=1}^n (X_i - \mu)^2 - n(\bar{X} - \mu)^2\]

Taking expectations:

\[E\left[\sum_{i=1}^n (X_i - \bar{X})^2\right] = n\sigma^2 - n \cdot \frac{\sigma^2}{n} = (n-1)\sigma^2\]

Therefore:

\[E\left[\frac{1}{n-1}\sum_{i=1}^n (X_i - \bar{X})^2\right] = \frac{(n-1)\sigma^2}{n-1} = \sigma^2\]

Why n-1? The Degrees of Freedom Argument¶

The \(n\) deviations \(d_i = X_i - \bar{X}\) are subject to the constraint:

\[\sum_{i=1}^n d_i = \sum_{i=1}^n (X_i - \bar{X}) = 0\]

This means only \(n-1\) of the deviations are free to vary independently. We say there are \(n-1\) degrees of freedom. Dividing by the degrees of freedom (\(n-1\)) instead of the number of observations (\(n\)) corrects the bias.

General principle: When estimating a variance using \(k\) estimated parameters, divide by \(n - k\): - Mean unknown, variance of \(X\): divide by \(n - 1\) - Regression with \(p\) coefficients: residual variance uses \(n - p\)

Distribution of S-squared¶

For Normal Populations¶

If \(X_i \sim N(\mu, \sigma^2)\), then:

\[\frac{(n-1)S^2}{\sigma^2} = \frac{\sum_{i=1}^n (X_i - \bar{X})^2}{\sigma^2} \sim \chi^2_{n-1}\]

This exact distributional result gives us:

\[E[S^2] = \sigma^2 \cdot \frac{n-1}{n-1} = \sigma^2 \quad \text{(confirming unbiasedness)}\]

\[\text{Var}(S^2) = \frac{2\sigma^4}{n-1}\]

Independence of X-bar and S-squared¶

For normal populations, \(\bar{X}\) and \(S^2\) are independent. This is a remarkable property unique to the normal distribution (by Cochran's theorem) and is crucial for the derivation of the \(t\)-distribution used in hypothesis testing.

Properties¶

Variance and MSE¶

For normal populations:

\[\text{Var}(S^2) = \frac{2\sigma^4}{n-1}\]

\[\text{MSE}(S^2) = \text{Var}(S^2) = \frac{2\sigma^4}{n-1} \quad \text{(since bias = 0)}\]

Consistency¶

\(S^2\) is consistent for \(\sigma^2\):

\[S^2 \xrightarrow{p} \sigma^2 \quad \text{as } n \to \infty\]

MSE Comparison with Alternatives¶

For normal populations:

\[\text{MSE}(S^2) = \frac{2}{n-1}\sigma^4 > \frac{2n-1}{n^2}\sigma^4 = \text{MSE}(\tilde{S}^2)\]

The unbiased estimator has higher MSE than the biased naive estimator. This is because unbiasedness comes at the cost of increased variance, and the variance increase outweighs the bias reduction (in MSE terms).

Practical Significance¶

When Does It Matter?¶

The difference between dividing by \(n\) and \(n-1\):

\(n\)	\((n-1)/n\)	Relative error
3	0.667	33.3%
5	0.800	20.0%
10	0.900	10.0%
30	0.967	3.3%
100	0.990	1.0%
1000	0.999	0.1%

For \(n > 30\), the practical difference is small. For \(n < 10\), the correction is substantial.

Standard Deviation Bias¶

While \(S^2\) is unbiased for \(\sigma^2\), the sample standard deviation \(S = \sqrt{S^2}\) is not unbiased for \(\sigma\). By Jensen's inequality (since \(\sqrt{\cdot}\) is concave):

\[E[S] = E[\sqrt{S^2}] < \sqrt{E[S^2]} = \sigma\]

For normal populations:

\[E[S] = \sigma \cdot \sqrt{\frac{2}{n-1}} \cdot \frac{\Gamma(n/2)}{\Gamma((n-1)/2)}\]

The correction factor \(c_4 = \sqrt{2/(n-1)} \cdot \Gamma(n/2)/\Gamma((n-1)/2)\) can be used to obtain an unbiased estimator of \(\sigma\): \(\hat{\sigma} = S/c_4\).

Software Implementation¶

Different software has different defaults:

Software	`var()` default	Divisor
Python `numpy.var()`	\(n\) (population)	`ddof=0`
Python `numpy.var(ddof=1)`	\(n-1\) (sample)	`ddof=1`
R `var()`	\(n-1\) (sample)	—
Excel `VAR.S()`	\(n-1\) (sample)	—
Excel `VAR.P()`	\(n\) (population)	—
pandas `.var()`	\(n-1\) (sample)	`ddof=1`

Common pitfall: Using numpy.var() without ddof=1 gives the biased (naive) estimator. Always specify ddof=1 when you want the unbiased sample variance.

Generalization: Degrees of Freedom in Regression¶

In linear regression \(Y = X\beta + \epsilon\), the residual variance estimator is:

\[\hat{\sigma}^2 = \frac{1}{n-p}\sum_{i=1}^n (Y_i - \hat{Y}_i)^2 = \frac{\text{RSS}}{n-p}\]

where \(p\) is the number of estimated coefficients. This generalizes Bessel's correction: we lose one degree of freedom for each estimated parameter.

Connections to Finance¶

Realized volatility: When computing daily realized volatility from intraday returns, the choice of \(n\) vs \(n-1\) is often irrelevant (many observations). But for monthly volatility from daily data (~21 observations), the correction matters.
Tracking error: Computing tracking error of a portfolio vs. benchmark uses \(S = \sqrt{\frac{1}{n-1}\sum(r_p - r_b)^2}\) with Bessel's correction.
Risk budgeting: Variance decomposition in portfolio risk uses the unbiased covariance matrix, which divides by \(n-1\).

Summary¶

Bessel's correction (\(n-1\) in the denominator) produces an unbiased estimator of the population variance. The correction compensates for the "lost" degree of freedom from estimating the mean. While the unbiased estimator has higher MSE than the naive one, unbiasedness is often preferred for its theoretical properties and is the standard in most statistical software. For large samples, the choice between \(n\) and \(n-1\) is inconsequential.

Key Formulas¶

Quantity	Formula
Bessel-corrected variance	\(S^2 = \frac{1}{n-1}\sum(X_i - \bar{X})^2\)
Unbiasedness	\(E[S^2] = \sigma^2\)
Distribution (Normal)	\((n-1)S^2/\sigma^2 \sim \chi^2_{n-1}\)
Variance (Normal)	\(\text{Var}(S^2) = 2\sigma^4/(n-1)\)
\(S\) is biased for \(\sigma\)	\(E[S] < \sigma\) (Jensen's inequality)
General regression	\(\hat{\sigma}^2 = \text{RSS}/(n-p)\)