Naive Variance Estimator¶

Introduction¶

The naive variance estimator divides the sum of squared deviations by \(n\) (the sample size) rather than by \(n-1\). While this is the most intuitive approach — simply averaging the squared deviations from the sample mean — it turns out to be biased. Understanding why it is biased provides deep insight into the nature of estimation and motivates Bessel's correction.

Definition¶

Given a random sample \(X_1, X_2, \ldots, X_n\) with sample mean \(\bar{X}\), the naive variance estimator is:

\[\tilde{S}^2 = \frac{1}{n}\sum_{i=1}^n (X_i - \bar{X})^2\]

This is also called the population variance formula applied to the sample, the biased sample variance, or the MLE of variance (for normal populations).

Bias Derivation¶

The Key Identity¶

The fundamental identity underlying the bias calculation is:

\[\sum_{i=1}^n (X_i - \bar{X})^2 = \sum_{i=1}^n (X_i - \mu)^2 - n(\bar{X} - \mu)^2\]

Proof: Expand \((X_i - \bar{X})^2 = (X_i - \mu - (\bar{X} - \mu))^2\):

\[\sum_{i=1}^n (X_i - \bar{X})^2 = \sum_{i=1}^n(X_i - \mu)^2 - 2(\bar{X} - \mu)\sum_{i=1}^n(X_i - \mu) + n(\bar{X} - \mu)^2\]

Since \(\sum(X_i - \mu) = n(\bar{X} - \mu)\), the middle term is \(-2n(\bar{X} - \mu)^2\):

\[= \sum_{i=1}^n(X_i - \mu)^2 - n(\bar{X} - \mu)^2\]

Computing the Expectation¶

Taking expectations:

\[E\left[\sum_{i=1}^n (X_i - \bar{X})^2\right] = \sum_{i=1}^n E[(X_i - \mu)^2] - nE[(\bar{X} - \mu)^2]\]

\[= n\sigma^2 - n \cdot \frac{\sigma^2}{n} = n\sigma^2 - \sigma^2 = (n-1)\sigma^2\]

Therefore:

\[E[\tilde{S}^2] = E\left[\frac{1}{n}\sum_{i=1}^n(X_i - \bar{X})^2\right] = \frac{n-1}{n}\sigma^2\]

The Bias¶

\[\text{Bias}(\tilde{S}^2) = E[\tilde{S}^2] - \sigma^2 = \frac{n-1}{n}\sigma^2 - \sigma^2 = -\frac{\sigma^2}{n}\]

The naive estimator underestimates the true variance by a factor of \((n-1)/n\).

Intuition: Why the Bias Exists¶

The bias arises because we use \(\bar{X}\) instead of \(\mu\) in the sum of squares. Since \(\bar{X}\) is the value that minimizes \(\sum(X_i - c)^2\) over all constants \(c\), we have:

\[\sum_{i=1}^n (X_i - \bar{X})^2 \leq \sum_{i=1}^n (X_i - \mu)^2\]

The sum of squared deviations from \(\bar{X}\) is always less than or equal to the sum from the true mean \(\mu\). By using \(\bar{X}\), we systematically undercount the variability, leading to downward bias.

Another way to see it: computing \(\bar{X}\) "uses up" one piece of information from the data. The \(n\) deviations \((X_i - \bar{X})\) satisfy \(\sum(X_i - \bar{X}) = 0\), so only \(n-1\) of them are free to vary. There are only \(n-1\) degrees of freedom, not \(n\).

Properties¶

Variance of S-tilde-squared¶

For normal populations:

\[\text{Var}(\tilde{S}^2) = \frac{2(n-1)}{n^2}\sigma^4\]

MSE of S-tilde-squared¶

\[\text{MSE}(\tilde{S}^2) = \text{Var}(\tilde{S}^2) + [\text{Bias}(\tilde{S}^2)]^2 = \frac{2(n-1)}{n^2}\sigma^4 + \frac{\sigma^4}{n^2} = \frac{2n-1}{n^2}\sigma^4\]

Consistency¶

Despite being biased, \(\tilde{S}^2\) is consistent:

\[\tilde{S}^2 = \frac{n-1}{n} S^2 \xrightarrow{p} \sigma^2\]

since \((n-1)/n \to 1\) and \(S^2 \xrightarrow{p} \sigma^2\).

Asymptotic Equivalence¶

For large \(n\), \(\tilde{S}^2\) and \(S^2\) are practically identical:

\[\tilde{S}^2 = \frac{n-1}{n}S^2 \approx S^2 \quad \text{for large } n\]

The bias \(-\sigma^2/n \to 0\), and the ratio \((n-1)/n \to 1\).

Comparison: Divide by n vs n-1 vs n+1¶

Estimator	Divisor	Bias	MSE (Normal)	Notes
\(\tilde{S}^2\)	\(n\)	\(-\sigma^2/n\)	\(\frac{2n-1}{n^2}\sigma^4\)	MLE; biased
\(S^2\)	\(n-1\)	\(0\)	\(\frac{2}{n-1}\sigma^4\)	Unbiased (Bessel's)
\(\hat{S}^2\)	\(n+1\)	\(-\frac{2}{n+1}\sigma^2\)	\(\frac{2(n-1)}{(n+1)^2}\sigma^4 + \frac{4}{(n+1)^2}\sigma^4\)	MSE-optimal (Normal)

Surprising fact: \(\text{MSE}(\tilde{S}^2) < \text{MSE}(S^2)\) for all \(n\). The biased estimator has lower MSE than the unbiased one! This is a textbook example of the bias-variance tradeoff.

When mu is Known¶

If the true mean \(\mu\) is known (rare in practice), we can use:

\[\hat{\sigma}^2_\mu = \frac{1}{n}\sum_{i=1}^n (X_i - \mu)^2\]

This estimator is unbiased: \(E[\hat{\sigma}^2_\mu] = \sigma^2\), and it has lower variance than \(S^2\):

\[\text{Var}(\hat{\sigma}^2_\mu) = \frac{2\sigma^4}{n} < \frac{2\sigma^4}{n-1} = \text{Var}(S^2)\]

Connection to MLE¶

For normal populations, \(\tilde{S}^2\) is the MLE of \(\sigma^2\). The MLE is biased in finite samples but asymptotically unbiased. This is a common pattern: MLEs are often biased for finite samples but consistent.

Connections to Finance¶

Volatility estimation: The realized variance of daily returns uses the formula \(\frac{1}{n}\sum r_i^2\) (with \(\mu \approx 0\)), which is the naive estimator when the mean is set to zero.
Risk metrics: When computing portfolio variance for risk management with large samples (\(n > 250\) daily observations), the difference between dividing by \(n\) and \(n-1\) is negligible.
Bias correction: For small samples (e.g., monthly data over a few years), the bias can be material and Bessel's correction should be used.

Summary¶

The naive variance estimator \(\tilde{S}^2 = \frac{1}{n}\sum(X_i - \bar{X})^2\) is biased downward by \(\sigma^2/n\) because using the sample mean instead of the true mean systematically underestimates variability. Despite this bias, it has lower MSE than the unbiased \(S^2\) and is consistent. It is also the MLE for normal populations. For large samples, the bias is negligible, but for small samples, Bessel's correction (dividing by \(n-1\)) is standard.

Key Formulas¶

Quantity	Formula
Naive estimator	\(\tilde{S}^2 = \frac{1}{n}\sum(X_i - \bar{X})^2\)
Expectation	\(E[\tilde{S}^2] = \frac{n-1}{n}\sigma^2\)
Bias	\(-\sigma^2/n\)
MSE (Normal)	\(\frac{2n-1}{n^2}\sigma^4\)
Key identity	\(\sum(X_i - \bar{X})^2 = \sum(X_i - \mu)^2 - n(\bar{X}-\mu)^2\)