Skip to content

MLE of μ and σ²

Introduction

The Maximum Likelihood Estimators of the Gaussian (Normal) distribution parameters are among the most important results in statistics. For \(X_1, \ldots, X_n \sim N(\mu, \sigma^2)\), the MLE provides closed-form estimators for both the mean \(\mu\) and variance \(\sigma^2\). This section derives these estimators, analyzes their properties, and connects the results to the broader theory of estimation.

The Normal Log-Likelihood

For an iid sample \(x_1, \ldots, x_n\) from \(N(\mu, \sigma^2)\), the log-likelihood is:

\[\ell(\mu, \sigma^2) = -\frac{n}{2}\log(2\pi) - \frac{n}{2}\log(\sigma^2) - \frac{1}{2\sigma^2}\sum_{i=1}^n (x_i - \mu)^2\]

Deriving the MLEs

MLE of mu

Differentiate with respect to \(\mu\):

\[\frac{\partial \ell}{\partial \mu} = \frac{1}{\sigma^2}\sum_{i=1}^n (x_i - \mu) = \frac{n}{\sigma^2}(\bar{x} - \mu)\]

Setting to zero:

\[\bar{x} - \mu = 0 \implies \boxed{\hat{\mu}_{\text{MLE}} = \bar{X} = \frac{1}{n}\sum_{i=1}^n X_i}\]

The MLE of the mean is the sample mean.

MLE of sigma-squared

Differentiate with respect to \(\sigma^2\) (treating \(\sigma^2\) as a single variable):

\[\frac{\partial \ell}{\partial \sigma^2} = -\frac{n}{2\sigma^2} + \frac{1}{2\sigma^4}\sum_{i=1}^n (x_i - \mu)^2\]

Setting to zero and substituting \(\hat{\mu} = \bar{x}\):

\[-\frac{n}{2\sigma^2} + \frac{1}{2\sigma^4}\sum_{i=1}^n (x_i - \bar{x})^2 = 0\]
\[\sigma^2 = \frac{1}{n}\sum_{i=1}^n (x_i - \bar{x})^2\]
\[\boxed{\hat{\sigma}^2_{\text{MLE}} = \frac{1}{n}\sum_{i=1}^n (X_i - \bar{X})^2}\]

The MLE of the variance divides by \(n\), not \(n-1\).

Verification: Second-Order Conditions

The Hessian matrix evaluated at \((\hat{\mu}, \hat{\sigma}^2)\) is:

\[H = \begin{pmatrix} -n/\hat{\sigma}^2 & 0 \\ 0 & -n/(2\hat{\sigma}^4) \end{pmatrix}\]

This is negative definite (both diagonal entries are negative), confirming a maximum.

Properties of the Gaussian MLEs

Properties of mu-hat = X-bar

Property Result
Bias \(E[\hat{\mu}] = \mu\) (unbiased)
Variance \(\text{Var}(\hat{\mu}) = \sigma^2/n\)
Distribution \(\hat{\mu} \sim N(\mu, \sigma^2/n)\) exactly
Efficiency Achieves CRLB; MVUE
Sufficiency Sufficient for \(\mu\) (given \(\sigma^2\))
Consistency \(\hat{\mu} \xrightarrow{p} \mu\)

Properties of sigma-squared (MLE)

Property Result
Bias \(E[\hat{\sigma}^2] = \frac{n-1}{n}\sigma^2\) (biased)
Bias magnitude \(\text{Bias} = -\sigma^2/n\)
Distribution \(n\hat{\sigma}^2/\sigma^2 \sim \chi^2_{n-1}\)
Variance \(\text{Var}(\hat{\sigma}^2) = \frac{2(n-1)}{n^2}\sigma^4\)
MSE \(\frac{2n-1}{n^2}\sigma^4\)
Consistency \(\hat{\sigma}^2 \xrightarrow{p} \sigma^2\)
Asymptotically unbiased \(E[\hat{\sigma}^2] \to \sigma^2\) as \(n \to \infty\)

Independence

\(\hat{\mu}\) and \(\hat{\sigma}^2\) are independent (by Cochran's theorem). This is a special property of the normal distribution and is crucial for deriving the \(t\)-distribution.

Fisher Information Matrix

The Fisher information matrix for \((\mu, \sigma^2)\) is:

\[I(\mu, \sigma^2) = \begin{pmatrix} n/\sigma^2 & 0 \\ 0 & n/(2\sigma^4) \end{pmatrix}\]

The zero off-diagonal entries confirm that \(\mu\) and \(\sigma^2\) carry independent information.

Cramér-Rao Lower Bounds

\[\text{Var}(\hat{\mu}) \geq \frac{\sigma^2}{n}, \quad \text{Var}(\hat{\sigma}^2) \geq \frac{2\sigma^4}{n}\]

The MLE of \(\mu\) achieves the CRLB exactly. The MLE of \(\sigma^2\) does not achieve the CRLB in finite samples (it has variance \(2(n-1)\sigma^4/n^2 < 2\sigma^4/n\)), but it does asymptotically.

Alternative Parametrization: (mu, sigma)

If we parametrize by \((\mu, \sigma)\) instead of \((\mu, \sigma^2)\), the MLE of \(\sigma\) is:

\[\hat{\sigma}_{\text{MLE}} = \sqrt{\hat{\sigma}^2_{\text{MLE}}} = \sqrt{\frac{1}{n}\sum_{i=1}^n (X_i - \bar{X})^2}\]

This follows from the invariance property of MLEs: if \(\hat{\theta}\) is the MLE of \(\theta\), then \(g(\hat{\theta})\) is the MLE of \(g(\theta)\).

Note that \(\hat{\sigma}_{\text{MLE}}\) is biased for \(\sigma\) (by Jensen's inequality, \(E[\sqrt{X}] < \sqrt{E[X]}\)).

Bias-Corrected Estimator

The unbiased estimator of \(\sigma^2\) is:

\[S^2 = \frac{n}{n-1}\hat{\sigma}^2_{\text{MLE}} = \frac{1}{n-1}\sum_{i=1}^n (X_i - \bar{X})^2\]

Comparison:

Estimator Formula \(E[\cdot]\) MSE
MLE \(\frac{1}{n}\sum(X_i - \bar{X})^2\) \(\frac{n-1}{n}\sigma^2\) \(\frac{2n-1}{n^2}\sigma^4\)
Bessel's \(\frac{1}{n-1}\sum(X_i - \bar{X})^2\) \(\sigma^2\) \(\frac{2}{n-1}\sigma^4\)
MSE-optimal \(\frac{1}{n+1}\sum(X_i - \bar{X})^2\) \(\frac{n-1}{n+1}\sigma^2\) minimum

Log-Likelihood Surface

The log-likelihood function \(\ell(\mu, \sigma^2)\) forms a surface over the \((\mu, \sigma^2)\) plane:

  • For fixed \(\sigma^2\): \(\ell\) is a downward-opening parabola in \(\mu\), maximized at \(\bar{X}\)
  • For fixed \(\mu\): \(\ell\) is a concave function of \(\sigma^2\)
  • The global maximum is at \((\bar{X}, \hat{\sigma}^2)\)
  • Contours of constant log-likelihood are ellipses centered at the MLE (approximately, for large \(n\))

Confidence Regions from the Likelihood

For mu (σ² known)

\[\bar{X} \pm z_{\alpha/2}\frac{\sigma}{\sqrt{n}}\]

For mu (σ² unknown)

\[\bar{X} \pm t_{n-1, \alpha/2}\frac{S}{\sqrt{n}}\]

where \(S = \sqrt{S^2}\) and \(t_{n-1}\) is the Student's \(t\)-distribution with \(n-1\) degrees of freedom.

For sigma-squared

\[\left(\frac{(n-1)S^2}{\chi^2_{n-1, \alpha/2}}, \quad \frac{(n-1)S^2}{\chi^2_{n-1, 1-\alpha/2}}\right)\]

MLE Under Constraints

Known Mean

If \(\mu = \mu_0\) is known, the constrained MLE of \(\sigma^2\) is:

\[\hat{\sigma}^2_{\mu_0} = \frac{1}{n}\sum_{i=1}^n (X_i - \mu_0)^2\]

This is unbiased (unlike the case when \(\mu\) is estimated).

Equal Means (Pooled Variance)

For two groups \(X_1, \ldots, X_{n_1} \sim N(\mu_1, \sigma^2)\) and \(Y_1, \ldots, Y_{n_2} \sim N(\mu_2, \sigma^2)\) with common variance, the MLE of \(\sigma^2\) is:

\[\hat{\sigma}^2_{\text{pooled}} = \frac{\sum(X_i - \bar{X})^2 + \sum(Y_j - \bar{Y})^2}{n_1 + n_2}\]

The unbiased version divides by \(n_1 + n_2 - 2\).

Connections to Finance

  • Return modeling: Assuming log-returns \(r_t \sim N(\mu, \sigma^2)\) is the foundation of many financial models. The MLEs \(\hat{\mu} = \bar{r}\) and \(\hat{\sigma}^2 = \frac{1}{n}\sum(r_t - \bar{r})^2\) are the standard estimates.

  • Black-Scholes: The model assumes \(\log(S_T/S_t) \sim N((\mu - \sigma^2/2)(T-t), \sigma^2(T-t))\). MLE of volatility from historical returns is a key input.

  • VaR estimation: Under normality, \(\text{VaR}_\alpha = -(\hat{\mu} + z_\alpha \hat{\sigma})\), which uses the Gaussian MLEs directly.

  • Portfolio theory: Markowitz optimization uses \(\hat{\mu}\) and \(\hat{\Sigma}\) (the sample mean vector and covariance matrix), which are the multivariate Gaussian MLEs.

  • Normality testing: Before using Gaussian MLE, one should test whether the normal distribution is appropriate. Financial returns often exhibit fat tails, making the Gaussian MLE suboptimal.

Summary

The Gaussian MLEs — \(\hat{\mu} = \bar{X}\) and \(\hat{\sigma}^2 = \frac{1}{n}\sum(X_i - \bar{X})^2\) — are closed-form, computationally trivial, and have excellent properties. The mean estimator is unbiased and efficient; the variance estimator is biased but consistent and has lower MSE than the unbiased alternative. Their independence (unique to the normal distribution) enables exact inference via \(t\) and \(\chi^2\) distributions. These estimators form the foundation of classical statistical inference and are the starting point for financial parameter estimation.

Key Formulas

Quantity Formula
\(\hat{\mu}_{\text{MLE}}\) \(\bar{X}\)
\(\hat{\sigma}^2_{\text{MLE}}\) \(\frac{1}{n}\sum(X_i - \bar{X})^2\)
Fisher info for \(\mu\) \(I_n(\mu) = n/\sigma^2\)
Fisher info for \(\sigma^2\) \(I_n(\sigma^2) = n/(2\sigma^4)\)
\(\hat{\mu}\) distribution \(N(\mu, \sigma^2/n)\)
\(n\hat{\sigma}^2/\sigma^2\) distribution \(\chi^2_{n-1}\)
\(t\)-statistic \((\bar{X}-\mu)/(S/\sqrt{n}) \sim t_{n-1}\)