MLE for Normal Distribution¶
Overview¶
Let \(x^{(i)}\) be \(m\) i.i.d. samples from \(N(\mu, \sigma^2)\). Then, \(\mu\) and \(\sigma^2\) can be estimated by \(\hat{\mu}\) and \(\hat{\sigma}^2\) where:
\[
\begin{array}{lll}
\hat{\mu} &=& \displaystyle\frac{\sum_{i=1}^m x^{(i)}}{m} \\[12pt]
\hat{\sigma}^2 &=& \displaystyle\frac{\sum_{i=1}^m (x^{(i)} - \hat{\mu})^2}{m}
\end{array}
\]
Derivation¶
Data¶
\[
\{x^{(i)} : i = 1, \ldots, m\}
\]
Model¶
\[
x^{(i)} \sim N(\mu, \sigma^2)
\]
Likelihood Function¶
\[
L(\mu, \sigma^2) = \prod_{i=1}^m \frac{1}{\sqrt{2\pi\sigma^2}} \exp\!\left(-\frac{1}{2\sigma^2}(x^{(i)} - \mu)^2\right)
\]
Log-Likelihood Function¶
\[
\ell(\mu, \sigma^2) = -\frac{1}{2\sigma^2}\sum_{i=1}^m (x^{(i)} - \mu)^2 - \frac{m}{2}\log\sigma^2 + \text{Constant}
\]
Cost Function¶
\[
J(\mu, \sigma^2) = \frac{1}{2\sigma^2}\sum_{i=1}^m (x^{(i)} - \mu)^2 + \frac{m}{2}\log\sigma^2
\]
Maximum Likelihood Principle¶
\[
\text{argmax}_{\mu, \sigma^2}\; L
\quad\Leftrightarrow\quad
\text{argmax}_{\mu, \sigma^2}\; \ell
\quad\Leftrightarrow\quad
\text{argmin}_{\mu, \sigma^2}\; J
\]
MLE Solutions¶
\[
\begin{array}{llcll}
\displaystyle\frac{\partial J}{\partial \mu} = 0
&\Rightarrow&
\displaystyle\sum_{i=1}^m (x^{(i)} - \mu) = 0
&\Rightarrow&
\displaystyle\hat{\mu} = \frac{\sum_{i=1}^m x^{(i)}}{m} \\[16pt]
\displaystyle\frac{\partial J}{\partial \sigma^2} = 0
&\Rightarrow&
\cdots
&\Rightarrow&
\displaystyle\hat{\sigma}^2 = \frac{\sum_{i=1}^m (x^{(i)} - \hat{\mu})^2}{m}
\end{array}
\]
Key Observations¶
| Estimator | MLE | Unbiased? |
|---|---|---|
| \(\hat{\mu}\) | \(\frac{1}{m}\sum x^{(i)}\) | ✅ Yes |
| \(\hat{\sigma}^2\) | \(\frac{1}{m}\sum (x^{(i)} - \hat{\mu})^2\) | ❌ No (divides by \(m\), not \(m-1\)) |
MLE Bias for Variance
The MLE \(\hat{\sigma}^2\) divides by \(m\), making it a biased estimator of \(\sigma^2\). The unbiased sample variance \(S^2\) divides by \(m - 1\) (Bessel's correction):
\[
S^2 = \frac{\sum_{i=1}^m (x^{(i)} - \hat{\mu})^2}{m - 1}
\]
Connection to Least Squares¶
The cost function for \(\mu\) (with \(\sigma^2\) fixed) is:
\[
J(\mu) \propto \sum_{i=1}^m (x^{(i)} - \mu)^2
\]
This is exactly the least squares objective. Thus, the MLE for the mean of a normal distribution is equivalent to the least squares estimate — a deep connection between MLE and regression.