MLE for Bernoulli Distribution¶
Overview¶
Let \(x^{(i)}\) be \(m\) i.i.d. samples from \(B(p)\). Then, \(p\) can be estimated by \(\hat{p}\) where:
\[
\hat{p} = \frac{\sum_{i=1}^m x^{(i)}}{m}
\]
Derivation¶
Data¶
\[
\{x^{(i)} : i = 1, \ldots, m\}
\]
Model¶
\[
x^{(i)} \sim B(p)
\]
Likelihood Function¶
\[
L(p) = \prod_{i=1}^m p^{x^{(i)}} (1 - p)^{1 - x^{(i)}}
\]
Log-Likelihood Function¶
\[
\ell(p) = \sum_{i=1}^m x^{(i)} \log(p) + (1 - x^{(i)}) \log(1 - p)
\]
Cost Function¶
\[
J(p) = -\sum_{i=1}^m x^{(i)} \log(p) + (1 - x^{(i)}) \log(1 - p)
\]
Connection to Cross-Entropy
The cost function \(J(p)\) is exactly the binary cross-entropy loss used in logistic regression and neural networks. MLE for the Bernoulli distribution is the theoretical foundation of cross-entropy-based training.
Maximum Likelihood Principle¶
\[
\text{argmax}_{p}\; L
\quad\Leftrightarrow\quad
\text{argmax}_{p}\; \ell
\quad\Leftrightarrow\quad
\text{argmin}_{p}\; J
\]
MLE Solution¶
\[
\begin{array}{llcll}
\displaystyle\frac{\partial J}{\partial p} = 0
&\Rightarrow&
\displaystyle\sum_{i=1}^m \frac{x^{(i)}}{p} - \frac{1 - x^{(i)}}{1 - p} = 0
&\Rightarrow&
\displaystyle\hat{p} = \frac{\sum_{i=1}^m x^{(i)}}{m}
\end{array}
\]
Python Implementation: Log-Likelihood and MLE¶
import numpy as np
import matplotlib.pyplot as plt
# Set random seed for reproducibility
seed = 1
np.random.seed(seed)
# Define probability of heads and sample size
p = 0.7
n_samples = 100
def load_data():
"""
Simulate coin flips based on a binomial distribution.
Returns:
- numpy array: Array of coin flips (1 for heads, 0 for tails).
"""
return np.random.binomial(n=1, p=p, size=(n_samples,)) # Shape (100,)
def compute_prob(coin, p):
"""
Compute the probability of a single coin flip outcome.
Parameters:
- coin: Outcome of the coin flip (1 for heads, 0 for tails).
- p: Probability of heads.
Returns:
- float: Probability of observing the outcome.
"""
return p**coin * (1 - p)**(1 - coin)
def compute_log_prob(coin, p):
"""
Compute the log-probability of a single coin flip outcome.
Parameters:
- coin: Outcome of the coin flip (1 for heads, 0 for tails).
- p: Probability of heads.
Returns:
- float: Log-probability of observing the outcome.
"""
return coin * np.log(p) + (1 - coin) * np.log(1 - p)
def compute_likelihood(coins, p):
"""
Compute the joint probability of all coin flips for a given probability.
Parameters:
- coins: Array of coin flip outcomes.
- p: Probability of heads.
Returns:
- float: Joint probability of observing all outcomes.
"""
joint_prob = 1.0
for coin in coins:
joint_prob *= compute_prob(coin, p)
return joint_prob
def compute_log_likelihood(coins, p):
"""
Compute the log-likelihood of all coin flips for a given probability.
Parameters:
- coins: Array of coin flip outcomes.
- p: Probability of heads.
Returns:
- float: Log-likelihood of observing all outcomes.
"""
log_joint_prob = 0.0
for coin in coins:
log_joint_prob += compute_log_prob(coin, p)
return log_joint_prob
# Simulate coin flips
coins = load_data()
# Define range of probabilities to test for MLE
ps = np.linspace(0.01, 0.99, 100)
# Calculate log-likelihood for each probability
log_likelihood_list = [compute_log_likelihood(coins, p) for p in ps]
log_likelihood = np.array(log_likelihood_list)
# Find the probability with the highest log-likelihood (MLE)
idx = np.argmax(log_likelihood)
mle_p = ps[idx]
log_likelihood_max = log_likelihood[idx]
print(f"MLE index: {idx}")
print(f"MLE probability (p): {mle_p:.4f}")
print(f"Max log-likelihood: {log_likelihood_max:.4f}\n")
# Plot log-likelihood function and mark the MLE
fig, ax = plt.subplots(figsize=(12, 3))
ax.plot(ps, log_likelihood, label="Log-likelihood")
ax.plot([mle_p, mle_p], [0, log_likelihood_max], '--or', label="MLE")
ax.legend(loc="lower right")
# Customize plot appearance
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['bottom'].set_position("zero")
ax.spines['left'].set_position("zero")
ax.set_xlabel("Probability (p)")
ax.set_ylabel("Log-likelihood")
plt.show()