Bayesian Inference Preview¶
Throughout this chapter, hypothesis tests and confidence intervals follow the frequentist approach: parameters are fixed unknowns, and probability statements apply to the data. Bayesian inference takes a different perspective by treating parameters as random variables with probability distributions. This section introduces the core Bayesian framework using scipy.stats distribution objects, setting the stage for more advanced Bayesian methods covered in later chapters.
Bayes' Theorem¶
The foundation of Bayesian inference is Bayes' theorem. Given observed data \(x\) and a parameter \(\theta\), the posterior distribution is
where:
- \(p(\theta)\) is the prior distribution, encoding beliefs about \(\theta\) before observing data
- \(p(x \mid \theta)\) is the likelihood, the probability of the data given the parameter
- \(p(x) = \int p(x \mid \theta)\, p(\theta)\, d\theta\) is the marginal likelihood (a normalizing constant)
- \(p(\theta \mid x)\) is the posterior distribution, the updated belief after observing data
The posterior combines prior knowledge with the evidence from data. As more data accumulates, the likelihood dominates and the influence of the prior diminishes.
Prior Distributions¶
A prior distribution represents knowledge about a parameter before observing data. Priors range from highly informative (concentrating probability on a narrow range) to weakly informative (spreading probability broadly).
Common choices available in scipy.stats:
| Prior type | Distribution | Use case |
|---|---|---|
| Proportion | stats.beta(a, b) |
Success probability in Bernoulli trials |
| Rate | stats.gamma(a, scale=1/b) |
Poisson rate parameter |
| Location | stats.norm(mu, sigma) |
Mean of a normal distribution |
| Uninformative | stats.uniform(0, 1) |
No prior preference (bounded parameter) |
from scipy import stats
import numpy as np
# Weakly informative prior for a proportion: Beta(1, 1) = Uniform(0, 1)
prior = stats.beta(1, 1)
print(f"Prior mean: {prior.mean():.2f}")
print(f"Prior 95% interval: {prior.ppf([0.025, 0.975])}")
Conjugate Analysis¶
When the prior and likelihood belong to specific distributional families, the posterior has a closed-form solution from the same family. This is called conjugacy.
Beta-Binomial Model¶
For Bernoulli data with a Beta prior, the posterior is also Beta. If the prior is \(\theta \sim \text{Beta}(\alpha, \beta)\) and we observe \(k\) successes in \(n\) trials, the posterior is
This update rule has an intuitive interpretation: \(\alpha\) and \(\beta\) act as "pseudo-counts" of prior successes and failures.
from scipy import stats
import numpy as np
# Prior: Beta(2, 2) — mild belief that theta is near 0.5
alpha_prior, beta_prior = 2, 2
# Data: 7 successes in 20 trials
k, n = 7, 20
# Posterior
alpha_post = alpha_prior + k
beta_post = beta_prior + n - k
posterior = stats.beta(alpha_post, beta_post)
print(f"Prior mean: {stats.beta(alpha_prior, beta_prior).mean():.4f}")
print(f"MLE: {k/n:.4f}")
print(f"Posterior mean: {posterior.mean():.4f}")
print(f"Posterior 95% CI: {posterior.ppf([0.025, 0.975])}")
The posterior mean lies between the prior mean and the maximum likelihood estimate, weighted by the relative amount of information each contributes.
Normal-Normal Model¶
For normally distributed data \(x_1, \ldots, x_n\) with known variance \(\sigma^2\) and a normal prior \(\mu \sim N(\mu_0, \sigma_0^2)\), the posterior is
where \(\bar{x}\) is the sample mean.
from scipy import stats
import numpy as np
# Prior: mu ~ N(0, 10^2)
mu_0, sigma_0 = 0.0, 10.0
# Data: 30 observations with known sigma = 2
sigma = 2.0
rng = np.random.default_rng(42)
data = rng.normal(loc=3.0, scale=sigma, size=30)
x_bar = np.mean(data)
n = len(data)
# Posterior parameters
precision_prior = 1 / sigma_0**2
precision_data = n / sigma**2
mu_post = (mu_0 * precision_prior + x_bar * precision_data) / (precision_prior + precision_data)
sigma_post = np.sqrt(1 / (precision_prior + precision_data))
posterior = stats.norm(mu_post, sigma_post)
print(f"Posterior mean: {mu_post:.4f}")
print(f"Posterior std: {sigma_post:.4f}")
print(f"Posterior 95% CI: {posterior.ppf([0.025, 0.975])}")
Grid Approximation¶
When conjugate priors are not available, a simple numerical approach evaluates the posterior on a grid of parameter values. For each grid point \(\theta_i\), the unnormalized posterior is
Normalizing these values by their sum (times the grid spacing) produces an approximate posterior distribution.
from scipy import stats
import numpy as np
# Grid approximation for Beta-Binomial (verifying conjugate result)
theta_grid = np.linspace(0, 1, 1000)
prior = stats.beta(2, 2)
k, n = 7, 20
log_prior = prior.logpdf(theta_grid)
log_likelihood = stats.binom.logpmf(k, n, theta_grid)
log_posterior = log_prior + log_likelihood
# Normalize
posterior_unnorm = np.exp(log_posterior - log_posterior.max())
posterior_approx = posterior_unnorm / np.trapz(posterior_unnorm, theta_grid)
# Compare with exact conjugate posterior
exact_posterior = stats.beta(2 + k, 2 + n - k)
print(f"Grid mean: {np.trapz(theta_grid * posterior_approx, theta_grid):.4f}")
print(f"Exact mean: {exact_posterior.mean():.4f}")
Bayesian vs Frequentist Comparison¶
| Aspect | Frequentist | Bayesian |
|---|---|---|
| Parameters | Fixed unknowns | Random variables with distributions |
| Probability | Long-run frequency | Degree of belief |
| Result | p-value, confidence interval | Posterior distribution, credible interval |
| Prior information | Not formally incorporated | Encoded in the prior |
| Small samples | May lack power | Prior stabilizes estimates |
When to Consider Bayesian Methods
Bayesian approaches are particularly useful when prior information is available from domain expertise, when sample sizes are small, or when the full posterior distribution (rather than a point estimate) is needed for downstream decisions.
Summary¶
Bayesian inference updates a prior distribution with observed data via Bayes' theorem to produce a posterior distribution. Conjugate analysis provides closed-form posteriors for specific prior-likelihood pairs, while grid approximation handles arbitrary models numerically. The scipy.stats distribution objects serve as building blocks for specifying priors and evaluating posteriors in both approaches.