Skip to content

Bayesian Inference Preview

Throughout this chapter, hypothesis tests and confidence intervals follow the frequentist approach: parameters are fixed unknowns, and probability statements apply to the data. Bayesian inference takes a different perspective by treating parameters as random variables with probability distributions. This section introduces the core Bayesian framework using scipy.stats distribution objects, setting the stage for more advanced Bayesian methods covered in later chapters.

Bayes' Theorem

The foundation of Bayesian inference is Bayes' theorem. Given observed data \(x\) and a parameter \(\theta\), the posterior distribution is

\[ p(\theta \mid x) = \frac{p(x \mid \theta)\, p(\theta)}{p(x)} \]

where:

  • \(p(\theta)\) is the prior distribution, encoding beliefs about \(\theta\) before observing data
  • \(p(x \mid \theta)\) is the likelihood, the probability of the data given the parameter
  • \(p(x) = \int p(x \mid \theta)\, p(\theta)\, d\theta\) is the marginal likelihood (a normalizing constant)
  • \(p(\theta \mid x)\) is the posterior distribution, the updated belief after observing data

The posterior combines prior knowledge with the evidence from data. As more data accumulates, the likelihood dominates and the influence of the prior diminishes.

Prior Distributions

A prior distribution represents knowledge about a parameter before observing data. Priors range from highly informative (concentrating probability on a narrow range) to weakly informative (spreading probability broadly).

Common choices available in scipy.stats:

Prior type Distribution Use case
Proportion stats.beta(a, b) Success probability in Bernoulli trials
Rate stats.gamma(a, scale=1/b) Poisson rate parameter
Location stats.norm(mu, sigma) Mean of a normal distribution
Uninformative stats.uniform(0, 1) No prior preference (bounded parameter)
from scipy import stats
import numpy as np

# Weakly informative prior for a proportion: Beta(1, 1) = Uniform(0, 1)
prior = stats.beta(1, 1)
print(f"Prior mean: {prior.mean():.2f}")
print(f"Prior 95% interval: {prior.ppf([0.025, 0.975])}")

Conjugate Analysis

When the prior and likelihood belong to specific distributional families, the posterior has a closed-form solution from the same family. This is called conjugacy.

Beta-Binomial Model

For Bernoulli data with a Beta prior, the posterior is also Beta. If the prior is \(\theta \sim \text{Beta}(\alpha, \beta)\) and we observe \(k\) successes in \(n\) trials, the posterior is

\[ \theta \mid k, n \sim \text{Beta}(\alpha + k,\; \beta + n - k) \]

This update rule has an intuitive interpretation: \(\alpha\) and \(\beta\) act as "pseudo-counts" of prior successes and failures.

from scipy import stats
import numpy as np

# Prior: Beta(2, 2) — mild belief that theta is near 0.5
alpha_prior, beta_prior = 2, 2

# Data: 7 successes in 20 trials
k, n = 7, 20

# Posterior
alpha_post = alpha_prior + k
beta_post = beta_prior + n - k
posterior = stats.beta(alpha_post, beta_post)

print(f"Prior mean:     {stats.beta(alpha_prior, beta_prior).mean():.4f}")
print(f"MLE:            {k/n:.4f}")
print(f"Posterior mean:  {posterior.mean():.4f}")
print(f"Posterior 95% CI: {posterior.ppf([0.025, 0.975])}")

The posterior mean lies between the prior mean and the maximum likelihood estimate, weighted by the relative amount of information each contributes.

Normal-Normal Model

For normally distributed data \(x_1, \ldots, x_n\) with known variance \(\sigma^2\) and a normal prior \(\mu \sim N(\mu_0, \sigma_0^2)\), the posterior is

\[ \mu \mid x_1, \ldots, x_n \sim N\!\left(\frac{\frac{\mu_0}{\sigma_0^2} + \frac{n\bar{x}}{\sigma^2}}{\frac{1}{\sigma_0^2} + \frac{n}{\sigma^2}},\; \frac{1}{\frac{1}{\sigma_0^2} + \frac{n}{\sigma^2}}\right) \]

where \(\bar{x}\) is the sample mean.

from scipy import stats
import numpy as np

# Prior: mu ~ N(0, 10^2)
mu_0, sigma_0 = 0.0, 10.0

# Data: 30 observations with known sigma = 2
sigma = 2.0
rng = np.random.default_rng(42)
data = rng.normal(loc=3.0, scale=sigma, size=30)
x_bar = np.mean(data)
n = len(data)

# Posterior parameters
precision_prior = 1 / sigma_0**2
precision_data = n / sigma**2
mu_post = (mu_0 * precision_prior + x_bar * precision_data) / (precision_prior + precision_data)
sigma_post = np.sqrt(1 / (precision_prior + precision_data))

posterior = stats.norm(mu_post, sigma_post)
print(f"Posterior mean: {mu_post:.4f}")
print(f"Posterior std:  {sigma_post:.4f}")
print(f"Posterior 95% CI: {posterior.ppf([0.025, 0.975])}")

Grid Approximation

When conjugate priors are not available, a simple numerical approach evaluates the posterior on a grid of parameter values. For each grid point \(\theta_i\), the unnormalized posterior is

\[ p(\theta_i \mid x) \propto p(x \mid \theta_i)\, p(\theta_i) \]

Normalizing these values by their sum (times the grid spacing) produces an approximate posterior distribution.

from scipy import stats
import numpy as np

# Grid approximation for Beta-Binomial (verifying conjugate result)
theta_grid = np.linspace(0, 1, 1000)

prior = stats.beta(2, 2)
k, n = 7, 20

log_prior = prior.logpdf(theta_grid)
log_likelihood = stats.binom.logpmf(k, n, theta_grid)
log_posterior = log_prior + log_likelihood

# Normalize
posterior_unnorm = np.exp(log_posterior - log_posterior.max())
posterior_approx = posterior_unnorm / np.trapz(posterior_unnorm, theta_grid)

# Compare with exact conjugate posterior
exact_posterior = stats.beta(2 + k, 2 + n - k)
print(f"Grid mean:  {np.trapz(theta_grid * posterior_approx, theta_grid):.4f}")
print(f"Exact mean: {exact_posterior.mean():.4f}")

Bayesian vs Frequentist Comparison

Aspect Frequentist Bayesian
Parameters Fixed unknowns Random variables with distributions
Probability Long-run frequency Degree of belief
Result p-value, confidence interval Posterior distribution, credible interval
Prior information Not formally incorporated Encoded in the prior
Small samples May lack power Prior stabilizes estimates

When to Consider Bayesian Methods

Bayesian approaches are particularly useful when prior information is available from domain expertise, when sample sizes are small, or when the full posterior distribution (rather than a point estimate) is needed for downstream decisions.

Summary

Bayesian inference updates a prior distribution with observed data via Bayes' theorem to produce a posterior distribution. Conjugate analysis provides closed-form posteriors for specific prior-likelihood pairs, while grid approximation handles arbitrary models numerically. The scipy.stats distribution objects serve as building blocks for specifying priors and evaluating posteriors in both approaches.