Bayesian Inference Preview¶
Throughout this chapter, hypothesis tests and confidence intervals follow the frequentist approach: parameters are fixed unknowns, and probability statements apply to the data. Bayesian inference takes a different perspective by treating parameters as random variables with probability distributions. This section introduces the core Bayesian framework using scipy.stats distribution objects, setting the stage for more advanced Bayesian methods covered in later chapters.
Mental Model
Imagine a dial you can turn to set a parameter like a coin's bias. Bayesian inference starts with a prior belief about where the dial is set, then updates that belief each time new data arrives. The posterior distribution is your refined belief -- it gets sharper and more concentrated as evidence accumulates.
Bayes' Theorem¶
The foundation of Bayesian inference is Bayes' theorem. Given observed data \(x\) and a parameter \(\theta\), the posterior distribution is
where:
- \(p(\theta)\) is the prior distribution, encoding beliefs about \(\theta\) before observing data
- \(p(x \mid \theta)\) is the likelihood, the probability of the data given the parameter
- \(p(x) = \int p(x \mid \theta)\, p(\theta)\, d\theta\) is the marginal likelihood (a normalizing constant)
- \(p(\theta \mid x)\) is the posterior distribution, the updated belief after observing data
The posterior combines prior knowledge with the evidence from data. As more data accumulates, the likelihood dominates and the influence of the prior diminishes.
Prior Distributions¶
A prior distribution represents knowledge about a parameter before observing data. Priors range from highly informative (concentrating probability on a narrow range) to weakly informative (spreading probability broadly).
Common choices available in scipy.stats:
| Prior type | Distribution | Use case |
|---|---|---|
| Proportion | stats.beta(a, b) |
Success probability in Bernoulli trials |
| Rate | stats.gamma(a, scale=1/b) |
Poisson rate parameter |
| Location | stats.norm(mu, sigma) |
Mean of a normal distribution |
| Uninformative | stats.uniform(0, 1) |
No prior preference (bounded parameter) |
```python from scipy import stats import numpy as np
Weakly informative prior for a proportion: Beta(1, 1) = Uniform(0, 1)¶
prior = stats.beta(1, 1) print(f"Prior mean: {prior.mean():.2f}") print(f"Prior 95% interval: {prior.ppf([0.025, 0.975])}") ```
Conjugate Analysis¶
When the prior and likelihood belong to specific distributional families, the posterior has a closed-form solution from the same family. This is called conjugacy.
Beta-Binomial Model¶
For Bernoulli data with a Beta prior, the posterior is also Beta. If the prior is \(\theta \sim \text{Beta}(\alpha, \beta)\) and we observe \(k\) successes in \(n\) trials, the posterior is
This update rule has an intuitive interpretation: \(\alpha\) and \(\beta\) act as "pseudo-counts" of prior successes and failures.
```python from scipy import stats import numpy as np
Prior: Beta(2, 2) — mild belief that theta is near 0.5¶
alpha_prior, beta_prior = 2, 2
Data: 7 successes in 20 trials¶
k, n = 7, 20
Posterior¶
alpha_post = alpha_prior + k beta_post = beta_prior + n - k posterior = stats.beta(alpha_post, beta_post)
print(f"Prior mean: {stats.beta(alpha_prior, beta_prior).mean():.4f}") print(f"MLE: {k/n:.4f}") print(f"Posterior mean: {posterior.mean():.4f}") print(f"Posterior 95% CI: {posterior.ppf([0.025, 0.975])}") ```
The posterior mean lies between the prior mean and the maximum likelihood estimate, weighted by the relative amount of information each contributes.
Normal-Normal Model¶
For normally distributed data \(x_1, \ldots, x_n\) with known variance \(\sigma^2\) and a normal prior \(\mu \sim N(\mu_0, \sigma_0^2)\), the posterior is
where \(\bar{x}\) is the sample mean.
```python from scipy import stats import numpy as np
Prior: mu ~ N(0, 10^2)¶
mu_0, sigma_0 = 0.0, 10.0
Data: 30 observations with known sigma = 2¶
sigma = 2.0 rng = np.random.default_rng(42) data = rng.normal(loc=3.0, scale=sigma, size=30) x_bar = np.mean(data) n = len(data)
Posterior parameters¶
precision_prior = 1 / sigma_02 precision_data = n / sigma2 mu_post = (mu_0 * precision_prior + x_bar * precision_data) / (precision_prior + precision_data) sigma_post = np.sqrt(1 / (precision_prior + precision_data))
posterior = stats.norm(mu_post, sigma_post) print(f"Posterior mean: {mu_post:.4f}") print(f"Posterior std: {sigma_post:.4f}") print(f"Posterior 95% CI: {posterior.ppf([0.025, 0.975])}") ```
Grid Approximation¶
When conjugate priors are not available, a simple numerical approach evaluates the posterior on a grid of parameter values. For each grid point \(\theta_i\), the unnormalized posterior is
Normalizing these values by their sum (times the grid spacing) produces an approximate posterior distribution.
```python from scipy import stats import numpy as np
Grid approximation for Beta-Binomial (verifying conjugate result)¶
theta_grid = np.linspace(0, 1, 1000)
prior = stats.beta(2, 2) k, n = 7, 20
log_prior = prior.logpdf(theta_grid) log_likelihood = stats.binom.logpmf(k, n, theta_grid) log_posterior = log_prior + log_likelihood
Normalize¶
posterior_unnorm = np.exp(log_posterior - log_posterior.max()) posterior_approx = posterior_unnorm / np.trapz(posterior_unnorm, theta_grid)
Compare with exact conjugate posterior¶
exact_posterior = stats.beta(2 + k, 2 + n - k) print(f"Grid mean: {np.trapz(theta_grid * posterior_approx, theta_grid):.4f}") print(f"Exact mean: {exact_posterior.mean():.4f}") ```
Bayesian vs Frequentist Comparison¶
| Aspect | Frequentist | Bayesian |
|---|---|---|
| Parameters | Fixed unknowns | Random variables with distributions |
| Probability | Long-run frequency | Degree of belief |
| Result | p-value, confidence interval | Posterior distribution, credible interval |
| Prior information | Not formally incorporated | Encoded in the prior |
| Small samples | May lack power | Prior stabilizes estimates |
When to Consider Bayesian Methods
Bayesian approaches are particularly useful when prior information is available from domain expertise, when sample sizes are small, or when the full posterior distribution (rather than a point estimate) is needed for downstream decisions.
Summary¶
Bayesian inference updates a prior distribution with observed data via Bayes' theorem to produce a posterior distribution. Conjugate analysis provides closed-form posteriors for specific prior-likelihood pairs, while grid approximation handles arbitrary models numerically. The scipy.stats distribution objects serve as building blocks for specifying priors and evaluating posteriors in both approaches.
Exercises¶
Exercise 1. Explain the difference between frequentist and Bayesian approaches to statistical inference. Give one advantage of each.
Solution to Exercise 1
```python import numpy as np from scipy import stats
np.random.seed(42) data = np.random.randn(100) print(f'Mean: {data.mean():.4f}') print(f'Std: {data.std():.4f}') ```
Exercise 2. Write code that computes Bayes' theorem for a simple example: if a disease has 1% prevalence and a test has 95% sensitivity and 90% specificity, what is the probability of having the disease given a positive test?
Solution to Exercise 2
See the main content for the detailed explanation. The key concept involves understanding the statistical method and its assumptions.
Exercise 3. Write code that performs a Bayesian update: start with a uniform prior on a coin's bias, observe 7 heads out of 10 flips, and compute the posterior distribution.
Solution to Exercise 3
```python import numpy as np from scipy import stats import matplotlib.pyplot as plt
np.random.seed(42) data = np.random.randn(1000) fig, ax = plt.subplots() ax.hist(data, bins=30, density=True, alpha=0.7) ax.set_title('Distribution') plt.show() ```
Exercise 4. Explain what a prior, likelihood, and posterior distribution represent in Bayesian statistics.
Solution to Exercise 4
```python import numpy as np from scipy import stats
np.random.seed(42) data = np.random.randn(500) result = stats.describe(data) print(result) ```