Confidence Intervals¶
A point estimate like a sample mean gives a single best guess for a population parameter, but it says nothing about how precise that guess is. Confidence intervals address this gap by providing a range of plausible values for the parameter, together with a stated level of confidence. They are the natural complement to hypothesis tests: a 95% confidence interval contains exactly those parameter values that a two-sided test at the 5% level would fail to reject.
Mental Model
A confidence interval is a net, not a bullseye. The 95% in "95% confidence" means that if you cast this net repeatedly across many samples, 95% of the nets would catch the true parameter. Any single net either caught it or missed -- you just do not know which. Wider intervals reflect more uncertainty; narrower intervals reflect more data or less variability.
Definition and Interpretation¶
A confidence interval at level \(1 - \alpha\) for a parameter \(\theta\) is a random interval \([L, U]\) computed from sample data such that
The confidence level \(1 - \alpha\) refers to the long-run coverage rate of the procedure, not to the probability that a specific realized interval contains \(\theta\). Once the data are observed and the interval is computed, \(\theta\) is either inside or outside — the interval is no longer random.
Common Misinterpretation
A 95% confidence interval does not mean "there is a 95% probability that \(\theta\) lies in this interval." It means that if we repeated the sampling procedure many times, 95% of the resulting intervals would contain \(\theta\).
Confidence Interval for a Mean¶
Known Variance (z-interval)¶
When the population variance \(\sigma^2\) is known and the data are normally distributed (or \(n\) is large enough for the CLT to apply), the confidence interval for the population mean \(\mu\) is
where \(z_{\alpha/2}\) is the upper \(\alpha/2\) quantile of the standard normal distribution.
```python import numpy as np from scipy import stats
Simulated data with known population std¶
np.random.seed(42) data = np.random.normal(loc=50, scale=10, size=30) sigma = 10 # Known population std
95% z-interval¶
alpha = 0.05 z_crit = stats.norm.ppf(1 - alpha / 2) margin = z_crit * sigma / np.sqrt(len(data)) ci = (np.mean(data) - margin, np.mean(data) + margin) print(f"Sample mean: {np.mean(data):.2f}") print(f"95% CI (z): ({ci[0]:.2f}, {ci[1]:.2f})") ```
Unknown Variance (t-interval)¶
When \(\sigma\) is unknown and estimated by the sample standard deviation \(s\), the interval uses the \(t\)-distribution with \(n - 1\) degrees of freedom:
```python
t-interval (unknown variance) using scipy¶
mean = np.mean(data) se = stats.sem(data) # Standard error of the mean ci_t = stats.t.interval(0.95, df=len(data) - 1, loc=mean, scale=se) print(f"95% CI (t): ({ci_t[0]:.2f}, {ci_t[1]:.2f})") ```
The \(t\)-interval is wider than the \(z\)-interval because the extra uncertainty from estimating \(\sigma\) inflates the critical value. As \(n \to \infty\), \(t_{\alpha/2,\, n-1} \to z_{\alpha/2}\).
Confidence Interval for a Proportion¶
For a population proportion \(p\) estimated by the sample proportion \(\hat{p} = X / n\), the Wald interval is
This approximation relies on the normal approximation to the binomial and works well when \(n\hat{p}\) and \(n(1 - \hat{p})\) are both at least 10.
```python
Proportion confidence interval¶
n_trials = 200 n_successes = 130 p_hat = n_successes / n_trials
z_crit = stats.norm.ppf(0.975) margin = z_crit * np.sqrt(p_hat * (1 - p_hat) / n_trials) ci_prop = (p_hat - margin, p_hat + margin) print(f"Sample proportion: {p_hat:.3f}") print(f"95% CI: ({ci_prop[0]:.3f}, {ci_prop[1]:.3f})") ```
Wilson Interval for Small Samples
The Wald interval can have poor coverage when \(n\) is small or \(p\) is near 0 or 1. The Wilson score interval provides better coverage in these cases:
python
from statsmodels.stats.proportion import proportion_confint
ci_wilson = proportion_confint(n_successes, n_trials, method='wilson')
print(f"Wilson CI: ({ci_wilson[0]:.3f}, {ci_wilson[1]:.3f})")
Confidence Interval Width¶
The half-width (margin of error) of a confidence interval depends on three factors:
- Confidence level \(1 - \alpha\): higher confidence requires a larger critical value, producing a wider interval
- Sample size \(n\): larger samples reduce the standard error \(\sigma / \sqrt{n}\), producing a narrower interval
- Variability \(\sigma\): more variable populations produce wider intervals
The required sample size for a desired margin of error \(E\) in a z-interval is
```python
Required sample size for margin of error = 2 with sigma = 10¶
desired_margin = 2.0 sigma = 10.0 z_crit = stats.norm.ppf(0.975) n_required = np.ceil((z_crit * sigma / desired_margin) ** 2) print(f"Required n for margin = {desired_margin}: {n_required:.0f}") ```
Relationship to Hypothesis Testing¶
A two-sided hypothesis test at significance level \(\alpha\) and a \((1 - \alpha)\) confidence interval are equivalent:
- Reject \(H_0: \mu = \mu_0\) if and only if \(\mu_0\) falls outside the \((1 - \alpha)\) confidence interval
- Fail to reject \(H_0\) if and only if \(\mu_0\) falls inside the interval
```python
Equivalence demonstration¶
np.random.seed(42) data = np.random.normal(loc=50, scale=10, size=30)
Hypothesis test: H0: mu = 48¶
mu_0 = 48 t_stat, p_value = stats.ttest_1samp(data, mu_0) print(f"t-test p-value: {p_value:.4f}")
Confidence interval¶
mean = np.mean(data) se = stats.sem(data) ci = stats.t.interval(0.95, df=len(data) - 1, loc=mean, scale=se) print(f"95% CI: ({ci[0]:.2f}, {ci[1]:.2f})") print(f"mu_0 = {mu_0} in CI: {ci[0] <= mu_0 <= ci[1]}") print(f"Reject H0: {p_value < 0.05}") ```
Bootstrap Confidence Intervals¶
When the sampling distribution of the estimator is unknown or difficult to derive analytically, bootstrap methods construct confidence intervals by resampling from the observed data.
```python from scipy.stats import bootstrap
Bootstrap CI for the mean¶
np.random.seed(42) data = np.random.normal(loc=50, scale=10, size=30)
scipy.stats.bootstrap requires a tuple of arrays¶
result = bootstrap( (data,), statistic=np.mean, n_resamples=9999, confidence_level=0.95, method='percentile' ) print(f"Bootstrap 95% CI: ({result.confidence_interval.low:.2f}, " f"{result.confidence_interval.high:.2f})") ```
The method parameter controls which bootstrap interval is computed:
'percentile': uses quantiles of the bootstrap distribution directly'basic': reflects the bootstrap distribution around the sample estimate'bca': bias-corrected and accelerated — adjusts for bias and skewness
Summary¶
Confidence intervals quantify the precision of point estimates by providing a range of plausible values for the population parameter. The key formulas are the z-interval (known variance) and t-interval (unknown variance) for means, and the Wald or Wilson interval for proportions. The interval width shrinks with larger sample sizes and lower confidence levels. A \((1 - \alpha)\) confidence interval is equivalent to the set of parameter values not rejected by a two-sided test at level \(\alpha\). When analytical formulas are unavailable, bootstrap methods provide a flexible nonparametric alternative.
Exercises¶
Exercise 1. Generate 40 samples from \(N(75, 8^2)\). Compute both the z-interval (assuming \(\sigma = 8\) is known) and the t-interval for a 95% confidence interval. Compare the widths and explain why the t-interval is wider.
Solution to Exercise 1
import numpy as np
from scipy import stats
np.random.seed(42)
data = np.random.normal(75, 8, 40)
xbar = np.mean(data)
n = len(data)
z_margin = stats.norm.ppf(0.975) * 8 / np.sqrt(n)
ci_z = (xbar - z_margin, xbar + z_margin)
se = stats.sem(data)
ci_t = stats.t.interval(0.95, df=n-1, loc=xbar, scale=se)
print(f"z-interval: ({ci_z[0]:.2f}, {ci_z[1]:.2f}), width={2*z_margin:.2f}")
print(f"t-interval: ({ci_t[0]:.2f}, {ci_t[1]:.2f}), width={ci_t[1]-ci_t[0]:.2f}")
Exercise 2.
In a survey, 180 out of 300 respondents favor a policy. Compute the 95% Wald confidence interval and the Wilson confidence interval (using statsmodels.stats.proportion.proportion_confint) for the population proportion.
Solution to Exercise 2
import numpy as np
from scipy import stats
from statsmodels.stats.proportion import proportion_confint
n, x = 300, 180
p_hat = x / n
z = stats.norm.ppf(0.975)
margin = z * np.sqrt(p_hat * (1 - p_hat) / n)
ci_wald = (p_hat - margin, p_hat + margin)
ci_wilson = proportion_confint(x, n, method='wilson')
print(f"Wald CI: ({ci_wald[0]:.4f}, {ci_wald[1]:.4f})")
print(f"Wilson CI: ({ci_wilson[0]:.4f}, {ci_wilson[1]:.4f})")
Exercise 3.
Use scipy.stats.bootstrap to compute a 95% bootstrap confidence interval for the median of 50 samples drawn from an exponential distribution with \(\lambda = 1\). Compare the bootstrap CI with the theoretical median \(\ln 2 \approx 0.693\).
Solution to Exercise 3
import numpy as np
from scipy.stats import bootstrap
np.random.seed(42)
data = np.random.exponential(1, size=50)
result = bootstrap((data,), statistic=np.median,
n_resamples=9999, confidence_level=0.95,
random_state=42)
print(f"Bootstrap 95% CI for median: "
f"({result.confidence_interval.low:.3f}, "
f"{result.confidence_interval.high:.3f})")
print(f"Theoretical median: {np.log(2):.3f}")