Interpretation and Common Misconceptions¶

The Repeated-Sampling Interpretation¶

A 95% confidence interval does not mean "there is a 95% probability that $\mu$ is in this interval." The parameter $\mu$ is a fixed (but unknown) number — it either is or is not in the interval.

The correct interpretation: if we were to repeat the sampling process many times, each time constructing a 95% CI, then approximately 95% of those intervals would contain the true parameter.

Formal Statement¶

Let $X_1, \ldots, X_n \overset{\text{iid}}{\sim} N(\mu, \sigma^2)$ with $\sigma$ known. The interval

\[\bar{X} \pm z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}}\]

satisfies:

\[P\left(\bar{X} - z_{\alpha/2}\frac{\sigma}{\sqrt{n}} \leq \mu \leq \bar{X} + z_{\alpha/2}\frac{\sigma}{\sqrt{n}}\right) = 1 - \alpha\]

The probability statement is about the random endpoints $\bar{X} \pm z_{\alpha/2}\sigma/\sqrt{n}$, not about $\mu$.

Simulation Demonstration¶

import numpy as np
np.random.seed(42)

mu, sigma, n = 50, 10, 30
alpha = 0.05
z = 1.96
n_simulations = 1000

covers = 0
for _ in range(n_simulations):
    sample = np.random.normal(mu, sigma, n)
    xbar = sample.mean()
    me = z * sigma / np.sqrt(n)
    lower, upper = xbar - me, xbar + me
    if lower <= mu <= upper:
        covers += 1

print(f"Coverage: {covers}/{n_simulations} = {covers/n_simulations:.3f}")
# ≈ 0.950

Common Misconceptions¶

Misconception 1: "95% probability that μ is in this interval"¶

After computing $[48.2, 51.8]$, the statement "there is a 95% probability that $\mu$ is between 48.2 and 51.8" is wrong. Either $\mu$ is in that interval or it is not — there is no randomness left.

The 95% refers to the procedure, not any single interval.

Misconception 2: "95% of the data falls in the interval"¶

A CI estimates a parameter (like the population mean), not the range of individual observations. The interval $\bar{X} \pm z_{\alpha/2}\sigma/\sqrt{n}$ shrinks with $n$, while the range of data does not.

Misconception 3: "If two CIs overlap, the difference is not significant"¶

Two 95% CIs can overlap even when the difference between parameters is statistically significant. The proper comparison uses a CI for the difference $\mu_1 - \mu_2$.

Width, Confidence Level, and Sample Size¶

The margin of error for a z-interval is:

\[E = z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}}\]

Three relationships follow:

Higher confidence → wider interval. Increasing from 95% to 99% increases $z_{\alpha/2}$ from 1.96 to 2.576, widening the interval by 31%.
Larger sample → narrower interval. The margin of error decreases as $1/\sqrt{n}$. To halve the width, you need 4 times the sample size.
Larger variance → wider interval. More variability in the population makes estimation harder.

Sample Size Determination¶

To achieve a desired margin of error $E$ at confidence level $1 - \alpha$:

\[n = \left(\frac{z_{\alpha/2} \cdot \sigma}{E}\right)^2\]

Example: To estimate a population mean within $\pm 2$ units with 95% confidence, given $\sigma = 10$:

\[n = \left(\frac{1.96 \times 10}{2}\right)^2 = 96.04 \implies n = 97\]

Common Confidence Levels¶

Confidence Level	$\alpha$	$z_{\alpha/2}$
90%	0.10	1.645
95%	0.05	1.960
99%	0.01	2.576

One-Sided Confidence Intervals (Confidence Bounds)¶

Sometimes we only need a bound in one direction:

Upper bound: $\mu \leq \bar{X} + z_\alpha \cdot \sigma/\sqrt{n}$ (with confidence $1 - \alpha$)
Lower bound: $\mu \geq \bar{X} - z_\alpha \cdot \sigma/\sqrt{n}$ (with confidence $1 - \alpha$)

Note that one-sided bounds use $z_\alpha$ (not $z_{\alpha/2}$). A 95% one-sided bound uses $z_{0.05} = 1.645$.

Financial example: A risk manager may want an upper bound on portfolio loss: "We are 95% confident that the expected loss does not exceed $X."