CI for μ_D (Mean of Differences)¶
Paired-Sample Confidence Interval¶
When two related measurements are taken on the same subjects — such as pre-test and post-test scores, or before-and-after measurements — we use paired data to analyze the difference between the two measurements. The paired difference confidence interval estimates the mean difference \(\mu_d\) between these two related measurements.
Formula¶
Let \(d_i = X_{i,1} - X_{i,2}\) represent the difference between the two measurements for the \(i\)-th subject. The confidence interval for the mean of the paired differences is
where
- \(\bar{d}\) is the mean of the paired differences,
- \(s_d\) is the standard deviation of the paired differences,
- \(n\) is the number of paired observations,
- \(\alpha\) is the significance level (\(\text{significance level} = 1 - \text{confidence level}\)),
- \(t_{\alpha/2, \, n-1}\) is the critical value from the \(t\)-distribution with \(n-1\) degrees of freedom.
The formula is essentially the same as for a confidence interval for a single mean, but applied to the differences between pairs of measurements.
Conditions for Validity¶
For large \(n\) (\(\ge 30\)), the normality condition is less strict due to the Central Limit Theorem. The z-interval variants apply analogously:
Python Code¶
import numpy as np
import scipy.stats as stats
differences = np.array([5, 3, 4, -2, 0, 6, -1, 2, 3, 4])
n = len(differences)
confidence_level = 0.95
mean_diff = np.mean(differences)
std_diff = np.std(differences, ddof=1)
standard_error = std_diff / np.sqrt(n)
t_critical = stats.t.ppf(1 - (1 - confidence_level) / 2, n - 1)
margin_of_error = t_critical * standard_error
confidence_interval = (mean_diff - margin_of_error, mean_diff + margin_of_error)
print(f"{confidence_interval = }")
Examples¶
Example 1: Blood Pressure Before and After Treatment¶
A researcher measures the blood pressure of 10 patients before and after a treatment. The differences (before minus after) are:
Construct a 95% confidence interval for the mean difference in blood pressure.
Solution.
We are 95% confident that the true mean difference in blood pressure after the treatment is between 0.848 and 3.952.
Example 2: Finger-Snapping Speed¶
Each of 5 participants snapped with both dominant and non-dominant hands for 10 seconds. The order was randomized by coin toss.
| Participant | Dominant | Non-Dominant | Difference |
|---|---|---|---|
| Jeff | 44 | 35 | 9 |
| David | 42 | 37 | 5 |
| Kim | 40 | 32 | 8 |
| Charlotte | 37 | 31 | 6 |
| Jake | 42 | 36 | 6 |
Construct and interpret a 95% confidence interval for the mean difference.
Solution.
We are 95% confident that the true mean difference in snaps between the dominant and non-dominant hands lies within \((4.76, 8.84)\).
import numpy as np
from scipy import stats
differences = np.array([9, 5, 8, 6, 6])
n = len(differences)
mean_diff = np.mean(differences)
std_diff = differences.std(ddof=1)
standard_error = std_diff / np.sqrt(n)
df = n - 1
confidence_level = 0.95
alpha = 1 - confidence_level
t_critical = stats.t(df).ppf(1 - alpha / 2)
margin_of_error = t_critical * standard_error
print(f"95% CI: ({mean_diff - margin_of_error:.2f}, {mean_diff + margin_of_error:.2f})")
print(f"95% CI: {mean_diff:.2f} ± {margin_of_error:.2f}")
Example 3: Two Watches (Four Steps)¶
A running magazine reviewed watches A and B that use GPS to measure distance. Five runners each wore both watches simultaneously on a 10-km route.
| Runner | Watch A | Watch B | Difference (A−B) |
|---|---|---|---|
| 1 | 9.8 | 10.1 | −0.3 |
| 2 | 9.8 | 10.0 | −0.2 |
| 3 | 10.1 | 10.2 | −0.1 |
| 4 | 10.1 | 9.9 | 0.2 |
| 5 | 10.2 | 10.1 | 0.1 |
Construct and interpret a 95% confidence interval for the mean difference.
Step 1: Calculate Differences. \(d = [-0.3, -0.2, -0.1, 0.2, 0.1]\).
Step 2: Check Conditions.
- Simple Random Sample: Satisfied (magazine selected random subscribers).
- Independence: Satisfied (at least 50 subscribers in population).
- Normal Population Distribution: Since \(n = 5\) is small, we check: the differences are symmetric with no outliers, so proceeding is safe.
Step 3: Construct Interval.
import numpy as np
from scipy import stats
watch_A = np.array([9.8, 9.8, 10.1, 10.1, 10.2])
watch_B = np.array([10.1, 10, 10.2, 9.9, 10.1])
d = watch_A - watch_B
d_bar = d.mean()
s = d.std(ddof=1)
n = d.shape[0]
df = n - 1
confidence_level = 0.95
alpha = 1 - confidence_level
t_star = stats.t(df).ppf(1 - alpha / 2)
margin_of_error = t_star * s / np.sqrt(n)
print(f"{confidence_level:.0%} CI: {d_bar:.4f} ± {margin_of_error:.4f}")
Step 4: Interpret Interval. With 95% confidence, the mean difference between the distances reported by the watches is likely to fall within the interval \((-0.32, 0.20)\) km. The interval includes zero, so there is no significant difference between the distances Watch A and Watch B reported.
Simulation: Paired Mean CI Coverage¶
#!/usr/bin/env python3
"""
Paired-sample mean CI simulation: t, z_known, z_plugin.
"""
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import t, norm
rng_seed = None
n_simulations = 100
n = 12
mu_x, mu_y = 0.5, 0.0
sigma_x, sigma_y = 1.0, 1.2
rho = 0.6
alpha = 0.05
method = "t" # 't' | 'z_known' | 'z_plugin'
def main():
rng = np.random.default_rng(rng_seed)
delta_true = mu_x - mu_y
var_d_true = sigma_x**2 + sigma_y**2 - 2 * rho * sigma_x * sigma_y
sigma_d_true = np.sqrt(max(var_d_true, 0.0))
cov = rho * sigma_x * sigma_y
Sigma = np.array([[sigma_x**2, cov], [cov, sigma_y**2]])
L = np.linalg.cholesky(Sigma)
df = n - 1
t_star = t.ppf(1 - alpha / 2.0, df=df)
z_star = norm.ppf(1 - alpha / 2.0)
lowers = np.empty(n_simulations)
uppers = np.empty(n_simulations)
centers = np.empty(n_simulations)
for i in range(n_simulations):
z = rng.standard_normal(size=(2, n))
xy = (L @ z).T
x = xy[:, 0] + mu_x
y = xy[:, 1] + mu_y
d = x - y
dbar = d.mean()
s_d = d.std(ddof=1)
if method == "t":
se, crit = s_d / np.sqrt(n), t_star
elif method == "z_known":
se, crit = sigma_d_true / np.sqrt(n), z_star
else:
se, crit = s_d / np.sqrt(n), z_star
lowers[i] = dbar - crit * se
uppers[i] = dbar + crit * se
centers[i] = dbar
covered = (lowers <= delta_true) & (delta_true <= uppers)
n_fail = int((~covered).sum())
coverage_pct = 100.0 * covered.mean()
fig, ax = plt.subplots(figsize=(12, 12))
for i in range(n_simulations):
color = "k" if covered[i] else "r"
ax.plot([lowers[i], uppers[i]], [i, i], lw=2, color=color)
ax.plot(centers[i], i, marker="o", ms=3, color=color)
ax.axvline(delta_true, linestyle="--", linewidth=1.5, color="r")
ax.set_title(
f"{n_simulations} Paired {method} CIs for μ_D | n={n}, ρ={rho:.2f}, "
f"CL={int((1 - alpha) * 100)}% | Fail={n_fail} (Coverage ≈ {coverage_pct:.1f}%)")
ax.set_yticks([])
for sp in ["left", "right", "top"]:
ax.spines[sp].set_visible(False)
ax.set_xlabel("μ_D = μ_X − μ_Y")
plt.tight_layout()
plt.show()
if __name__ == "__main__":
main()
Exercise¶
Exercise: 99% CI for Mean Difference (Paired Data)¶
A sample of 10 pairs yields \(\bar{d} = 4.5\) and \(s_d = 2.0\). Construct a 99% CI for the mean difference. Assume the difference distribution is normal.
Solution. With \(df = 9\) and 99% confidence, \(t_{0.005, 9} \approx 3.2498\).
import numpy as np
from scipy import stats
confidence_level = 0.99
alpha = 1 - confidence_level
n = 10
df = n - 1
t_star = stats.t(df).ppf(1 - alpha / 2)
d_bar = 4.5
s = 2
left = d_bar - t_star * s / np.sqrt(n)
right = d_bar + t_star * s / np.sqrt(n)
print(f"({left:.3f}, {right:.3f})")