Two-Sample Tests¶
1. Two Sample z Test¶
The two sample z-test determines whether the means of two independent samples differ significantly, given that the population variances are known. It applies when comparing two groups under different conditions and assumes normally distributed, independent samples.
A. Hypothesis¶
- Null: \(H_0: \mu_1 = \mu_2\)
- Alternative:
- Two-tailed: \(H_a: \mu_1 \neq \mu_2\)
- One-tailed (greater): \(H_a: \mu_1 > \mu_2\)
- One-tailed (less): \(H_a: \mu_1 < \mu_2\)
B. Test Statistic¶
Under \(H_0\) (where \(\mu_1 - \mu_2 = 0\)), and when using sample standard deviations for large samples:
C. Decision Rule¶
- Two-tailed: Reject \(H_0\) if \(|z| > z_{\alpha/2}\).
- One-tailed (greater): Reject \(H_0\) if \(z > z_{\alpha}\).
- One-tailed (less): Reject \(H_0\) if \(z < -z_{\alpha}\).
D. P-value¶
- Two-tailed: \(p\text{-value} = 2P(Z \geq |z|)\)
- One-tailed (greater): \(p\text{-value} = P(Z \geq z)\)
- One-tailed (less): \(p\text{-value} = P(Z \leq z)\)
E. Example¶
import numpy as np
from scipy import stats
n_f, n_s = 100, 100
x_bar_f, x_bar_s = 1.85, 1.65
s_f, s_s = 1.3, 1.2
statistic = (x_bar_f - x_bar_s) / np.sqrt(s_f**2/n_f + s_s**2/n_s)
p_value = stats.norm().sf(abs(statistic)) * 2
print(f"statistic : {statistic:.4f}")
print(f"p value : {p_value:.4f}")
2. Two Sample t Test¶
The two sample t-test (independent samples t-test) determines whether the means of two independent groups differ significantly. It is used when the population variances are unknown and assumed to be equal.
A. Hypothesis¶
- Null: \(H_0: \mu_1 = \mu_2\)
- Alternative:
- Two-tailed: \(H_a: \mu_1 \neq \mu_2\)
- One-tailed (greater): \(H_a: \mu_1 > \mu_2\)
- One-tailed (less): \(H_a: \mu_1 < \mu_2\)
B. Test Statistic (Pooled Variance)¶
where the pooled standard deviation is:
This statistic follows a t-distribution with \(n_1 + n_2 - 2\) degrees of freedom.
C. Decision Rule¶
- Two-tailed: Reject \(H_0\) if \(|t| > t_{\alpha/2, n_1+n_2-2}\).
- One-tailed (greater): Reject \(H_0\) if \(t > t_{\alpha, n_1+n_2-2}\).
- One-tailed (less): Reject \(H_0\) if \(t < -t_{\alpha, n_1+n_2-2}\).
D. Examples¶
Example: Gender Bias in Salary¶
Market researchers compare average salaries for male vs female managers.
Example: Tomatoes from Two Different Fields¶
| Field A | Field B | |
|---|---|---|
| Mean | 1.3m | 1.6m |
| Std Dev | 0.5m | 0.3m |
| n | 22 | 24 |
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
X_1_bar, X_2_bar = 1.3, 1.6
s_1, s_2 = 0.5, 0.3
n_1, n_2 = 22, 24
# Welch's approach (unequal variances)
statistic = (X_1_bar - X_2_bar) / np.sqrt(s_1**2 / n_1 + s_2**2 / n_2)
# Welch-Satterthwaite degrees of freedom
top = (s_1**2 / n_1 + s_2**2 / n_2)**2
bottom = (s_1**2 / n_1)**2 / (n_1 - 1) + (s_2**2 / n_2)**2 / (n_2 - 1)
df = top / bottom
p_value = 2 * stats.t(df).cdf(-abs(statistic))
print(f"{df = :.4f}")
print(f"{statistic = :.4f}")
print(f"{p_value = :.4f}")
alpha = 0.05
if p_value <= alpha:
print("Reject H_0")
else:
print("Fail to reject H_0")
Example: Number of Babies (France vs Switzerland)¶
| France | Switzerland | |
|---|---|---|
| Mean | 1.85 | 1.65 |
| Std Dev | 1.3 | 1.2 |
| n | 100 | 100 |
Using pooled variance:
X_1_bar, X_2_bar = 1.3, 1.6
s_1, s_2 = 0.5, 0.3
n_1, n_2 = 22, 24
s_p_square = ((n_1 - 1) * s_1**2 + (n_2 - 1) * s_2**2) / (n_1 + n_2 - 2)
statistic = (X_1_bar - X_2_bar) / np.sqrt(s_p_square / n_1 + s_p_square / n_2)
df = n_1 + n_2 - 2
p_value = 2 * stats.t(df).cdf(-abs(statistic))
print(f"{df = :.4f}")
print(f"{statistic = :.4f}")
print(f"{p_value = :.4f}")
Example: Two Varieties of Pears (Bosc and Anjou)¶
| Bosc | Anjou | |
|---|---|---|
| Mean | 120 | 116 |
| Std Dev | 15 | 13 |
| n | 65 | 65 |
The 99% confidence interval for \(\mu_{\text{Bosc}} - \mu_{\text{Anjou}}\) is \(4 \pm 6.44\), i.e., \((-2.44, 10.44)\). Since the confidence interval contains 0, we fail to reject \(H_0\) at \(\alpha = 0.01\).
3. Welch's t Test¶
Welch's t-test is a robust adaptation of the standard two-sample t-test that accounts for unequal variances and potentially unequal sample sizes.
Formula¶
The degrees of freedom are approximated using the Welch-Satterthwaite equation:
When to Use¶
- When the two groups have noticeably different variances.
- When the sample sizes between the two groups differ significantly.
- When population variances are unknown.
Python Implementation¶
import numpy as np
from scipy.stats import ttest_ind
team_a = [120, 118, 125, 130, 115, 122, 121, 119, 117, 123, 124, 126, 127, 118, 116]
team_b = [135, 132, 137, 140, 136, 130, 134, 138, 139, 133, 131, 142, 141,
129, 128, 135, 137, 136, 134, 132]
stat, p_value = ttest_ind(team_a, team_b, equal_var=False)
print(f"Test Statistic: {stat:.4f}")
print(f"P-value: {p_value:.4f}")
alpha = 0.05
if p_value < alpha:
print("Reject H0: The means are significantly different.")
else:
print("Fail to reject H0.")
Comparison to Standard Two-Sample t-Test¶
| Feature | Standard t-Test | Welch's t-Test |
|---|---|---|
| Variance assumption | Equal variances | No equal variance assumption |
| Sample sizes | Similar sizes assumed | Handles unequal sizes |
| Degrees of freedom | Fixed: \(n_1 + n_2 - 2\) | Approximated via Welch-Satterthwaite |
4. Two Sample Proportion Test¶
The two sample proportion test determines if there is a significant difference between the proportions of two independent groups based on a binary outcome.
A. Hypothesis¶
- Null: \(H_0: p_1 = p_2\)
- Alternative:
- Two-tailed: \(H_a: p_1 \neq p_2\)
- One-tailed (greater): \(H_a: p_1 > p_2\)
- One-tailed (less): \(H_a: p_1 < p_2\)
B. Test Statistic¶
The pooled proportion:
The test statistic:
C. Examples¶
Example: Support for New Law¶
| District A | District B | Total | |
|---|---|---|---|
| Yes | 58 | 52 | 110 |
| No | 42 | 48 | 90 |
import numpy as np
from scipy import stats
positive_A, positive_B = 58, 52
n_A, n_B = 100, 100
p_hat_A, p_hat_B = positive_A / n_A, positive_B / n_B
p_pooled = (positive_A + positive_B) / (n_A + n_B)
statistic = (p_hat_A - p_hat_B) / (np.sqrt(p_pooled * (1 - p_pooled)) * np.sqrt(1/n_A + 1/n_B))
p_value = stats.norm().sf(abs(statistic)) * 2
print(f"{statistic = :.4f}")
print(f"{p_value = :.4f}")
alpha = 0.05
if p_value <= alpha:
print("Reject H_0")
else:
print("Fail to reject H_0")
Example: Derrick's Approval Rate¶
Derrick tests whether the prime minister's approval is lower in December vs November.
Example: Dime and Nickel¶
Kiley tests if a dime and nickel have the same likelihood of showing heads.
Example: Myopia¶
Researchers test whether myopia prevalence increased from 2000 to 2015. In 2000: 132/400 positive. In 2015: 228/600 positive.
n_2000, n_2015 = 400, 600
positive_2000, positive_2015 = 132, 228
p_hat_2000, p_hat_2015 = positive_2000 / n_2000, positive_2015 / n_2015
p_pooled = (positive_2000 + positive_2015) / (n_2000 + n_2015)
statistic = (p_hat_2000 - p_hat_2015) / (np.sqrt(p_pooled * (1 - p_pooled)) * np.sqrt(1/n_2000 + 1/n_2015))
p_value = stats.norm().cdf(statistic)
print(f"{statistic = :.4f}")
print(f"{p_value = :.4f}")
alpha = 0.05
if p_value <= alpha:
print("Reject H_0: significant increase in myopia")
else:
print("Fail to reject H_0")
Example: Cat Disease¶
Veterinarians test \(H_0: p_{\text{male}} = p_{\text{female}}\) vs \(H_1: p_{\text{male}} > p_{\text{female}}\) with 24/259 male cats and 14/241 female cats affected.
positive_male, positive_female = 24, 14
n_male, n_female = 259, 241
p_hat_male, p_hat_female = positive_male / n_male, positive_female / n_female
p_pooled = (positive_male + positive_female) / (n_male + n_female)
statistic = (p_hat_male - p_hat_female) / (np.sqrt(p_pooled * (1 - p_pooled)) * np.sqrt(1/n_male + 1/n_female))
p_value = stats.norm().sf(statistic)
print(f"{statistic = :.4f}")
print(f"{p_value = :.4f}")
Example: In-person vs Online Classes¶
A 95% confidence interval for \(p_{\text{in\_person}} - p_{\text{online}}\) is \((-0.04, 0.14)\). Since the interval contains 0, we fail to reject \(H_0: p_{\text{in\_person}} = p_{\text{online}}\).
5. Mann-Whitney U Test (Wilcoxon Rank-Sum Test)¶
The Mann-Whitney U test is a non-parametric test used to compare the distributions of two independent groups. It is useful when the assumptions of a parametric test are not met (e.g., non-normality or ordinal data).
Key Features¶
- Tests whether the distributions of two independent groups are the same.
- Assumptions: independent groups, ordinal/interval/ratio data, random samples.
- Null: The two groups have the same distribution.
- Alternative: The distributions differ, or one group tends to have higher values.
How the Test Works¶
- Combine all data and assign ranks (average tied ranks).
- Calculate rank sums \(R_1\) and \(R_2\).
- Compute U-statistics:
- \(U_1 = n_1 n_2 + \frac{n_1(n_1+1)}{2} - R_1\)
- \(U_2 = n_1 n_2 + \frac{n_2(n_2+1)}{2} - R_2\)
- Test statistic: \(U = \min(U_1, U_2)\).
- For large samples (\(n_1, n_2 > 20\)), use normal approximation with Z-score.
Interpretation¶
- If \(p < 0.05\): Reject \(H_0\); the two groups have significantly different distributions.
- If \(p\) is large: Fail to reject \(H_0\).
Which Group is Larger?¶
If \(H_0\) is rejected, compare mean ranks of the two groups. A higher mean rank indicates that group tends to have larger values.
Note¶
The Mann-Whitney U test and Wilcoxon rank-sum test are statistically equivalent. The terminology varies by software (e.g., "Mann-Whitney U" in SPSS, "Wilcoxon rank-sum" in R).