Test Selection Guide¶
Choosing the right statistical test depends on the research question, the type of data, and whether the underlying assumptions are met. A mismatched test can produce misleading p-values and invalid conclusions. This guide provides a structured approach to selecting among the tests covered in this chapter.
Decision Criteria¶
Before selecting a test, answer these four questions:
- What is the research question? Are you comparing group means, testing distributional fit, or assessing association between categorical variables?
- How many groups or samples? One sample, two samples (paired or independent), or three or more?
- What type of data? Continuous (interval/ratio) or categorical (nominal/ordinal)?
- Are parametric assumptions met? Specifically, are the data approximately normal and do the groups have equal variances?
Comparing Means¶
| Scenario | Parametric Test | Non-Parametric Alternative |
|---|---|---|
| One sample vs known mean | One-sample t-test (ttest_1samp) |
Wilcoxon signed-rank (wilcoxon) |
| Two independent samples | Independent t-test (ttest_ind) |
Mann-Whitney U (mannwhitneyu) |
| Two paired samples | Paired t-test (ttest_rel) |
Wilcoxon signed-rank (wilcoxon) |
| Three or more independent groups | One-way ANOVA (f_oneway) |
Kruskal-Wallis (kruskal) |
Use the parametric test when the normality assumption holds (check with Shapiro-Wilk or QQ plots) and variances are approximately equal (check with Levene's test). Switch to the non-parametric alternative when these assumptions are violated or when sample sizes are small.
Testing Distributional Fit¶
| Question | Test | SciPy Function |
|---|---|---|
| Does data follow a specific distribution? | Kolmogorov-Smirnov | kstest |
| Does data follow a specific distribution (tail-sensitive)? | Anderson-Darling | anderson |
| Do categorical counts match expected frequencies? | Chi-square goodness-of-fit | chisquare |
| Is data normally distributed? | Shapiro-Wilk | shapiro |
| Is data normally distributed (skew/kurtosis)? | D'Agostino-Pearson | normaltest |
Goodness-of-Fit vs Normality Tests
Normality tests (Shapiro-Wilk, D'Agostino-Pearson) are specialized goodness-of-fit tests designed exclusively for the normal distribution. For testing fit to other distributions (exponential, uniform, etc.), use the KS or Anderson-Darling test with the appropriate reference distribution.
Categorical Data¶
| Question | Test | SciPy Function |
|---|---|---|
| Do observed counts match expected proportions? | Chi-square goodness-of-fit | chisquare |
| Are two categorical variables independent? | Chi-square test of independence | chi2_contingency |
Variance Assumptions¶
Before running parametric tests that assume equal variances, verify this assumption:
| Test | SciPy Function | Assumption |
|---|---|---|
| Levene's test | levene |
Robust to non-normality |
| Bartlett's test | bartletttest |
Requires normality |
If equal variances are rejected, use Welch's t-test (ttest_ind with equal_var=False) for two groups, or Welch's ANOVA for multiple groups.
Decision Flowchart¶
graph TD
A["What type of data?"] -->|Continuous| B["How many groups?"]
A -->|Categorical| C["Chi-square tests"]
B -->|1 group| D["One-sample t-test"]
B -->|2 groups| E["Paired or independent?"]
B -->|3+ groups| F["Normal + equal variance?"]
E -->|Paired| G["Normal differences?"]
E -->|Independent| H["Normal + equal variance?"]
G -->|Yes| I["Paired t-test"]
G -->|No| J["Wilcoxon signed-rank"]
H -->|Yes| K["Independent t-test"]
H -->|No| L["Mann-Whitney U"]
F -->|Yes| M["One-way ANOVA"]
F -->|No| N["Kruskal-Wallis"]
Summary¶
Test selection follows a systematic path from research question to appropriate method. The primary branching points are data type (continuous vs categorical), number of groups, sample pairing, and whether parametric assumptions hold. When in doubt about normality or equal variances, non-parametric tests provide a safer alternative at the cost of some statistical power.
Runnable Example: solutions_tests.py¶
"""
Solutions 03: Statistical Hypothesis Tests
==========================================
Detailed solutions with interpretations.
"""
import numpy as np
from scipy import stats
# =============================================================================
# Main
# =============================================================================
if __name__ == "__main__":
print("="*80)
print("SOLUTIONS: STATISTICAL TESTS")
print("="*80)
print()
# Solution 1
print("Solution 1: Comparing Two Teaching Methods")
print("-" * 40)
method_A = np.array([78, 82, 75, 88, 72, 90, 85, 77, 83, 79])
method_B = np.array([85, 88, 91, 84, 89, 92, 87, 90, 86, 93])
print("Step 1: State hypotheses")
print(" H₀: μ_A = μ_B (no difference)")
print(" H₁: μ_A ≠ μ_B (significant difference)\n")
# Part a: t-test
t_stat, p_value = stats.ttest_ind(method_A, method_B)
print(f"a) Two-sample t-test:")
print(f" t-statistic = {t_stat:.4f}")
print(f" p-value = {p_value:.4f}")
alpha = 0.05
if p_value < alpha:
print(f" Decision: Reject H₀ (p < {alpha})")
print(f" Conclusion: Method B produces significantly higher scores\n")
else:
print(f" Decision: Fail to reject H₀\n")
# Part b: Effect size
mean_diff = np.mean(method_B) - np.mean(method_A)
pooled_std = np.sqrt(((len(method_A)-1)*np.var(method_A, ddof=1) +
(len(method_B)-1)*np.var(method_B, ddof=1)) /
(len(method_A) + len(method_B) - 2))
cohens_d = mean_diff / pooled_std
print(f"b) Cohen\'s d = {cohens_d:.4f}")
if abs(cohens_d) > 0.8:
print(f" Interpretation: Large effect size\n")
# Part c: Confidence interval
se_diff = pooled_std * np.sqrt(1/len(method_A) + 1/len(method_B))
df = len(method_A) + len(method_B) - 2
t_critical = stats.t.ppf(0.975, df)
margin = t_critical * se_diff
print(f"c) 95% CI for difference: [{mean_diff - margin:.2f}, {mean_diff + margin:.2f}]")
print(f" We are 95% confident the true difference is in this range\n")
# Detailed solutions continue...
print("="*80)