t-Tests¶
The t-test is the fundamental tool for comparing means when the population standard deviation is unknown. Unlike the z-test, which requires known variance, the t-test estimates variance from the sample and uses the heavier-tailed t-distribution to account for this additional uncertainty. SciPy provides three variants: one-sample, independent two-sample, and paired.
One-Sample t-Test¶
The one-sample t-test determines whether a sample mean differs significantly from a hypothesized population mean \(\mu_0\).
The hypotheses are
The test statistic is
where \(\bar{X}\) is the sample mean, \(s\) is the sample standard deviation, and \(n\) is the sample size. Under \(H_0\), the statistic follows a t-distribution with \(n - 1\) degrees of freedom:
Assumptions: the data are independent and approximately normally distributed (the test is robust to moderate non-normality for \(n \geq 30\)).
from scipy import stats
# Test whether the mean differs from 50
data = [52.1, 48.3, 51.7, 49.8, 53.2, 50.5, 47.9, 51.0]
t_stat, p_value = stats.ttest_1samp(data, popmean=50)
print(f"t-statistic: {t_stat:.4f}")
print(f"p-value: {p_value:.4f}")
Independent Two-Sample t-Test¶
The independent two-sample t-test compares the means of two unrelated groups. This is the most commonly used t-test variant.
The hypotheses are
When the two groups have equal variances (the pooled t-test), the test statistic is
where \(s_p\) is the pooled standard deviation:
Under \(H_0\), the statistic follows \(t_{n_1 + n_2 - 2}\).
Welch's t-Test¶
When the equal-variance assumption is questionable, Welch's t-test provides a more reliable alternative. The test statistic uses separate variance estimates:
The degrees of freedom are approximated by the Welch-Satterthwaite equation:
Default to Welch's Test
Welch's t-test performs well even when variances are equal, with minimal power loss. Many statisticians recommend using it by default. In SciPy, set equal_var=False to use Welch's version.
Assumptions: independent samples, each group is approximately normally distributed.
from scipy import stats
group_a = [23.1, 25.3, 24.8, 22.9, 26.1, 24.5]
group_b = [28.4, 30.1, 27.6, 29.8, 31.2, 28.9]
# Pooled t-test (assumes equal variances)
t_stat, p_value = stats.ttest_ind(group_a, group_b, equal_var=True)
print(f"Pooled t-test: t = {t_stat:.4f}, p = {p_value:.6f}")
# Welch's t-test (does not assume equal variances)
t_stat, p_value = stats.ttest_ind(group_a, group_b, equal_var=False)
print(f"Welch's t-test: t = {t_stat:.4f}, p = {p_value:.6f}")
Paired t-Test¶
The paired t-test compares means from the same subjects measured under two conditions (e.g., before and after a treatment). By analyzing within-subject differences, it controls for individual variability.
Given paired observations \((X_{1i}, X_{2i})\), define the differences \(D_i = X_{1i} - X_{2i}\). The test reduces to a one-sample t-test on the differences:
where \(\bar{D}\) and \(s_D\) are the mean and standard deviation of the differences, and \(n\) is the number of pairs. Under \(H_0\), the statistic follows \(t_{n-1}\).
Assumptions: the differences \(D_i\) are independent and approximately normally distributed. The original measurements need not be normal — only the differences must be.
from scipy import stats
before = [120, 135, 128, 142, 130, 125, 138]
after = [115, 128, 122, 138, 125, 118, 132]
t_stat, p_value = stats.ttest_rel(before, after)
print(f"Paired t-test: t = {t_stat:.4f}, p = {p_value:.4f}")
Summary of t-Test Variants¶
| Variant | SciPy Function | Use Case | Degrees of Freedom |
|---|---|---|---|
| One-sample | ttest_1samp |
Sample mean vs known value | \(n - 1\) |
| Independent (pooled) | ttest_ind(equal_var=True) |
Two independent groups, equal variance | \(n_1 + n_2 - 2\) |
| Independent (Welch) | ttest_ind(equal_var=False) |
Two independent groups, unequal variance | Welch-Satterthwaite \(\nu\) |
| Paired | ttest_rel |
Same subjects, two conditions | \(n - 1\) (number of pairs) |
Summary¶
The three t-test variants address distinct experimental designs: one-sample tests compare a single group to a reference value, independent tests compare two separate groups, and paired tests compare two measurements on the same subjects. All three share the same core structure of forming a t-ratio (signal divided by estimated noise) and comparing it to the t-distribution, differing only in how the signal, noise, and degrees of freedom are computed.
Runnable Example: statistical_tests.py¶
"""
Tutorial 04: Statistical Tests in scipy.stats
=============================================
Level: Intermediate
Topics: t-tests, chi-square tests, ANOVA, normality tests, Mann-Whitney U,
Wilcoxon, Kruskal-Wallis, and other hypothesis tests
This module covers the most common statistical hypothesis tests available
in scipy.stats for comparing means, distributions, and relationships.
"""
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
import warnings
if __name__ == "__main__":
warnings.filterwarnings('ignore')
# Set random seed for reproducibility
np.random.seed(42)
# =============================================================================
# SECTION 1: Introduction to Hypothesis Testing
# =============================================================================
"""
Hypothesis Testing Framework:
------------------------------
1. State null hypothesis (H₀) and alternative hypothesis (H₁)
2. Choose significance level (α), typically 0.05
3. Calculate test statistic from sample data
4. Determine p-value: probability of observing data as extreme as ours, assuming H₀ is true
5. Decision:
- If p-value ≤ α: Reject H₀ (result is "statistically significant")
- If p-value > α: Fail to reject H₀ (insufficient evidence)
Types of Errors:
- Type I Error (α): Rejecting H₀ when it's true (false positive)
- Type II Error (β): Failing to reject H₀ when it's false (false negative)
- Power = 1 - β: Probability of correctly rejecting false H₀
"""
print("="*80)
print("STATISTICAL HYPOTHESIS TESTS")
print("="*80)
print()
# =============================================================================
# SECTION 2: One-Sample t-Test
# =============================================================================
"""
Tests if the mean of a single sample differs from a hypothesized value.
H₀: μ = μ₀ (population mean equals hypothesized value)
H₁: μ ≠ μ₀ (two-tailed) or μ > μ₀ or μ < μ₀ (one-tailed)
Test statistic: t = (x̄ - μ₀) / (s / √n)
where x̄ is sample mean, s is sample std dev, n is sample size
Assumptions:
- Data is approximately normally distributed (or n ≥ 30 by CLT)
- Random sample
"""
print("ONE-SAMPLE t-TEST")
print("-" * 40)
# Example: Testing if average student test score differs from 75
scores = np.array([78, 82, 75, 88, 72, 90, 85, 77, 83, 79,
81, 76, 84, 80, 86, 74, 89, 73, 87, 91])
hypothesized_mean = 75
# Perform one-sample t-test
t_statistic, p_value = stats.ttest_1samp(scores, hypothesized_mean)
print(f"Sample data: n={len(scores)}, mean={np.mean(scores):.2f}, std={np.std(scores, ddof=1):.2f}")
print(f"Hypothesized mean: μ₀ = {hypothesized_mean}")
print(f"Test statistic: t = {t_statistic:.4f}")
print(f"P-value (two-tailed): {p_value:.4f}")
print()
alpha = 0.05
if p_value < alpha:
print(f"Decision: Reject H₀ (p={p_value:.4f} < α={alpha})")
print(f"Conclusion: Sample mean ({np.mean(scores):.2f}) significantly differs from {hypothesized_mean}")
else:
print(f"Decision: Fail to reject H₀ (p={p_value:.4f} ≥ α={alpha})")
print(f"Conclusion: Insufficient evidence that mean differs from {hypothesized_mean}")
print()
# Confidence interval for the mean
confidence_level = 0.95
df = len(scores) - 1
t_critical = stats.t.ppf((1 + confidence_level) / 2, df)
margin_of_error = t_critical * (np.std(scores, ddof=1) / np.sqrt(len(scores)))
ci_lower = np.mean(scores) - margin_of_error
ci_upper = np.mean(scores) + margin_of_error
print(f"{confidence_level*100:.0f}% Confidence Interval for mean: [{ci_lower:.2f}, {ci_upper:.2f}]")
print()
# =============================================================================
# SECTION 3: Two-Sample t-Test (Independent Samples)
# =============================================================================
"""
Tests if the means of two independent samples differ.
H₀: μ₁ = μ₂ (population means are equal)
H₁: μ₁ ≠ μ₂ (two-tailed) or μ₁ > μ₂ or μ₁ < μ₂ (one-tailed)
Two versions:
1. Student's t-test: assumes equal variances
2. Welch's t-test: does not assume equal variances (default in scipy)
Assumptions:
- Both samples are approximately normally distributed
- Random and independent samples
"""
print("TWO-SAMPLE t-TEST (Independent)")
print("-" * 40)
# Example: Comparing test scores between two teaching methods
method_A = np.array([78, 82, 75, 88, 72, 90, 85, 77, 83, 79])
method_B = np.array([85, 88, 91, 84, 89, 92, 87, 90, 86, 93, 88, 85])
print(f"Method A: n={len(method_A)}, mean={np.mean(method_A):.2f}, std={np.std(method_A, ddof=1):.2f}")
print(f"Method B: n={len(method_B)}, mean={np.mean(method_B):.2f}, std={np.std(method_B, ddof=1):.2f}")
print()
# Test for equal variances (Levene's test)
levene_stat, levene_p = stats.levene(method_A, method_B)
print(f"Levene's test for equal variances: p={levene_p:.4f}")
if levene_p > 0.05:
print(" → Variances are approximately equal")
equal_var = True
else:
print(" → Variances are significantly different")
equal_var = False
print()
# Perform two-sample t-test
t_stat, p_val = stats.ttest_ind(method_A, method_B, equal_var=equal_var)
print(f"Test statistic: t = {t_stat:.4f}")
print(f"P-value (two-tailed): {p_val:.4f}")
print()
if p_val < alpha:
print(f"Decision: Reject H₀ (p={p_val:.4f} < α={alpha})")
print("Conclusion: Teaching methods produce significantly different results")
else:
print(f"Decision: Fail to reject H₀ (p={p_val:.4f} ≥ α={alpha})")
print("Conclusion: No significant difference between teaching methods")
print()
# Effect size: Cohen's d
pooled_std = np.sqrt(((len(method_A)-1)*np.var(method_A, ddof=1) +
(len(method_B)-1)*np.var(method_B, ddof=1)) /
(len(method_A) + len(method_B) - 2))
cohens_d = (np.mean(method_B) - np.mean(method_A)) / pooled_std
print(f"Effect size (Cohen's d): {cohens_d:.4f}")
print(" Interpretation: |d| < 0.2 (small), 0.2-0.8 (medium), > 0.8 (large)")
print()
# =============================================================================
# SECTION 4: Paired t-Test (Dependent Samples)
# =============================================================================
"""
Tests if the mean difference between paired observations is zero.
H₀: μ_d = 0 (mean difference is zero)
H₁: μ_d ≠ 0 (mean difference is non-zero)
Used for:
- Before-after measurements
- Matched pairs
- Repeated measures
Test statistic: t = d̄ / (s_d / √n)
where d̄ is mean of differences, s_d is std dev of differences
"""
print("PAIRED t-TEST (Dependent Samples)")
print("-" * 40)
# Example: Weight before and after diet program
before = np.array([85, 92, 78, 88, 95, 82, 90, 87, 93, 80])
after = np.array([82, 89, 76, 85, 91, 80, 87, 84, 90, 78])
differences = after - before
mean_diff = np.mean(differences)
print(f"Before: mean={np.mean(before):.2f}, std={np.std(before, ddof=1):.2f}")
print(f"After: mean={np.mean(after):.2f}, std={np.std(after, ddof=1):.2f}")
print(f"Mean difference: {mean_diff:.2f} kg")
print()
# Perform paired t-test
t_paired, p_paired = stats.ttest_rel(before, after)
print(f"Test statistic: t = {t_paired:.4f}")
print(f"P-value (two-tailed): {p_paired:.4f}")
print()
if p_paired < alpha:
print(f"Decision: Reject H₀ (p={p_paired:.4f} < α={alpha})")
print("Conclusion: Diet program has significant effect on weight")
else:
print(f"Decision: Fail to reject H₀ (p={p_paired:.4f} ≥ α={alpha})")
print("Conclusion: No significant effect of diet program")
print()
# Visualize paired data
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
x = np.arange(len(before))
plt.plot(x, before, 'o-', label='Before', markersize=8)
plt.plot(x, after, 's-', label='After', markersize=8)
for i in range(len(before)):
plt.plot([i, i], [before[i], after[i]], 'k-', alpha=0.3)
plt.xlabel('Subject')
plt.ylabel('Weight (kg)')
plt.title('Paired Measurements: Before vs After')
plt.legend()
plt.grid(True, alpha=0.3)
plt.subplot(1, 2, 2)
plt.hist(differences, bins=7, edgecolor='black', alpha=0.7)
plt.axvline(0, color='red', linestyle='--', linewidth=2, label='No change')
plt.axvline(mean_diff, color='blue', linestyle='-', linewidth=2, label=f'Mean diff = {mean_diff:.2f}')
plt.xlabel('Weight Change (kg)')
plt.ylabel('Frequency')
plt.title('Distribution of Weight Changes')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('/home/claude/scipy_stats_course/04_paired_test.png', dpi=300, bbox_inches='tight')
print("Saved: 04_paired_test.png\n")
plt.close()
# =============================================================================
# SECTION 5: Chi-Square Test for Independence
# =============================================================================
"""
Tests if two categorical variables are independent.
H₀: Variables are independent
H₁: Variables are associated (dependent)
Test statistic: χ² = Σ [(O - E)² / E]
where O is observed frequency, E is expected frequency
Assumptions:
- Expected frequency ≥ 5 in at least 80% of cells
- Expected frequency ≥ 1 in all cells
"""
print("CHI-SQUARE TEST FOR INDEPENDENCE")
print("-" * 40)
# Example: Relationship between smoking and lung disease
# Rows: Smoker/Non-smoker, Columns: Disease/No Disease
observed = np.array([[45, 65], # Smokers: Disease, No Disease
[15, 95]]) # Non-smokers: Disease, No Disease
print("Contingency Table:")
print(" Disease No Disease")
print(f"Smokers {observed[0,0]:3d} {observed[0,1]:3d}")
print(f"Non-smokers {observed[1,0]:3d} {observed[1,1]:3d}")
print()
# Perform chi-square test
chi2_stat, p_chi, dof, expected = stats.chi2_contingency(observed)
print(f"Chi-square statistic: χ² = {chi2_stat:.4f}")
print(f"Degrees of freedom: {dof}")
print(f"P-value: {p_chi:.4f}")
print()
print("Expected frequencies (under independence):")
print(" Disease No Disease")
print(f"Smokers {expected[0,0]:6.2f} {expected[0,1]:6.2f}")
print(f"Non-smokers {expected[1,0]:6.2f} {expected[1,1]:6.2f}")
print()
if p_chi < alpha:
print(f"Decision: Reject H₀ (p={p_chi:.4f} < α={alpha})")
print("Conclusion: Smoking and lung disease are associated")
else:
print(f"Decision: Fail to reject H₀ (p={p_chi:.4f} ≥ α={alpha})")
print("Conclusion: No significant association found")
print()
# Cramér's V (effect size for chi-square)
n = np.sum(observed)
min_dim = min(observed.shape[0], observed.shape[1]) - 1
cramers_v = np.sqrt(chi2_stat / (n * min_dim))
print(f"Cramér's V (effect size): {cramers_v:.4f}")
print(" Interpretation: V < 0.1 (small), 0.1-0.3 (medium), > 0.3 (large)")
print()
# =============================================================================
# SECTION 6: Chi-Square Goodness-of-Fit Test
# =============================================================================
"""
Tests if observed frequencies match expected frequencies.
H₀: Data follows the specified distribution
H₁: Data does not follow the specified distribution
"""
print("CHI-SQUARE GOODNESS-OF-FIT TEST")
print("-" * 40)
# Example: Testing if a die is fair
rolls = np.array([12, 18, 15, 14, 16, 13]) # Observed frequencies
expected_freq = np.array([15, 15, 15, 15, 15, 15]) # Expected if fair (88 total rolls)
print(f"Die rolls observed: {rolls}")
print(f"Expected (fair die): {expected_freq}")
print()
# Perform goodness-of-fit test
chi2_gof, p_gof = stats.chisquare(rolls, expected_freq)
print(f"Chi-square statistic: χ² = {chi2_gof:.4f}")
print(f"P-value: {p_gof:.4f}")
print()
if p_gof < alpha:
print(f"Decision: Reject H₀ (p={p_gof:.4f} < α={alpha})")
print("Conclusion: Die is likely not fair")
else:
print(f"Decision: Fail to reject H₀ (p={p_gof:.4f} ≥ α={alpha})")
print("Conclusion: Data consistent with fair die")
print()
# =============================================================================
# SECTION 7: One-Way ANOVA (Analysis of Variance)
# =============================================================================
"""
Tests if means of three or more groups are equal.
H₀: μ₁ = μ₂ = μ₃ = ... (all group means are equal)
H₁: At least one mean differs
F-statistic: F = (Between-group variance) / (Within-group variance)
Assumptions:
- Normal distribution within each group
- Equal variances across groups (homoscedasticity)
- Independent observations
"""
print("ONE-WAY ANOVA")
print("-" * 40)
# Example: Comparing productivity across three work schedules
schedule_A = np.array([82, 85, 88, 84, 86, 83, 87, 85, 89, 84])
schedule_B = np.array([90, 92, 88, 91, 93, 89, 92, 90, 94, 91])
schedule_C = np.array([85, 87, 86, 88, 84, 86, 87, 85, 89, 86])
print(f"Schedule A: mean={np.mean(schedule_A):.2f}, std={np.std(schedule_A, ddof=1):.2f}")
print(f"Schedule B: mean={np.mean(schedule_B):.2f}, std={np.std(schedule_B, ddof=1):.2f}")
print(f"Schedule C: mean={np.mean(schedule_C):.2f}, std={np.std(schedule_C, ddof=1):.2f}")
print()
# Test for equal variances across groups
bartlett_stat, bartlett_p = stats.bartlett(schedule_A, schedule_B, schedule_C)
print(f"Bartlett's test for equal variances: p={bartlett_p:.4f}")
print()
# Perform one-way ANOVA
f_stat, p_anova = stats.f_oneway(schedule_A, schedule_B, schedule_C)
print(f"F-statistic: F = {f_stat:.4f}")
print(f"P-value: {p_anova:.4f}")
print()
if p_anova < alpha:
print(f"Decision: Reject H₀ (p={p_anova:.4f} < α={alpha})")
print("Conclusion: At least one schedule produces different results")
print(" → Follow up with post-hoc tests (e.g., Tukey HSD) to identify which groups differ")
else:
print(f"Decision: Fail to reject H₀ (p={p_anova:.4f} ≥ α={alpha})")
print("Conclusion: No significant difference among schedules")
print()
# Visualize ANOVA
plt.figure(figsize=(10, 6))
data_anova = [schedule_A, schedule_B, schedule_C]
positions = [1, 2, 3]
bp = plt.boxplot(data_anova, positions=positions, widths=0.6, patch_artist=True)
for patch in bp['boxes']:
patch.set_facecolor('lightblue')
plt.xticks(positions, ['Schedule A', 'Schedule B', 'Schedule C'])
plt.ylabel('Productivity Score')
plt.title('One-Way ANOVA: Productivity by Work Schedule')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('/home/claude/scipy_stats_course/04_anova.png', dpi=300, bbox_inches='tight')
print("Saved: 04_anova.png\n")
plt.close()
# =============================================================================
# SECTION 8: Mann-Whitney U Test (Non-parametric Alternative to t-test)
# =============================================================================
"""
Tests if two independent samples come from the same distribution.
Non-parametric alternative to independent t-test.
H₀: Distributions are equal
H₁: Distributions differ
Advantages:
- Does not assume normality
- Robust to outliers
- Works with ordinal data
Use when:
- Data is not normally distributed
- Small sample sizes
- Presence of outliers
"""
print("MANN-WHITNEY U TEST (Non-parametric)")
print("-" * 40)
# Example: Comparing reaction times (may not be normal due to outliers)
group1 = np.array([12, 15, 13, 18, 14, 16, 15, 40]) # Note outlier: 40
group2 = np.array([20, 22, 21, 24, 19, 23, 22, 21])
print(f"Group 1: median={np.median(group1):.1f}, IQR={np.percentile(group1, 75)-np.percentile(group1, 25):.1f}")
print(f"Group 2: median={np.median(group2):.1f}, IQR={np.percentile(group2, 75)-np.percentile(group2, 25):.1f}")
print()
# Perform Mann-Whitney U test
u_stat, p_mann = stats.mannwhitneyu(group1, group2, alternative='two-sided')
print(f"U-statistic: U = {u_stat:.4f}")
print(f"P-value: {p_mann:.4f}")
print()
if p_mann < alpha:
print(f"Decision: Reject H₀ (p={p_mann:.4f} < α={alpha})")
print("Conclusion: Groups have significantly different distributions")
else:
print(f"Decision: Fail to reject H₀ (p={p_mann:.4f} ≥ α={alpha})")
print("Conclusion: No significant difference in distributions")
print()
# Compare with t-test (for illustration)
t_with_outlier, p_t = stats.ttest_ind(group1, group2)
print(f"For comparison, t-test p-value: {p_t:.4f}")
print(" (t-test is affected by the outlier in group1)")
print()
# =============================================================================
# SECTION 9: Wilcoxon Signed-Rank Test (Non-parametric Paired Test)
# =============================================================================
"""
Non-parametric alternative to paired t-test.
Tests if median difference between paired samples is zero.
H₀: Median difference is zero
H₁: Median difference is non-zero
Use when:
- Paired data is not normally distributed
- Small sample sizes
- Ordinal data
"""
print("WILCOXON SIGNED-RANK TEST (Non-parametric Paired)")
print("-" * 40)
# Example: Pain scores before and after treatment (ordinal scale 1-10)
pain_before = np.array([8, 7, 9, 6, 8, 7, 9, 8, 7, 6])
pain_after = np.array([5, 6, 7, 5, 6, 4, 6, 5, 5, 4])
print(f"Before: median={np.median(pain_before):.1f}")
print(f"After: median={np.median(pain_after):.1f}")
print()
# Perform Wilcoxon signed-rank test
w_stat, p_wilcoxon = stats.wilcoxon(pain_before, pain_after)
print(f"W-statistic: W = {w_stat:.4f}")
print(f"P-value: {p_wilcoxon:.4f}")
print()
if p_wilcoxon < alpha:
print(f"Decision: Reject H₀ (p={p_wilcoxon:.4f} < α={alpha})")
print("Conclusion: Treatment significantly reduces pain")
else:
print(f"Decision: Fail to reject H₀ (p={p_wilcoxon:.4f} ≥ α={alpha})")
print("Conclusion: No significant effect of treatment")
print()
# =============================================================================
# SECTION 10: Kruskal-Wallis H Test (Non-parametric Alternative to ANOVA)
# =============================================================================
"""
Tests if three or more groups come from the same distribution.
Non-parametric alternative to one-way ANOVA.
H₀: All groups have the same distribution
H₁: At least one group differs
Use when:
- Data is not normally distributed
- Ordinal data
- Presence of outliers
"""
print("KRUSKAL-WALLIS H TEST (Non-parametric ANOVA)")
print("-" * 40)
# Example: Customer satisfaction scores (ordinal scale) across three stores
store1 = np.array([3, 4, 3, 5, 4, 3, 4, 5, 3, 4])
store2 = np.array([4, 5, 5, 5, 4, 5, 5, 4, 5, 5])
store3 = np.array([3, 3, 4, 3, 2, 3, 4, 3, 3, 4])
print(f"Store 1: median={np.median(store1):.1f}")
print(f"Store 2: median={np.median(store2):.1f}")
print(f"Store 3: median={np.median(store3):.1f}")
print()
# Perform Kruskal-Wallis H test
h_stat, p_kruskal = stats.kruskal(store1, store2, store3)
print(f"H-statistic: H = {h_stat:.4f}")
print(f"P-value: {p_kruskal:.4f}")
print()
if p_kruskal < alpha:
print(f"Decision: Reject H₀ (p={p_kruskal:.4f} < α={alpha})")
print("Conclusion: At least one store has different satisfaction distribution")
else:
print(f"Decision: Fail to reject H₀ (p={p_kruskal:.4f} ≥ α={alpha})")
print("Conclusion: No significant difference among stores")
print()
# =============================================================================
# SECTION 11: Tests for Normality
# =============================================================================
"""
Several tests to check if data is normally distributed.
Common tests:
1. Shapiro-Wilk: Good for small to medium samples (n < 5000)
2. Kolmogorov-Smirnov: Compares with reference distribution
3. Anderson-Darling: More weight to tail values
4. D'Agostino-Pearson: Based on skewness and kurtosis
"""
print("NORMALITY TESTS")
print("-" * 40)
# Generate test data
normal_data = np.random.normal(0, 1, 100)
uniform_data = np.random.uniform(-2, 2, 100)
exponential_data = np.random.exponential(2, 100)
datasets = [
("Normal", normal_data),
("Uniform", uniform_data),
("Exponential", exponential_data)
]
for name, data in datasets:
print(f"\n{name} data:")
# Shapiro-Wilk test
shapiro_stat, shapiro_p = stats.shapiro(data)
print(f" Shapiro-Wilk: W={shapiro_stat:.4f}, p={shapiro_p:.4f}", end="")
print(f" → {'Normal' if shapiro_p > alpha else 'Not normal'}")
# Kolmogorov-Smirnov test
ks_stat, ks_p = stats.kstest(data, 'norm', args=(np.mean(data), np.std(data, ddof=1)))
print(f" Kolmogorov-Smirnov: D={ks_stat:.4f}, p={ks_p:.4f}", end="")
print(f" → {'Normal' if ks_p > alpha else 'Not normal'}")
# Anderson-Darling test
anderson_result = stats.anderson(data, dist='norm')
print(f" Anderson-Darling: A²={anderson_result.statistic:.4f}")
print(f" Critical values: {anderson_result.critical_values}")
print(f" Significance levels: {anderson_result.significance_level}")
print()
# Visualize normality
fig, axes = plt.subplots(2, 3, figsize=(15, 10))
for i, (name, data) in enumerate(datasets):
# Histogram with normal curve overlay
axes[0, i].hist(data, bins=20, density=True, alpha=0.7, edgecolor='black')
x_range = np.linspace(data.min(), data.max(), 100)
axes[0, i].plot(x_range, stats.norm.pdf(x_range, np.mean(data), np.std(data, ddof=1)),
'r-', linewidth=2, label='Normal fit')
axes[0, i].set_title(f'{name} Data - Histogram')
axes[0, i].legend()
axes[0, i].grid(True, alpha=0.3)
# Q-Q plot
stats.probplot(data, dist="norm", plot=axes[1, i])
axes[1, i].set_title(f'{name} Data - Q-Q Plot')
axes[1, i].grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('/home/claude/scipy_stats_course/04_normality_tests.png', dpi=300, bbox_inches='tight')
print("Saved: 04_normality_tests.png\n")
plt.close()
# =============================================================================
# SECTION 12: Summary Table
# =============================================================================
print("="*80)
print("SUMMARY: When to Use Each Test")
print("="*80)
test_guide = """
Test Type Use When Assumptions
--------------------------------------------------------------------------------
One-sample t-test Compare sample mean to value Normal or n≥30
Two-sample t-test Compare two independent means Normal, independent
Paired t-test Compare paired measurements Normal differences
Chi-square test Test categorical associations Expected freq ≥ 5
Goodness-of-fit Test distribution fit Expected freq ≥ 5
One-way ANOVA Compare 3+ independent means Normal, equal var
Mann-Whitney U Compare 2 independent (non-param) None (ordinal OK)
Wilcoxon signed-rank Compare paired (non-parametric) None (ordinal OK)
Kruskal-Wallis Compare 3+ groups (non-parametric) None (ordinal OK)
Shapiro-Wilk Test normality n < 5000
"""
print(test_guide)
print("\n" + "="*80)
print("Tutorial 04 Complete!")
print("="*80)
print("\nKey Takeaways:")
print("1. Choose parametric tests (t-tests, ANOVA) when assumptions are met")
print("2. Use non-parametric tests when data is not normal or has outliers")
print("3. Always check test assumptions before proceeding")
print("4. P-value indicates strength of evidence against null hypothesis")
print("5. Statistical significance (p<0.05) doesn't always mean practical importance")
print("6. Report effect sizes along with p-values for complete picture")
print("7. Multiple comparisons require adjustment (e.g., Bonferroni correction)")