Power Analysis¶
Before collecting data, a researcher needs to know how many observations are required to detect an effect of practical interest. An underpowered study wastes resources because it is unlikely to find a real effect even if one exists, while an overpowered study collects more data than necessary. Power analysis answers this question by quantifying the relationship between four interrelated quantities: significance level, effect size, sample size, and statistical power.
Statistical Power¶
The power of a test is the probability of correctly rejecting \(H_0\) when \(H_1\) is true:
where \(\beta\) is the Type II error rate (probability of failing to reject a false \(H_0\)). A conventional target is power \(\ge 0.80\), meaning the study has at least an 80% chance of detecting the effect if it exists.
The Four-Way Relationship¶
Power analysis connects four quantities, any one of which can be solved from the other three:
| Quantity | Symbol | Typical Value |
|---|---|---|
| Significance level | \(\alpha\) | 0.05 |
| Power | \(1 - \beta\) | 0.80 |
| Effect size | \(d\), \(f\), \(r\), etc. | Varies by context |
| Sample size | \(n\) | Solved for |
from statsmodels.stats.power import tt_solve_power, TTestIndPower
# Solve for sample size given the other three
n = tt_solve_power(effect_size=0.5, alpha=0.05, power=0.80,
alternative='two-sided')
print(f"Required n per group (d=0.5): {n:.0f}")
# Solve for power given sample size
power = tt_solve_power(effect_size=0.5, alpha=0.05, nobs=30,
alternative='two-sided')
print(f"Power with n=30 (d=0.5): {power:.3f}")
# Solve for detectable effect size
d = tt_solve_power(nobs=30, alpha=0.05, power=0.80,
alternative='two-sided')
print(f"Detectable effect size (n=30): {d:.3f}")
Effect Size Conventions¶
Cohen's \(d\) provides standardized benchmarks for the magnitude of a mean difference:
| Size | \(d\) | Example |
|---|---|---|
| Small | 0.2 | Subtle difference, requires large \(n\) to detect |
| Medium | 0.5 | Visible to careful observation |
| Large | 0.8 | Obvious difference, detectable with small \(n\) |
Cohen's \(d\) is defined as
where \(\sigma\) is the common population standard deviation.
import numpy as np
# Cohen's d from raw data
def cohens_d(group1, group2):
"""Compute Cohen's d for independent samples."""
n1, n2 = len(group1), len(group2)
s1, s2 = np.std(group1, ddof=1), np.std(group2, ddof=1)
s_pooled = np.sqrt(((n1 - 1) * s1**2 + (n2 - 1) * s2**2) / (n1 + n2 - 2))
return (np.mean(group1) - np.mean(group2)) / s_pooled
np.random.seed(42)
g1 = np.random.normal(100, 15, 30)
g2 = np.random.normal(108, 15, 30)
d = cohens_d(g1, g2)
print(f"Cohen's d: {d:.3f}")
Sample Size Determination¶
The required sample size for a two-sample \(t\)-test grows as the effect size shrinks. For a two-sided test at level \(\alpha\) with power \(1 - \beta\), the approximate sample size per group is
where \(z_{\alpha/2}\) and \(z_\beta\) are the standard normal quantiles corresponding to \(\alpha/2\) and \(\beta\).
from scipy import stats
# Approximate formula
alpha = 0.05
power_target = 0.80
d = 0.5
z_alpha = stats.norm.ppf(1 - alpha / 2)
z_beta = stats.norm.ppf(power_target)
n_approx = 2 * (z_alpha + z_beta)**2 / d**2
print(f"Approximate n per group: {n_approx:.0f}")
# Exact calculation via statsmodels
n_exact = tt_solve_power(effect_size=d, alpha=alpha, power=power_target,
alternative='two-sided')
print(f"Exact n per group: {n_exact:.0f}")
Power Curves¶
A power curve shows how power changes as a function of one parameter while holding the others fixed. This visualization helps researchers understand the trade-offs in study design.
import numpy as np
from statsmodels.stats.power import TTestIndPower
analysis = TTestIndPower()
# Power as a function of sample size for different effect sizes
sample_sizes = np.arange(10, 201, 5)
effect_sizes = [0.2, 0.5, 0.8]
for d in effect_sizes:
powers = [analysis.power(d, n, 0.05) for n in sample_sizes]
print(f"d={d}: n=20 → power={analysis.power(d, 20, 0.05):.2f}, "
f"n=100 → power={analysis.power(d, 100, 0.05):.2f}")
Power for Other Tests¶
Paired t-test¶
The paired design uses the effect size \(d_z = \mu_D / \sigma_D\), where \(\mu_D\) and \(\sigma_D\) are the mean and standard deviation of the paired differences.
from statsmodels.stats.power import tt_solve_power
# Paired t-test power (uses one-sample power with d_z)
n_paired = tt_solve_power(effect_size=0.5, alpha=0.05, power=0.80,
alternative='two-sided')
print(f"Required n (paired, d_z=0.5): {n_paired:.0f}")
ANOVA (F-test)¶
For one-way ANOVA comparing \(k\) groups, the effect size is Cohen's \(f\):
from statsmodels.stats.power import FTestAnovaPower
anova_power = FTestAnovaPower()
n_anova = anova_power.solve_power(effect_size=0.25, alpha=0.05, power=0.80,
k_groups=3)
print(f"Required n per group (ANOVA, f=0.25, k=3): {n_anova:.0f}")
Chi-Square Test¶
For chi-square tests, the effect size is Cohen's \(w\):
from statsmodels.stats.power import GofChisquarePower
chi2_power = GofChisquarePower()
n_chi2 = chi2_power.solve_power(effect_size=0.3, alpha=0.05, power=0.80,
n_bins=4)
print(f"Required n (chi-square, w=0.3): {n_chi2:.0f}")
Post-Hoc Power Analysis¶
Observed Power is Uninformative
Computing power using the observed effect size after data collection (post-hoc or retrospective power) is widely discouraged. Observed power is a monotone function of the p-value and adds no information beyond what the p-value already provides. Power analysis is meaningful only when conducted before data collection, using a clinically or practically meaningful effect size.
Summary¶
Power analysis determines the sample size needed to detect an effect of a specified magnitude with a given probability. The four key quantities — significance level \(\alpha\), power \(1 - \beta\), effect size, and sample size — are interrelated so that fixing any three determines the fourth. Conventional targets are \(\alpha = 0.05\) and power \(= 0.80\). Power curves visualize these trade-offs and help researchers choose an appropriate design. Power analysis should always be conducted before data collection, using a scientifically meaningful effect size rather than the observed effect.