Power Analysis¶
Before collecting data, a researcher needs to know how many observations are required to detect an effect of practical interest. An underpowered study wastes resources because it is unlikely to find a real effect even if one exists, while an overpowered study collects more data than necessary. Power analysis answers this question by quantifying the relationship between four interrelated quantities: significance level, effect size, sample size, and statistical power.
Mental Model
Power analysis is the study design equation: significance level, effect size, sample size, and power are four knobs connected by one constraint. Fix any three and the fourth is determined. In practice, you fix \(\alpha = 0.05\), target power \(= 0.80\), estimate the effect size from prior work, and solve for the sample size you need.
Statistical Power¶
The power of a test is the probability of correctly rejecting \(H_0\) when \(H_1\) is true:
where \(\beta\) is the Type II error rate (probability of failing to reject a false \(H_0\)). A conventional target is power \(\ge 0.80\), meaning the study has at least an 80% chance of detecting the effect if it exists.
The Four-Way Relationship¶
Power analysis connects four quantities, any one of which can be solved from the other three:
| Quantity | Symbol | Typical Value |
|---|---|---|
| Significance level | \(\alpha\) | 0.05 |
| Power | \(1 - \beta\) | 0.80 |
| Effect size | \(d\), \(f\), \(r\), etc. | Varies by context |
| Sample size | \(n\) | Solved for |
```python from statsmodels.stats.power import tt_solve_power, TTestIndPower
Solve for sample size given the other three¶
n = tt_solve_power(effect_size=0.5, alpha=0.05, power=0.80, alternative='two-sided') print(f"Required n per group (d=0.5): {n:.0f}")
Solve for power given sample size¶
power = tt_solve_power(effect_size=0.5, alpha=0.05, nobs=30, alternative='two-sided') print(f"Power with n=30 (d=0.5): {power:.3f}")
Solve for detectable effect size¶
d = tt_solve_power(nobs=30, alpha=0.05, power=0.80, alternative='two-sided') print(f"Detectable effect size (n=30): {d:.3f}") ```
Effect Size Conventions¶
Cohen's \(d\) provides standardized benchmarks for the magnitude of a mean difference:
| Size | \(d\) | Example |
|---|---|---|
| Small | 0.2 | Subtle difference, requires large \(n\) to detect |
| Medium | 0.5 | Visible to careful observation |
| Large | 0.8 | Obvious difference, detectable with small \(n\) |
Cohen's \(d\) is defined as
where \(\sigma\) is the common population standard deviation.
```python import numpy as np
Cohen's d from raw data¶
def cohens_d(group1, group2): """Compute Cohen's d for independent samples.""" n1, n2 = len(group1), len(group2) s1, s2 = np.std(group1, ddof=1), np.std(group2, ddof=1) s_pooled = np.sqrt(((n1 - 1) * s12 + (n2 - 1) * s22) / (n1 + n2 - 2)) return (np.mean(group1) - np.mean(group2)) / s_pooled
np.random.seed(42) g1 = np.random.normal(100, 15, 30) g2 = np.random.normal(108, 15, 30) d = cohens_d(g1, g2) print(f"Cohen's d: {d:.3f}") ```
Sample Size Determination¶
The required sample size for a two-sample \(t\)-test grows as the effect size shrinks. For a two-sided test at level \(\alpha\) with power \(1 - \beta\), the approximate sample size per group is
where \(z_{\alpha/2}\) and \(z_\beta\) are the standard normal quantiles corresponding to \(\alpha/2\) and \(\beta\).
```python from scipy import stats
Approximate formula¶
alpha = 0.05 power_target = 0.80 d = 0.5
z_alpha = stats.norm.ppf(1 - alpha / 2) z_beta = stats.norm.ppf(power_target) n_approx = 2 * (z_alpha + z_beta)2 / d2 print(f"Approximate n per group: {n_approx:.0f}")
Exact calculation via statsmodels¶
n_exact = tt_solve_power(effect_size=d, alpha=alpha, power=power_target, alternative='two-sided') print(f"Exact n per group: {n_exact:.0f}") ```
Power Curves¶
A power curve shows how power changes as a function of one parameter while holding the others fixed. This visualization helps researchers understand the trade-offs in study design.
```python import numpy as np from statsmodels.stats.power import TTestIndPower
analysis = TTestIndPower()
Power as a function of sample size for different effect sizes¶
sample_sizes = np.arange(10, 201, 5) effect_sizes = [0.2, 0.5, 0.8]
for d in effect_sizes: powers = [analysis.power(d, n, 0.05) for n in sample_sizes] print(f"d={d}: n=20 → power={analysis.power(d, 20, 0.05):.2f}, " f"n=100 → power={analysis.power(d, 100, 0.05):.2f}") ```
Power for Other Tests¶
Paired t-test¶
The paired design uses the effect size \(d_z = \mu_D / \sigma_D\), where \(\mu_D\) and \(\sigma_D\) are the mean and standard deviation of the paired differences.
```python from statsmodels.stats.power import tt_solve_power
Paired t-test power (uses one-sample power with d_z)¶
n_paired = tt_solve_power(effect_size=0.5, alpha=0.05, power=0.80, alternative='two-sided') print(f"Required n (paired, d_z=0.5): {n_paired:.0f}") ```
ANOVA (F-test)¶
For one-way ANOVA comparing \(k\) groups, the effect size is Cohen's \(f\):
```python from statsmodels.stats.power import FTestAnovaPower
anova_power = FTestAnovaPower() n_anova = anova_power.solve_power(effect_size=0.25, alpha=0.05, power=0.80, k_groups=3) print(f"Required n per group (ANOVA, f=0.25, k=3): {n_anova:.0f}") ```
Chi-Square Test¶
For chi-square tests, the effect size is Cohen's \(w\):
```python from statsmodels.stats.power import GofChisquarePower
chi2_power = GofChisquarePower() n_chi2 = chi2_power.solve_power(effect_size=0.3, alpha=0.05, power=0.80, n_bins=4) print(f"Required n (chi-square, w=0.3): {n_chi2:.0f}") ```
Post-Hoc Power Analysis¶
Observed Power is Uninformative
Computing power using the observed effect size after data collection (post-hoc or retrospective power) is widely discouraged. Observed power is a monotone function of the p-value and adds no information beyond what the p-value already provides. Power analysis is meaningful only when conducted before data collection, using a clinically or practically meaningful effect size.
Summary¶
Power analysis determines the sample size needed to detect an effect of a specified magnitude with a given probability. The four key quantities — significance level \(\alpha\), power \(1 - \beta\), effect size, and sample size — are interrelated so that fixing any three determines the fourth. Conventional targets are \(\alpha = 0.05\) and power \(= 0.80\). Power curves visualize these trade-offs and help researchers choose an appropriate design. Power analysis should always be conducted before data collection, using a scientifically meaningful effect size rather than the observed effect.
Exercises¶
Exercise 1.
Using statsmodels.stats.power.tt_solve_power, compute the required sample size to detect an effect size of \(d = 0.3\) with 80% power at \(\alpha = 0.05\) for a two-sided one-sample t-test.
Solution to Exercise 1
from statsmodels.stats.power import tt_solve_power
n = tt_solve_power(effect_size=0.3, alpha=0.05, power=0.8,
alternative='two-sided')
print(f"Required sample size: {n:.0f}")
Exercise 2. Plot a power curve for a one-sample t-test with \(n = 30\) at \(\alpha = 0.05\): compute power for effect sizes \(d = 0, 0.1, 0.2, \ldots, 1.0\) and plot power vs effect size.
Solution to Exercise 2
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.stats.power import tt_solve_power
effect_sizes = np.arange(0, 1.05, 0.1)
powers = [tt_solve_power(effect_size=d, nobs=30, alpha=0.05,
alternative='two-sided') if d > 0 else 0.05
for d in effect_sizes]
plt.plot(effect_sizes, powers, 'o-')
plt.axhline(0.8, ls='--', color='r', label='80% power')
plt.xlabel('Effect Size (d)')
plt.ylabel('Power')
plt.title('Power Curve (n=30)')
plt.legend()
plt.show()
Exercise 3. Compare the required sample sizes for 80% power at effect sizes \(d = 0.2, 0.5, 0.8\) for both one-sided and two-sided tests. Show that the one-sided test always requires fewer subjects.
Solution to Exercise 3
from statsmodels.stats.power import tt_solve_power
for d in [0.2, 0.5, 0.8]:
n_two = tt_solve_power(effect_size=d, alpha=0.05, power=0.8,
alternative='two-sided')
n_one = tt_solve_power(effect_size=d, alpha=0.05, power=0.8,
alternative='larger')
print(f"d={d}: two-sided n={n_two:.0f}, one-sided n={n_one:.0f}")