p-values and Statistical Significance¶
In hypothesis testing, after computing a test statistic, we need a way to calibrate how surprising the observed result is under the null hypothesis. The p-value provides this calibration by converting the test statistic into a probability scale. This section defines p-values formally, explains their correct interpretation, and presents the standard decision rule for statistical significance.
Definition¶
A p-value is the probability, under the null hypothesis \(H_0\), of obtaining a test statistic at least as extreme as the value actually observed from the sample data.
For an observed test statistic \(t_{\text{obs}}\), the p-value depends on whether the test is one-sided or two-sided. In a one-sided (upper-tail) test, the p-value is
where \(T\) denotes the random test statistic under \(H_0\). For a two-sided test, the p-value accounts for extreme values in both tails:
A small p-value indicates that the observed data would be unlikely if \(H_0\) were true, providing evidence against the null hypothesis.
Common Misconception
The p-value is not the probability that \(H_0\) is true. It is the probability of observing a test statistic as extreme as or more extreme than the one obtained, assuming \(H_0\) holds. Confusing these two interpretations is one of the most widespread errors in applied statistics.
Decision Rule¶
Given a pre-specified significance level \(\alpha\) (commonly \(\alpha = 0.05\)), the decision rule is:
- Reject \(H_0\) if \(p < \alpha\)
- Fail to reject \(H_0\) if \(p \geq \alpha\)
The significance level \(\alpha\) represents the maximum tolerable probability of a Type I error — rejecting \(H_0\) when it is in fact true.
The following example illustrates how to compute a p-value using a one-sample \(t\)-test. Suppose we have measurements and wish to test \(H_0: \mu = 22\) against \(H_1: \mu \neq 22\).
import numpy as np
from scipy import stats
# Sample data: 10 measurements
data = np.array([23.1, 22.5, 24.0, 21.8, 23.7, 22.9, 24.3, 23.0, 22.6, 23.5])
# One-sample t-test: H₀: μ = 22 vs H₁: μ ≠ 22
t_stat, p_value = stats.ttest_1samp(data, 22)
# p_value = P(|T| >= |t_obs| | H₀ true) for a two-sided test
alpha = 0.05
if p_value < alpha:
print(f"Reject H₀ (p = {p_value:.4f} < {alpha})")
else:
print(f"Fail to reject H₀ (p = {p_value:.4f} >= {alpha})")
Summary¶
The p-value converts a test statistic into a probability that measures the strength of evidence against \(H_0\). By comparing the p-value to a pre-specified significance level \(\alpha\), we obtain a binary decision rule for hypothesis testing.