Degrees of Freedom and Asymptotic Theory¶
Degrees of Freedom in Chi-Square Tests¶
The degrees of freedom determine the shape of the chi-square distribution used as the reference distribution under \(H_0\). The calculation depends on which chi-square test is being performed.
Goodness-of-Fit Test¶
For the goodness-of-fit test with \(k\) categories:
The single constraint arises because the observed counts must sum to the total sample size \(n\):
This means only \(k - 1\) of the deviations \(O_i - E_i\) are free to vary; the last is determined by the others.
Test of Independence and Homogeneity¶
For an \(r \times c\) contingency table:
where \(r\) is the number of rows and \(c\) is the number of columns.
The constraints are:
- Row totals must match: \(r - 1\) independent constraints from rows.
- Column totals must match: \(c - 1\) independent constraints from columns.
- One constraint is redundant (the grand total).
So the total number of free cells in the table is:
Asymptotic Theory¶
The Core Result¶
Under \(H_0\), for large sample sizes, the chi-square test statistic
converges in distribution to a chi-square random variable with the appropriate degrees of freedom. This is an asymptotic result — it holds approximately for finite samples, and the approximation improves as the sample size grows.
Derivation Sketch for Goodness-of-Fit¶
Approximating the denominator:
The final step uses the fact that the \(Z_i\) are not fully independent (they satisfy a linear constraint), reducing the effective degrees of freedom from \(k\) to \(k-1\).
Rate of Convergence¶
The chi-square approximation improves with:
- Larger total sample size \(n\).
- More uniform expected cell counts.
- Fewer categories with very small expected frequencies.
As a practical guideline, the approximation is generally reliable when all expected frequencies are at least 5.