Chi-Square Distribution¶
The chi-square (\(\chi^2\)) distribution arises as the sum of squared standard normal random variables. It plays a central role in hypothesis testing, confidence interval construction, and goodness-of-fit tests.
Mathematical Definition¶
If \(Z_1, Z_2, \ldots, Z_k\) are independent standard normal random variables, then:
The PDF is:
where \(k\) is the degrees of freedom (df).
Usage in scipy.stats¶
import scipy.stats as stats
import numpy as np
import matplotlib.pyplot as plt
df = 10
a = stats.chi2(df)
print(f"Mean: {a.mean():.2f}") # = df = 10
print(f"Variance: {a.var():.2f}") # = 2*df = 20
# Visualize: samples vs theoretical PDF
x_samples = a.rvs(size=10000, random_state=337)
x_theory = np.linspace(0, 40, 100)
y_theory = a.pdf(x_theory)
plt.plot(x_theory, y_theory, color='r', label='Theoretical PDF')
plt.hist(x_samples, density=True, bins=30, alpha=0.7, label='Sampled histogram')
plt.legend()
plt.title(f'Chi-Square Distribution (df={df})')
plt.xlabel('x')
plt.ylabel('Density')
plt.show()
Key Properties¶
The chi-square distribution has mean \(E[X] = k\), variance \(\text{Var}(X) = 2k\), and is always right-skewed (though it becomes more symmetric as \(k\) increases). It is a special case of the gamma distribution: \(\chi^2(k) = \text{Gamma}(k/2, 2)\).
Effect of Degrees of Freedom¶
As the degrees of freedom increase, the chi-square distribution shifts to the right and becomes more symmetric, approaching a normal distribution by the Central Limit Theorem:
import scipy.stats as stats
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 40, 200)
for df in [1, 2, 5, 10, 20]:
plt.plot(x, stats.chi2(df).pdf(x), label=f'df={df}')
plt.legend()
plt.title('Chi-Square PDFs for Various Degrees of Freedom')
plt.xlabel('x')
plt.ylabel('f(x)')
plt.show()
Applications in Hypothesis Testing¶
The chi-square distribution is used for chi-square tests of independence (testing association between categorical variables), goodness-of-fit tests (testing if data follows a hypothesized distribution), and confidence intervals for population variance (\(\sigma^2\)).
Critical values are obtained using the PPF:
alpha = 0.05
df = 10
critical_value = stats.chi2(df).ppf(1 - alpha)
# Reject Hâ‚€ if test statistic > critical_value
Financial Applications¶
In finance, the chi-square distribution appears in variance ratio tests for market efficiency, likelihood ratio tests in maximum likelihood estimation, and portfolio variance testing under normality assumptions.
Summary¶
The chi-square distribution is fundamental to statistical inference. In scipy.stats, use stats.chi2(df) to create a frozen distribution for computing PDFs, CDFs, critical values, and generating random samples.