Beta Distribution¶
The beta distribution is the most natural continuous distribution for modeling random variables that represent probabilities, proportions, or rates confined to the interval \([0, 1]\). It arises as the conjugate prior for the Bernoulli and binomial likelihoods in Bayesian statistics, and it appears in the theory of order statistics. Its two shape parameters allow a remarkably flexible family of density shapes on \([0, 1]\).
Mental Model
The beta distribution is the Swiss army knife for modeling proportions on \([0, 1]\). Its two shape parameters \(\alpha\) and \(\beta\) act like vote counts: \(\alpha\) votes for values near 1 and \(\beta\) votes for values near 0. More votes overall means a more concentrated (confident) distribution; equal votes produce symmetry.
Mathematical Definition¶
A random variable \(X\) follows a beta distribution with shape parameters \(\alpha > 0\) and \(\beta > 0\), written \(X \sim \text{Beta}(\alpha, \beta)\), if its probability density function is:
where \(B(\alpha, \beta)\) is the beta function, defined as:
The beta function serves as the normalization constant that ensures the density integrates to 1 over \([0, 1]\).
Usage in scipy.stats¶
The scipy.stats.beta distribution object takes parameters a (\(\alpha\)) and b (\(\beta\)):
```python import scipy.stats as stats import numpy as np import matplotlib.pyplot as plt
alpha, beta_param = 2.0, 5.0 a = stats.beta(alpha, beta_param)
print(f"Mean: {a.mean():.4f}") # α/(α+β) = 2/7 print(f"Variance: {a.var():.4f}") # αβ/((α+β)²(α+β+1))
x = np.linspace(0, 1, 200) y_pdf = a.pdf(x) y_cdf = a.cdf(x)
plt.plot(x, y_pdf, label='PDF') plt.plot(x, y_cdf, label='CDF') plt.legend() plt.title(f'Beta Distribution (α={alpha}, β={beta_param})') plt.xlabel('x') plt.ylabel('Density / Probability') plt.show() ```
The PDF shows the shape of the distribution on \([0, 1]\), while the CDF gives \(P(X \le x)\), rising from 0 to 1.
Effect of Parameters¶
The two shape parameters control the density shape with great flexibility. Varying \(\alpha\) and \(\beta\) produces uniform, U-shaped, J-shaped, and bell-shaped densities all within the same family:
```python import scipy.stats as stats import numpy as np import matplotlib.pyplot as plt
x = np.linspace(0, 1, 200) params = [(0.5, 0.5), (1, 1), (2, 2), (2, 5), (5, 2)] for (a, b) in params: plt.plot(x, stats.beta(a, b).pdf(x), label=f'α={a}, β={b}')
plt.legend() plt.title('Beta PDFs for Various Parameter Combinations') plt.xlabel('x') plt.ylabel('f(x)') plt.ylim(0, 4) plt.show() ```
Key Properties¶
The beta distribution has the following properties:
- Mean: \(E[X] = \dfrac{\alpha}{\alpha + \beta}\)
- Variance: \(\text{Var}(X) = \dfrac{\alpha\beta}{(\alpha + \beta)^2(\alpha + \beta + 1)}\)
- Mode (for \(\alpha > 1\) and \(\beta > 1\)): \(\dfrac{\alpha - 1}{\alpha + \beta - 2}\)
Special Cases¶
- \(\text{Beta}(1, 1) = \text{Uniform}(0, 1)\): when both parameters equal 1, the density is flat
- \(\text{Beta}(\alpha, \alpha)\): symmetric about \(x = 0.5\) for any \(\alpha\)
- \(\text{Beta}(1/2, 1/2)\): the arcsine distribution, with density concentrated near \(x = 0\) and \(x = 1\)
Parameters in scipy.stats¶
| Parameter | Symbol | scipy.stats keyword |
Default |
|---|---|---|---|
| Shape 1 | \(\alpha\) | a |
(required) |
| Shape 2 | \(\beta\) | b |
(required) |
| Location | — | loc |
0 |
| Scale | — | scale |
1 |
Financial Applications¶
In quantitative finance, the beta distribution models recovery rates in credit risk (the fraction of face value recovered after default), portfolio weight distributions, and loss-given-default rates. In Bayesian portfolio analysis, it serves as a prior for the probability of outperformance. The PERT distribution, widely used in project risk analysis, is a scaled beta distribution.
Summary¶
The beta distribution provides a flexible family of densities on \([0, 1]\), controlled by two shape parameters. In scipy.stats, use stats.beta(a, b) to create a frozen distribution for computing PDFs, CDFs, quantiles, and generating random samples.
Exercises¶
Exercise 1. Plot the Beta PDF for \((a, b) = (0.5, 0.5)\), \((1, 1)\), \((2, 5)\), and \((5, 2)\) on the same axes. Describe how the shape changes with different parameter values.
Solution to Exercise 1
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
x = np.linspace(0.01, 0.99, 200)
params = [(0.5, 0.5), (1, 1), (2, 5), (5, 2)]
for a, b in params:
plt.plot(x, stats.beta.pdf(x, a, b), label=f'Beta({a},{b})')
plt.legend()
plt.title('Beta Distribution PDFs')
plt.ylim(0, 4)
plt.show()
Exercise 2.
For a \(\text{Beta}(3, 7)\) distribution, compute the mean, mode, and variance analytically and verify using scipy.stats.beta. The mode is \((a-1)/(a+b-2)\) when \(a, b > 1\).
Solution to Exercise 2
from scipy import stats
a, b = 3, 7
rv = stats.beta(a, b)
mode = (a - 1) / (a + b - 2)
mean_exact = a / (a + b)
var_exact = (a * b) / ((a + b)**2 * (a + b + 1))
print(f"Mean: scipy={rv.mean():.4f}, exact={mean_exact:.4f}")
print(f"Variance: scipy={rv.var():.6f}, exact={var_exact:.6f}")
print(f"Mode: {mode:.4f}")
Exercise 3.
Generate 10,000 samples from \(\text{Beta}(2, 5)\) and estimate \(P(X < 0.3)\) from the samples. Compare with the exact value from .cdf(0.3).
Solution to Exercise 3
import numpy as np
from scipy import stats
rv = stats.beta(2, 5)
samples = rv.rvs(size=10000, random_state=42)
p_sample = np.mean(samples < 0.3)
p_exact = rv.cdf(0.3)
print(f"P(X < 0.3) — sample: {p_sample:.4f}, exact: {p_exact:.4f}")