Custom Distributions¶
The built-in distribution families in scipy.stats cover a wide range of common models, but real-world data sometimes requires a distribution with a specific functional form not available in the library. The rv_continuous base class provides a framework for defining custom continuous distributions that automatically gain access to all standard methods (PDF, CDF, PPF, random sampling, moment computation) once you specify the density.
The rv_continuous Subclass Pattern¶
To create a custom continuous distribution, define a class that inherits from scipy.stats.rv_continuous and override the _pdf method with your desired density function. The density \(f(x)\) must satisfy:
At minimum, providing _pdf is sufficient. SciPy will numerically compute the CDF, PPF (quantile function), and moments from the PDF via numerical integration. For better performance, you can also override _cdf, _ppf, _sf (survival function), and _stats.
Basic Example: Power Distribution¶
Consider the power distribution on \([0, 1]\) with parameter \(\alpha > 0\):
This density integrates to 1 since \(\int_0^1 \alpha\, x^{\alpha - 1}\, dx = \alpha \cdot \frac{x^\alpha}{\alpha}\Big|_0^1 = 1\).
import scipy.stats as stats
import numpy as np
import matplotlib.pyplot as plt
class PowerDistribution(stats.rv_continuous):
"""Custom power distribution on [0, 1] with shape parameter alpha."""
def _pdf(self, x, alpha):
return alpha * x ** (alpha - 1)
def _cdf(self, x, alpha):
return x ** alpha
def _ppf(self, q, alpha):
return q ** (1.0 / alpha)
def _stats(self, alpha):
mean = alpha / (alpha + 1)
var = alpha / ((alpha + 1) ** 2 * (alpha + 2))
return mean, var, None, None
power_dist = PowerDistribution(a=0, b=1, name='power')
The constructor arguments a=0 and b=1 set the support of the distribution to \([0, 1]\).
Using the Custom Distribution¶
Once defined, the custom distribution supports the same interface as any built-in scipy.stats distribution:
# Create a frozen distribution with alpha=3
frozen = power_dist(alpha=3)
print(f"Mean: {frozen.mean():.4f}") # 3/4 = 0.75
print(f"Variance: {frozen.var():.4f}") # 3/48 = 0.0375
print(f"Median: {frozen.median():.4f}") # 0.5^(1/3)
# PDF, CDF, and random samples
x = np.linspace(0, 1, 200)
y_pdf = frozen.pdf(x)
y_cdf = frozen.cdf(x)
samples = frozen.rvs(size=5000, random_state=42)
plt.plot(x, y_pdf, 'r-', label='PDF')
plt.hist(samples, density=True, bins=40, alpha=0.5, label='Samples')
plt.legend()
plt.title('Custom Power Distribution (α=3)')
plt.xlabel('x')
plt.ylabel('Density')
plt.show()
Overriding Additional Methods¶
Providing only _pdf works but can be slow for repeated CDF or quantile evaluations because SciPy falls back to numerical integration. The methods you can override, in order of impact, are:
| Method | Mathematical meaning | Benefit of overriding |
|---|---|---|
_pdf(x, ...) |
\(f(x)\) | Required |
_cdf(x, ...) |
\(F(x) = \int_{-\infty}^{x} f(t)\, dt\) | Avoids numerical integration |
_ppf(q, ...) |
\(F^{-1}(q)\) | Avoids numerical root-finding |
_sf(x, ...) |
\(1 - F(x)\) | Better numerical accuracy in the tail |
_stats(...) |
\((E[X],\; \text{Var}(X),\; \text{skew},\; \text{kurt})\) | Avoids numerical moment computation |
Validating a Custom Distribution¶
After defining a custom distribution, verify that the density integrates to 1 and that the CDF is consistent with the PDF:
from scipy.integrate import quad
frozen = power_dist(alpha=3)
# Check normalization
total, _ = quad(frozen.pdf, 0, 1)
print(f"Integral of PDF: {total:.6f}") # Should be 1.000000
# Check CDF consistency at a specific point
x0 = 0.7
cdf_from_integral, _ = quad(frozen.pdf, 0, x0)
cdf_from_method = frozen.cdf(x0)
print(f"CDF from integral: {cdf_from_integral:.6f}")
print(f"CDF from method: {cdf_from_method:.6f}")
Key Considerations¶
- Support specification: Set
aandbin the constructor to define the support \([a, b]\). Values outside this range receive zero density automatically. - Shape parameters: Pass shape parameter names as string arguments to the constructor via the
shapeskeyword, or let SciPy infer them from the_pdfsignature. - Performance: Override
_cdfand_ppfwhenever closed-form expressions exist. Numerical fallbacks can be orders of magnitude slower. - Normalization: SciPy does not automatically verify that your PDF integrates to 1. Always check this manually.
Summary¶
The rv_continuous subclass pattern allows you to define any continuous distribution by specifying its PDF. Once defined, the distribution inherits the full scipy.stats interface including CDF, PPF, moment computation, and random sampling. For production use, override _cdf, _ppf, and _stats with closed-form expressions to avoid numerical overhead.
Runnable Example: solutions_distributions.py¶
"""
Solutions 02: Continuous and Discrete Distributions
===================================================
Detailed solutions with full explanations.
"""
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
# =============================================================================
# Main
# =============================================================================
if __name__ == "__main__":
print("="*80)
print("SOLUTIONS: PROBABILITY DISTRIBUTIONS")
print("="*80)
print()
# Solution 1
print("Solution 1: Exponential Distribution")
print("-" * 40)
mean_time = 5 # minutes
expo_dist = stats.expon(scale=mean_time)
# Part a
prob_within_3 = expo_dist.cdf(3)
print(f"a) P(X ≤ 3) = {prob_within_3:.4f}")
print()
# Part b - Memoryless property
prob_within_2 = expo_dist.cdf(2)
print(f"b) P(X ≤ 6 | X > 4) = P(X ≤ 2) = {prob_within_2:.4f}")
print(f" This equals P(X ≤ 2) due to memoryless property\n")
# Detailed solutions continue...
print("="*80)