Sampling (rvs)¶
The .rvs() method generates random variates (random samples) from a probability distribution. It is the primary tool for Monte Carlo simulation, bootstrapping, and any workflow that requires synthetic data from a known distribution.
Mental Model
Calling .rvs(size=n) is like rolling a custom die \(n\) times, where the die is shaped by the distribution's parameters. The random_state parameter pins the sequence of rolls for reproducibility. The output is always a NumPy array, ready for vectorized computation.
Basic Usage¶
Every frozen distribution in scipy.stats provides the .rvs() method:
```python import scipy.stats as stats
a = stats.norm(loc=3.0) # frozen normal distribution, mean=3, std=1 samples = a.rvs(size=(2, 3), random_state=1) print(samples)
[[4.62434536 2.38824359 2.47182825]¶
[1.92703138 3.86540763 0.6984613 ]]¶
print(type(samples)) #
The output is always a NumPy array whose shape is determined by the size parameter. With size=(2, 3), you get a 2x3 matrix of independent samples.
Parameters¶
| Parameter | Type | Description |
|---|---|---|
size |
int or tuple | Shape of the output array. size=1000 gives a 1D array; size=(100, 5) gives a 2D array. |
random_state |
int, Generator, or RandomState | Seed for reproducibility. Pass an integer for deterministic results. |
Reproducibility¶
The random_state parameter ensures that the same sequence of random numbers is generated each time:
```python
These two calls produce identical samples¶
s1 = stats.norm(0, 1).rvs(size=5, random_state=42) s2 = stats.norm(0, 1).rvs(size=5, random_state=42) assert (s1 == s2).all() ```
For more control, pass a numpy.random.Generator object:
python
import numpy as np
rng = np.random.default_rng(seed=42)
samples = stats.norm(0, 1).rvs(size=1000, random_state=rng)
Verifying Samples Against Theory¶
A standard validation technique is to overlay a histogram of samples with the theoretical PDF (continuous) or PMF (discrete):
```python import scipy.stats as stats import numpy as np import matplotlib.pyplot as plt
Generate samples and overlay with theoretical PDF¶
a = stats.norm(loc=0, scale=1) samples = a.rvs(size=10000, random_state=337) x = np.linspace(-4, 4, 100)
plt.hist(samples, density=True, bins=50, alpha=0.7, label='Histogram') plt.plot(x, a.pdf(x), 'r-', linewidth=2, label='Theoretical PDF') plt.legend() plt.title('Sampling Verification: Histogram vs PDF') plt.show() ```
As the sample size increases, the histogram converges to the theoretical distribution — a visual demonstration of the law of large numbers.
Sampling from Different Distribution Types¶
The .rvs() method works identically for continuous and discrete distributions:
```python
Continuous distributions¶
stats.norm(0, 1).rvs(size=5) # normal stats.expon(scale=1/3).rvs(size=5) # exponential with rate λ=3
Discrete distributions¶
stats.poisson(mu=3.0).rvs(size=5) # Poisson stats.binom(n=100, p=0.6).rvs(size=5) # binomial ```
For discrete distributions, .rvs() returns integer-valued arrays.
Financial Applications¶
Random sampling is fundamental to Monte Carlo methods in finance: simulating asset price paths under geometric Brownian motion, generating scenarios for Value at Risk (VaR) estimation, pricing path-dependent options, and stress testing portfolio performance under various distributional assumptions.
Summary¶
The .rvs() method is the gateway to simulation-based analysis. Combined with the random_state parameter for reproducibility and NumPy's array infrastructure, it provides an efficient and consistent interface for generating random samples from any scipy.stats distribution.
Exercises¶
Exercise 1.
Generate 500 samples from a normal distribution with \(\mu = 100\) and \(\sigma = 15\) using random_state=42. Compute the sample mean and standard deviation. Repeat with random_state=42 and verify you get identical results.
Solution to Exercise 1
import numpy as np
from scipy import stats
samples1 = stats.norm.rvs(loc=100, scale=15, size=500, random_state=42)
samples2 = stats.norm.rvs(loc=100, scale=15, size=500, random_state=42)
print(f"Mean: {np.mean(samples1):.4f}, Std: {np.std(samples1, ddof=1):.4f}")
print(f"Identical results: {np.array_equal(samples1, samples2)}")
Exercise 2.
Generate a \(5 \times 4\) array of random samples from a uniform distribution on \([2, 7]\) using a single .rvs() call with the size parameter. Print the array shape and verify all values fall within \([2, 7]\).
Solution to Exercise 2
import numpy as np
from scipy import stats
samples = stats.uniform.rvs(loc=2, scale=5, size=(5, 4), random_state=42)
print(f"Shape: {samples.shape}")
print(f"Min: {samples.min():.4f}, Max: {samples.max():.4f}")
print(f"All in [2, 7]: {(samples >= 2).all() and (samples <= 7).all()}")
Exercise 3. Draw 10,000 samples from both a \(t\)-distribution with 3 degrees of freedom and a standard normal. Compare the fraction of samples with \(|x| > 3\) for each distribution to demonstrate heavier tails in the \(t\)-distribution.
Solution to Exercise 3
import numpy as np
from scipy import stats
np.random.seed(42)
t_samples = stats.t.rvs(df=3, size=10000)
norm_samples = stats.norm.rvs(size=10000)
t_frac = np.mean(np.abs(t_samples) > 3)
n_frac = np.mean(np.abs(norm_samples) > 3)
print(f"Fraction |x|>3 — t(3): {t_frac:.4f}, Normal: {n_frac:.4f}")