Seed and Reproduce¶
Setting a random seed ensures reproducible results in simulations and scientific computing.
Mental Model
A seed is the starting state of a pseudorandom number generator -- same seed, same sequence, every time. Set it once at the start of a script for reproducible experiments. Prefer the modern np.random.default_rng(seed) over the legacy np.random.seed() because the new API provides independent generators that do not share global state.
Why Seeds Matter¶
Random number generators produce deterministic sequences from an initial state.
1. Pseudorandomness¶
NumPy uses pseudorandom number generators that are deterministic given the same seed.
2. Reproducibility¶
Scientific research requires reproducible results for validation and peer review.
np.random.seed¶
Sets the global random state for the legacy random API.
1. Basic Usage¶
```python import numpy as np
def main(): np.random.seed(0)
data = np.random.randn(5)
print(data)
if name == "main": main() ```
2. Consistent Results¶
Running the same seed produces identical sequences every time.
Seed with Histogram¶
Visualize that seeded random samples follow expected distributions.
1. Normal Distribution¶
```python import numpy as np import matplotlib.pyplot as plt from scipy import stats
def main(): np.random.seed(0)
n_samples = 10_000
data = np.random.randn(n_samples)
fig, ax = plt.subplots()
_, bins_, _ = ax.hist(data, bins=100, density=True)
mu = data.mean()
sigma = data.std()
pdf_at_bins_ = stats.norm(loc=mu, scale=sigma).pdf(bins_)
ax.plot(bins_, pdf_at_bins_, '--r', linewidth=5)
plt.show()
if name == "main": main() ```
2. Histogram Fit¶
The red dashed line shows the theoretical PDF matching the sample histogram.
Modern Generator API¶
The recommended approach uses explicit generator objects.
1. default_rng¶
```python import numpy as np
rng = np.random.default_rng(seed=42) data = rng.standard_normal(1000) ```
2. Advantages¶
Explicit generators avoid global state and enable parallel random streams.
Best Practices¶
Follow these guidelines for reproducible random experiments.
1. Set Early¶
Set the seed at the beginning of your script or notebook.
2. Document Seeds¶
Record seed values in logs or comments for future reproduction.
3. Avoid Resetting¶
Resetting seeds mid-experiment can cause subtle statistical biases.
Exercises¶
Exercise 1. Create two random number generators with the same seed. Verify they produce identical sequences.
Solution to Exercise 1
```python import numpy as np rng1 = np.random.default_rng(42) rng2 = np.random.default_rng(42) print(rng1.random(5)) print(rng2.random(5))
Both produce identical arrays¶
```
Exercise 2. Explain why np.random.seed(42) is considered legacy. What is the modern alternative?
Solution to Exercise 2
np.random.seed() sets global state, which can cause subtle bugs in multithreaded code or when different parts of a program need independent random streams. The modern alternative is np.random.default_rng(seed), which creates an independent generator instance.
Exercise 3. Generate 5 random floats using a seeded default_rng, save the seed, then reproduce the exact same sequence.
Solution to Exercise 3
python
import numpy as np
seed = 12345
rng = np.random.default_rng(seed)
first_run = rng.random(5)
rng = np.random.default_rng(seed)
second_run = rng.random(5)
print(np.array_equal(first_run, second_run)) # True
Exercise 4. Write a function reproducible_sample(seed, n) that always returns the same n random numbers for a given seed.
Solution to Exercise 4
```python import numpy as np def reproducible_sample(seed, n): rng = np.random.default_rng(seed) return rng.random(n)
print(reproducible_sample(42, 5)) print(reproducible_sample(42, 5)) # same output ```