Random Permutations¶
NumPy provides functions for randomly reordering array elements. A permutation is a random ordering --- it is the basis for shuffling datasets before training, bootstrap sampling, and randomized algorithms. While distributions generate new random values, permutations rearrange existing data — a distinct but equally important form of randomization in the random computation pipeline.
Mental Model
shuffle rearranges elements in place and returns None; permutation returns a shuffled copy and leaves the original untouched. For multi-dimensional arrays, both operate only along the first axis (shuffling rows, not individual elements). Use permutation when you need to keep the original data intact.
np.random.shuffle¶
Shuffles an array in-place, modifying the original.
1. Returns None¶
```python import numpy as np
def main(): x = [1, 4, 9, 12, 15] result = np.random.shuffle(x) print(result) # None
if name == "main": main() ```
The function modifies the array in-place and returns None.
2. List Shuffle¶
```python import numpy as np
def main(): x = [1, 4, 9, 12, 15] np.random.shuffle(x) print(x) # Shuffled list
if name == "main": main() ```
3. Array Shuffle¶
```python import numpy as np
def main(): x = np.array([1, 4, 9, 12, 15]) np.random.shuffle(x) print(x)
if name == "main": main() ```
2D Array Shuffle¶
For multi-dimensional arrays, shuffle operates along the first axis.
1. Row Shuffling¶
```python import numpy as np
def main(): x = np.arange(9).reshape((3, 3)) print("Before shuffle:") print(x)
np.random.shuffle(x)
print("After shuffle:")
print(x)
if name == "main": main() ```
2. Only First Axis¶
Rows are permuted, but elements within each row maintain their order.
np.random.permutation¶
Returns a permuted copy, leaving the original unchanged.
1. From Integer¶
```python import numpy as np
def main(): x = np.random.permutation(10) print(x) # Random permutation of 0-9
if name == "main": main() ```
2. From List¶
```python import numpy as np
def main(): x = [1, 4, 9, 12, 15] y = np.random.permutation(x) print(y) print(x) # Original unchanged
if name == "main": main() ```
3. From Array¶
```python import numpy as np
def main(): x = np.array([1, 4, 9, 12, 15]) y = np.random.permutation(x) print(y)
if name == "main": main() ```
2D Permutation¶
Like shuffle, permutation operates along the first axis for 2D arrays.
1. Row Permutation¶
```python import numpy as np
def main(): x = np.arange(9).reshape((3, 3)) y = np.random.permutation(x) print(y)
if name == "main": main() ```
2. Non-Destructive¶
The original array x remains unchanged.
shuffle vs permutation¶
Choose based on whether you need the original array.
1. Use shuffle¶
When you want to modify the array in-place to save memory.
2. Use permutation¶
When you need to preserve the original array or create an index array.
3. Memory Trade-off¶
shuffle is memory-efficient; permutation creates a copy.
Common Applications¶
Random permutations are essential in many algorithms.
1. Data Shuffling¶
Randomize training data order before each epoch in machine learning.
2. Random Sampling¶
Create random indices for selecting subsets of data.
3. A/B Testing¶
Randomly assign subjects to control and treatment groups.
Exercises¶
Exercise 1. Generate 1,000 samples from this distribution using NumPy. Compute the sample mean and variance and compare with the theoretical values.
Solution to Exercise 1
```python import numpy as np rng = np.random.default_rng(42)
Adjust parameters based on the specific distribution¶
samples = rng.standard_normal(1000) # example print(f"Sample mean: {samples.mean():.4f}") print(f"Sample var: {samples.var():.4f}") ```
Exercise 2. Create a histogram of 10,000 samples from this distribution using np.histogram. Print the bin edges and counts for the first 5 bins.
Solution to Exercise 2
python
import numpy as np
rng = np.random.default_rng(42)
samples = rng.standard_normal(10000)
counts, edges = np.histogram(samples, bins=20)
for i in range(5):
print(f"[{edges[i]:.2f}, {edges[i+1]:.2f}): {counts[i]}")
Exercise 3. Write a function that generates n samples from this distribution and returns the proportion that fall below the mean. Verify it approaches the expected proportion as n grows.
Solution to Exercise 3
python
import numpy as np
rng = np.random.default_rng(42)
for n in [100, 1000, 10000, 100000]:
samples = rng.standard_normal(n)
below_mean = np.mean(samples < samples.mean())
print(f"n={n:>7d}: {below_mean:.4f}")
Exercise 4. Simulate a real-world scenario that uses this distribution. Generate data, compute summary statistics, and explain why this distribution is appropriate for the scenario.
Solution to Exercise 4
The specific scenario depends on the distribution. For example, a Poisson distribution models the number of events per time interval (e.g., customers arriving at a store).
```python import numpy as np rng = np.random.default_rng(42)
Example: Poisson arrivals¶
arrivals = rng.poisson(lam=5, size=365) print(f"Mean daily arrivals: {arrivals.mean():.2f}") print(f"Max in a day: {arrivals.max()}") ```