Boolean Masking¶
Boolean masking uses logical conditions to selectively operate on array subsets.
Mental Model
A boolean mask is an array of True/False values the same shape as your data. Using it as an index selects only the True elements, like a stencil laid over the array. This is the NumPy equivalent of a SQL WHERE clause -- filter, replace, or count elements that satisfy any condition, all without writing a loop.
Filtering Basics¶
Boolean arrays select elements satisfying a condition.
1. 1D with List¶
```python import numpy as np
def main(): a = np.array([1, 2, 3]) b = [True, False, True] c = a[b] print(f"{c = }")
if name == "main": main() ```
Output:
c = array([1, 3])
2. 1D with Array¶
```python import numpy as np
def main(): a = np.array([1, 2, 3]) b = np.array([True, False, True]) c = a[b] print(f"{c = }")
if name == "main": main() ```
Output:
c = array([1, 3])
3. Condition Filter¶
```python import numpy as np
def main(): a = np.array([1, 2, 3]) b = (a % 2 == 1) print(f"{b = }") print(f"{type(b) = }")
c = a[b]
print(f"{c = }")
if name == "main": main() ```
Output:
b = array([ True, False, True])
type(b) = <class 'numpy.ndarray'>
c = array([1, 3])
2D Filtering¶
Multi-dimensional arrays require NumPy boolean arrays.
1. List Causes Error¶
```python import numpy as np
def main(): a = np.array([[1, 2, 3], [4, 5, 6]]) b = [[True, False, True], [False, True, False]] try: c = a[b] print(f"{c = }") except IndexError as e: print(f"Error: {e}")
if name == "main": main() ```
2. Array Works¶
```python import numpy as np
def main(): a = np.array([[1, 2, 3], [4, 5, 6]]) b = np.array([[True, False, True], [False, True, False]]) c = a[b] print(f"{c = }")
if name == "main": main() ```
Output:
c = array([1, 3, 5])
3. 2D Condition¶
```python import numpy as np
def main(): a = np.array([[1, 2, 3], [4, 5, 6]]) b = (a % 2 == 1) print(f"{b = }")
c = a[b]
print(f"{c = }")
if name == "main": main() ```
Output:
b = array([[ True, False, True],
[False, True, False]])
c = array([1, 3, 5])
Mask Assignment¶
Boolean masks can modify array elements in-place.
1. 1D Assignment¶
```python import numpy as np
def main(): a = np.array([1, 2, 3]) a[a % 2 == 1] = 0 print(a)
if name == "main": main() ```
Output:
[0 2 0]
2. 2D Assignment¶
```python import numpy as np
def main(): a = np.array([[1, 2, 3], [4, 5, 6]]) b = np.array([[True, False, True], [False, True, False]]) a[b] = 0 print(a)
if name == "main": main() ```
Output:
[[0 2 0]
[4 0 6]]
3. Condition Assignment¶
```python import numpy as np
def main(): a = np.array([[1, 2, 3], [4, 5, 6]]) a[a % 2 == 1] = 0 print(a)
if name == "main": main() ```
Output:
[[0 2 0]
[4 0 6]]
Image Masking¶
Apply masks to image regions.
1. Rectangle Mask¶
```python import numpy as np import matplotlib.pyplot as plt import PIL import urllib
def main(): url = "https://upload.wikimedia.org/wikipedia/en/4/43/Pok%C3%A9mon_Mewtwo_art.png" img = np.array(PIL.Image.open(urllib.request.urlopen(url))) print(f"{img.shape = }") print(f"{img.dtype = }")
mask = np.zeros(shape=img.shape[:2], dtype=bool)
mask[50:100, 50:100] = True
img_copy = img.copy()
img_copy[mask] = [255, 255, 255, 255]
fig, (ax0, ax1) = plt.subplots(1, 2, figsize=(6, 4))
ax0.set_title("Original", fontsize=15)
ax0.imshow(img)
ax1.set_title("Masked", fontsize=15)
ax1.imshow(img_copy)
for ax in (ax0, ax1):
ax.axis('off')
plt.tight_layout()
plt.show()
if name == "main": main() ```
2. Mask Shape¶
Create mask matching image height and width: img.shape[:2].
Combining Masks¶
Logical operators combine multiple conditions.
1. AND Operator¶
```python import numpy as np
arr = np.array([1, 2, 3, 4, 5]) mask1 = arr > 2 mask2 = arr < 5 combined = mask1 & mask2 print(arr[combined]) # [3 4] ```
2. OR Operator¶
```python import numpy as np
arr = np.array([1, 2, 3, 4, 5]) combined = (arr < 2) | (arr > 4) print(arr[combined]) # [1 5] ```
3. NOT Operator¶
```python import numpy as np
arr = np.array([1, 2, 3, 4, 5]) mask = arr > 2 print(arr[~mask]) # [1 2] ```
Coin Flip Simulation¶
Boolean masking enables vectorized probability simulation.
1. Inverse Transform¶
```python import numpy as np
def main(): n = 10 p = 0.5
c = np.zeros(n)
u = np.random.rand(n)
c[u > 1 - p] = 1
for ui, ci in zip(u, c):
print(f"{ui = :.3f}, {ui > 1-p = }, {ci = }")
if name == "main": main() ```
2. Vectorized Flip¶
```python import numpy as np
def main(): n = 30 p = 0.5
uniform = np.random.uniform(size=(n,))
coin = np.zeros_like(uniform)
coin[uniform > 1 - p] = 1.
print(coin)
if name == "main": main() ```
3. Performance Benefit¶
Vectorized masking is faster than loop-based coin flips.
np.where Indices¶
np.where(condition) returns indices where condition is True.
1. 1D Indices¶
```python import numpy as np
def main(): a = np.array([1, 2, 2, 3, 2, 4, 4, 2]) b = np.where(a == 2) print(f"{b = }")
if name == "main": main() ```
Output:
b = (array([1, 2, 4, 7]),)
2. 2D Indices¶
```python import numpy as np
def main(): a = np.array([[1, 2], [2, 3], [2, 4], [4, 2]]) b = np.where(a == 2) print(f"{b = }")
if name == "main": main() ```
Output:
b = (array([0, 1, 2, 3]), array([1, 0, 0, 1]))
3. 3D Indices¶
```python import numpy as np
def main(): a = np.array([[1, 2], [2, 3], [2, 4], [4, 2]]) a = np.array([a, a]) b = np.where(a == 2) print(f"{b = }")
if name == "main": main() ```
np.where Conditional¶
np.where(condition, x, y) selects from x or y based on condition.
1. Syntax Pattern¶
2. 1D Example¶
```python import numpy as np
def main(): a = np.array([1, 2, 2, 3, 2, 4, 4, 2]) b = a * 10 c = np.where(a == 2, a, b) print(f"{c = }")
if name == "main": main() ```
Output:
c = array([10, 2, 2, 30, 2, 40, 40, 2])
3. 2D Example¶
```python import numpy as np
def main(): a = np.array([[1, 2], [2, 3], [2, 4], [4, 2]]) b = a * 10 c = np.where(a == 2, a, b) print(f"{c = }")
if name == "main": main() ```
Output:
c = array([[10, 2],
[ 2, 30],
[ 2, 40],
[40, 2]])
Image Clamping¶
Use np.where to clamp pixel values to valid range.
1. Noise and Clamp¶
```python import numpy as np import matplotlib.pyplot as plt import PIL import urllib
def main(): url = "https://upload.wikimedia.org/wikipedia/en/4/43/Pok%C3%A9mon_Mewtwo_art.png" img = np.array(PIL.Image.open(urllib.request.urlopen(url)))
img_noisy = img + np.random.randint(-100, 101, size=img.shape)
img_noisy = np.where(img_noisy >= 0, img_noisy, 0)
img_noisy = np.where(img_noisy <= 255, img_noisy, 255)
fig, ax = plt.subplots()
ax.imshow(img_noisy.astype(np.uint8))
ax.axis('off')
plt.show()
if name == "main": main() ```
2. Chained np.where¶
Apply multiple conditions sequentially to enforce bounds.
Boolean Masking Always Returns a Copy
Unlike slicing (which returns a view), boolean masking always returns a copy. Modifying the result does not modify the original array:
python
a = np.arange(10)
result = a[a > 5] # copy, not a view
result[0] = 999
print(a) # unchanged
To modify the original, use mask assignment directly: a[a > 5] = 0.
Exercises¶
Exercise 1.
Create an array a = np.arange(1, 21). Use boolean masking to extract all elements divisible by 3. Then use mask assignment to replace all elements greater than 15 with -1. Print the modified array.
Solution to Exercise 1
import numpy as np
a = np.arange(1, 21)
divisible_by_3 = a[a % 3 == 0]
print(f"Divisible by 3: {divisible_by_3}")
a[a > 15] = -1
print(f"After replacement: {a}")
Exercise 2.
Given a 2D array M = np.random.randint(0, 100, size=(5, 5)), create a combined mask that selects elements that are both greater than 25 AND less than 75. Count how many elements satisfy this condition using np.sum on the mask.
Solution to Exercise 2
import numpy as np
np.random.seed(42)
M = np.random.randint(0, 100, size=(5, 5))
mask = (M > 25) & (M < 75)
count = np.sum(mask)
print(f"Elements in (25, 75): {count}")
print(f"Values: {M[mask]}")
Exercise 3.
Simulate 1000 coin flips with probability p = 0.7 using np.random.rand and boolean masking (set values where rand < p to 1, else 0). Print the empirical probability (mean of the result) and verify it is close to 0.7.
Solution to Exercise 3
import numpy as np
np.random.seed(42)
p = 0.7
flips = np.zeros(1000)
flips[np.random.rand(1000) < p] = 1
empirical_p = flips.mean()
print(f"Empirical probability: {empirical_p:.3f}")
print(f"Close to 0.7: {abs(empirical_p - 0.7) < 0.05}")