Array Utilities: clip, unique, diff¶
Common NumPy utility functions for data manipulation and analysis.
import numpy as np
np.clip() — Limit Values to Range¶
Constrain array values to a minimum and maximum:
arr = np.array([1, 5, 10, 15, 20])
# Clip to range [5, 15]
clipped = np.clip(arr, 5, 15)
print(clipped) # [ 5 5 10 15 15]
# Clip with only min
np.clip(arr, 5, None) # [ 5 5 10 15 20]
# Clip with only max
np.clip(arr, None, 15) # [ 1 5 10 15 15]
Method Syntax¶
arr = np.array([1, 5, 10, 15, 20])
# Function syntax
np.clip(arr, 5, 15)
# Method syntax
arr.clip(5, 15)
# In-place (modifies original)
arr.clip(5, 15, out=arr)
Practical Examples¶
# Normalize pixel values to [0, 255]
pixels = np.array([-10, 50, 200, 300])
pixels = np.clip(pixels, 0, 255)
print(pixels) # [ 0 50 200 255]
# Prevent division by small numbers
denominators = np.array([0.001, 0.5, 1.0, 0.0001])
safe_denom = np.clip(denominators, 1e-6, None)
# Clip neural network gradients
gradients = np.array([-100, 0.5, 50, -0.1])
clipped_grads = np.clip(gradients, -1, 1)
print(clipped_grads) # [-1. 0.5 1. -0.1]
# Probability bounds
probs = np.array([-0.1, 0.5, 1.2])
probs = np.clip(probs, 0, 1)
print(probs) # [0. 0.5 1. ]
np.unique() — Find Unique Values¶
Return sorted unique elements of an array:
arr = np.array([3, 1, 2, 2, 3, 1, 1, 4])
unique = np.unique(arr)
print(unique) # [1 2 3 4]
Return Indices¶
arr = np.array([3, 1, 2, 2, 3, 1, 1, 4])
# Index of first occurrence of each unique value
unique, indices = np.unique(arr, return_index=True)
print(unique) # [1 2 3 4]
print(indices) # [1 2 0 7]
# Indices to reconstruct original from unique
unique, inverse = np.unique(arr, return_inverse=True)
print(unique) # [1 2 3 4]
print(inverse) # [2 0 1 1 2 0 0 3]
print(unique[inverse]) # [3 1 2 2 3 1 1 4] (original!)
# Count of each unique value
unique, counts = np.unique(arr, return_counts=True)
print(unique) # [1 2 3 4]
print(counts) # [3 2 2 1]
All Return Values¶
arr = np.array([3, 1, 2, 2, 3])
unique, indices, inverse, counts = np.unique(
arr,
return_index=True,
return_inverse=True,
return_counts=True
)
Unique Rows (2D)¶
arr = np.array([[1, 2],
[3, 4],
[1, 2],
[5, 6]])
# Unique rows
unique_rows = np.unique(arr, axis=0)
print(unique_rows)
# [[1 2]
# [3 4]
# [5 6]]
Practical Examples¶
# Find unique categories
labels = np.array(['cat', 'dog', 'cat', 'bird', 'dog'])
categories = np.unique(labels)
print(categories) # ['bird' 'cat' 'dog']
# Value counts (like pandas)
values, counts = np.unique(labels, return_counts=True)
for v, c in zip(values, counts):
print(f"{v}: {c}")
# bird: 1
# cat: 2
# dog: 2
# Label encoding
labels = np.array(['cat', 'dog', 'cat', 'bird'])
unique, encoded = np.unique(labels, return_inverse=True)
print(encoded) # [1 2 1 0] (numeric encoding)
# Check if array has duplicates
arr = np.array([1, 2, 3, 2])
has_duplicates = len(np.unique(arr)) < len(arr)
print(has_duplicates) # True
np.diff() — Discrete Differences¶
Calculate the n-th discrete difference along an axis:
arr = np.array([1, 3, 6, 10, 15])
# First difference: arr[i+1] - arr[i]
diff1 = np.diff(arr)
print(diff1) # [2 3 4 5]
# Second difference (difference of differences)
diff2 = np.diff(arr, n=2)
print(diff2) # [1 1 1]
Along Different Axes¶
matrix = np.array([[1, 2, 4],
[3, 5, 9]])
# Diff along columns (axis=1, default)
np.diff(matrix)
# [[1 2]
# [2 4]]
# Diff along rows (axis=0)
np.diff(matrix, axis=0)
# [[2 3 5]]
Prepend/Append Values¶
arr = np.array([1, 3, 6, 10])
# Prepend to maintain length
np.diff(arr, prepend=0)
# [1 2 3 4]
# Append to maintain length
np.diff(arr, append=arr[-1])
# [2 3 4 0]
Practical Examples¶
# Calculate velocity from position
time = np.array([0, 1, 2, 3, 4])
position = np.array([0, 2, 8, 18, 32])
velocity = np.diff(position) / np.diff(time)
print(velocity) # [ 2. 6. 10. 14.]
# Calculate acceleration
acceleration = np.diff(velocity)
print(acceleration) # [4. 4. 4.]
# Detect changes in signal
signal = np.array([1, 1, 1, 5, 5, 5, 2, 2])
changes = np.diff(signal)
change_points = np.where(changes != 0)[0]
print(change_points) # [2 5] (indices where changes occur)
# Cumulative sum check (diff is inverse of cumsum)
arr = np.array([1, 2, 3, 4, 5])
cumsum = np.cumsum(arr)
print(cumsum) # [ 1 3 6 10 15]
print(np.diff(cumsum, prepend=0)) # [1 2 3 4 5] (original!)
np.gradient() — Numerical Gradient¶
Unlike diff(), gradient() computes central differences and preserves array length:
arr = np.array([1, 3, 6, 10, 15])
# gradient uses central differences (except at edges)
grad = np.gradient(arr)
print(grad) # [2. 2.5 3.5 4.5 5. ]
# Compare with diff (one element shorter)
print(np.diff(arr)) # [2 3 4 5]
With Spacing¶
# Position at uneven time intervals
t = np.array([0, 1, 3, 6])
x = np.array([0, 2, 10, 28])
# Velocity with time spacing
velocity = np.gradient(x, t)
print(velocity) # [2. 2.5 4.33... 6.]
2D Gradient¶
img = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
# Returns gradient along each axis
gy, gx = np.gradient(img)
print(gx) # Gradient along x (columns)
print(gy) # Gradient along y (rows)
Practical Examples¶
# Edge detection (simplified)
image = np.random.rand(100, 100)
gy, gx = np.gradient(image)
edges = np.sqrt(gx**2 + gy**2)
# Numerical derivative of function
x = np.linspace(0, 2*np.pi, 100)
y = np.sin(x)
dydx = np.gradient(y, x)
# dydx ≈ cos(x)
diff() vs gradient()¶
| Feature | np.diff() |
np.gradient() |
|---|---|---|
| Method | Forward difference | Central difference |
| Output length | n - 1 | n (same) |
| Edge handling | None | One-sided at edges |
| Accuracy | First-order | Second-order |
| Use case | Discrete changes | Smooth derivatives |
arr = np.array([1, 4, 9, 16, 25]) # x^2 at x=1,2,3,4,5
# diff: forward difference
print(np.diff(arr)) # [3 5 7 9] (length 4)
# gradient: central difference
print(np.gradient(arr)) # [3. 4. 6. 8. 9.] (length 5)
# True derivative is 2x: [2, 4, 6, 8, 10]
# gradient is more accurate in the middle
Summary¶
| Function | Purpose | Key Feature |
|---|---|---|
np.clip(a, min, max) |
Limit values to range | In-place option |
np.unique(a) |
Find unique values | Return counts/indices |
np.diff(a) |
Forward differences | Length n-1 |
np.gradient(a) |
Central differences | Length n (preserved) |
Key Takeaways:
clip()is essential for data normalization and bound enforcementunique()withreturn_counts=Truereplaces pandas value_countsunique()withreturn_inverse=Trueenables label encodingdiff()for discrete changes (signal processing, change detection)gradient()for smooth numerical derivatives (preserves length)- Use
gradient()overdiff()when you need same-length output