Array Utilities: clip, unique, diff, gradient¶
Common NumPy utility functions for data manipulation and analysis.
Mental Model
clip caps values to a range (great for clamping pixel values or enforcing bounds), unique finds distinct elements (like SQL DISTINCT), and diff computes consecutive differences (the discrete derivative). These utilities handle the most common data-cleaning tasks without loops.
These Are Operators, Not Just Utilities
These functions transform arrays in three fundamentally different ways:
| Function | Operator type | What it does |
|---|---|---|
clip |
Constraint (projection) | Enforce bounds — project values onto \([a, b]\) |
unique |
Structural (set extraction) | Extract distinct values — factorize into categories |
diff |
Derivative (finite difference) | Measure change — discrete first/second derivative |
gradient |
Differential (central difference) | Approximate continuous derivative — same-length output |
Together they form the core toolkit for the pipeline: data → constrain (clip) → structure (unique) → differentiate (diff/gradient) → analyze.
The unifying insight: these are discrete analogs of mathematical operators — projection, factorization, and differentiation — applied element-wise to arrays. They bridge raw data and higher-level analysis.
python
import numpy as np
np.clip() — Limit Values to Range¶
Mathematically, clip(x, a, b) is the projection of \(x\) onto the interval
\([a, b]\) — it returns the closest point in the interval to \(x\). This framing
connects it to constrained optimization, where clipping is a standard step after
gradient updates.
Constrain array values to a minimum and maximum:
```python arr = np.array([1, 5, 10, 15, 20])
Clip to range [5, 15]¶
clipped = np.clip(arr, 5, 15) print(clipped) # [ 5 5 10 15 15]
Clip with only min¶
np.clip(arr, 5, None) # [ 5 5 10 15 20]
Clip with only max¶
np.clip(arr, None, 15) # [ 1 5 10 15 15] ```
Method Syntax¶
```python arr = np.array([1, 5, 10, 15, 20])
Function syntax¶
np.clip(arr, 5, 15)
Method syntax¶
arr.clip(5, 15)
In-place (modifies original)¶
arr.clip(5, 15, out=arr) ```
Practical Examples¶
```python
Normalize pixel values to [0, 255]¶
pixels = np.array([-10, 50, 200, 300]) pixels = np.clip(pixels, 0, 255) print(pixels) # [ 0 50 200 255]
Prevent division by small numbers¶
denominators = np.array([0.001, 0.5, 1.0, 0.0001]) safe_denom = np.clip(denominators, 1e-6, None)
Clip neural network gradients¶
gradients = np.array([-100, 0.5, 50, -0.1]) clipped_grads = np.clip(gradients, -1, 1) print(clipped_grads) # [-1. 0.5 1. -0.1]
Probability bounds¶
probs = np.array([-0.1, 0.5, 1.2]) probs = np.clip(probs, 0, 1) print(probs) # [0. 0.5 1. ] ```
np.unique() — Find Unique Values¶
unique with return_inverse=True is a factorization: it decomposes an array
into a compact set of distinct values and an index mapping that can reconstruct the
original. This is exactly categorical encoding — the basis of label encoding in
ML, value-count analysis, and data compression.
Return sorted unique elements of an array:
```python arr = np.array([3, 1, 2, 2, 3, 1, 1, 4])
unique = np.unique(arr) print(unique) # [1 2 3 4] ```
Return Indices¶
```python arr = np.array([3, 1, 2, 2, 3, 1, 1, 4])
Index of first occurrence of each unique value¶
unique, indices = np.unique(arr, return_index=True) print(unique) # [1 2 3 4] print(indices) # [1 2 0 7]
Indices to reconstruct original from unique¶
unique, inverse = np.unique(arr, return_inverse=True) print(unique) # [1 2 3 4] print(inverse) # [2 0 1 1 2 0 0 3] print(unique[inverse]) # [3 1 2 2 3 1 1 4] (original!)
Count of each unique value¶
unique, counts = np.unique(arr, return_counts=True) print(unique) # [1 2 3 4] print(counts) # [3 2 2 1] ```
All Return Values¶
```python arr = np.array([3, 1, 2, 2, 3])
unique, indices, inverse, counts = np.unique( arr, return_index=True, return_inverse=True, return_counts=True ) ```
Unique Rows (2D)¶
```python arr = np.array([[1, 2], [3, 4], [1, 2], [5, 6]])
Unique rows¶
unique_rows = np.unique(arr, axis=0) print(unique_rows)
[[1 2]¶
[3 4]¶
[5 6]]¶
```
Practical Examples¶
```python
Find unique categories¶
labels = np.array(['cat', 'dog', 'cat', 'bird', 'dog']) categories = np.unique(labels) print(categories) # ['bird' 'cat' 'dog']
Value counts (like pandas)¶
values, counts = np.unique(labels, return_counts=True) for v, c in zip(values, counts): print(f"{v}: {c}")
bird: 1¶
cat: 2¶
dog: 2¶
Label encoding¶
labels = np.array(['cat', 'dog', 'cat', 'bird']) unique, encoded = np.unique(labels, return_inverse=True) print(encoded) # [1 2 1 0] (numeric encoding)
Check if array has duplicates¶
arr = np.array([1, 2, 3, 2]) has_duplicates = len(np.unique(arr)) < len(arr) print(has_duplicates) # True ```
np.diff() — Discrete Differences¶
diff is a finite difference operator — the discrete analog of differentiation.
np.diff(x) computes forward differences (\(x_{i+1} - x_i\)), and np.diff(x, n=2)
computes second-order differences (the discrete second derivative, analogous to
acceleration). This connects directly to calculus, signal processing (change
detection), and time-series analysis (returns, velocity).
Calculate the n-th discrete difference along an axis:
```python arr = np.array([1, 3, 6, 10, 15])
First difference: arr[i+1] - arr[i]¶
diff1 = np.diff(arr) print(diff1) # [2 3 4 5]
Second difference (difference of differences)¶
diff2 = np.diff(arr, n=2) print(diff2) # [1 1 1] ```
Along Different Axes¶
```python matrix = np.array([[1, 2, 4], [3, 5, 9]])
Diff along columns (axis=1, default)¶
np.diff(matrix)
[[1 2]¶
[2 4]]¶
Diff along rows (axis=0)¶
np.diff(matrix, axis=0)
[[2 3 5]]¶
```
Prepend/Append Values¶
```python arr = np.array([1, 3, 6, 10])
Prepend to maintain length¶
np.diff(arr, prepend=0)
[1 2 3 4]¶
Append to maintain length¶
np.diff(arr, append=arr[-1])
[2 3 4 0]¶
```
Practical Examples¶
```python
Calculate velocity from position¶
time = np.array([0, 1, 2, 3, 4]) position = np.array([0, 2, 8, 18, 32]) velocity = np.diff(position) / np.diff(time) print(velocity) # [ 2. 6. 10. 14.]
Calculate acceleration¶
acceleration = np.diff(velocity) print(acceleration) # [4. 4. 4.]
Detect changes in signal¶
signal = np.array([1, 1, 1, 5, 5, 5, 2, 2]) changes = np.diff(signal) change_points = np.where(changes != 0)[0] print(change_points) # [2 5] (indices where changes occur)
Cumulative sum check (diff is inverse of cumsum)¶
arr = np.array([1, 2, 3, 4, 5]) cumsum = np.cumsum(arr) print(cumsum) # [ 1 3 6 10 15] print(np.diff(cumsum, prepend=0)) # [1 2 3 4 5] (original!) ```
np.gradient() — Numerical Gradient¶
gradient approximates the continuous derivative using central differences,
which are more accurate than the forward differences of diff and — critically —
preserve the array length. For multi-dimensional arrays, np.gradient(img) returns
partial derivatives along each axis, connecting directly to edge detection in image
processing and gradient computation in optimization.
Unlike diff(), gradient() computes central differences and preserves array length:
```python arr = np.array([1, 3, 6, 10, 15])
gradient uses central differences (except at edges)¶
grad = np.gradient(arr) print(grad) # [2. 2.5 3.5 4.5 5. ]
Compare with diff (one element shorter)¶
print(np.diff(arr)) # [2 3 4 5] ```
With Spacing¶
```python
Position at uneven time intervals¶
t = np.array([0, 1, 3, 6]) x = np.array([0, 2, 10, 28])
Velocity with time spacing¶
velocity = np.gradient(x, t) print(velocity) # [2. 2.5 4.33... 6.] ```
2D Gradient¶
```python img = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
Returns gradient along each axis¶
gy, gx = np.gradient(img) print(gx) # Gradient along x (columns) print(gy) # Gradient along y (rows) ```
Practical Examples¶
```python
Edge detection (simplified)¶
image = np.random.rand(100, 100) gy, gx = np.gradient(image) edges = np.sqrt(gx2 + gy2)
Numerical derivative of function¶
x = np.linspace(0, 2*np.pi, 100) y = np.sin(x) dydx = np.gradient(y, x)
dydx ≈ cos(x)¶
```
diff() vs gradient()¶
| Feature | np.diff() |
np.gradient() |
|---|---|---|
| Method | Forward difference | Central difference |
| Output length | n - 1 | n (same) |
| Edge handling | None | One-sided at edges |
| Accuracy | First-order | Second-order |
| Use case | Discrete changes | Smooth derivatives |
```python arr = np.array([1, 4, 9, 16, 25]) # x^2 at x=1,2,3,4,5
diff: forward difference¶
print(np.diff(arr)) # [3 5 7 9] (length 4)
gradient: central difference¶
print(np.gradient(arr)) # [3. 4. 6. 8. 9.] (length 5)
True derivative is 2x: [2, 4, 6, 8, 10]¶
gradient is more accurate in the middle¶
```
Summary¶
| Function | Purpose | Key Feature |
|---|---|---|
np.clip(a, min, max) |
Limit values to range | In-place option |
np.unique(a) |
Find unique values | Return counts/indices |
np.diff(a) |
Forward differences | Length n-1 |
np.gradient(a) |
Central differences | Length n (preserved) |
Key Takeaways:
clip()is essential for data normalization and bound enforcementunique()withreturn_counts=Truereplaces pandas value_countsunique()withreturn_inverse=Trueenables label encodingdiff()for discrete changes (signal processing, change detection)gradient()for smooth numerical derivatives (preserves length)- Use
gradient()overdiff()when you need same-length output
Exercises¶
Exercise 1. Write a short code example that demonstrates the main concept covered on this page. Include comments explaining each step.
Solution to Exercise 1
Refer to the code examples in the page content above. A complete solution would recreate the key pattern with clear comments explaining the NumPy operations involved.
Exercise 2. Predict the output of a code snippet that uses the features described on this page. Explain why the output is what it is.
Solution to Exercise 2
The output depends on how NumPy handles the specific operation. Key factors include array shapes, dtypes, and broadcasting rules. Trace through the computation step by step.
Exercise 3. Write a practical function that applies the concepts from this page to solve a real data processing task. Test it with sample data.
Solution to Exercise 3
```python import numpy as np
Example: apply the page's concept to process sample data¶
data = np.random.default_rng(42).random((5, 3))
Apply the relevant operation¶
result = data # replace with actual operation print(result) ```
Exercise 4. Identify a common mistake when using the features described on this page. Write code that demonstrates the mistake and then show the corrected version.
Solution to Exercise 4
A common mistake is misunderstanding array shapes or dtypes. Always check .shape and .dtype when debugging unexpected results.