Skip to content

Array Utilities: clip, unique, diff, gradient

Common NumPy utility functions for data manipulation and analysis.

Mental Model

clip caps values to a range (great for clamping pixel values or enforcing bounds), unique finds distinct elements (like SQL DISTINCT), and diff computes consecutive differences (the discrete derivative). These utilities handle the most common data-cleaning tasks without loops.

These Are Operators, Not Just Utilities

These functions transform arrays in three fundamentally different ways:

Function Operator type What it does
clip Constraint (projection) Enforce bounds — project values onto \([a, b]\)
unique Structural (set extraction) Extract distinct values — factorize into categories
diff Derivative (finite difference) Measure change — discrete first/second derivative
gradient Differential (central difference) Approximate continuous derivative — same-length output

Together they form the core toolkit for the pipeline: data → constrain (clip) → structure (unique) → differentiate (diff/gradient) → analyze.

The unifying insight: these are discrete analogs of mathematical operators — projection, factorization, and differentiation — applied element-wise to arrays. They bridge raw data and higher-level analysis.

python import numpy as np


np.clip() — Limit Values to Range

Mathematically, clip(x, a, b) is the projection of \(x\) onto the interval \([a, b]\) — it returns the closest point in the interval to \(x\). This framing connects it to constrained optimization, where clipping is a standard step after gradient updates.

Constrain array values to a minimum and maximum:

```python arr = np.array([1, 5, 10, 15, 20])

Clip to range [5, 15]

clipped = np.clip(arr, 5, 15) print(clipped) # [ 5 5 10 15 15]

Clip with only min

np.clip(arr, 5, None) # [ 5 5 10 15 20]

Clip with only max

np.clip(arr, None, 15) # [ 1 5 10 15 15] ```

Method Syntax

```python arr = np.array([1, 5, 10, 15, 20])

Function syntax

np.clip(arr, 5, 15)

Method syntax

arr.clip(5, 15)

In-place (modifies original)

arr.clip(5, 15, out=arr) ```

Practical Examples

```python

Normalize pixel values to [0, 255]

pixels = np.array([-10, 50, 200, 300]) pixels = np.clip(pixels, 0, 255) print(pixels) # [ 0 50 200 255]

Prevent division by small numbers

denominators = np.array([0.001, 0.5, 1.0, 0.0001]) safe_denom = np.clip(denominators, 1e-6, None)

Clip neural network gradients

gradients = np.array([-100, 0.5, 50, -0.1]) clipped_grads = np.clip(gradients, -1, 1) print(clipped_grads) # [-1. 0.5 1. -0.1]

Probability bounds

probs = np.array([-0.1, 0.5, 1.2]) probs = np.clip(probs, 0, 1) print(probs) # [0. 0.5 1. ] ```


np.unique() — Find Unique Values

unique with return_inverse=True is a factorization: it decomposes an array into a compact set of distinct values and an index mapping that can reconstruct the original. This is exactly categorical encoding — the basis of label encoding in ML, value-count analysis, and data compression.

Return sorted unique elements of an array:

```python arr = np.array([3, 1, 2, 2, 3, 1, 1, 4])

unique = np.unique(arr) print(unique) # [1 2 3 4] ```

Return Indices

```python arr = np.array([3, 1, 2, 2, 3, 1, 1, 4])

Index of first occurrence of each unique value

unique, indices = np.unique(arr, return_index=True) print(unique) # [1 2 3 4] print(indices) # [1 2 0 7]

Indices to reconstruct original from unique

unique, inverse = np.unique(arr, return_inverse=True) print(unique) # [1 2 3 4] print(inverse) # [2 0 1 1 2 0 0 3] print(unique[inverse]) # [3 1 2 2 3 1 1 4] (original!)

Count of each unique value

unique, counts = np.unique(arr, return_counts=True) print(unique) # [1 2 3 4] print(counts) # [3 2 2 1] ```

All Return Values

```python arr = np.array([3, 1, 2, 2, 3])

unique, indices, inverse, counts = np.unique( arr, return_index=True, return_inverse=True, return_counts=True ) ```

Unique Rows (2D)

```python arr = np.array([[1, 2], [3, 4], [1, 2], [5, 6]])

Unique rows

unique_rows = np.unique(arr, axis=0) print(unique_rows)

[[1 2]

[3 4]

[5 6]]

```

Practical Examples

```python

Find unique categories

labels = np.array(['cat', 'dog', 'cat', 'bird', 'dog']) categories = np.unique(labels) print(categories) # ['bird' 'cat' 'dog']

Value counts (like pandas)

values, counts = np.unique(labels, return_counts=True) for v, c in zip(values, counts): print(f"{v}: {c}")

bird: 1

cat: 2

dog: 2

Label encoding

labels = np.array(['cat', 'dog', 'cat', 'bird']) unique, encoded = np.unique(labels, return_inverse=True) print(encoded) # [1 2 1 0] (numeric encoding)

Check if array has duplicates

arr = np.array([1, 2, 3, 2]) has_duplicates = len(np.unique(arr)) < len(arr) print(has_duplicates) # True ```


np.diff() — Discrete Differences

diff is a finite difference operator — the discrete analog of differentiation. np.diff(x) computes forward differences (\(x_{i+1} - x_i\)), and np.diff(x, n=2) computes second-order differences (the discrete second derivative, analogous to acceleration). This connects directly to calculus, signal processing (change detection), and time-series analysis (returns, velocity).

Calculate the n-th discrete difference along an axis:

```python arr = np.array([1, 3, 6, 10, 15])

First difference: arr[i+1] - arr[i]

diff1 = np.diff(arr) print(diff1) # [2 3 4 5]

Second difference (difference of differences)

diff2 = np.diff(arr, n=2) print(diff2) # [1 1 1] ```

Along Different Axes

```python matrix = np.array([[1, 2, 4], [3, 5, 9]])

Diff along columns (axis=1, default)

np.diff(matrix)

[[1 2]

[2 4]]

Diff along rows (axis=0)

np.diff(matrix, axis=0)

[[2 3 5]]

```

Prepend/Append Values

```python arr = np.array([1, 3, 6, 10])

Prepend to maintain length

np.diff(arr, prepend=0)

[1 2 3 4]

Append to maintain length

np.diff(arr, append=arr[-1])

[2 3 4 0]

```

Practical Examples

```python

Calculate velocity from position

time = np.array([0, 1, 2, 3, 4]) position = np.array([0, 2, 8, 18, 32]) velocity = np.diff(position) / np.diff(time) print(velocity) # [ 2. 6. 10. 14.]

Calculate acceleration

acceleration = np.diff(velocity) print(acceleration) # [4. 4. 4.]

Detect changes in signal

signal = np.array([1, 1, 1, 5, 5, 5, 2, 2]) changes = np.diff(signal) change_points = np.where(changes != 0)[0] print(change_points) # [2 5] (indices where changes occur)

Cumulative sum check (diff is inverse of cumsum)

arr = np.array([1, 2, 3, 4, 5]) cumsum = np.cumsum(arr) print(cumsum) # [ 1 3 6 10 15] print(np.diff(cumsum, prepend=0)) # [1 2 3 4 5] (original!) ```


np.gradient() — Numerical Gradient

gradient approximates the continuous derivative using central differences, which are more accurate than the forward differences of diff and — critically — preserve the array length. For multi-dimensional arrays, np.gradient(img) returns partial derivatives along each axis, connecting directly to edge detection in image processing and gradient computation in optimization.

Unlike diff(), gradient() computes central differences and preserves array length:

```python arr = np.array([1, 3, 6, 10, 15])

gradient uses central differences (except at edges)

grad = np.gradient(arr) print(grad) # [2. 2.5 3.5 4.5 5. ]

Compare with diff (one element shorter)

print(np.diff(arr)) # [2 3 4 5] ```

With Spacing

```python

Position at uneven time intervals

t = np.array([0, 1, 3, 6]) x = np.array([0, 2, 10, 28])

Velocity with time spacing

velocity = np.gradient(x, t) print(velocity) # [2. 2.5 4.33... 6.] ```

2D Gradient

```python img = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

Returns gradient along each axis

gy, gx = np.gradient(img) print(gx) # Gradient along x (columns) print(gy) # Gradient along y (rows) ```

Practical Examples

```python

Edge detection (simplified)

image = np.random.rand(100, 100) gy, gx = np.gradient(image) edges = np.sqrt(gx2 + gy2)

Numerical derivative of function

x = np.linspace(0, 2*np.pi, 100) y = np.sin(x) dydx = np.gradient(y, x)

dydx ≈ cos(x)

```


diff() vs gradient()

Feature np.diff() np.gradient()
Method Forward difference Central difference
Output length n - 1 n (same)
Edge handling None One-sided at edges
Accuracy First-order Second-order
Use case Discrete changes Smooth derivatives

```python arr = np.array([1, 4, 9, 16, 25]) # x^2 at x=1,2,3,4,5

diff: forward difference

print(np.diff(arr)) # [3 5 7 9] (length 4)

gradient: central difference

print(np.gradient(arr)) # [3. 4. 6. 8. 9.] (length 5)

True derivative is 2x: [2, 4, 6, 8, 10]

gradient is more accurate in the middle

```


Summary

Function Purpose Key Feature
np.clip(a, min, max) Limit values to range In-place option
np.unique(a) Find unique values Return counts/indices
np.diff(a) Forward differences Length n-1
np.gradient(a) Central differences Length n (preserved)

Key Takeaways:

  • clip() is essential for data normalization and bound enforcement
  • unique() with return_counts=True replaces pandas value_counts
  • unique() with return_inverse=True enables label encoding
  • diff() for discrete changes (signal processing, change detection)
  • gradient() for smooth numerical derivatives (preserves length)
  • Use gradient() over diff() when you need same-length output

Exercises

Exercise 1. Write a short code example that demonstrates the main concept covered on this page. Include comments explaining each step.

Solution to Exercise 1

Refer to the code examples in the page content above. A complete solution would recreate the key pattern with clear comments explaining the NumPy operations involved.


Exercise 2. Predict the output of a code snippet that uses the features described on this page. Explain why the output is what it is.

Solution to Exercise 2

The output depends on how NumPy handles the specific operation. Key factors include array shapes, dtypes, and broadcasting rules. Trace through the computation step by step.


Exercise 3. Write a practical function that applies the concepts from this page to solve a real data processing task. Test it with sample data.

Solution to Exercise 3

```python import numpy as np

Example: apply the page's concept to process sample data

data = np.random.default_rng(42).random((5, 3))

Apply the relevant operation

result = data # replace with actual operation print(result) ```


Exercise 4. Identify a common mistake when using the features described on this page. Write code that demonstrates the mistake and then show the corrected version.

Solution to Exercise 4

A common mistake is misunderstanding array shapes or dtypes. Always check .shape and .dtype when debugging unexpected results.