np.vectorize — Vectorizing Python Functions¶
np.vectorize converts a scalar function into a function that works element-wise on arrays. While convenient, it's not a performance optimization—it's primarily a convenience wrapper.
import numpy as np
Basic Usage¶
# Scalar function (only works on single values)
def add_tax(price):
if price > 100:
return price * 1.1
return price * 1.05
# This fails with arrays
prices = np.array([50, 100, 150])
# add_tax(prices) # ValueError: truth value ambiguous
# Vectorize it
add_tax_vec = np.vectorize(add_tax)
# Now works with arrays
result = add_tax_vec(prices)
print(result) # [52.5 105. 165.]
Why vectorize Exists¶
np.vectorize is useful when:
- You have an existing scalar function
- The function has complex logic (if/else, loops)
- You want array broadcasting behavior
- Quick prototyping before optimization
# Complex logic that's hard to vectorize manually
def categorize(value):
if value < 0:
return 'negative'
elif value == 0:
return 'zero'
elif value < 10:
return 'small'
elif value < 100:
return 'medium'
else:
return 'large'
categorize_vec = np.vectorize(categorize)
values = np.array([-5, 0, 5, 50, 500])
print(categorize_vec(values))
# ['negative' 'zero' 'small' 'medium' 'large']
Important: Not a Performance Tool!¶
np.vectorize does NOT make your code faster. It simply loops over elements internally—there's no actual vectorization at the C level.
import time
def slow_func(x):
return x ** 2 + 2 * x + 1
slow_func_vec = np.vectorize(slow_func)
arr = np.arange(1_000_000)
# Vectorized function (NOT faster)
start = time.time()
result1 = slow_func_vec(arr)
print(f"np.vectorize: {time.time() - start:.3f}s")
# True NumPy vectorization (MUCH faster)
start = time.time()
result2 = arr ** 2 + 2 * arr + 1
print(f"True NumPy: {time.time() - start:.3f}s")
# np.vectorize: ~0.5s
# True NumPy: ~0.01s (50x faster!)
Specifying Output Type¶
By default, vectorize infers the output type from the first element. Specify otypes to ensure correct type:
# Without otypes: may guess wrong type
def to_string(x):
return f"value: {x}"
vec_func = np.vectorize(to_string)
print(vec_func([1, 2, 3]).dtype) # <U8 (may truncate!)
# With otypes: correct type
vec_func = np.vectorize(to_string, otypes=[object])
# or
vec_func = np.vectorize(to_string, otypes=['U50'])
Common otypes¶
# Numeric outputs
np.vectorize(func, otypes=[float])
np.vectorize(func, otypes=[int])
np.vectorize(func, otypes=[np.float64])
# String outputs
np.vectorize(func, otypes=['U100']) # Unicode, max 100 chars
np.vectorize(func, otypes=[object]) # Python objects
# Multiple outputs
np.vectorize(func, otypes=[float, float])
Multiple Inputs and Outputs¶
Multiple Inputs¶
def power_diff(base, exp1, exp2):
return base ** exp1 - base ** exp2
power_diff_vec = np.vectorize(power_diff)
bases = np.array([2, 3, 4])
exp1 = np.array([2, 2, 2])
exp2 = np.array([1, 1, 1])
result = power_diff_vec(bases, exp1, exp2)
print(result) # [2 6 12]
Multiple Outputs¶
def div_mod(a, b):
return a // b, a % b
div_mod_vec = np.vectorize(div_mod)
a = np.array([10, 20, 30])
b = np.array([3, 7, 4])
quotients, remainders = div_mod_vec(a, b)
print(quotients) # [3 2 7]
print(remainders) # [1 6 2]
Excluded Arguments¶
Exclude arguments from vectorization (passed as-is):
def lookup(x, table):
"""Look up x in a dictionary."""
return table.get(x, 'unknown')
# Without excluded: tries to iterate over table
lookup_vec = np.vectorize(lookup, excluded=['table'])
table = {1: 'one', 2: 'two', 3: 'three'}
values = np.array([1, 2, 3, 4])
result = lookup_vec(values, table)
print(result) # ['one' 'two' 'three' 'unknown']
Signature for Generalized ufuncs¶
For functions that operate on subarrays rather than scalars:
# Function that takes a 1D array and returns a scalar
def array_sum(arr):
return arr.sum()
# signature: input is (n,), output is scalar ()
array_sum_vec = np.vectorize(array_sum, signature='(n)->()')
# Apply to 2D array (operates on each row)
matrix = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
result = array_sum_vec(matrix)
print(result) # [6 15 24]
Signature Examples¶
# Dot product of vectors
def dot(a, b):
return np.sum(a * b)
dot_vec = np.vectorize(dot, signature='(n),(n)->()')
# Matrix-vector multiplication
def matvec(M, v):
return M @ v
matvec_vec = np.vectorize(matvec, signature='(m,n),(n)->(m)')
Decorator Syntax¶
@np.vectorize
def celsius_to_fahrenheit(c):
return c * 9/5 + 32
temps_c = np.array([0, 20, 37, 100])
temps_f = celsius_to_fahrenheit(temps_c)
print(temps_f) # [32. 68. 98.6 212.]
With options:
@np.vectorize(otypes=[float], excluded=['unit'])
def convert_temp(value, unit):
if unit == 'C':
return value * 9/5 + 32
return value
Better Alternatives¶
Use np.where for Conditionals¶
# Instead of vectorize with if/else
def categorize(x):
if x > 0:
return 'positive'
return 'non-positive'
# Use np.where (much faster)
arr = np.array([-1, 0, 1, 2])
result = np.where(arr > 0, 'positive', 'non-positive')
Use np.select for Multiple Conditions¶
arr = np.array([-5, 0, 5, 50, 500])
conditions = [
arr < 0,
arr == 0,
arr < 10,
arr < 100,
]
choices = ['negative', 'zero', 'small', 'medium']
default = 'large'
result = np.select(conditions, choices, default)
# ['negative' 'zero' 'small' 'medium' 'large']
Use np.piecewise for Numeric Results¶
arr = np.array([-2, -1, 0, 1, 2], dtype=float)
result = np.piecewise(
arr,
[arr < 0, arr >= 0],
[lambda x: x ** 2, lambda x: x ** 3]
)
print(result) # [4. 1. 0. 1. 8.]
When to Use vectorize¶
| Scenario | Use vectorize? | Better Alternative |
|---|---|---|
| Quick prototype | ✅ Yes | - |
| Complex string logic | ✅ Yes | - |
| External library calls | ✅ Yes | - |
| Simple math | ❌ No | NumPy operations |
| Conditionals | ❌ No | np.where, np.select |
| Performance critical | ❌ No | Numba, Cython |
Summary¶
| Feature | Usage |
|---|---|
| Basic | np.vectorize(func) |
| Output type | np.vectorize(func, otypes=[float]) |
| Exclude args | np.vectorize(func, excluded=['param']) |
| Subarray ops | np.vectorize(func, signature='(n)->()') |
| Decorator | @np.vectorize |
Key Takeaways:
np.vectorizeis a convenience function, not performance optimization- It wraps a Python loop—not true vectorization
- Use
otypesto specify output dtypes explicitly - Use
excludedfor arguments that shouldn't be iterated - Use
signaturefor functions on subarrays - Prefer
np.where,np.select, or true NumPy operations for speed - Good for prototyping and complex logic, not production performance