Vectorization Basics¶
Vectorization replaces explicit loops with bulk array operations optimized in C.
Core Principle¶
Suppress Python-level loops for orders-of-magnitude speedup.
1. Loop Approach¶
import numpy as np
def square_with_loop(arr):
result = np.empty_like(arr)
for i in range(len(arr)):
result[i] = arr[i] ** 2
return result
2. Vectorized Approach¶
import numpy as np
def square_vectorized(arr):
return arr ** 2
3. Key Difference¶
Vectorized code executes in optimized C; loops execute in slow Python.
Performance Gain¶
Vectorization yields dramatic speedups.
1. Timing Comparison¶
import numpy as np
import time
def main():
arr = np.random.randn(1_000_000)
# Loop timing
start = time.perf_counter()
result_loop = np.empty_like(arr)
for i in range(len(arr)):
result_loop[i] = arr[i] ** 2
loop_time = time.perf_counter() - start
# Vectorized timing
start = time.perf_counter()
result_vec = arr ** 2
vec_time = time.perf_counter() - start
print(f"Loop time: {loop_time:.4f} sec")
print(f"Vectorized time: {vec_time:.6f} sec")
print(f"Speedup: {loop_time / vec_time:.0f}x")
if __name__ == "__main__":
main()
2. Typical Results¶
Loop time: 0.3200 sec
Vectorized time: 0.001500 sec
Speedup: 213x
3. Scale Matters¶
Speedup increases with array size.
Common Patterns¶
Recognize loop patterns that can be vectorized.
1. Element-wise Math¶
import numpy as np
# Loop
for i in range(len(arr)):
result[i] = np.sin(arr[i])
# Vectorized
result = np.sin(arr)
2. Conditional Logic¶
import numpy as np
# Loop
for i in range(len(arr)):
if arr[i] > 0:
result[i] = arr[i]
else:
result[i] = 0
# Vectorized
result = np.where(arr > 0, arr, 0)
3. Aggregation¶
import numpy as np
# Loop
total = 0
for i in range(len(arr)):
total += arr[i]
# Vectorized
total = np.sum(arr)
Universal Functions¶
NumPy ufuncs are inherently vectorized.
1. Math Functions¶
import numpy as np
arr = np.array([1, 2, 3, 4])
print(np.sqrt(arr)) # [1. 1.41 1.73 2.]
print(np.exp(arr)) # [2.71 7.38 20.08 54.59]
print(np.log(arr)) # [0. 0.69 1.09 1.38]
2. Comparison Functions¶
import numpy as np
a = np.array([1, 2, 3])
b = np.array([2, 2, 2])
print(np.maximum(a, b)) # [2 2 3]
print(np.minimum(a, b)) # [1 2 2]
3. Custom ufuncs¶
Use np.vectorize for custom functions (convenience, not performance).
When Loops Are OK¶
Some situations still require explicit loops.
1. Sequential Dependence¶
When iteration i depends on result of i-1.
2. Complex Logic¶
When vectorization would be unreadable or impossible.
3. Small Arrays¶
Overhead matters less for tiny arrays.
Runnable Example: norm_optimization_progression.py¶
"""
Vector Norm Computation: Pure Python vs NumPy Vectorization
This tutorial demonstrates how vectorizing mathematical operations can lead to
dramatic performance improvements. We'll compute the L2 norm (Euclidean norm) of
a vector using two approaches:
1. Pure Python with explicit loops
2. NumPy with vectorized array operations
The key insight: NumPy operations are implemented in C and operate on entire
arrays at once, while pure Python loops iterate element-by-element, incurring
function call overhead for each iteration.
Learning Goals:
- Understand what vectorization means in practice
- See how NumPy leverages compiled code for speed
- Recognize patterns in your code that could be vectorized
- Measure the real-world performance difference
"""
import time
import numpy as np
if __name__ == "__main__":
print("=" * 70)
print("VECTOR NORM COMPUTATION: PURE PYTHON vs NUMPY VECTORIZATION")
print("=" * 70)
# ============ EXAMPLE 1: Pure Python Implementation ============
print("\n" + "=" * 70)
print("EXAMPLE 1: Pure Python with Explicit Loops")
print("=" * 70)
def norm_square_python(vector):
"""
Compute the squared L2 norm using pure Python.
The L2 norm (Euclidean norm) of a vector is: sqrt(sum(v_i^2))
We compute the squared norm (without the sqrt) since sqrt is expensive
and we often only care about relative magnitudes.
Why this is slow:
- Each iteration calls Python operations (addition, multiplication)
- Python must interpret each operation
- No knowledge of what comes next, so can't optimize
"""
norm = 0
for v in vector:
norm += v * v
return norm
# Create a test vector
test_vector_python = list(range(10000))
# Show what the function returns
result_python = norm_square_python(test_vector_python)
print(f"\nSquared norm of vector [0, 1, 2, ..., 9999]: {result_python}")
print(f"This equals: sum(i^2 for i in 0..9999) = 0^2 + 1^2 + 2^2 + ... + 9999^2")
# Time the pure Python version
num_iterations = 5
times_python = []
test_size = 1000000
test_vector_python = list(range(test_size))
print(f"\nTiming pure Python with vector of {test_size:,} elements ({num_iterations} runs):")
for i in range(num_iterations):
start = time.time()
norm_square_python(test_vector_python)
elapsed = time.time() - start
times_python.append(elapsed)
print(f" Run {i+1}: {elapsed:.6f}s")
min_time_python = min(times_python)
print(f"\nBest pure Python time: {min_time_python:.6f}s")
# ============ EXAMPLE 2: NumPy Vectorized Implementation ============
print("\n" + "=" * 70)
print("EXAMPLE 2: NumPy Vectorized Operations")
print("=" * 70)
def norm_square_numpy(vector):
"""
Compute the squared L2 norm using NumPy.
Why this is fast:
- vector * vector: element-wise multiplication on entire array at once
- np.sum(): single function call to sum all elements
- Both operations are implemented in optimized C code
- NumPy can use SIMD (Single Instruction Multiple Data) on modern CPUs
- No Python loop overhead!
The key vectorization principle:
Instead of: for v in vector: norm += v * v
Do this: np.sum(vector * vector)
"""
return np.sum(vector * vector)
# Create a test vector using NumPy
test_vector_numpy = np.arange(10000)
# Show what the function returns
result_numpy = norm_square_numpy(test_vector_numpy)
print(f"\nSquared norm of vector [0, 1, 2, ..., 9999]: {result_numpy}")
print(f"Note: Same result as pure Python (both should be {int(result_python)})")
# Time the NumPy version
times_numpy = []
test_vector_numpy = np.arange(test_size)
print(f"\nTiming NumPy with vector of {test_size:,} elements ({num_iterations} runs):")
for i in range(num_iterations):
start = time.time()
norm_square_numpy(test_vector_numpy)
elapsed = time.time() - start
times_numpy.append(elapsed)
print(f" Run {i+1}: {elapsed:.6f}s")
min_time_numpy = min(times_numpy)
print(f"\nBest NumPy time: {min_time_numpy:.6f}s")
# ============ EXAMPLE 3: Performance Comparison ============
print("\n" + "=" * 70)
print("EXAMPLE 3: Performance Comparison & Speedup Analysis")
print("=" * 70)
speedup = min_time_python / min_time_numpy
print(f"\nResults for {test_size:,}-element vector:")
print(f" Pure Python: {min_time_python:.6f}s")
print(f" NumPy: {min_time_numpy:.6f}s")
print(f" Speedup: {speedup:.1f}x faster with NumPy")
print(f"\n{'*' * 70}")
print("WHY IS VECTORIZATION SO POWERFUL?")
print("{'*' * 70}")
print("""
1. ELIMINATION OF PYTHON LOOP OVERHEAD
- Pure Python: 1,000,000 iterations through Python's interpreter
- NumPy: One C function call that processes all data
2. COMPILED C IMPLEMENTATION
- NumPy operations (*, sum) are written in C
- C is orders of magnitude faster than interpreted Python
- Direct memory access without Python's dynamic type checking
3. SIMD VECTORIZATION
- Modern CPUs have instructions to process multiple values in one step
- NumPy can use these (AVX, SSE) for additional speedup
- Python loops cannot take advantage of this
4. MEMORY LAYOUT AWARENESS
- NumPy arrays store data in continuous memory blocks
- CPU caches work optimally with this layout
- Python lists are scattered pointers, poor cache behavior
5. ALGORITHMIC OPTIMIZATION
- NumPy's internals are battle-tested and optimized
- Authors have spent years perfecting these algorithms
- Your hand-written loops can't compete
""")
# ============ EXAMPLE 4: Correct NumPy Norm Using Built-in ============
print("\n" + "=" * 70)
print("EXAMPLE 4: Using NumPy's Built-in Norm Function")
print("=" * 70)
print("""
In real code, you'd use NumPy's norm function:
norm = np.linalg.norm(vector) # Computes sqrt(sum(v_i^2))
This is even more optimized and handles edge cases properly.
""")
# Demonstrate
vector_example = np.array([3.0, 4.0])
norm_result = np.linalg.norm(vector_example)
print(f"Example: norm([3, 4]) = {norm_result} (should be 5.0)")
print(f" Verification: sqrt(3^2 + 4^2) = sqrt(9 + 16) = sqrt(25) = 5.0")
# ============ EXAMPLE 5: The Vectorization Pattern ============
print("\n" + "=" * 70)
print("EXAMPLE 5: Recognizing Vectorization Opportunities")
print("=" * 70)
print("""
When you see this pattern in Python:
result = initial_value
for element in collection:
result = operation(result, element)
Consider if you can vectorize it with NumPy:
1. Convert to NumPy array
2. Use element-wise operations (*, +, etc.)
3. Use aggregation functions (sum, mean, max, etc.)
Example transformations:
BEFORE (Pure Python):
total = 0
for x in values:
total += x * x
AFTER (Vectorized):
total = np.sum(values * values)
BEFORE (Pure Python):
result = []
for i, val in enumerate(my_list):
result.append(val * 2)
AFTER (Vectorized):
result = np.array(my_list) * 2
The principle: Avoid Python loops over numeric data when possible.
""")
print("\n" + "=" * 70)
print("KEY TAKEAWAY")
print("=" * 70)
print(f"""
Vectorization with NumPy can provide {speedup:.0f}x+ speedup for numerical
operations. The main idea is to let compiled libraries (NumPy, using C code)
handle the iteration instead of Python's interpreter.
This is one of the most important optimization techniques for data-heavy
Python code. As a rule of thumb:
- Numerical operations on large arrays → Use NumPy
- File I/O or string processing → Optimize differently
- When speed matters, measure and vectorize
""")