Views vs Copies in NumPy¶
NumPy's handling of views and copies differs significantly from Python lists and other languages like MATLAB and R. Understanding this is crucial for memory efficiency and avoiding unexpected bugs.
Mental Model
A view is a different window into the same memory buffer; a copy is a completely separate buffer. Slicing returns views (cheap, but mutations propagate), while fancy indexing and most dtype conversions return copies (safe, but use extra memory). When in doubt, check b.base is a -- if it returns True, b is a view of a.
The NumPy Array Model
Every NumPy array is three things: a data buffer (contiguous block of typed memory), a shape (dimensions), and strides (byte jumps per axis). A view shares the same data buffer but may have different shape and strides. A copy allocates a new data buffer. Contiguity determines whether a view is possible for a given reshaping.
Every operation maps onto this model:
| Operation | What changes | Buffer copied? |
|---|---|---|
| View (slice, reshape, transpose) | shape and/or strides | No — same buffer |
Copy (.copy(), fancy index, arithmetic) |
everything | Yes — new buffer |
| Transpose | swaps strides | No |
| Reshape | reinterprets strides | No (if contiguous) |
| Slice | adjusts offset + strides | No |
| ravel | flattens via strides | No (if contiguous) |
This single model — buffer + shape + strides — explains everything in this section.
Core Concept¶
View Definition¶
A view shares the same underlying data buffer with the original array. Mutations propagate to both.
Copy Definition¶
A copy allocates new, independent memory. Mutations are isolated from the original.
Default Behavior¶
NumPy prefers views for fast computation and efficient memory usage.
The Key Difference¶
| Operation | Python List | NumPy Array |
|---|---|---|
| Slicing | Returns copy | Returns view |
| Assignment | Creates alias | Creates alias |
```python
Python list: slicing creates copy¶
lst = [1, 2, 3, 4, 5] lst_slice = lst[1:4] lst_slice[0] = 99 print(lst) # [1, 2, 3, 4, 5] (unchanged)
NumPy: slicing creates view¶
import numpy as np arr = np.array([1, 2, 3, 4, 5]) arr_slice = arr[1:4] arr_slice[0] = 99 print(arr) # [ 1 99 3 4 5] (changed!) ```
Visual Comparison¶
View Diagram¶
Original Array: [1, 2, 3, 4, 5]
↑
(shared memory)
↓
View: [2, 3, 4]
Copy Diagram¶
Original Array: [1, 2, 3, 4, 5]
(separate memory)
Copy: [2, 3, 4]
Views: Shared Memory¶
A view shares the same underlying data buffer:
```python arr = np.array([1, 2, 3, 4, 5]) view = arr[1:4]
print(np.shares_memory(arr, view)) # True print(view.base is arr) # True ```
Operations That Return Views¶
| Operation | Returns |
|---|---|
arr[1:4] |
View |
arr[::2] |
View |
arr.reshape(2, 3) |
View (usually) |
arr.T |
View |
arr.ravel() |
View (if contiguous) |
arr.view(dtype) |
View |
python
arr = np.arange(6)
reshaped = arr.reshape(2, 3)
reshaped[0, 0] = 99
print(arr) # [99 1 2 3 4 5] (affected!)
Copies: Independent Memory¶
A copy has its own data buffer:
```python arr = np.array([1, 2, 3, 4, 5]) copied = arr.copy()
print(np.shares_memory(arr, copied)) # False print(copied.base) # None
copied[0] = 99 print(arr) # [1 2 3 4 5] (unchanged) ```
Operations That Return Copies¶
| Operation | Returns |
|---|---|
arr.copy() |
Copy |
np.copy(arr) |
Copy |
arr.flatten() |
Copy (always) |
arr[[0, 2, 4]] |
Copy (fancy indexing) |
arr[arr > 2] |
Copy (boolean indexing) |
```python arr = np.array([1, 2, 3, 4, 5])
Fancy indexing: copy¶
fancy = arr[[0, 2, 4]] fancy[0] = 99 print(arr) # [1 2 3 4 5] (unchanged)
Boolean indexing: copy¶
mask = arr > 2 filtered = arr[mask] filtered[0] = 99 print(arr) # [1 2 3 4 5] (unchanged) ```
Checking View vs Copy¶
```python arr = np.arange(10)
Method 1: Check .base attribute¶
slice_view = arr[2:5] print(slice_view.base is arr) # True (view)
fancy_copy = arr[[2, 3, 4]] print(fancy_copy.base) # None (copy)
Method 2: np.shares_memory()¶
print(np.shares_memory(arr, slice_view)) # True print(np.shares_memory(arr, fancy_copy)) # False ```
Why Views Exist¶
Views provide significant performance benefits:
- Memory Efficiency: No data duplication means lower memory consumption
- Speed: Avoiding memory allocation and copying is faster
- Large Arrays: Critical when working with gigabyte-scale datasets
- In-place Operations: Modify subsets directly
```python
Process large array efficiently¶
data = np.random.randn(1_000_000)
View: no memory overhead¶
subset = data[::100] # Every 100th element subset *= 2 # Modify in-place (affects data!)
If you need independence:¶
subset = data[::100].copy() subset *= 2 # data unchanged ```
When to Copy¶
Explicit copies protect data integrity:
- Data Preservation: Copy when you need to preserve the original unchanged
- Multi-threaded Code: Copy to avoid race conditions in parallel processing
- Function Returns: Copy when returning array subsets from functions
Comparison: NumPy vs MATLAB vs R¶
Copy-on-Write Semantics¶
| Language | Default Behavior | Copy Trigger |
|---|---|---|
| NumPy | View (slicing) | Explicit .copy() |
| MATLAB | Lazy copy | On modification |
| R | Copy-on-modify | On modification |
MATLAB: Lazy Copy¶
MATLAB uses copy-on-write:
matlab
% MATLAB
A = [1 2 3 4 5];
B = A; % No copy yet (shares memory)
B(1) = 99; % Copy triggered here
% A is [1 2 3 4 5], B is [99 2 3 4 5]
R: Copy-on-Modify¶
R also uses copy-on-modify:
```r
R¶
a <- c(1, 2, 3, 4, 5) b <- a # No copy yet b[1] <- 99 # Copy triggered here
a is [1 2 3 4 5], b is [99 2 3 4 5]¶
```
NumPy: Explicit Views¶
NumPy is explicit — views are intentional:
```python
NumPy¶
a = np.array([1, 2, 3, 4, 5]) b = a # Alias (same object) b[0] = 99 # Modifies a too!
Both are [99 2 3 4 5]¶
To avoid:¶
b = a.copy() # Explicit copy ```
Summary Comparison Table¶
| Scenario | NumPy | MATLAB | R |
|---|---|---|---|
b = a |
Alias | Lazy copy | Lazy copy |
b = a[1:4] |
View | Copy | Copy |
b[0] = x after slice |
Modifies a |
Independent | Independent |
| Explicit copy | .copy() |
Not needed | Not needed |
| Memory efficiency | High (views) | Medium | Medium |
| Accidental mutation risk | High | Low | Low |
Common Pitfalls¶
Pitfall 1: Unexpected Modification¶
```python def process(arr): sub = arr[:5] sub[0] = 0 # Modifies original! return sub
data = np.arange(10) result = process(data) print(data) # [0 1 2 3 4 5 6 7 8 9] — modified! ```
Fix: Copy if you need independence:
python
def process(arr):
sub = arr[:5].copy()
sub[0] = 0
return sub
Pitfall 2: Stale Views¶
python
arr = np.array([1, 2, 3])
view = arr[:]
arr = np.array([4, 5, 6]) # arr now points to new array
print(view) # [1 2 3] — still points to old data
Best Practices¶
- Be explicit: Use
.copy()when you need independence - Check with
np.shares_memory()when uncertain - Document intent: Comment when views are intentional
- Defensive copying: Copy input arrays in functions if modifying
python
def safe_normalize(arr):
"""Normalize array without modifying original."""
arr = arr.copy() # Defensive copy
arr -= arr.mean()
arr /= arr.std()
return arr
Quick Reference¶
| Need | Action |
|---|---|
| Check if view | arr.base is not None or np.shares_memory(a, b) |
| Force copy | arr.copy() or np.copy(arr) |
| Flatten (always copy) | arr.flatten() |
| Flatten (view if possible) | arr.ravel() |
Key Takeaways¶
- NumPy slicing returns views (unlike Python lists)
- Views share memory — modifications propagate
- Use
.copy()for independent copies - Fancy/boolean indexing returns copies
- MATLAB and R use copy-on-write; NumPy uses explicit views
- Check with
np.shares_memory()or.baseattribute - Defensive copying in functions prevents accidental mutation
Exercises¶
Exercise 1.
Create a = np.arange(12).reshape(3, 4). Create a view v = a[1:3, 1:3] and a copy c = a[1:3, 1:3].copy(). Modify both and check which changes propagate back to a.
Solution to Exercise 1
import numpy as np
a = np.arange(12).reshape(3, 4)
v = a[1:3, 1:3]
c = a[1:3, 1:3].copy()
v[0, 0] = 99
print(f"a after view mod: a[1,1] = {a[1, 1]}") # 99
c[0, 0] = 88
print(f"a after copy mod: a[1,1] = {a[1, 1]}") # still 99
Exercise 2.
Write a function that takes an array and returns True if it is a view (has a base) and False if it owns its data. Test it on a slice, a reshape result, a .copy(), and a boolean-indexed result.
Solution to Exercise 2
import numpy as np
def is_view(arr):
return arr.base is not None
a = np.arange(12).reshape(3, 4)
print(f"Slice: {is_view(a[1:3])}") # True
print(f"Reshape: {is_view(a.reshape(4, 3))}") # True
print(f"Copy: {is_view(a.copy())}") # False
print(f"Boolean: {is_view(a[a > 5])}") # False
Exercise 3.
Create a large array a = np.random.randn(10000). Create a view v = a[::2] and a copy c = a[::2].copy(). Compare their nbytes and verify that the view shares memory with a by checking np.shares_memory(a, v).
Solution to Exercise 3
import numpy as np
a = np.random.randn(10000)
v = a[::2]
c = a[::2].copy()
print(f"View nbytes: {v.nbytes}")
print(f"Copy nbytes: {c.nbytes}")
print(f"Shares memory (view): {np.shares_memory(a, v)}")
print(f"Shares memory (copy): {np.shares_memory(a, c)}")