Memory Management¶
Understanding copy semantics, memory views, and __slots__ for efficient memory usage.
Mental Model
Shallow copy is like photocopying a page of links -- you get a new page, but the links still point to the same websites. Deep copy recreates the entire website tree. Choose shallow when shared sub-objects are fine; choose deep when each copy must be fully independent.
Copy vs Deepcopy¶
Shallow Copy¶
A shallow copy creates a new container but references the same nested objects:
```python import copy
original = [[1, 2], [3, 4]] shallow = copy.copy(original)
Top level is copied¶
print(shallow is original) # False
Nested objects are shared¶
print(shallow[0] is original[0]) # True ```
Mutation affects both:
```python shallow[0].append(3)
print(original) # [[1, 2, 3], [3, 4]] print(shallow) # [[1, 2, 3], [3, 4]]
Both changed!¶
```
Deep Copy¶
A deep copy recursively copies all nested objects:
```python import copy
original = [[1, 2], [3, 4]] deep = copy.deepcopy(original)
Everything is copied¶
print(deep is original) # False print(deep[0] is original[0]) # False ```
Changes are independent:
```python deep[0].append(3)
print(original) # [[1, 2], [3, 4]] print(deep) # [[1, 2, 3], [3, 4]]
Only deep changed¶
```
When to Use Each¶
| Situation | Use | Example |
|---|---|---|
| Flat data | Shallow | [1, 2, 3].copy() |
| Nested mutable | Deep | copy.deepcopy(nested) |
| Performance critical | Shallow | Large data, no nesting |
| Complete independence | Deep | Complex structures |
```python
Shallow: fast, less memory¶
data = [1, 2, 3] backup = data.copy()
Deep: slow, more memory, but safe¶
nested = [[1, 2], {'a': [3, 4]}] backup = copy.deepcopy(nested) ```
Memory Views¶
Memory views provide zero-copy access to buffer data.
Creating Memory Views¶
```python data = bytearray(b'Hello World') view = memoryview(data)
Access without copying¶
print(view[0]) # 72 (ASCII for 'H') print(bytes(view[0:5])) # b'Hello' ```
Zero-Copy Slicing¶
```python data = bytearray(range(100)) view = memoryview(data)
Slice creates a view, not a copy¶
slice_view = view[10:20] print(bytes(slice_view)) # b'\n\x0b\x0c...'
Verify no copy¶
slice_view[0] = 255 print(data[10]) # 255 (original modified) ```
Use Cases¶
Large Data Processing:
```python import array
arr = array.array('i', range(1000000)) view = memoryview(arr)
Process in chunks without copying¶
for i in range(0, len(view), 1000): chunk = view[i:i+1000] process(chunk) ```
Network Buffers:
```python buffer = bytearray(4096) view = memoryview(buffer)
Receive directly into buffer¶
bytes_received = socket.recv_into(view)
Process without copying¶
data = view[:bytes_received] ```
In-Place Modification:
```python data = bytearray(b'Hello') view = memoryview(data)
view[0] = ord('J') print(data) # bytearray(b'Jello') ```
Memory View Properties¶
```python view = memoryview(bytearray(100))
print(view.nbytes) # Total bytes print(view.readonly) # False for bytearray print(view.format) # 'B' (unsigned char) print(view.itemsize) # 1 byte per item ```
__slots__¶
__slots__ reduces memory usage by eliminating the instance __dict__.
Without Slots (Default)¶
```python class Point: def init(self, x, y): self.x = x self.y = y
p = Point(10, 20) print(p.dict) # {'x': 10, 'y': 20} ```
With Slots¶
```python class Point: slots = ['x', 'y']
def __init__(self, x, y):
self.x = x
self.y = y
p = Point(10, 20)
p.dict # AttributeError: no dict¶
print(p.x, p.y) # 10 20 ```
Memory Comparison¶
```python import sys
class WithDict: def init(self, x, y): self.x = x self.y = y
class WithSlots: slots = ['x', 'y']
def __init__(self, x, y):
self.x = x
self.y = y
d = WithDict(1, 2) s = WithSlots(1, 2)
print(sys.getsizeof(d)) # ~48 bytes (+ dict ~104) print(sys.getsizeof(s)) # ~48 bytes (no dict overhead) ```
Restrictions¶
No Dynamic Attributes:
```python class Point: slots = ['x', 'y']
def __init__(self, x, y):
self.x = x
self.y = y
p = Point(10, 20)
p.z = 30 # AttributeError: 'Point' has no attribute 'z'¶
```
Inheritance Considerations:
```python class Point: slots = ['x', 'y']
class Point3D(Point): slots = ['z'] # Only add new slots
def __init__(self, x, y, z):
self.x = x
self.y = y
self.z = z
```
Enabling Weak References:
python
class Point:
__slots__ = ['x', 'y', '__weakref__'] # Add __weakref__
When to Use __slots__¶
```python
Good: millions of instances¶
class Particle: slots = ['x', 'y', 'vx', 'vy', 'mass']
particles = [Particle() for _ in range(1000000)]
Saves significant memory¶
Not needed: few instances¶
class Config: def init(self): self.debug = False self.verbose = True
Only one instance, slots unnecessary¶
```
Summary¶
| Technique | Purpose | Use When |
|---|---|---|
| Shallow copy | Quick duplicate | Flat structures |
| Deep copy | Full independence | Nested mutables |
| Memory views | Zero-copy access | Large buffers |
__slots__ |
Reduce memory | Many instances |
Key points:
- Understand shallow vs deep copy to avoid aliasing bugs
- Use memory views for efficient buffer manipulation
- Use
__slots__when creating many instances of a class __slots__prevents dynamic attribute addition- Always profile before optimizing
Runnable Example: list_internals.py¶
```python """ 01_list_internals.py
TOPIC: Python List Implementation Internals LEVEL: Advanced DURATION: 60-75 minutes
PREREQUISITES: - Topic #24: Memory Deep Dive (mutable types, references, memory management)
LEARNING OBJECTIVES: 1. Understand list memory layout 2. Learn about over-allocation strategy 3. Explore capacity vs length 4. Understand why lists are fast for certain operations
This is ADVANCED implementation details - builds on memory concepts from #24 """
import sys
if name == "main":
print("=" * 70)
print("PYTHON LIST INTERNALS")
print("=" * 70)
# ============================================================================
# SECTION 1: List Memory Layout
# ============================================================================
print("\nSECTION 1: How Lists Are Structured")
print("-" * 70)
print("""
PYTHON LIST STRUCTURE (CPython):
┌─────────────────────────────────────────┐
│ PyListObject │
├─────────────────────────────────────────┤
│ ob_refcnt: reference count │
│ ob_type: pointer to list type │
│ ob_size: number of elements (LENGTH) │ ← What len() returns
│ allocated: capacity (slots) │ ← Total allocated space
│ ob_item: pointer to array of pointers │ ← Points to actual data
└─────────────────────────────────────────┘
│
↓
┌──────────────────────┐
│ Array of Pointers │
├──────────────────────┤
│ [0] → Object 1 │
│ [1] → Object 2 │
│ [2] → Object 3 │
│ [3] → Object 4 │
│ [4] → (unused) │ ← OVER-ALLOCATION
│ [5] → (unused) │ ← Extra capacity
└──────────────────────┘
KEY INSIGHT:
- Lists store POINTERS to objects, not objects themselves
- ob_size (length) ≤ allocated (capacity)
- Extra capacity allows fast appends!
""")
# ============================================================================
# SECTION 2: Length vs Capacity
# ============================================================================
print("\nSECTION 2: Length vs Capacity")
print("-" * 70)
# Create empty list
my_list = []
print(f"Empty list: {my_list}")
print(f" len() = {len(my_list)}") # ob_size
print(f" sys.getsizeof() = {sys.getsizeof(my_list)} bytes") # includes allocated capacity
# Add one element
my_list.append(1)
print(f"\nAfter append(1): {my_list}")
print(f" len() = {len(my_list)}")
print(f" sys.getsizeof() = {sys.getsizeof(my_list)} bytes")
# Add more elements
sizes = [sys.getsizeof([])]
for i in range(2, 20):
my_list.append(i)
size = sys.getsizeof(my_list)
if size > sizes[-1]:
print(f"\nAfter append({i}): len={len(my_list)}, size={size} bytes (RESIZED!)")
sizes.append(size)
else:
print(f"After append({i}): len={len(my_list)}, size={size} bytes")
print("""
OBSERVATIONS:
- Size doesn't increase with EVERY append
- Size increases in JUMPS (when capacity is exceeded)
- This is OVER-ALLOCATION strategy
- Trade-off: waste some memory for faster appends
""")
# ============================================================================
# SECTION 3: Over-Allocation Strategy
# ============================================================================
print("\n" + "=" * 70)
print("SECTION 3: Why Over-Allocation?")
print("-" * 70)
print("""
WITHOUT OVER-ALLOCATION (naive approach):
───────────────────────────────────────
lst = [] # Allocate 0 slots
lst.append(1) # Allocate 1 slot, copy 0 items
lst.append(2) # Allocate 2 slots, copy 1 item
lst.append(3) # Allocate 3 slots, copy 2 items
lst.append(4) # Allocate 4 slots, copy 3 items
...
lst.append(n) # Allocate n slots, copy n-1 items
Total copies: 0 + 1 + 2 + ... + (n-1) = O(n²) ← TERRIBLE!
WITH OVER-ALLOCATION (Python's approach):
───────────────────────────────────────
lst = [] # Allocate 0 slots
lst.append(1) # Allocate 4 slots (over-allocate!)
lst.append(2) # Use existing capacity
lst.append(3) # Use existing capacity
lst.append(4) # Use existing capacity
lst.append(5) # Allocate 8 slots, copy 4 items
lst.append(6) # Use existing capacity
...
Total copies: Much fewer! = O(n) ← MUCH BETTER!
GROWTH PATTERN (approximately):
0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
Formula (approximately): new_capacity = old_capacity + (old_capacity >> 3) + 6
Which is roughly: new_capacity ≈ 1.125 * old_capacity + 6
""")
# ============================================================================
# SECTION 4: Performance Implications
# ============================================================================
print("\nSECTION 4: Performance Analysis")
print("-" * 70)
print("""
OPERATION PERFORMANCE:
FAST (O(1) amortized):
✓ append() - usually fits in capacity
✓ Access by index: lst[i]
✓ Modify by index: lst[i] = x
✓ len(lst)
✓ pop() - remove last element
SLOW (O(n)):
✗ insert(0, x) - shift all elements right
✗ pop(0) - shift all elements left
✗ remove(x) - search then shift
✗ Concatenation: lst1 + lst2
WHY append() is O(1) amortized:
- Most appends use existing capacity: O(1)
- Occasional resize: O(n)
- But resizes become increasingly rare
- Average over many operations: O(1)
""")
# Demonstrate fast append
import time
n = 100000
start = time.time()
test_list = []
for i in range(n):
test_list.append(i)
append_time = time.time() - start
print(f"\nAppending {n} elements: {append_time:.4f} seconds")
print(f"Average per append: {append_time/n*1000000:.2f} microseconds")
# ============================================================================
# SECTION 5: Memory Trade-offs
# ============================================================================
print("\n" + "=" * 70)
print("SECTION 5: Memory Trade-offs")
print("-" * 70)
# Show wasted space
test_list = list(range(100))
actual_size = sys.getsizeof(test_list)
ptr_size = 8 # 64-bit system
base_size = sys.getsizeof([])
used_size = base_size + (100 * ptr_size)
print(f"List with 100 elements:")
print(f" Total memory: {actual_size} bytes")
print(f" Base overhead: {base_size} bytes")
print(f" Used for data: {100 * ptr_size} bytes (100 pointers × 8 bytes)")
print(f" Estimated waste: {actual_size - used_size} bytes (~{(actual_size-used_size)/actual_size*100:.1f}%)")
print("""
TRADE-OFF:
- Waste ~10-30% of memory (varies with size)
- Gain O(1) amortized append performance
- For most applications: speed > memory
WHEN TO CARE:
- Creating MANY small lists → use tuples if immutable
- Memory-constrained environments
- Lists that rarely grow after creation
""")
# ============================================================================
# SECTION 6: Practical Implications
# ============================================================================
print("\nSECTION 6: Practical Programming Implications")
print("-" * 70)
print("""
BEST PRACTICES:
1. Pre-allocate if you know size:
✓ lst = [None] * 1000 # Better than 1000 appends
2. Use list comprehensions:
✓ [x**2 for x in range(1000)] # Optimized
3. Avoid repeated insertions at beginning:
✗ for x in data: lst.insert(0, x) # O(n²)
✓ Use collections.deque for queue operations
4. Don't pre-allocate unnecessarily:
✗ lst = [None] * 1000000 # Wastes memory if not all used
5. Concatenation alternatives:
✗ result = []
for lst in many_lists:
result = result + lst # Creates new list each time!
✓ result = []
for lst in many_lists:
result.extend(lst) # Modifies in place
""")
# ============================================================================
# SECTION 7: Comparison to Other Data Structures
# ============================================================================
print("\nSECTION 7: When NOT to Use Lists")
print("-" * 70)
print("""
USE THESE INSTEAD:
collections.deque:
- Fast O(1) operations at both ends
- Use for queues, stacks
- Slower random access
array.array:
- Compact storage for numeric data
- No over-allocation waste
- Only homogeneous types
numpy.ndarray:
- Multi-dimensional arrays
- Vectorized operations
- Scientific computing
tuple:
- Immutable, no over-allocation
- Slightly faster, less memory
- Use when size fixed
""")
# ============================================================================
# SECTION 8: Key Takeaways
# ============================================================================
print("\nKEY TAKEAWAYS")
print("-" * 70)
print("""
1. Lists use OVER-ALLOCATION for performance
2. Length (ob_size) ≤ Capacity (allocated)
3. append() is O(1) amortized due to over-allocation
4. Occasional resize operations are O(n)
5. Growth pattern: ~1.125x + constant
6. Trades memory (10-30% waste) for speed
7. insert(0) and pop(0) are O(n) - avoid for queues
8. Pre-allocate if size known
9. Use deque for queue operations
10. Understanding internals helps choose right data structure
CONNECTS TO #24 MEMORY CONCEPTS:
- Mutable type behavior
- Reference storage
- Memory allocation patterns
- Performance vs memory trade-offs
""")
print("\nSee exercises.py for practice!")
```
Exercises¶
Exercise 1. Write a script that demonstrates the difference between shallow and deep copy for a nested structure: a list of dictionaries where each dictionary contains a list. Modify a deeply nested value through the shallow copy and show that the original is affected. Then do the same with a deep copy and show independence.
Solution to Exercise 1
```python
import copy
original = [
{"name": "Alice", "scores": [90, 85]},
{"name": "Bob", "scores": [78, 92]},
]
shallow = copy.copy(original)
shallow[0]["scores"].append(100)
print("After shallow copy modification:")
print(f" original[0]['scores'] = {original[0]['scores']}") # [90, 85, 100]
print(f" shallow[0]['scores'] = {shallow[0]['scores']}") # [90, 85, 100]
print(" Original was affected!\n")
original[0]["scores"].pop() # restore
deep = copy.deepcopy(original)
deep[0]["scores"].append(200)
print("After deep copy modification:")
print(f" original[0]['scores'] = {original[0]['scores']}") # [90, 85]
print(f" deep[0]['scores'] = {deep[0]['scores']}") # [90, 85, 200]
print(" Original is independent!")
```
Exercise 2.
Create a bytearray of 1000 bytes, then obtain a memoryview of it. Slice the memory view to get bytes 100 through 199 and modify the first byte of the slice to 0xFF. Print the original bytearray at index 100 to prove that memory views provide zero-copy access. Measure and print the size of the slice versus creating an actual bytes copy of the same range.
Solution to Exercise 2
```python
import sys
data = bytearray(1000)
view = memoryview(data)
slice_view = view[100:200]
slice_view[0] = 0xFF
print(f"data[100] = {data[100]:#04x}") # 0xff — zero-copy confirmed
copy_bytes = bytes(data[100:200])
print(f"memoryview slice size: {sys.getsizeof(slice_view)} bytes")
print(f"bytes copy size: {sys.getsizeof(copy_bytes)} bytes")
```
Exercise 3.
Define a class Sensor with __slots__ = ('id', 'value', 'timestamp') and a class SensorNoSlots with the same attributes but no __slots__. Create 50,000 instances of each, then use sys.getsizeof() to compare per-instance sizes. Also use tracemalloc to measure the total memory consumed by each batch.
Solution to Exercise 3
```python
import sys
import tracemalloc
class Sensor:
__slots__ = ('id', 'value', 'timestamp')
def __init__(self, id, value, timestamp):
self.id = id
self.value = value
self.timestamp = timestamp
class SensorNoSlots:
def __init__(self, id, value, timestamp):
self.id = id
self.value = value
self.timestamp = timestamp
s = Sensor(1, 2.0, 3)
ns = SensorNoSlots(1, 2.0, 3)
print(f"With slots: {sys.getsizeof(s)} bytes/instance")
print(f"Without slots: {sys.getsizeof(ns) + sys.getsizeof(ns.__dict__)} bytes/instance")
n = 50_000
tracemalloc.start()
slots_list = [Sensor(i, float(i), i) for i in range(n)]
_, slots_peak = tracemalloc.get_traced_memory()
tracemalloc.stop()
tracemalloc.start()
noslots_list = [SensorNoSlots(i, float(i), i) for i in range(n)]
_, noslots_peak = tracemalloc.get_traced_memory()
tracemalloc.stop()
print(f"\n{n:,} instances peak memory:")
print(f" With slots: {slots_peak / 1024 / 1024:.2f} MB")
print(f" Without slots: {noslots_peak / 1024 / 1024:.2f} MB")
```