Buffer Reallocation¶
List Growth¶
1. Dynamic Arrays¶
Lists grow dynamically:
lst = []
for i in range(100):
lst.append(i)
# May trigger reallocation
2. Capacity¶
import sys
lst = []
print(sys.getsizeof(lst)) # Initial
lst.append(1)
print(sys.getsizeof(lst)) # Grew
# Over-allocates for efficiency
Growth Strategy¶
1. Exponential¶
# CPython strategy:
# new_capacity = old_capacity + (old_capacity >> 3) + 6
# Example progression:
# 0 -> 4 -> 8 -> 16 -> 25 -> 35 -> ...
2. Amortized O(1)¶
# Average append is O(1)
# Occasional reallocation is O(n)
# Amortized: O(1)
lst = []
for i in range(1000):
lst.append(i) # Fast on average
Pre-allocation¶
1. If Known Size¶
# Inefficient
lst = []
for i in range(1000):
lst.append(i)
# Better: pre-allocate
lst = [0] * 1000
for i in range(1000):
lst[i] = i
2. List Comprehension¶
# Also pre-allocates
lst = [i for i in range(1000)]
Memory Impact¶
1. Over-allocation¶
import sys
lst = [1, 2, 3]
print(sys.getsizeof(lst)) # e.g., 80
# Has extra capacity
# Not just 3 items worth
2. Shrinking¶
# Doesn't automatically shrink
lst = list(range(1000))
size1 = sys.getsizeof(lst)
del lst[100:]
size2 = sys.getsizeof(lst)
# size2 may equal size1
Summary¶
1. Dynamic Growth¶
- Lists grow automatically
- Exponential strategy
- Over-allocates
- Amortized O(1) append
2. Optimization¶
- Pre-allocate when possible
- Use comprehensions
- Avoid many appends
Runnable Example: defensive_copying_pattern.py¶
"""
TUTORIAL: Defensive Copying Pattern in __init__
In this tutorial, you'll learn how to avoid one of Python's most insidious bugs:
ALIASING. When you accept a mutable object (like a list) as a parameter, you need
to be careful: if you assign it directly to an instance variable, you and the caller
share the SAME list. Any changes to it affects both!
The solution is DEFENSIVE COPYING: make a copy of mutable arguments in __init__.
This ensures your object owns its own data and can't be corrupted by external changes.
This is based on the Bus example from Fluent Python by Luciano Ramalho.
"""
import copy
if __name__ == "__main__":
print("=" * 70)
print("TUTORIAL: Defensive Copying Pattern")
print("=" * 70)
# ============ EXAMPLE 1: The Correct Bus Class (with defensive copying)
print("\n# ============ EXAMPLE 1: The Correct Bus Class")
print("Define a Bus class that makes defensive copies of the passenger list")
print("so external changes can't corrupt our data:\n")
class Bus:
"""A bus with its own copy of the passenger list"""
def __init__(self, passengers=None):
# WHY defensive copying? If passengers is provided, we DON'T assign it
# directly. Instead, we create a NEW list with list(passengers).
# This means our self.passengers is independent from the original list.
if passengers is None:
self.passengers = []
else:
# Make a defensive copy! This is the KEY difference.
self.passengers = list(passengers)
def pick(self, name):
"""Add a passenger to the bus"""
self.passengers.append(name)
def drop(self, name):
"""Remove a passenger from the bus"""
self.passengers.remove(name)
# Test 1: Create buses with initial passengers
print("Creating bus1 with ['Alice', 'Bill', 'Claire', 'David']")
bus1 = Bus(['Alice', 'Bill', 'Claire', 'David'])
print(f"bus1.passengers = {bus1.passengers}")
print("\nCreating bus2 as a shallow copy of bus1 (copy.copy)")
bus2 = copy.copy(bus1)
print(f"bus2.passengers = {bus2.passengers}")
print("\nCreating bus3 as a deep copy of bus1 (copy.deepcopy)")
bus3 = copy.deepcopy(bus1)
print(f"bus3.passengers = {bus3.passengers}")
# Now drop a passenger from bus1
print("\nDropping 'Bill' from bus1...")
bus1.drop('Bill')
print(f"bus1.passengers = {bus1.passengers}")
# Check bus2 - it was a SHALLOW copy, so it shares the same passenger list
print(f"bus2.passengers = {bus2.passengers}")
print("-> bus2 also changed because shallow copy shares the list object!")
# Check bus3 - it was a DEEP copy, so it has its own completely independent copy
print(f"bus3.passengers = {bus3.passengers}")
print("-> bus3 remains unchanged because deep copy made independent copies!")
# ============ EXAMPLE 2: Why Defensive Copying Matters in __init__
print("\n" + "=" * 70)
print("# ============ EXAMPLE 2: Why Defensive Copying Matters in __init__")
print("Demonstrate what happens WITHOUT defensive copying:\n")
class UnsafeBus:
"""A bus that doesn't make defensive copies (ANTI-PATTERN)"""
def __init__(self, passengers=None):
# WRONG: Direct assignment creates an alias!
# If passengers is a list, self.passengers now points to THE SAME list
if passengers is None:
self.passengers = []
else:
self.passengers = passengers # ALIASING! Shared reference!
def pick(self, name):
self.passengers.append(name)
def drop(self, name):
self.passengers.remove(name)
print("Creating a passenger list we'll reuse:")
original_passengers = ['Alice', 'Bill', 'Claire']
print(f"original_passengers = {original_passengers}")
print("\nCreating unsafe_bus with this list...")
unsafe_bus = UnsafeBus(original_passengers)
print(f"unsafe_bus.passengers = {unsafe_bus.passengers}")
print("\nNow drop 'Bill' from the bus...")
unsafe_bus.drop('Bill')
print(f"unsafe_bus.passengers = {unsafe_bus.passengers}")
print("\nBUT LOOK - the original list was also modified!")
print(f"original_passengers = {original_passengers}")
print("-> This is the ALIASING BUG! Both variables point to the same list.")
print("-> Modifying one affects the other - data corruption!")
# ============ EXAMPLE 3: Comparing Safe vs Unsafe
print("\n" + "=" * 70)
print("# ============ EXAMPLE 3: Comparing Safe vs Unsafe")
print("Side-by-side comparison of defensive vs non-defensive copying:\n")
print("Scenario: We have a list of guests and create two buses")
guests = ['Eve', 'Frank', 'Grace']
print(f"guests = {guests}")
print("\nSafe Bus (with defensive copying):")
safe_bus = Bus(guests)
print(f"safe_bus.passengers = {safe_bus.passengers}")
safe_bus.drop('Frank')
print(f"After dropping 'Frank': safe_bus.passengers = {safe_bus.passengers}")
print(f"Original list unchanged: guests = {guests}")
print("-> Our bus can't corrupt external data!")
guests = ['Eve', 'Frank', 'Grace'] # Reset
print("\nUnsafe Bus (without defensive copying):")
unsafe_bus = UnsafeBus(guests)
print(f"unsafe_bus.passengers = {unsafe_bus.passengers}")
unsafe_bus.drop('Frank')
print(f"After dropping 'Frank': unsafe_bus.passengers = {unsafe_bus.passengers}")
print(f"Original list corrupted: guests = {guests}")
print("-> The bus corrupted the external data!")
# ============ EXAMPLE 4: Understanding What list() Does
print("\n" + "=" * 70)
print("# ============ EXAMPLE 4: Understanding What list() Does")
print("The list() constructor creates a new list with the same elements:\n")
original = [1, 2, 3]
copy1 = list(original)
print(f"original = {original}")
print(f"copy1 = list(original) = {copy1}")
print(f"copy1 == original: {copy1 == original} (same contents)")
print(f"copy1 is original: {copy1 is original} (different objects)")
print("\nModifying copy1 doesn't affect original:")
copy1.append(4)
print(f"After copy1.append(4):")
print(f"original = {original}")
print(f"copy1 = {copy1}")
print("-> They're completely independent!")
# ============ EXAMPLE 5: When to Use Defensive Copying
print("\n" + "=" * 70)
print("# ============ EXAMPLE 5: When to Use Defensive Copying")
print("Best practices for defensive copying:\n")
print("""
RULE 1: If a parameter is a MUTABLE collection (list, dict, set),
and you store it as an instance variable, make a copy.
RULE 2: Use list(param) for lists, dict(param) for dicts, set(param) for sets.
These are shallow copies - good enough for most cases.
RULE 3: Only use copy.deepcopy() if your collections contain mutable objects
that also need to be protected from external modification.
WHY? Because Python objects are references. Without copying, you create
an ALIAS - two variables pointing to the same object. Changes through
one variable affect what the other sees.
WHEN? Defensive copying is mainly for:
- __init__ methods (protect your instance variables)
- Public methods that accept mutable parameters
- Any time you store a parameter as instance state
NOT NEEDED for:
- Immutable objects (strings, tuples, numbers)
- When you process and discard the parameter (not storing it)
- When aliasing is intentional and documented
""")
# ============ EXAMPLE 6: Defensive Copying with Different Collections
print("\n" + "=" * 70)
print("# ============ EXAMPLE 6: Defensive Copying with Different Collections")
print("Different collection types need different copy approaches:\n")
class Config:
"""Configuration class demonstrating defensive copying for different types"""
def __init__(self, names=None, settings=None, allowed_tags=None):
# Copy each mutable parameter appropriately
self.names = list(names) if names is not None else []
self.settings = dict(settings) if settings is not None else {}
self.allowed_tags = set(allowed_tags) if allowed_tags is not None else set()
print("Creating initial data:")
names = ['Alice', 'Bob']
settings = {'theme': 'dark', 'lang': 'en'}
tags = {'python', 'tutorial'}
config = Config(names, settings, tags)
print(f"config.names = {config.names}")
print(f"config.settings = {config.settings}")
print(f"config.allowed_tags = {config.allowed_tags}")
print("\nModifying original data:")
names.append('Charlie')
settings['theme'] = 'light'
tags.add('advanced')
print(f"names = {names}, config.names = {config.names} (unchanged)")
print(f"settings = {settings}, config.settings = {config.settings} (unchanged)")
print(f"tags = {tags}, config.allowed_tags = {config.allowed_tags} (unchanged)")
print("-> All config data protected by defensive copying!")
# ============ EXAMPLE 7: The Cost of Defensive Copying
print("\n" + "=" * 70)
print("# ============ EXAMPLE 7: The Cost of Defensive Copying")
print("Defensive copying uses memory - but it's almost always worth it:\n")
print("""
MEMORY: Copying uses extra memory proportional to the data size
SPEED: list(param) is very fast - typically negligible overhead
TRADE-OFF: A tiny bit of memory and CPU to prevent BUGS and DATA CORRUPTION
This is almost always the right choice!
EXCEPTION: In rare high-performance scenarios with huge lists where you
KNOW aliasing is safe, you might skip defensive copying.
But document it clearly!
PRINCIPLE: Be DEFENSIVE. Assume users will modify their data.
Protect your instance variables by copying mutable inputs.
""")
print("\n" + "=" * 70)
print("SUMMARY")
print("=" * 70)
print("""
KEY TAKEAWAYS:
1. ALIASING: In Python, assignment creates references, not copies.
Multiple variables can point to the same mutable object.
2. DEFENSIVE COPYING: In __init__, copy mutable parameters using:
- list(param) for lists
- dict(param) for dicts
- set(param) for sets
3. WHY: This prevents external code from corrupting your object's data
through the original reference they passed in.
4. COST: Minimal - it's a shallow copy, very fast for most cases.
5. BEST PRACTICE: Always use defensive copying for mutable parameters
unless there's a documented reason not to.
""")