Reference Counting¶
CPython의 기본 메모리 관리 메커니즘입니다.
CPython Mechanism¶
1. Every Object Has a Reference Count¶
import sys
x = [1, 2, 3]
print(sys.getrefcount(x)) # 2 (x + getrefcount's arg)
2. Increment/Decrement¶
x = [1, 2, 3] # refcount = 1
y = x # refcount = 2
del y # refcount = 1
del x # refcount = 0 → freed
Automatic Management¶
1. No Manual Memory Management¶
def function():
x = [1, 2, 3]
# Automatically freed when function returns
return
# No memory leak
Advantages¶
1. Immediate Deallocation¶
x = [1, 2, 3]
del x
# Memory freed immediately (deterministic)
2. Predictable¶
메모리가 언제 해제되는지 예측 가능합니다.
Limitation: Circular References¶
참조 카운팅만으로는 순환 참조를 해제할 수 없습니다.
The Problem¶
class Node:
def __init__(self):
self.ref = None
a = Node()
b = Node()
a.ref = b
b.ref = a
# Cycle: a → b → a
┌──────────┐
│ a │
│ refcount │──────┐
│ = 1 │ │
└──────────┘ ▼
▲ ┌──────────┐
│ │ b │
│ │ refcount │
└───────│ = 1 │
└──────────┘
del a, b 후에도 각 객체의 refcount는 1로 남아있습니다.
Detection with GC¶
import gc
# GC finds cycles
gc.collect()
Manual Prevention¶
# Break cycle manually
a.ref = None
b.ref = None
del a, b # Now freed
Summary¶
| Feature | Reference Counting |
|---|---|
| Speed | Immediate |
| Deterministic | Yes |
| Cycles | Cannot handle |
| Overhead | Per-operation |
Runnable Example: object_lifecycle.py¶
"""
06_advanced_object_lifecycle.py
TOPIC: Object Lifecycle and Memory Internals
LEVEL: Advanced
DURATION: 75-90 minutes
LEARNING OBJECTIVES:
1. Understand complete object lifecycle from creation to destruction
2. Learn about __new__, __init__, and __del__ methods
3. Explore object memory layout and PyObject structure
4. Master memory profiling tools and techniques
5. Understand CPython implementation details
KEY CONCEPTS:
- Object creation: __new__ and __init__
- Object destruction: __del__ and garbage collection
- PyObject structure and overhead
- Memory interning and optimization
- Context managers for resource management
- Memory profiling with memory_profiler
"""
import sys
import gc
import weakref
from ctypes import py_object, c_void_p, cast
# ============================================================================
# SECTION 1: Object Creation Lifecycle
# ============================================================================
if __name__ == "__main__":
print("=" * 70)
print("SECTION 1: Object Creation - __new__ and __init__")
print("=" * 70)
# Object creation is a TWO-STEP process in Python:
# 1. __new__: Allocates memory and creates the object
# 2. __init__: Initializes the object's state
class LifecycleDemo:
"""Demonstrates the complete object creation lifecycle"""
def __new__(cls, value):
"""
__new__ is called FIRST
- It's a static method (takes cls, not self)
- Allocates memory for the object
- Returns the new instance
"""
print(f" 1. __new__ called with value={value}")
print(f" Allocating memory...")
instance = super().__new__(cls)
print(f" Object created at id={id(instance)}")
return instance
def __init__(self, value):
"""
__init__ is called SECOND
- Receives the object created by __new__
- Initializes object attributes
- Returns None
"""
print(f" 2. __init__ called")
print(f" Initializing object with value={value}")
self.value = value
print(f" Initialization complete")
print("Creating object:")
obj = LifecycleDemo(42)
print(f"\nFinal object: {obj.value} at id={id(obj)}")
print("""
LIFECYCLE STAGES:
1. ALLOCATION (__new__):
- Memory allocated on heap
- PyObject structure created
- Reference count initialized to 0
2. INITIALIZATION (__init__):
- Object attributes set
- Object becomes usable
- Reference count increased to 1
3. USAGE:
- Object referenced by variables
- Reference count increases/decreases
- Object may be passed to functions
4. CLEANUP (__del__ if defined):
- Called when refcount reaches 0
- Allows cleanup of resources
- Should not be relied upon for critical cleanup
5. DEALLOCATION:
- Memory returned to Python's memory pool
- Object no longer exists
""")
# ============================================================================
# SECTION 2: Object Destruction - __del__
# ============================================================================
print("\n" + "=" * 70)
print("SECTION 2: Object Destruction - __del__")
print("=" * 70)
class TrackedObject:
"""Object that tracks its lifecycle"""
count = 0 # Class variable to track instances
def __init__(self, name):
TrackedObject.count += 1
self.name = name
print(f" Created {self.name} (total: {TrackedObject.count})")
def __del__(self):
"""
__del__ is called when object is about to be destroyed
- Reference count has reached 0
- Object is being garbage collected
"""
TrackedObject.count -= 1
print(f" Destroying {self.name} (remaining: {TrackedObject.count})")
print("Creating objects:")
obj1 = TrackedObject("Object1")
obj2 = TrackedObject("Object2")
obj3 = TrackedObject("Object3")
print(f"\nCurrent count: {TrackedObject.count}")
print("\nDeleting obj1:")
del obj1 # __del__ called immediately (refcount = 0)
print("\nCreating alias for obj2:")
obj2_alias = obj2 # Both obj2 and obj2_alias reference same object
print("\nDeleting obj2:")
del obj2 # __del__ NOT called yet (obj2_alias still references it)
print(f"Object still alive! Refcount: {sys.getrefcount(obj2_alias)}")
print("\nDeleting obj2_alias:")
del obj2_alias # NOW __del__ is called (refcount = 0)
print("\nDeleting obj3:")
del obj3
print("""
__del__ CAUTIONS:
1. DON'T RELY ON __del__ for critical cleanup:
- Timing is unpredictable
- May not be called in some situations
- Can be called at interpreter shutdown
2. BETTER ALTERNATIVES:
- Use context managers (with statement)
- Explicitly call cleanup methods
- Use try/finally blocks
3. CYCLIC REFERENCES:
- __del__ can prevent garbage collection of cycles
- May cause memory leaks
4. EXCEPTIONS IN __del__:
- Are printed but otherwise ignored
- Don't propagate to caller
""")
# ============================================================================
# SECTION 3: PyObject Structure and Memory Layout
# ============================================================================
print("\n" + "=" * 70)
print("SECTION 3: PyObject Structure and Memory Overhead")
print("=" * 70)
# Every Python object has overhead from the PyObject structure:
# - ob_refcnt: Reference count (8 bytes on 64-bit)
# - ob_type: Pointer to type object (8 bytes on 64-bit)
# - + actual object data
print("Memory sizes of Python objects:\n")
# Integers
print("INTEGERS:")
for val in [0, 1, 100, 10000, 10**100]:
size = sys.getsizeof(val)
print(f" {str(val):20} : {size} bytes")
# The first few integers (usually -5 to 256) are pre-allocated
# Larger integers take more space
print("\nSTRINGS:")
strings = ["", "a", "hello", "a" * 100]
for s in strings:
size = sys.getsizeof(s)
print(f" {repr(s[:20]):22} : {size} bytes")
print("\nLISTS:")
lists = [[], [1], [1]*10, [1]*100]
for lst in lists:
size = sys.getsizeof(lst)
print(f" {len(lst):3} elements : {size} bytes")
print("\nDICTIONARIES:")
dicts = [{}, {"a": 1}, {f"k{i}": i for i in range(10)}]
for d in dicts:
size = sys.getsizeof(d)
print(f" {len(d):3} items : {size} bytes")
print("""
MEMORY OVERHEAD BREAKDOWN:
For most objects on 64-bit Python:
- PyObject header: 16 bytes (refcount + type pointer)
- Object-specific data varies by type
Examples:
- int: 28 bytes minimum (header + value)
- str: 49+ bytes (header + length + hash + chars)
- list: 56 bytes empty + 8 bytes per element (for pointers)
- dict: 64 bytes empty + overhead per key-value pair
IMPLICATIONS:
- Small objects have high overhead ratio
- Lists of integers: each int is a separate object!
- array.array or numpy can be more memory-efficient
""")
# ============================================================================
# SECTION 4: Object Identity and Equality
# ============================================================================
print("\n" + "=" * 70)
print("SECTION 4: Object Identity vs Equality")
print("=" * 70)
# Identity: Same object in memory (id)
# Equality: Same value (__eq__)
a = [1, 2, 3]
b = [1, 2, 3]
c = a
print("Three variables:")
print(f"a = {a}, id = {id(a)}")
print(f"b = {b}, id = {id(b)}")
print(f"c = {c}, id = {id(c)}")
print("\nIdentity (is operator):")
print(f" a is b: {a is b} # Different objects")
print(f" a is c: {a is c} # Same object")
print("\nEquality (== operator):")
print(f" a == b: {a == b} # Same values")
print(f" a == c: {a == c} # Same values")
# The 'is' operator is implemented as:
# a is b ⟺ id(a) == id(b)
print("\nChecking with id():")
print(f" id(a) == id(b): {id(a) == id(b)}")
print(f" id(a) == id(c): {id(a) == id(c)}")
# ============================================================================
# SECTION 5: Context Managers and Resource Management
# ============================================================================
print("\n" + "=" * 70)
print("SECTION 5: Context Managers (Better than __del__)")
print("=" * 70)
class ResourceManager:
"""Demonstrates proper resource management using context manager"""
def __init__(self, name):
self.name = name
def __enter__(self):
"""Called when entering 'with' block"""
print(f" Acquiring resource: {self.name}")
return self
def __exit__(self, exc_type, exc_val, exc_tb):
"""Called when exiting 'with' block (GUARANTEED)"""
print(f" Releasing resource: {self.name}")
# Return False to propagate exceptions, True to suppress
return False
def use(self):
print(f" Using resource: {self.name}")
print("Using context manager:")
with ResourceManager("Database Connection") as resource:
resource.use()
print(" Doing work...")
# __exit__ will be called even if exception occurs!
print("Outside with block - resource released\n")
# Compare to manual cleanup (error-prone):
print("Manual cleanup (DON'T DO THIS):")
resource = ResourceManager("File Handle")
resource.__enter__()
resource.use()
# What if exception occurs here? Resource won't be released!
resource.__exit__(None, None, None)
print("""
CONTEXT MANAGER BENEFITS:
1. GUARANTEED CLEANUP:
- __exit__ always called
- Even if exception occurs
- Even if return statement executed
2. EXPLICIT SCOPE:
- Clear where resource is acquired/released
- Easy to reason about resource lifetime
3. PYTHONIC:
- with statement is idiomatic Python
- Widely understood pattern
4. COMPOSABLE:
- Can use multiple context managers
- with open('a') as f1, open('b') as f2:
""")
# ============================================================================
# SECTION 6: Memory Profiling in Detail
# ============================================================================
print("\n" + "=" * 70)
print("SECTION 6: Advanced Memory Profiling")
print("=" * 70)
def memory_intensive_function():
"""Function that allocates significant memory"""
# Create large data structures
data = []
for i in range(1000):
data.append([i] * 100)
return data
# Using tracemalloc for detailed profiling
import tracemalloc
tracemalloc.start()
# Get baseline
snapshot_before = tracemalloc.take_snapshot()
# Run memory-intensive code
result = memory_intensive_function()
# Get snapshot after
snapshot_after = tracemalloc.take_snapshot()
# Analyze differences
top_stats = snapshot_after.compare_to(snapshot_before, 'lineno')
print("Top memory allocations:")
for i, stat in enumerate(top_stats[:5], 1):
print(f"\n#{i}: {stat}")
for line in stat.traceback.format():
print(f" {line}")
# Overall memory usage
current, peak = tracemalloc.get_traced_memory()
print(f"\nCurrent memory: {current / 1024:.1f} KB")
print(f"Peak memory: {peak / 1024:.1f} KB")
tracemalloc.stop()
# Clean up
del result
# ============================================================================
# SECTION 7: Debugging Memory Issues
# ============================================================================
print("\n" + "=" * 70)
print("SECTION 7: Debugging Memory Problems")
print("=" * 70)
# Finding objects that keep other objects alive
class Container:
def __init__(self, name):
self.name = name
self.items = []
# Create structure
container = Container("MyContainer")
item = [1, 2, 3]
container.items.append(item)
print(f"item refcount: {sys.getrefcount(item)}")
# Find what's referencing an object
referrers = gc.get_referrers(item)
print(f"\nObjects referencing item: {len(referrers)}")
for ref in referrers:
print(f" {type(ref).__name__}: {ref if len(str(ref)) < 50 else str(ref)[:50]+'...'}")
# Find what an object references
referents = gc.get_referents(container)
print(f"\nObjects referenced by container: {len(referents)}")
for ref in referents[:5]: # Limit output
print(f" {type(ref).__name__}")
# ============================================================================
# SECTION 8: Object Interning and Caching
# ============================================================================
print("\n" + "=" * 70)
print("SECTION 8: Object Interning and Caching")
print("=" * 70)
# Python caches small integers
print("INTEGER CACHING:")
a = 256
b = 256
print(f"256: a is b = {a is b} # Cached")
a = 257
b = 257
print(f"257: a is b = {a is b} # Not cached (usually)")
# String interning
print("\nSTRING INTERNING:")
s1 = "hello"
s2 = "hello"
print(f"'hello': s1 is s2 = {s1 is s2} # Interned")
s1 = "hello world"
s2 = "hello world"
print(f"'hello world': s1 is s2 = {s1 is s2} # May or may not be interned")
# Explicit interning
s1 = sys.intern("hello world")
s2 = sys.intern("hello world")
print(f"sys.intern('hello world'): s1 is s2 = {s1 is s2} # Explicitly interned")
print("""
INTERNING BENEFITS:
1. MEMORY SAVINGS:
- One copy of string instead of many
- Useful for large codebases with repeated strings
2. FASTER COMPARISONS:
- Identity check (is) instead of value check (==)
- O(1) instead of O(n)
3. AUTOMATIC FOR:
- Python identifiers (variable names, function names)
- Small integers (-5 to 256)
- Some strings (compile-time constants)
WHEN TO USE sys.intern():
- Dictionary keys used repeatedly
- Configuration strings
- Strings from large datasets with repetition
""")
# ============================================================================
# SECTION 9: Memory-Efficient Data Structures
# ============================================================================
print("\n" + "=" * 70)
print("SECTION 9: Memory-Efficient Alternatives")
print("=" * 70)
import array
# Compare list vs array.array
print("STORING 1000 INTEGERS:\n")
# Using list (each int is a separate PyObject)
int_list = list(range(1000))
list_size = sys.getsizeof(int_list)
list_item_size = sum(sys.getsizeof(i) for i in int_list[:10]) # Sample
print(f"List:")
print(f" Container: {list_size} bytes")
print(f" 10 items: {list_item_size} bytes (~{list_item_size/10:.0f} bytes each)")
print(f" Total (estimated): {list_size + 1000 * (list_item_size/10):.0f} bytes")
# Using array.array (compact storage)
int_array = array.array('i', range(1000))
array_size = sys.getsizeof(int_array)
print(f"\nArray:")
print(f" Total: {array_size} bytes")
print(f" Savings: {(1 - array_size / (list_size + 1000 * (list_item_size/10))) * 100:.1f}%")
print("""
MEMORY-EFFICIENT CHOICES:
1. FOR NUMERIC DATA:
- array.array: Compact, homogeneous types
- numpy.ndarray: Best for scientific computing
- struct: Binary data packing
2. FOR STRINGS:
- str: Use for text
- bytes: Use for binary data (more compact)
- bytearray: Mutable bytes
3. FOR COLLECTIONS:
- tuple instead of list (if immutable)
- set for membership testing
- collections.deque for queues
- dict for key-value (optimized in Python 3.6+)
4. FOR CLASSES:
- __slots__: Reduces per-instance overhead
- namedtuple: Immutable, lightweight
- dataclass: Convenient, reasonable overhead
""")
# ============================================================================
# SECTION 10: Key Takeaways
# ============================================================================
print("\n" + "=" * 70)
print("KEY TAKEAWAYS")
print("=" * 70)
print("""
1. Object creation: __new__ (allocate) then __init__ (initialize)
2. Object destruction: refcount → 0, then __del__ (if defined), then deallocate
3. Use context managers (with) instead of __del__ for cleanup
4. Every Python object has 16-byte overhead (64-bit systems)
5. Integer caching: -5 to 256 pre-allocated
6. String interning: automatic for identifiers, manual with sys.intern()
7. Use tracemalloc and gc.get_referrers() for debugging
8. Choose appropriate data structures for memory efficiency
9. array.array and __slots__ can save significant memory
10. Profile before optimizing - measure actual memory usage!
BEST PRACTICES:
- Use 'with' for resource management
- Prefer array.array for numeric data
- Use __slots__ for classes with many instances
- Profile with tracemalloc before optimizing
- Explicitly del large objects when done
- Use generators for large sequences
- Consider namedtuple or dataclass over dict
""")
# ============================================================================
# PRACTICE EXERCISES
# ============================================================================
print("\n" + "=" * 70)
print("PRACTICE EXERCISES")
print("=" * 70)
print("""
Master object lifecycle with these exercises:
1. Create a class that implements __new__, __init__, and __del__.
Track the complete lifecycle of instances.
2. Implement a context manager for a database connection simulator.
Ensure cleanup happens even with exceptions.
3. Use tracemalloc to profile memory usage of list vs array.array
for storing 1 million integers.
4. Create a class without __slots__, then add __slots__.
Create 100,000 instances and compare memory usage.
5. Write a decorator that profiles memory usage of a function
and reports allocations and peak usage.
6. Create a circular reference and use gc.get_referrers() to
debug what's keeping objects alive.
See exercises_03_advanced.py for complete practice problems!
""")