Performance and Memory¶
Techniques for writing efficient Python code with optimal memory usage.
Mental Model
Performance and memory are two sides of the same coin. Using more memory (caching, pre-computation) often buys speed; using less memory (generators, lazy evaluation) sometimes costs speed. The art is profiling both dimensions and finding the sweet spot for your specific workload.
Measuring Performance¶
Using timeit¶
```python import timeit
Time a simple statement¶
time = timeit.timeit('sum(range(100))', number=10000) print(f"Time: {time:.4f}s")
Time a function¶
def my_function(): return [x**2 for x in range(1000)]
time = timeit.timeit(my_function, number=1000) print(f"Time: {time:.4f}s")
Compare approaches¶
list_time = timeit.timeit('[x2 for x in range(1000)]', number=1000) gen_time = timeit.timeit('list(x2 for x in range(1000))', number=1000) print(f"List comp: {list_time:.4f}s, Generator: {gen_time:.4f}s") ```
Using cProfile¶
```python import cProfile
def slow_function(): result = [] for i in range(10000): result.append(i ** 2) return result
Profile the function¶
cProfile.run('slow_function()') ```
Performance Optimizations¶
Use Comprehensions¶
Comprehensions are faster than equivalent loops:
```python
Fast: list comprehension¶
squares = [x**2 for x in range(1000)]
Slower: explicit loop¶
squares = [] for x in range(1000): squares.append(x**2) ```
Local Variable Caching¶
Accessing local variables is faster than global lookups:
```python
Slower: global lookup each iteration¶
def slow(): for i in range(10000): len([1, 2, 3]) # Looks up len() each time
Faster: cache the function locally¶
def fast(): _len = len # Local reference for i in range(10000): _len([1, 2, 3]) ```
Use Built-in Functions¶
Built-in functions are implemented in C and optimized:
```python
Fast: built-in sum¶
total = sum(numbers)
Slower: manual loop¶
total = 0 for n in numbers: total += n
Fast: built-in map¶
result = list(map(str, numbers))
Slower: list comprehension for simple transforms¶
result = [str(n) for n in numbers] ```
String Joining¶
```python
Fast: join method¶
result = ''.join(strings)
Slow: concatenation in loop¶
result = '' for s in strings: result += s # Creates new string each time ```
Use collections Module¶
```python from collections import Counter, defaultdict, deque
Fast counting¶
counts = Counter(words)
Fast grouping¶
groups = defaultdict(list) for item in items: groups[item.category].append(item)
Fast append/pop from both ends¶
queue = deque() queue.append(item) # O(1) queue.appendleft(item) # O(1) ```
Memory Efficiency¶
Generators for Lazy Evaluation¶
Generators produce values on-demand, avoiding memory allocation for entire sequences:
```python
Memory efficient: generator expression¶
gen = (x**2 for x in range(10000000))
Memory expensive: list comprehension¶
lst = [x**2 for x in range(10000000)] # All in memory at once
Generator function¶
def squares(n): for x in range(n): yield x**2
Process without loading all into memory¶
for square in squares(10000000): process(square) ```
__slots__ for Classes¶
Reduce memory overhead when creating many instances:
```python
Without slots: ~152 bytes per instance¶
class PointRegular: def init(self, x, y): self.x = x self.y = y
With slots: ~56 bytes per instance¶
class PointSlots: slots = ['x', 'y']
def __init__(self, x, y):
self.x = x
self.y = y
Significant savings with many instances¶
points = [PointSlots(i, i) for i in range(1000000)] ```
Use array for Homogeneous Data¶
```python import array
Regular list: ~8MB for 1M integers¶
lst = list(range(1000000))
Array: ~4MB for 1M integers¶
arr = array.array('i', range(1000000)) ```
Avoid Unnecessary Copies¶
```python
Creates copy (uses memory)¶
subset = large_list[100:200]
Use iterator (no copy)¶
import itertools subset = itertools.islice(large_list, 100, 200)
Process in chunks¶
def chunked(iterable, size): it = iter(iterable) while chunk := list(itertools.islice(it, size)): yield chunk
for chunk in chunked(large_data, 1000): process(chunk) ```
Delete Large Objects¶
```python def process_large_data(): data = load_huge_file() # 1GB result = compute(data)
del data # Free memory immediately
gc.collect() # Optional: force garbage collection
return result
```
Profiling Memory¶
```python import sys import tracemalloc
Check object size¶
x = [1, 2, 3] print(sys.getsizeof(x)) # Bytes
Track memory allocations¶
tracemalloc.start()
... your code ...¶
data = [x**2 for x in range(100000)]
current, peak = tracemalloc.get_traced_memory() print(f"Current: {current / 1024 / 1024:.2f} MB") print(f"Peak: {peak / 1024 / 1024:.2f} MB")
tracemalloc.stop() ```
Summary¶
Performance Tips¶
| Technique | Benefit |
|---|---|
| List comprehensions | Faster than loops |
| Local variable caching | Faster lookups |
| Built-in functions | C-optimized |
''.join() |
O(n) vs O(n²) |
collections module |
Optimized data structures |
Memory Tips¶
| Technique | Benefit |
|---|---|
| Generators | Lazy evaluation |
__slots__ |
Smaller instances |
array.array |
Compact numeric storage |
| Iterators | Avoid copies |
del + gc.collect() |
Free memory early |
Key Principles¶
- Measure before optimizing - Use
timeitandcProfile - Prefer built-ins - They're implemented in C
- Use appropriate data structures -
deque,Counter, etc. - Think about memory - Generators for large data
- Cache expensive operations - Local variables,
lru_cache
Runnable Example: time_complexity.py¶
```python """ PYTHON CODE PROFILING & OPTIMIZATION - BEGINNER LEVEL ====================================================== Module 1: Time Complexity Basics - Understanding Big O Notation
LEARNING OBJECTIVES: - Understand what time complexity means - Learn Big O notation and common complexity classes - Recognize different complexity patterns in code - Analyze simple algorithms for their time complexity
THEORY:¶
Time Complexity measures how the runtime of an algorithm grows relative to the input size (n). It's expressed using Big O notation, which describes the upper bound of growth rate, ignoring constants and lower-order terms.
Common Complexity Classes (from best to worst): 1. O(1) - Constant time 2. O(log n) - Logarithmic time 3. O(n) - Linear time 4. O(n log n) - Linearithmic time 5. O(n²) - Quadratic time 6. O(n³) - Cubic time 7. O(2ⁿ) - Exponential time 8. O(n!) - Factorial time
Author: Python Course Development Team Date: 2024 """
import time import random
============================================================================¶
SECTION 1: O(1) - CONSTANT TIME COMPLEXITY¶
============================================================================¶
Operations that take the same amount of time regardless of input size¶
def constant_time_access(my_list, index): """ Access an element by index - O(1) operation
Explanation:
- Array/list access by index is direct memory lookup
- Doesn't matter if list has 10 or 10 million items
- Time taken is constant
Args:
my_list: A list of elements
index: Index to access
Returns:
Element at the specified index
Time Complexity: O(1)
"""
# Direct index access - constant time
return my_list[index]
def constant_time_operations(): """ Demonstrate various O(1) operations
Common O(1) operations in Python:
- Accessing list/array element by index
- Accessing dictionary value by key
- Append to list
- Push/pop from stack (list)
- Basic arithmetic operations
"""
# Create sample data structures
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
my_dict = {'a': 1, 'b': 2, 'c': 3}
print("=" * 70)
print("O(1) - CONSTANT TIME OPERATIONS")
print("=" * 70)
# Operation 1: List access by index - O(1)
start = time.time()
element = numbers[5]
end = time.time()
print(f"List index access: {element}, Time: {end - start:.10f}s")
# Operation 2: Dictionary access by key - O(1) average case
start = time.time()
value = my_dict['b']
end = time.time()
print(f"Dict key access: {value}, Time: {end - start:.10f}s")
# Operation 3: List append - O(1) amortized
start = time.time()
numbers.append(11)
end = time.time()
print(f"List append, Time: {end - start:.10f}s")
# Operation 4: Arithmetic - O(1)
start = time.time()
result = 1234567 * 9876543
end = time.time()
print(f"Arithmetic: {result}, Time: {end - start:.10f}s")
print("\nKey Point: All these operations take roughly the same time")
print("regardless of data structure size.\n")
============================================================================¶
SECTION 2: O(log n) - LOGARITHMIC TIME COMPLEXITY¶
============================================================================¶
Operations that cut the problem size in half each iteration¶
def binary_search(sorted_list, target): """ Binary search on a sorted list - O(log n) operation
Explanation:
- Each comparison eliminates half of the remaining elements
- For 1000 elements, needs at most 10 comparisons (2^10 = 1024)
- For 1,000,000 elements, needs at most 20 comparisons
- Doubling input size only adds one more comparison
Args:
sorted_list: A sorted list of comparable elements
target: Element to search for
Returns:
Index of target if found, -1 otherwise
Time Complexity: O(log n)
"""
left = 0 # Left boundary of search space
right = len(sorted_list) - 1 # Right boundary
comparisons = 0 # Track number of comparisons
# Continue while search space is valid
while left <= right:
comparisons += 1
# Find middle point (avoid overflow with this formula)
mid = left + (right - left) // 2
# Check if target is at mid
if sorted_list[mid] == target:
print(f" Found {target} after {comparisons} comparisons")
return mid
# If target is greater, ignore left half
elif sorted_list[mid] < target:
left = mid + 1
# If target is smaller, ignore right half
else:
right = mid - 1
# Target not found
print(f" Target {target} not found after {comparisons} comparisons")
return -1
def demonstrate_logarithmic(): """ Demonstrate O(log n) complexity with binary search
Shows how logarithmic algorithms scale beautifully:
- Small increase in comparisons for massive increase in data size
"""
print("=" * 70)
print("O(log n) - LOGARITHMIC TIME COMPLEXITY")
print("=" * 70)
# Test with different sizes
sizes = [100, 1000, 10000, 100000]
for size in sizes:
# Create sorted list
sorted_data = list(range(size))
target = random.randint(0, size - 1)
print(f"\nSearching in list of size {size:,}:")
start = time.time()
result = binary_search(sorted_data, target)
end = time.time()
print(f" Time taken: {end - start:.10f}s")
print("\nKey Point: Time grows logarithmically with input size.")
print("10x larger input ≈ 3-4 more operations (log₂(10) ≈ 3.32)\n")
============================================================================¶
SECTION 3: O(n) - LINEAR TIME COMPLEXITY¶
============================================================================¶
Operations that must examine each element once¶
def linear_search(my_list, target): """ Linear search - O(n) operation
Explanation:
- Must potentially check every element
- In worst case, target is at end or not present
- Time grows proportionally with input size
Args:
my_list: List to search
target: Element to find
Returns:
Index of target if found, -1 otherwise
Time Complexity: O(n)
"""
# Iterate through each element
for i in range(len(my_list)):
# Check if current element matches target
if my_list[i] == target:
return i
# Target not found after checking all elements
return -1
def find_maximum(numbers): """ Find maximum value in list - O(n) operation
Explanation:
- Must examine every element to ensure we found the maximum
- Cannot skip any elements
- Doubling input size doubles the time
Args:
numbers: List of numbers
Returns:
Maximum value in the list
Time Complexity: O(n)
"""
# Initialize max with first element
max_val = numbers[0]
# Compare each element with current max
for num in numbers[1:]:
if num > max_val:
max_val = num
return max_val
def demonstrate_linear(): """ Demonstrate O(n) complexity with various algorithms """ print("=" * 70) print("O(n) - LINEAR TIME COMPLEXITY") print("=" * 70)
# Test with different sizes
sizes = [1000, 10000, 100000, 1000000]
for size in sizes:
# Create test data
data = list(range(size))
print(f"\nProcessing list of size {size:,}:")
# Linear search (worst case - element not present)
start = time.time()
result = linear_search(data, -1) # Search for non-existent element
end = time.time()
print(f" Linear search time: {end - start:.10f}s")
# Find maximum
start = time.time()
max_val = find_maximum(data)
end = time.time()
print(f" Find maximum time: {end - start:.10f}s")
print("\nKey Point: Time grows linearly with input size.")
print("10x larger input ≈ 10x more time\n")
============================================================================¶
SECTION 4: O(n²) - QUADRATIC TIME COMPLEXITY¶
============================================================================¶
Nested loops that examine all pairs of elements¶
def bubble_sort(arr): """ Bubble sort algorithm - O(n²) operation
Explanation:
- Two nested loops, each running n times
- Outer loop: n iterations
- Inner loop: n iterations for each outer iteration
- Total operations: n × n = n²
Args:
arr: List to sort
Returns:
Sorted list
Time Complexity: O(n²)
"""
# Create a copy to avoid modifying original
arr = arr.copy()
n = len(arr)
# Outer loop: n passes through the array
for i in range(n):
# Inner loop: compare adjacent elements
# After each pass, largest element "bubbles" to the end
for j in range(0, n - i - 1):
# Swap if elements are in wrong order
if arr[j] > arr[j + 1]:
arr[j], arr[j + 1] = arr[j + 1], arr[j]
return arr
def find_duplicates_naive(numbers): """ Find all duplicate pairs - O(n²) operation
Explanation:
- Compare each element with every other element
- For n elements, we make n × (n-1) / 2 comparisons
- This is still O(n²) as we ignore constants
Args:
numbers: List of numbers
Returns:
List of duplicate pairs
Time Complexity: O(n²)
"""
duplicates = []
n = len(numbers)
# Outer loop: select first element of pair
for i in range(n):
# Inner loop: compare with remaining elements
for j in range(i + 1, n):
# Check if elements are equal
if numbers[i] == numbers[j]:
duplicates.append((numbers[i], i, j))
return duplicates
def demonstrate_quadratic(): """ Demonstrate O(n²) complexity
Warning: Quadratic algorithms become very slow for large inputs!
"""
print("=" * 70)
print("O(n²) - QUADRATIC TIME COMPLEXITY")
print("=" * 70)
# Test with smaller sizes (quadratic is slow!)
sizes = [100, 500, 1000, 2000]
for size in sizes:
# Create test data
data = [random.randint(0, size) for _ in range(size)]
print(f"\nProcessing list of size {size:,}:")
# Bubble sort
start = time.time()
sorted_data = bubble_sort(data)
end = time.time()
print(f" Bubble sort time: {end - start:.6f}s")
print("\nKey Point: Time grows quadratically with input size.")
print("10x larger input ≈ 100x more time (10² = 100)")
print("WARNING: Avoid O(n²) algorithms for large datasets!\n")
============================================================================¶
SECTION 5: O(n log n) - LINEARITHMIC TIME COMPLEXITY¶
============================================================================¶
Efficient sorting algorithms like merge sort, quick sort¶
def merge_sort(arr): """ Merge sort algorithm - O(n log n) operation
Explanation:
- Divide and conquer algorithm
- Divides array into halves recursively: O(log n) divisions
- Merges sorted halves: O(n) operations per level
- Total: O(n) × O(log n) = O(n log n)
Args:
arr: List to sort
Returns:
Sorted list
Time Complexity: O(n log n)
"""
# Base case: arrays of length 0 or 1 are already sorted
if len(arr) <= 1:
return arr
# Divide: split array into two halves
mid = len(arr) // 2
left_half = arr[:mid]
right_half = arr[mid:]
# Conquer: recursively sort both halves
left_sorted = merge_sort(left_half)
right_sorted = merge_sort(right_half)
# Combine: merge the sorted halves
return merge(left_sorted, right_sorted)
def merge(left, right): """ Merge two sorted arrays - O(n) operation
Helper function for merge sort
Args:
left: First sorted array
right: Second sorted array
Returns:
Merged sorted array
"""
result = []
i = j = 0
# Merge elements from both arrays in sorted order
while i < len(left) and j < len(right):
if left[i] <= right[j]:
result.append(left[i])
i += 1
else:
result.append(right[j])
j += 1
# Append remaining elements
result.extend(left[i:])
result.extend(right[j:])
return result
def demonstrate_linearithmic(): """ Demonstrate O(n log n) complexity
Shows that O(n log n) is much better than O(n²) for large inputs
"""
print("=" * 70)
print("O(n log n) - LINEARITHMIC TIME COMPLEXITY")
print("=" * 70)
# Test with various sizes
sizes = [100, 1000, 10000, 100000]
for size in sizes:
# Create test data
data = [random.randint(0, size) for _ in range(size)]
print(f"\nSorting list of size {size:,}:")
# Merge sort - O(n log n)
start = time.time()
sorted_data = merge_sort(data)
end = time.time()
print(f" Merge sort time: {end - start:.6f}s")
# Compare with Python's built-in sort (also O(n log n) - Timsort)
data_copy = data.copy()
start = time.time()
data_copy.sort()
end = time.time()
print(f" Python sort time: {end - start:.6f}s")
print("\nKey Point: O(n log n) is optimal for comparison-based sorting.")
print("Much faster than O(n²) for large inputs!")
print("10x larger input ≈ 33x more time (10 × log₂(10) ≈ 33.2)\n")
============================================================================¶
SECTION 6: COMPARING COMPLEXITY CLASSES¶
============================================================================¶
def compare_all_complexities(): """ Compare different complexity classes side by side
Demonstrates how different algorithms scale with input size
"""
print("=" * 70)
print("COMPLEXITY CLASS COMPARISON")
print("=" * 70)
print("\nHow many operations for different input sizes?\n")
print(f"{'n':<12} {'O(1)':<12} {'O(log n)':<12} {'O(n)':<12} {'O(n log n)':<15} {'O(n²)':<12}")
print("-" * 85)
import math
# Different input sizes
sizes = [10, 100, 1000, 10000, 100000]
for n in sizes:
o_1 = 1
o_log_n = math.log2(n)
o_n = n
o_n_log_n = n * math.log2(n)
o_n2 = n * n
print(f"{n:<12} {o_1:<12} {o_log_n:<12.2f} {o_n:<12} {o_n_log_n:<15.0f} {o_n2:<12}")
print("\nKey Insights:")
print("1. O(1) is always best - constant time regardless of input")
print("2. O(log n) scales beautifully - very slow growth")
print("3. O(n) grows linearly - acceptable for most applications")
print("4. O(n log n) is optimal for sorting - reasonable growth")
print("5. O(n²) grows very fast - avoid for large inputs!")
print()
============================================================================¶
SECTION 7: SPACE COMPLEXITY¶
============================================================================¶
Space complexity measures memory usage relative to input size¶
def space_o1_example(n): """ Algorithm with O(1) space complexity
Uses constant extra space regardless of input size
Time Complexity: O(n)
Space Complexity: O(1)
"""
# Only uses a fixed number of variables
total = 0 # One integer variable
count = 0 # Another integer variable
# Process n items but don't store them
for i in range(n):
total += i
count += 1
return total / count if count > 0 else 0
def space_on_example(n): """ Algorithm with O(n) space complexity
Creates data structure proportional to input size
Time Complexity: O(n)
Space Complexity: O(n)
"""
# Creates a list of size n
result = []
# Store all values
for i in range(n):
result.append(i * 2)
return result
def demonstrate_space_complexity(): """ Demonstrate space complexity concepts """ print("=" * 70) print("SPACE COMPLEXITY") print("=" * 70)
import sys
n = 1000
print(f"\nFor n = {n}:")
# O(1) space
result1 = space_o1_example(n)
print(f"O(1) space: stores only a few variables")
print(f" Result: {result1}")
# O(n) space
result2 = space_on_example(n)
print(f"\nO(n) space: stores {len(result2)} elements")
print(f" Memory used: ~{sys.getsizeof(result2)} bytes")
print("\nKey Point: Space complexity is equally important!")
print("Memory-efficient algorithms are crucial for large datasets.\n")
============================================================================¶
MAIN EXECUTION¶
============================================================================¶
def main(): """ Main function to run all demonstrations
Runs through all complexity classes with examples and timing
"""
print("\n" + "=" * 70)
print("TIME COMPLEXITY BASICS - COMPREHENSIVE TUTORIAL")
print("=" * 70 + "\n")
# Run all demonstrations
constant_time_operations()
demonstrate_logarithmic()
demonstrate_linear()
demonstrate_quadratic()
demonstrate_linearithmic()
compare_all_complexities()
demonstrate_space_complexity()
# Summary
print("=" * 70)
print("SUMMARY")
print("=" * 70)
print("""
Best to Worst Complexity Classes:
1. O(1) - Constant - Excellent
2. O(log n) - Logarithmic - Excellent
3. O(n) - Linear - Good
4. O(n log n) - Linearithmic - Acceptable
5. O(n²) - Quadratic - Poor for large n
6. O(2ⁿ) - Exponential - Impractical for large n
Remember:
- Always analyze both time AND space complexity
- Constants matter in practice (Big O ignores them)
- Choose the right algorithm for your data size
- Profile real code to verify theoretical analysis
""")
if name == "main": main() ```
Exercises¶
Exercise 1.
Write a benchmark comparing four ways to sum a list of 1,000,000 integers: (a) a for loop with +=, (b) the built-in sum(), (c) functools.reduce with operator.add, and (d) a generator expression inside sum(). Use timeit to measure each and print a ranked table from fastest to slowest.
Solution to Exercise 1
```python
import timeit
import functools
import operator
data = list(range(1_000_000))
def sum_loop():
total = 0
for x in data:
total += x
return total
def sum_builtin():
return sum(data)
def sum_reduce():
return functools.reduce(operator.add, data)
def sum_genexpr():
return sum(x for x in data)
methods = [
("for loop", sum_loop),
("sum()", sum_builtin),
("reduce", sum_reduce),
("sum(gen)", sum_genexpr),
]
results = []
for name, func in methods:
t = min(timeit.repeat(func, repeat=3, number=10))
results.append((name, t))
results.sort(key=lambda x: x[1])
print(f"{'Method':<12} {'Time (s)':>10}")
print("-" * 24)
for name, t in results:
print(f"{name:<12} {t:>10.4f}")
```
Exercise 2.
Create a class Point with and without __slots__, each having x and y attributes. Create 500,000 instances of each, measure total memory using tracemalloc, and print the memory used and savings percentage from using __slots__.
Solution to Exercise 2
```python
import tracemalloc
class PointRegular:
def __init__(self, x, y):
self.x = x
self.y = y
class PointSlots:
__slots__ = ('x', 'y')
def __init__(self, x, y):
self.x = x
self.y = y
n = 500_000
tracemalloc.start()
regular = [PointRegular(i, i) for i in range(n)]
_, peak_regular = tracemalloc.get_traced_memory()
tracemalloc.stop()
tracemalloc.start()
slotted = [PointSlots(i, i) for i in range(n)]
_, peak_slotted = tracemalloc.get_traced_memory()
tracemalloc.stop()
savings = (1 - peak_slotted / peak_regular) * 100
print(f"Regular: {peak_regular / 1024 / 1024:.1f} MB")
print(f"Slotted: {peak_slotted / 1024 / 1024:.1f} MB")
print(f"Savings: {savings:.1f}%")
```
Exercise 3.
Write a generator-based version and a list-based version of a function that yields/returns the first n Fibonacci numbers. For n = 1,000,000, use tracemalloc to compare peak memory of iterating through all values (consuming but not storing them) for both versions.
Solution to Exercise 3
```python
import tracemalloc
def fib_generator(n):
a, b = 0, 1
for _ in range(n):
yield a
a, b = b, a + b
def fib_list(n):
result = []
a, b = 0, 1
for _ in range(n):
result.append(a)
a, b = b, a + b
return result
n = 1_000_000
# Generator version
tracemalloc.start()
total = 0
for x in fib_generator(n):
total += x
_, peak_gen = tracemalloc.get_traced_memory()
tracemalloc.stop()
# List version
tracemalloc.start()
total = 0
for x in fib_list(n):
total += x
_, peak_list = tracemalloc.get_traced_memory()
tracemalloc.stop()
print(f"Generator peak: {peak_gen / 1024 / 1024:.1f} MB")
print(f"List peak: {peak_list / 1024 / 1024:.1f} MB")
```