CPU-Bound vs I/O-Bound Tasks¶
Understanding whether your task is CPU-bound or I/O-bound is crucial for choosing the right concurrency strategy.
Definitions¶
CPU-Bound¶
A task is CPU-bound when it spends most of its time doing computation — the CPU is the bottleneck.
Examples: - Mathematical calculations - Image/video processing - Data compression - Machine learning training - Cryptographic operations - Simulations
def cpu_bound_task(n):
"""Compute sum of squares — CPU is working hard."""
total = 0
for i in range(n):
total += i ** 2
return total
I/O-Bound¶
A task is I/O-bound when it spends most of its time waiting for input/output operations — the CPU is mostly idle.
Examples: - Network requests (HTTP, database queries) - File reading/writing - User input - API calls - Web scraping
import requests
def io_bound_task(url):
"""Fetch URL — CPU is mostly waiting."""
response = requests.get(url) # Waiting for network
return response.text
Visual Comparison¶
CPU-Bound Execution¶
CPU Activity:
████████████████████████████████████████ 100% busy
Timeline:
[compute][compute][compute][compute][done]
I/O-Bound Execution¶
CPU Activity:
██░░░░░░░░░░░░░░██░░░░░░░░░░░░░░██░░░░░░ ~20% busy
Timeline:
[send request][.....waiting.....][process response]
Identifying Task Type¶
Method 1: CPU Usage Monitoring¶
import time
import psutil
def monitor_cpu(func, *args):
"""Run function while monitoring CPU usage."""
process = psutil.Process()
start = time.perf_counter()
cpu_start = process.cpu_times()
result = func(*args)
cpu_end = process.cpu_times()
elapsed = time.perf_counter() - start
cpu_time = (cpu_end.user - cpu_start.user) + (cpu_end.system - cpu_start.system)
cpu_percent = (cpu_time / elapsed) * 100
print(f"Elapsed: {elapsed:.2f}s")
print(f"CPU time: {cpu_time:.2f}s")
print(f"CPU usage: {cpu_percent:.1f}%")
return result
# CPU-bound: ~100% CPU usage
monitor_cpu(cpu_bound_task, 10_000_000)
# I/O-bound: ~5% CPU usage
monitor_cpu(io_bound_task, "https://httpbin.org/delay/2")
Method 2: Analyze the Code¶
| Look for... | Task Type |
|---|---|
| Loops with computation | CPU-bound |
| Mathematical operations | CPU-bound |
requests, urllib |
I/O-bound |
File open(), read(), write() |
I/O-bound |
| Database queries | I/O-bound |
time.sleep() |
I/O-bound (simulated) |
subprocess calls |
Usually I/O-bound |
Concurrency Strategy by Task Type¶
CPU-Bound → Use Processes¶
from concurrent.futures import ProcessPoolExecutor
import os
def compute_heavy(n):
"""CPU-intensive task."""
print(f"Process {os.getpid()}: computing...")
return sum(i * i for i in range(n))
numbers = [10_000_000, 20_000_000, 30_000_000, 40_000_000]
# ProcessPoolExecutor bypasses GIL
with ProcessPoolExecutor() as executor:
results = list(executor.map(compute_heavy, numbers))
print(results)
Why processes? - Each process has its own Python interpreter - Each process has its own GIL - True parallel execution on multiple CPU cores
I/O-Bound → Use Threads¶
from concurrent.futures import ThreadPoolExecutor
import requests
def fetch_url(url):
"""I/O-intensive task."""
response = requests.get(url)
return len(response.content)
urls = [
"https://httpbin.org/delay/1",
"https://httpbin.org/delay/1",
"https://httpbin.org/delay/1",
"https://httpbin.org/delay/1",
]
# ThreadPoolExecutor works well — GIL released during I/O
with ThreadPoolExecutor(max_workers=4) as executor:
results = list(executor.map(fetch_url, urls))
print(results)
Why threads? - GIL is released during I/O operations - Threads are lighter weight than processes - Shared memory makes data passing easy
Benchmark: Threads vs Processes¶
CPU-Bound Benchmark¶
import time
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
def cpu_task(n):
return sum(i ** 2 for i in range(n))
data = [5_000_000] * 4
# Sequential
start = time.perf_counter()
results = [cpu_task(n) for n in data]
seq_time = time.perf_counter() - start
print(f"Sequential: {seq_time:.2f}s")
# Threads (limited by GIL)
start = time.perf_counter()
with ThreadPoolExecutor(max_workers=4) as ex:
results = list(ex.map(cpu_task, data))
thread_time = time.perf_counter() - start
print(f"Threads: {thread_time:.2f}s")
# Processes (true parallelism)
start = time.perf_counter()
with ProcessPoolExecutor(max_workers=4) as ex:
results = list(ex.map(cpu_task, data))
proc_time = time.perf_counter() - start
print(f"Processes: {proc_time:.2f}s")
# Typical results (4-core machine):
# Sequential: 3.2s
# Threads: 3.5s ← No speedup (GIL)
# Processes: 0.9s ← ~4x speedup ✓
I/O-Bound Benchmark¶
import time
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
def io_task(seconds):
time.sleep(seconds) # Simulate I/O
return seconds
data = [1] * 4
# Sequential
start = time.perf_counter()
results = [io_task(n) for n in data]
seq_time = time.perf_counter() - start
print(f"Sequential: {seq_time:.2f}s")
# Threads
start = time.perf_counter()
with ThreadPoolExecutor(max_workers=4) as ex:
results = list(ex.map(io_task, data))
thread_time = time.perf_counter() - start
print(f"Threads: {thread_time:.2f}s")
# Processes
start = time.perf_counter()
with ProcessPoolExecutor(max_workers=4) as ex:
results = list(ex.map(io_task, data))
proc_time = time.perf_counter() - start
print(f"Processes: {proc_time:.2f}s")
# Typical results:
# Sequential: 4.0s
# Threads: 1.0s ← 4x speedup ✓
# Processes: 1.1s ← Similar, but more overhead
Mixed Workloads¶
Many real applications have both CPU and I/O components.
Example: Download and Process Images¶
import requests
from PIL import Image
from io import BytesIO
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
def download_image(url):
"""I/O-bound: fetch from network."""
response = requests.get(url)
return response.content
def process_image(image_bytes):
"""CPU-bound: resize and transform."""
img = Image.open(BytesIO(image_bytes))
img = img.resize((100, 100))
# More CPU-intensive operations...
return img
urls = ["https://example.com/img1.jpg", "https://example.com/img2.jpg", ...]
# Stage 1: Download (I/O-bound) — use threads
with ThreadPoolExecutor(max_workers=10) as executor:
image_bytes_list = list(executor.map(download_image, urls))
# Stage 2: Process (CPU-bound) — use processes
with ProcessPoolExecutor() as executor:
processed_images = list(executor.map(process_image, image_bytes_list))
Example: Web Scraping with Parsing¶
import requests
from bs4 import BeautifulSoup
from concurrent.futures import ThreadPoolExecutor
def scrape_and_parse(url):
"""Mixed: I/O (fetch) + CPU (parse)."""
# I/O-bound part
response = requests.get(url)
# CPU-bound part (but usually fast enough)
soup = BeautifulSoup(response.text, 'html.parser')
return soup.find_all('a')
# For mixed tasks where I/O dominates, threads work well
urls = [...]
with ThreadPoolExecutor(max_workers=10) as executor:
results = list(executor.map(scrape_and_parse, urls))
Decision Matrix¶
| Task Type | Sequential | Threads | Processes |
|---|---|---|---|
| CPU-bound | Baseline | No speedup | ✓ Best |
| I/O-bound | Baseline | ✓ Best | Good (more overhead) |
| Mixed (I/O dominant) | Baseline | ✓ Best | Good |
| Mixed (CPU dominant) | Baseline | Some speedup | ✓ Best |
Quick Decision Guide¶
def choose_executor(task_type):
"""Choose the right executor for your task."""
if task_type == "cpu_bound":
# CPU-bound: need true parallelism
from concurrent.futures import ProcessPoolExecutor
return ProcessPoolExecutor()
elif task_type == "io_bound":
# I/O-bound: threads work, lower overhead
from concurrent.futures import ThreadPoolExecutor
return ThreadPoolExecutor(max_workers=20)
elif task_type == "mixed_io_dominant":
# Mixed but mostly waiting: threads fine
from concurrent.futures import ThreadPoolExecutor
return ThreadPoolExecutor(max_workers=10)
elif task_type == "mixed_cpu_dominant":
# Mixed but mostly computing: processes
from concurrent.futures import ProcessPoolExecutor
return ProcessPoolExecutor()
Key Takeaways¶
- CPU-bound: CPU is busy computing → use processes
- I/O-bound: CPU is waiting for I/O → use threads
- GIL blocks thread parallelism for CPU work, but releases during I/O
- Measure your task's CPU usage to determine type
- Mixed workloads: Consider which component dominates, or use two-stage pipeline
- When in doubt: Start with threads for simplicity, switch to processes if no speedup