CPU-Bound vs I/O-Bound Tasks¶

Understanding whether your task is CPU-bound or I/O-bound is crucial for choosing the right concurrency strategy.

Definitions¶

CPU-Bound¶

A task is CPU-bound when it spends most of its time doing computation — the CPU is the bottleneck.

Examples: - Mathematical calculations - Image/video processing - Data compression - Machine learning training - Cryptographic operations - Simulations

def cpu_bound_task(n):
    """Compute sum of squares — CPU is working hard."""
    total = 0
    for i in range(n):
        total += i ** 2
    return total

I/O-Bound¶

A task is I/O-bound when it spends most of its time waiting for input/output operations — the CPU is mostly idle.

Examples: - Network requests (HTTP, database queries) - File reading/writing - User input - API calls - Web scraping

import requests

def io_bound_task(url):
    """Fetch URL — CPU is mostly waiting."""
    response = requests.get(url)  # Waiting for network
    return response.text

Visual Comparison¶

CPU-Bound Execution¶

CPU Activity:
████████████████████████████████████████  100% busy

Timeline:
[compute][compute][compute][compute][done]

I/O-Bound Execution¶

CPU Activity:
██░░░░░░░░░░░░░░██░░░░░░░░░░░░░░██░░░░░░  ~20% busy

Timeline:
[send request][.....waiting.....][process response]

Identifying Task Type¶

Method 1: CPU Usage Monitoring¶

import time
import psutil

def monitor_cpu(func, *args):
    """Run function while monitoring CPU usage."""
    process = psutil.Process()

    start = time.perf_counter()
    cpu_start = process.cpu_times()

    result = func(*args)

    cpu_end = process.cpu_times()
    elapsed = time.perf_counter() - start

    cpu_time = (cpu_end.user - cpu_start.user) + (cpu_end.system - cpu_start.system)
    cpu_percent = (cpu_time / elapsed) * 100

    print(f"Elapsed: {elapsed:.2f}s")
    print(f"CPU time: {cpu_time:.2f}s")
    print(f"CPU usage: {cpu_percent:.1f}%")

    return result

# CPU-bound: ~100% CPU usage
monitor_cpu(cpu_bound_task, 10_000_000)

# I/O-bound: ~5% CPU usage
monitor_cpu(io_bound_task, "https://httpbin.org/delay/2")

Method 2: Analyze the Code¶

Look for...	Task Type
Loops with computation	CPU-bound
Mathematical operations	CPU-bound
`requests`, `urllib`	I/O-bound
File `open()`, `read()`, `write()`	I/O-bound
Database queries	I/O-bound
`time.sleep()`	I/O-bound (simulated)
`subprocess` calls	Usually I/O-bound

Concurrency Strategy by Task Type¶

CPU-Bound → Use Processes¶

from concurrent.futures import ProcessPoolExecutor
import os

def compute_heavy(n):
    """CPU-intensive task."""
    print(f"Process {os.getpid()}: computing...")
    return sum(i * i for i in range(n))

numbers = [10_000_000, 20_000_000, 30_000_000, 40_000_000]

# ProcessPoolExecutor bypasses GIL
with ProcessPoolExecutor() as executor:
    results = list(executor.map(compute_heavy, numbers))

print(results)

Why processes? - Each process has its own Python interpreter - Each process has its own GIL - True parallel execution on multiple CPU cores

I/O-Bound → Use Threads¶

from concurrent.futures import ThreadPoolExecutor
import requests

def fetch_url(url):
    """I/O-intensive task."""
    response = requests.get(url)
    return len(response.content)

urls = [
    "https://httpbin.org/delay/1",
    "https://httpbin.org/delay/1",
    "https://httpbin.org/delay/1",
    "https://httpbin.org/delay/1",
]

# ThreadPoolExecutor works well — GIL released during I/O
with ThreadPoolExecutor(max_workers=4) as executor:
    results = list(executor.map(fetch_url, urls))

print(results)

Why threads? - GIL is released during I/O operations - Threads are lighter weight than processes - Shared memory makes data passing easy

Benchmark: Threads vs Processes¶

CPU-Bound Benchmark¶

import time
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor

def cpu_task(n):
    return sum(i ** 2 for i in range(n))

data = [5_000_000] * 4

# Sequential
start = time.perf_counter()
results = [cpu_task(n) for n in data]
seq_time = time.perf_counter() - start
print(f"Sequential:    {seq_time:.2f}s")

# Threads (limited by GIL)
start = time.perf_counter()
with ThreadPoolExecutor(max_workers=4) as ex:
    results = list(ex.map(cpu_task, data))
thread_time = time.perf_counter() - start
print(f"Threads:       {thread_time:.2f}s")

# Processes (true parallelism)
start = time.perf_counter()
with ProcessPoolExecutor(max_workers=4) as ex:
    results = list(ex.map(cpu_task, data))
proc_time = time.perf_counter() - start
print(f"Processes:     {proc_time:.2f}s")

# Typical results (4-core machine):
# Sequential:    3.2s
# Threads:       3.5s  ← No speedup (GIL)
# Processes:     0.9s  ← ~4x speedup ✓

I/O-Bound Benchmark¶

import time
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor

def io_task(seconds):
    time.sleep(seconds)  # Simulate I/O
    return seconds

data = [1] * 4

# Sequential
start = time.perf_counter()
results = [io_task(n) for n in data]
seq_time = time.perf_counter() - start
print(f"Sequential:    {seq_time:.2f}s")

# Threads
start = time.perf_counter()
with ThreadPoolExecutor(max_workers=4) as ex:
    results = list(ex.map(io_task, data))
thread_time = time.perf_counter() - start
print(f"Threads:       {thread_time:.2f}s")

# Processes
start = time.perf_counter()
with ProcessPoolExecutor(max_workers=4) as ex:
    results = list(ex.map(io_task, data))
proc_time = time.perf_counter() - start
print(f"Processes:     {proc_time:.2f}s")

# Typical results:
# Sequential:    4.0s
# Threads:       1.0s  ← 4x speedup ✓
# Processes:     1.1s  ← Similar, but more overhead

Mixed Workloads¶

Many real applications have both CPU and I/O components.

Example: Download and Process Images¶

import requests
from PIL import Image
from io import BytesIO
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor

def download_image(url):
    """I/O-bound: fetch from network."""
    response = requests.get(url)
    return response.content

def process_image(image_bytes):
    """CPU-bound: resize and transform."""
    img = Image.open(BytesIO(image_bytes))
    img = img.resize((100, 100))
    # More CPU-intensive operations...
    return img

urls = ["https://example.com/img1.jpg", "https://example.com/img2.jpg", ...]

# Stage 1: Download (I/O-bound) — use threads
with ThreadPoolExecutor(max_workers=10) as executor:
    image_bytes_list = list(executor.map(download_image, urls))

# Stage 2: Process (CPU-bound) — use processes
with ProcessPoolExecutor() as executor:
    processed_images = list(executor.map(process_image, image_bytes_list))

Example: Web Scraping with Parsing¶

import requests
from bs4 import BeautifulSoup
from concurrent.futures import ThreadPoolExecutor

def scrape_and_parse(url):
    """Mixed: I/O (fetch) + CPU (parse)."""
    # I/O-bound part
    response = requests.get(url)

    # CPU-bound part (but usually fast enough)
    soup = BeautifulSoup(response.text, 'html.parser')
    return soup.find_all('a')

# For mixed tasks where I/O dominates, threads work well
urls = [...]
with ThreadPoolExecutor(max_workers=10) as executor:
    results = list(executor.map(scrape_and_parse, urls))

Decision Matrix¶

Task Type	Sequential	Threads	Processes
CPU-bound	Baseline	No speedup	✓ Best
I/O-bound	Baseline	✓ Best	Good (more overhead)
Mixed (I/O dominant)	Baseline	✓ Best	Good
Mixed (CPU dominant)	Baseline	Some speedup	✓ Best

Quick Decision Guide¶

def choose_executor(task_type):
    """Choose the right executor for your task."""

    if task_type == "cpu_bound":
        # CPU-bound: need true parallelism
        from concurrent.futures import ProcessPoolExecutor
        return ProcessPoolExecutor()

    elif task_type == "io_bound":
        # I/O-bound: threads work, lower overhead
        from concurrent.futures import ThreadPoolExecutor
        return ThreadPoolExecutor(max_workers=20)

    elif task_type == "mixed_io_dominant":
        # Mixed but mostly waiting: threads fine
        from concurrent.futures import ThreadPoolExecutor
        return ThreadPoolExecutor(max_workers=10)

    elif task_type == "mixed_cpu_dominant":
        # Mixed but mostly computing: processes
        from concurrent.futures import ProcessPoolExecutor
        return ProcessPoolExecutor()

Key Takeaways¶

CPU-bound: CPU is busy computing → use processes
I/O-bound: CPU is waiting for I/O → use threads
GIL blocks thread parallelism for CPU work, but releases during I/O
Measure your task's CPU usage to determine type
Mixed workloads: Consider which component dominates, or use two-stage pipeline
When in doubt: Start with threads for simplicity, switch to processes if no speedup