Skip to content

Introduction to Concurrency

Concurrency is about dealing with multiple things at once. This chapter covers Python's tools for concurrent and parallel execution.


Why Concurrency?

The Problem: Sequential Execution

import time

def download_file(url):
    print(f"Downloading {url}...")
    time.sleep(2)  # Simulate network delay
    print(f"Finished {url}")

# Sequential: 6 seconds total
urls = ["file1.zip", "file2.zip", "file3.zip"]
for url in urls:
    download_file(url)

Each download waits for the previous one to complete. Total time: 6 seconds.

The Solution: Concurrent Execution

import time
from concurrent.futures import ThreadPoolExecutor

def download_file(url):
    print(f"Downloading {url}...")
    time.sleep(2)
    print(f"Finished {url}")

# Concurrent: ~2 seconds total
urls = ["file1.zip", "file2.zip", "file3.zip"]
with ThreadPoolExecutor() as executor:
    executor.map(download_file, urls)

All downloads happen simultaneously. Total time: ~2 seconds.


Key Terminology

Concurrency vs Parallelism

Term Definition Analogy
Concurrency Managing multiple tasks at once One chef juggling multiple dishes
Parallelism Executing multiple tasks simultaneously Multiple chefs cooking simultaneously

Concurrency is about structure — organizing code to handle multiple tasks. Parallelism is about execution — actually running tasks at the same time.

Concurrency (single core):
Task A: ██░░██░░██
Task B: ░░██░░██░░
        Time →

Parallelism (multiple cores):
Task A: ██████████  (Core 1)
Task B: ██████████  (Core 2)
        Time →

Threads vs Processes

Aspect Thread Process
Memory Shared memory space Separate memory space
Creation Fast, lightweight Slower, heavier
Communication Easy (shared variables) Requires IPC (queues, pipes)
GIL impact Affected by GIL Not affected by GIL
Best for I/O-bound tasks CPU-bound tasks

Synchronous vs Asynchronous

Mode Description
Synchronous Wait for each operation to complete before starting next
Asynchronous Start operations without waiting, handle results when ready

Python's Concurrency Tools

Standard Library Modules

Module Purpose Use Case
threading Thread-based concurrency I/O-bound tasks
multiprocessing Process-based parallelism CPU-bound tasks
concurrent.futures High-level interface Both (recommended)
asyncio Async I/O High-concurrency I/O
queue Thread-safe queues Producer-consumer patterns

Which to Use?

Start here
    │
    ├─ Is the task CPU-intensive (computation)?
    │   │
    │   ├─ Yes → multiprocessing / ProcessPoolExecutor
    │   │
    │   └─ No → Continue
    │
    ├─ Is the task I/O-intensive (network, disk)?
    │   │
    │   ├─ Yes → threading / ThreadPoolExecutor
    │   │        or asyncio for very high concurrency
    │   │
    │   └─ No → Sequential is probably fine
    │
    └─ Simple parallel map over data?
        │
        └─ Yes → concurrent.futures (easiest)

Real-World Examples

I/O-Bound: Web Scraping

import requests
from concurrent.futures import ThreadPoolExecutor

urls = [
    "https://example.com/page1",
    "https://example.com/page2",
    "https://example.com/page3",
]

def fetch(url):
    response = requests.get(url)
    return len(response.content)

# Threads work well — waiting for network I/O
with ThreadPoolExecutor(max_workers=10) as executor:
    results = list(executor.map(fetch, urls))

CPU-Bound: Number Crunching

from concurrent.futures import ProcessPoolExecutor

def compute_heavy(n):
    """CPU-intensive calculation."""
    return sum(i * i for i in range(n))

numbers = [10_000_000, 20_000_000, 30_000_000]

# Processes work well — true parallel computation
with ProcessPoolExecutor() as executor:
    results = list(executor.map(compute_heavy, numbers))

Mixed: Data Pipeline

from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor

def download_data(url):
    """I/O-bound: fetch from network."""
    import requests
    return requests.get(url).json()

def process_data(data):
    """CPU-bound: heavy computation."""
    return expensive_computation(data)

# Stage 1: Download (I/O-bound) — use threads
with ThreadPoolExecutor() as executor:
    raw_data = list(executor.map(download_data, urls))

# Stage 2: Process (CPU-bound) — use processes
with ProcessPoolExecutor() as executor:
    results = list(executor.map(process_data, raw_data))

Performance Comparison

Benchmark: I/O-Bound Task

import time
import requests
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor

def fetch_url(url):
    requests.get(url)
    return url

urls = ["https://httpbin.org/delay/1"] * 5

# Sequential: ~5 seconds
# ThreadPool: ~1 second  ✓ Best
# ProcessPool: ~1.5 seconds (overhead)

Benchmark: CPU-Bound Task

import time
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor

def compute(n):
    return sum(i ** 2 for i in range(n))

numbers = [5_000_000] * 4

# Sequential: ~4 seconds (on 4-core machine)
# ThreadPool: ~4 seconds (GIL blocks parallelism)
# ProcessPool: ~1 second  ✓ Best

Common Pitfalls

1. Using Threads for CPU-Bound Work

# Bad: Threads don't help with CPU-bound tasks
with ThreadPoolExecutor() as executor:
    results = executor.map(heavy_computation, data)  # No speedup!

# Good: Use processes for CPU-bound tasks
with ProcessPoolExecutor() as executor:
    results = executor.map(heavy_computation, data)  # Real parallelism

2. Too Many Workers

# Bad: 1000 threads/processes is wasteful
with ThreadPoolExecutor(max_workers=1000) as executor:
    ...

# Good: Match workers to task type
# I/O-bound: 10-50 threads typically sufficient
# CPU-bound: Match CPU cores
import os
with ProcessPoolExecutor(max_workers=os.cpu_count()) as executor:
    ...

3. Shared State Without Synchronization

# Bad: Race condition
counter = 0
def increment():
    global counter
    counter += 1  # Not thread-safe!

# Good: Use synchronization
import threading
counter = 0
lock = threading.Lock()
def increment():
    global counter
    with lock:
        counter += 1

Chapter Overview

This chapter covers:

  1. Concurrency Concepts — GIL, CPU vs I/O bound, threads vs processes
  2. threading Module — Creating threads, synchronization, communication
  3. multiprocessing Module — Processes, pools, sharing state
  4. concurrent.futures — Modern, high-level API (recommended)
  5. Practical Patterns — Decision guide, common patterns, error handling

Key Takeaways

  • Concurrency = managing multiple tasks; Parallelism = running simultaneously
  • Threads share memory, affected by GIL — best for I/O-bound tasks
  • Processes have separate memory, bypass GIL — best for CPU-bound tasks
  • concurrent.futures provides the cleanest API for most use cases
  • Match your concurrency strategy to your task type
  • Always consider synchronization when sharing state