When to Use What¶
A practical guide for choosing the right concurrency approach in Python.
Quick Decision Flowchart¶
Start
│
├─ Is your task CPU-intensive?
│ │
│ ├─ Yes ──────────────────────────► ProcessPoolExecutor
│ │ (or multiprocessing.Pool)
│ │
│ └─ No (I/O-intensive or waiting)
│ │
│ ├─ Many concurrent connections (1000+)?
│ │ │
│ │ └─ Yes ──────────────────► asyncio
│ │
│ └─ Moderate concurrency (10-100)?
│ │
│ └─ Yes ──────────────────► ThreadPoolExecutor
│ (or threading)
│
└─ Simple parallel map over data?
│
└─ Yes ──────────────────────────► concurrent.futures
(easiest choice)
Decision Matrix¶
| Task Type | Recommended Approach | Why |
|---|---|---|
| CPU-bound computation | ProcessPoolExecutor |
Bypasses GIL, true parallelism |
| File I/O | ThreadPoolExecutor |
GIL released during I/O |
| Network requests | ThreadPoolExecutor |
GIL released during I/O |
| Database queries | ThreadPoolExecutor |
GIL released during I/O |
| Web scraping | ThreadPoolExecutor |
Mostly waiting for network |
| Image processing | ProcessPoolExecutor |
CPU-intensive pixel operations |
| Data transformation | ProcessPoolExecutor |
CPU-intensive computation |
| API calls | ThreadPoolExecutor |
Network I/O dominant |
| High-concurrency server | asyncio |
Handles thousands of connections |
| Mixed I/O + CPU | Both (pipeline) | Separate stages appropriately |
Detailed Guidelines¶
Use ThreadPoolExecutor When:¶
from concurrent.futures import ThreadPoolExecutor
# ✅ Network requests
def fetch_url(url):
return requests.get(url).text
with ThreadPoolExecutor(max_workers=20) as executor:
results = executor.map(fetch_url, urls)
# ✅ File I/O
def read_file(path):
return open(path).read()
with ThreadPoolExecutor(max_workers=10) as executor:
contents = executor.map(read_file, file_paths)
# ✅ Database queries
def query_db(sql):
return db.execute(sql)
with ThreadPoolExecutor(max_workers=5) as executor:
results = executor.map(query_db, queries)
Characteristics: - Task spends most time waiting (I/O) - Low CPU usage per task - Need shared memory/state - Quick startup needed
Use ProcessPoolExecutor When:¶
from concurrent.futures import ProcessPoolExecutor
# ✅ Heavy computation
def compute(n):
return sum(i**2 for i in range(n))
with ProcessPoolExecutor() as executor:
results = executor.map(compute, large_numbers)
# ✅ Image/video processing
def process_image(image_path):
img = load_image(image_path)
return apply_filters(img)
with ProcessPoolExecutor() as executor:
results = executor.map(process_image, image_paths)
# ✅ Data transformation
def transform(chunk):
return chunk.apply(complex_operation)
with ProcessPoolExecutor() as executor:
results = executor.map(transform, data_chunks)
Characteristics: - Task is CPU-intensive - High CPU usage per task - Can tolerate memory copying overhead - Objects are picklable
Use asyncio When:¶
import asyncio
import aiohttp
# ✅ Many concurrent connections
async def fetch(session, url):
async with session.get(url) as response:
return await response.text()
async def main():
async with aiohttp.ClientSession() as session:
tasks = [fetch(session, url) for url in urls]
results = await asyncio.gather(*tasks)
asyncio.run(main())
Characteristics: - Very high concurrency (thousands) - All I/O operations - Need fine-grained control - Single-threaded is acceptable
Use Raw threading/multiprocessing When:¶
import threading
from multiprocessing import Process, Queue
# ✅ Long-running background tasks
def background_worker():
while True:
process_queue()
thread = threading.Thread(target=background_worker, daemon=True)
thread.start()
# ✅ Need fine-grained control over processes
def worker(queue):
while True:
item = queue.get()
if item is None:
break
process(item)
queue = Queue()
processes = [Process(target=worker, args=(queue,)) for _ in range(4)]
Characteristics: - Need long-running workers - Custom synchronization required - Complex communication patterns - Pool pattern doesn't fit
Anti-Patterns to Avoid¶
❌ Threads for CPU-Bound Work¶
# Bad: No speedup due to GIL
with ThreadPoolExecutor() as executor:
results = executor.map(heavy_computation, data)
# Good: Use processes
with ProcessPoolExecutor() as executor:
results = executor.map(heavy_computation, data)
❌ Processes for Quick I/O Tasks¶
# Bad: Process overhead dominates
with ProcessPoolExecutor() as executor:
results = executor.map(quick_api_call, urls)
# Good: Use threads
with ThreadPoolExecutor() as executor:
results = executor.map(quick_api_call, urls)
❌ Too Many Workers¶
# Bad: Resource waste
with ThreadPoolExecutor(max_workers=1000) as executor:
...
# Good: Match to workload
# I/O-bound: 10-50 threads typically sufficient
# CPU-bound: os.cpu_count() processes
❌ Shared Mutable State Without Locks¶
# Bad: Race condition
results = []
def worker(x):
results.append(x ** 2) # Not thread-safe!
# Good: Use thread-safe structures
from queue import Queue
result_queue = Queue()
def worker(x):
result_queue.put(x ** 2)
Performance Comparison¶
Benchmark Template¶
import time
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
def benchmark(name, executor_class, func, data, workers=None):
start = time.perf_counter()
with executor_class(max_workers=workers) as executor:
list(executor.map(func, data))
elapsed = time.perf_counter() - start
print(f"{name}: {elapsed:.2f}s")
# Test your specific workload
data = [...]
# Sequential baseline
start = time.perf_counter()
results = [func(x) for x in data]
print(f"Sequential: {time.perf_counter() - start:.2f}s")
# Threads
benchmark("Threads", ThreadPoolExecutor, func, data, workers=10)
# Processes
benchmark("Processes", ProcessPoolExecutor, func, data)
Typical Results¶
| Workload | Sequential | Threads | Processes |
|---|---|---|---|
| CPU-bound (4 items) | 4.0s | 4.2s ❌ | 1.1s ✅ |
| I/O-bound (4 items) | 4.0s | 1.1s ✅ | 1.2s |
| Mixed (4 items) | 4.0s | 2.5s | 1.5s ✅ |
Choosing Worker Count¶
ThreadPoolExecutor¶
import os
# I/O-bound: more threads than CPUs
# Most time waiting, not computing
max_workers = 20 # or even 50-100 for pure I/O
# Mixed workload
max_workers = os.cpu_count() * 2
# Default (Python 3.8+)
# min(32, os.cpu_count() + 4)
ProcessPoolExecutor¶
import os
# CPU-bound: match CPU cores
max_workers = os.cpu_count()
# Leave headroom for system
max_workers = max(1, os.cpu_count() - 1)
# Default
# os.cpu_count()
Summary Table¶
| Approach | Best For | GIL | Overhead | Memory |
|---|---|---|---|---|
ThreadPoolExecutor |
I/O-bound | Blocked | Low | Shared |
ProcessPoolExecutor |
CPU-bound | Bypassed | High | Isolated |
asyncio |
High concurrency I/O | Blocked | Lowest | Shared |
Raw threading |
Custom thread control | Blocked | Low | Shared |
Raw multiprocessing |
Custom process control | Bypassed | High | Isolated |
Key Takeaways¶
- I/O-bound (network, files, database) → Threads
- CPU-bound (computation, processing) → Processes
- Very high concurrency (10,000+ connections) → asyncio
- Simple parallel map → concurrent.futures (start here)
- Mixed workloads → Pipeline with appropriate executor per stage
- When in doubt → Start with
ThreadPoolExecutor, measure, adjust - Always measure your specific workload before deciding
Runnable Example: decision_guide_tutorial.py¶
"""
Topic 45.6 - When to Use Threading vs Multiprocessing
Complete decision guide with practical examples and benchmarks to help
you choose the right concurrency approach for your specific use case.
Learning Objectives:
- Decision criteria for threading vs multiprocessing
- Performance characteristics of each approach
- Common use cases and patterns
- Hybrid approaches
- Real-world examples
Author: Python Educator
Date: 2024
"""
import threading
import multiprocessing
import time
import requests
import json
from queue import Queue
from multiprocessing import Pool, cpu_count
# ============================================================================
# PART 1: BEGINNER - Core Decision Criteria
# ============================================================================
def explain_core_differences():
"""
Fundamental differences between threading and multiprocessing.
"""
print("=" * 70)
print("BEGINNER: Core Differences")
print("=" * 70)
print("\n" + "─" * 70)
print("│ ASPECT │ THREADING │ MULTIPROCESSING │")
print("─" * 70)
print("│ GIL Impact │ Limited by GIL │ No GIL! │")
print("│ Memory │ Shared │ Separate copies │")
print("│ Startup Cost │ Fast (~1ms) │ Slow (~10-50ms) │")
print("│ Communication │ Easy (shared) │ IPC required │")
print("│ CPU-Bound Tasks │ ❌ Bad │ ✓ Excellent │")
print("│ I/O-Bound Tasks │ ✓ Excellent │ ✓ Good │")
print("│ Debugging │ Easier │ Harder │")
print("│ Resource Usage │ Light │ Heavy │")
print("─" * 70)
print("\n📚 KEY RULE OF THUMB:")
print(" • CPU-Bound (computation) → MULTIPROCESSING")
print(" • I/O-Bound (waiting) → THREADING or ASYNCIO")
print("\n💡 CPU-Bound Examples:")
print(" - Mathematical calculations")
print(" - Data processing and analysis")
print(" - Image/video processing")
print(" - Machine learning training")
print(" - Compression/encryption")
print("\n💡 I/O-Bound Examples:")
print(" - Network requests (API calls)")
print(" - File operations (read/write)")
print(" - Database queries")
print(" - Web scraping")
print(" - User input waiting")
print("\n" + "=" * 70 + "\n")
def cpu_bound_comparison():
"""
Direct comparison: CPU-bound task with threading vs multiprocessing.
"""
print("=" * 70)
print("BEGINNER: CPU-Bound Task Comparison")
print("=" * 70)
def compute_fibonacci(n):
"""
CPU-intensive recursive Fibonacci.
Args:
n: Fibonacci number to compute
Returns:
nth Fibonacci number
"""
if n <= 1:
return n
return compute_fibonacci(n - 1) + compute_fibonacci(n - 2)
numbers = [35, 35, 35, 35] # 4 heavy computations
# Sequential baseline
print("\n⏱️ Sequential (baseline):")
start = time.time()
results_seq = [compute_fibonacci(n) for n in numbers]
seq_time = time.time() - start
print(f" Time: {seq_time:.2f}s")
# Threading (limited by GIL)
print("\n⏱️ Threading (4 threads):")
start = time.time()
results_threading = []
threads = []
def worker(n, index, results):
results[index] = compute_fibonacci(n)
results_threading = [None] * len(numbers)
for i, n in enumerate(numbers):
thread = threading.Thread(target=worker, args=(n, i, results_threading))
threads.append(thread)
thread.start()
for thread in threads:
thread.join()
threading_time = time.time() - start
print(f" Time: {threading_time:.2f}s")
print(f" Speedup: {seq_time/threading_time:.2f}x")
# Multiprocessing (true parallelism)
print("\n⏱️ Multiprocessing (4 processes):")
start = time.time()
with Pool(4) as pool:
results_mp = pool.map(compute_fibonacci, numbers)
mp_time = time.time() - start
print(f" Time: {mp_time:.2f}s")
print(f" Speedup: {seq_time/mp_time:.2f}x")
# Analysis
print("\n📊 Summary:")
print(f" Sequential: {seq_time:.2f}s (1.00x)")
print(f" Threading: {threading_time:.2f}s ({seq_time/threading_time:.2f}x)")
print(f" Multiprocessing: {mp_time:.2f}s ({seq_time/mp_time:.2f}x)")
print("\n✓ Winner: MULTIPROCESSING")
print(" Threading showed minimal improvement due to GIL")
print(" Multiprocessing achieved near-linear speedup")
print("\n" + "=" * 70 + "\n")
def io_bound_comparison():
"""
Direct comparison: I/O-bound task with threading vs multiprocessing.
"""
print("=" * 70)
print("BEGINNER: I/O-Bound Task Comparison")
print("=" * 70)
def simulate_io_operation(task_id):
"""
Simulate I/O operation (network request, file read, etc).
Args:
task_id: Task identifier
Returns:
Task result
"""
# Simulate I/O wait (GIL is released during sleep!)
time.sleep(0.5)
return f"Task {task_id} completed"
task_ids = list(range(20)) # 20 I/O operations
# Sequential baseline
print("\n⏱️ Sequential (baseline):")
start = time.time()
results_seq = [simulate_io_operation(tid) for tid in task_ids]
seq_time = time.time() - start
print(f" Time: {seq_time:.2f}s")
# Threading
print("\n⏱️ Threading (10 threads):")
start = time.time()
results_threading = []
threads = []
def worker(tid, results, lock):
result = simulate_io_operation(tid)
with lock:
results.append(result)
results_threading = []
lock = threading.Lock()
for tid in task_ids:
thread = threading.Thread(target=worker, args=(tid, results_threading, lock))
threads.append(thread)
thread.start()
for thread in threads:
thread.join()
threading_time = time.time() - start
print(f" Time: {threading_time:.2f}s")
print(f" Speedup: {seq_time/threading_time:.2f}x")
# Multiprocessing
print("\n⏱️ Multiprocessing (10 processes):")
start = time.time()
with Pool(10) as pool:
results_mp = pool.map(simulate_io_operation, task_ids)
mp_time = time.time() - start
print(f" Time: {mp_time:.2f}s")
print(f" Speedup: {seq_time/mp_time:.2f}x")
# Analysis
print("\n📊 Summary:")
print(f" Sequential: {seq_time:.2f}s (1.00x)")
print(f" Threading: {threading_time:.2f}s ({seq_time/threading_time:.2f}x)")
print(f" Multiprocessing: {mp_time:.2f}s ({seq_time/mp_time:.2f}x)")
print("\n✓ Winner: THREADING")
print(" Both achieved similar speedup")
print(" Threading has lower overhead and is preferred for I/O")
print("\n" + "=" * 70 + "\n")
# ============================================================================
# PART 2: INTERMEDIATE - Real-World Use Cases
# ============================================================================
def use_case_web_scraping():
"""
Web scraping: I/O-bound → Use Threading
"""
print("=" * 70)
print("INTERMEDIATE: Use Case - Web Scraping")
print("=" * 70)
def fetch_url_simulation(url):
"""
Simulate fetching URL content.
Args:
url: URL to fetch
Returns:
Simulated response
"""
# Simulate network delay
time.sleep(0.3)
return {"url": url, "status": 200, "size": 1024}
urls = [f"https://example.com/page{i}" for i in range(20)]
print(f"\n📝 Task: Scrape {len(urls)} web pages")
print(" Characteristic: I/O-bound (network requests)")
print(" Recommendation: THREADING\n")
# Threading approach
print("⏱️ Using Threading:")
start = time.time()
results = []
threads = []
lock = threading.Lock()
def fetch_worker(url):
result = fetch_url_simulation(url)
with lock:
results.append(result)
for url in urls:
thread = threading.Thread(target=fetch_worker, args=(url,))
threads.append(thread)
thread.start()
for thread in threads:
thread.join()
elapsed = time.time() - start
print(f" Fetched {len(results)} pages in {elapsed:.2f}s")
print(f" Throughput: {len(results)/elapsed:.1f} pages/sec")
print("\n💡 Why Threading?")
print(" ✓ Threads released GIL during network I/O")
print(" ✓ Low overhead (can handle 100+ threads)")
print(" ✓ Easy to share results")
print(" ✗ Multiprocessing would add unnecessary overhead")
print("\n" + "=" * 70 + "\n")
def use_case_image_processing():
"""
Image processing: CPU-bound → Use Multiprocessing
"""
print("=" * 70)
print("INTERMEDIATE: Use Case - Image Processing")
print("=" * 70)
def process_image_simulation(image_id):
"""
Simulate CPU-intensive image processing.
Args:
image_id: Image identifier
Returns:
Processed image info
"""
# Simulate CPU-intensive operations
total = 0
for i in range(1_000_000):
total += i ** 2
return {
"image_id": image_id,
"processed": True,
"checksum": total % 10000
}
image_ids = list(range(20))
print(f"\n📝 Task: Process {len(image_ids)} images")
print(" Operations: Resize, filter, compress (CPU-intensive)")
print(" Recommendation: MULTIPROCESSING\n")
# Multiprocessing approach
print("⏱️ Using Multiprocessing:")
start = time.time()
with Pool(cpu_count()) as pool:
results = pool.map(process_image_simulation, image_ids)
elapsed = time.time() - start
print(f" Processed {len(results)} images in {elapsed:.2f}s")
print(f" Throughput: {len(results)/elapsed:.1f} images/sec")
print(f" Using {cpu_count()} CPU cores")
print("\n💡 Why Multiprocessing?")
print(" ✓ True parallel execution on multiple cores")
print(" ✓ No GIL interference")
print(" ✓ Scales with CPU count")
print(" ✗ Threading would be serialized by GIL")
print("\n" + "=" * 70 + "\n")
def use_case_database_operations():
"""
Database operations: I/O-bound → Use Threading
"""
print("=" * 70)
print("INTERMEDIATE: Use Case - Database Operations")
print("=" * 70)
def execute_query_simulation(query_id):
"""
Simulate database query execution.
Args:
query_id: Query identifier
Returns:
Query result
"""
# Simulate database I/O
time.sleep(0.2)
return {
"query_id": query_id,
"rows": 100,
"duration_ms": 200
}
queries = list(range(30))
print(f"\n📝 Task: Execute {len(queries)} database queries")
print(" Characteristic: I/O-bound (waiting for DB)")
print(" Recommendation: THREADING\n")
print("⏱️ Using Threading with connection pool:")
start = time.time()
# Simulate connection pool with 5 connections
semaphore = threading.Semaphore(5)
results = []
threads = []
lock = threading.Lock()
def query_worker(qid):
with semaphore: # Limit concurrent connections
result = execute_query_simulation(qid)
with lock:
results.append(result)
for qid in queries:
thread = threading.Thread(target=query_worker, args=(qid,))
threads.append(thread)
thread.start()
for thread in threads:
thread.join()
elapsed = time.time() - start
print(f" Executed {len(results)} queries in {elapsed:.2f}s")
print(f" Throughput: {len(results)/elapsed:.1f} queries/sec")
print("\n💡 Why Threading?")
print(" ✓ Database I/O releases GIL")
print(" ✓ Can use connection pool (Semaphore)")
print(" ✓ Lower memory overhead")
print(" ✓ Easier to manage shared state")
print("\n" + "=" * 70 + "\n")
# ============================================================================
# PART 3: ADVANCED - Hybrid Approaches and Edge Cases
# ============================================================================
def hybrid_approach_example():
"""
Combine threading and multiprocessing for complex workloads.
"""
print("=" * 70)
print("ADVANCED: Hybrid Approach (Threading + Multiprocessing)")
print("=" * 70)
def fetch_data(url_id):
"""I/O-bound: Fetch data from network"""
time.sleep(0.2) # Network I/O
return list(range(100)) # Simulated data
def process_data(data):
"""CPU-bound: Process the fetched data"""
# Heavy computation
return sum(x ** 2 for x in data)
num_tasks = 8
print(f"\n📝 Task: Fetch data (I/O) then process it (CPU)")
print(f" {num_tasks} tasks total")
print(" Strategy: Threading for I/O, then Multiprocessing for CPU\n")
start = time.time()
# Phase 1: Fetch data with threading
print("⏱️ Phase 1: Fetching data with threading...")
fetch_start = time.time()
fetched_data = []
threads = []
lock = threading.Lock()
def fetch_worker(url_id):
data = fetch_data(url_id)
with lock:
fetched_data.append(data)
for i in range(num_tasks):
thread = threading.Thread(target=fetch_worker, args=(i,))
threads.append(thread)
thread.start()
for thread in threads:
thread.join()
fetch_time = time.time() - fetch_start
print(f" Fetched {len(fetched_data)} datasets in {fetch_time:.2f}s")
# Phase 2: Process data with multiprocessing
print("\n⏱️ Phase 2: Processing data with multiprocessing...")
process_start = time.time()
with Pool(cpu_count()) as pool:
results = pool.map(process_data, fetched_data)
process_time = time.time() - process_start
print(f" Processed {len(results)} datasets in {process_time:.2f}s")
total_time = time.time() - start
print(f"\n📊 Performance:")
print(f" Fetch time: {fetch_time:.2f}s (threading)")
print(f" Process time: {process_time:.2f}s (multiprocessing)")
print(f" Total time: {total_time:.2f}s")
print("\n💡 Hybrid Approach Benefits:")
print(" ✓ Use best tool for each phase")
print(" ✓ Maximize resource utilization")
print(" ✓ Common in data pipelines")
print(" ✓ ETL workflows, ML pipelines")
print("\n" + "=" * 70 + "\n")
def decision_flowchart():
"""
Interactive decision flowchart for choosing approach.
"""
print("=" * 70)
print("ADVANCED: Decision Flowchart")
print("=" * 70)
print("""
╔════════════════════════════════════════════════════════════════╗
║ THREADING vs MULTIPROCESSING ║
║ DECISION GUIDE ║
╚════════════════════════════════════════════════════════════════╝
START: What kind of task do you have?
│
├─► CPU-Bound (computation, data processing)
│ │
│ ├─► Many independent tasks?
│ │ YES → Use multiprocessing.Pool
│ │ NO → Use multiprocessing.Process
│ │
│ └─► Need shared state?
│ Consider threading if updates are rare
│ Otherwise use multiprocessing with Manager
│
└─► I/O-Bound (network, disk, database)
│
├─► Simple parallel operations?
│ YES → Use threading.Thread or ThreadPoolExecutor
│
├─► High concurrency (100+ operations)?
│ YES → Consider asyncio (async/await)
│
└─► Need blocking I/O with simple code?
YES → Use threading
SPECIAL CASES:
• Mixed workload (I/O + CPU)
→ Hybrid: Threading for I/O, Multiprocessing for CPU
• Real-time requirements
→ Threading (lower latency)
• Memory constraints
→ Threading (shared memory)
• Need true parallelism + I/O
→ Multiprocessing
• Complex state management
→ Threading (easier synchronization)
• Maximum performance on multi-core
→ Multiprocessing (no GIL)
• Quick prototyping
→ Threading (simpler debugging)
""")
print("=" * 70 + "\n")
def performance_characteristics_summary():
"""
Comprehensive performance characteristics table.
"""
print("=" * 70)
print("ADVANCED: Performance Characteristics Summary")
print("=" * 70)
print("""
╔═══════════════════════════════════════════════════════════════════════╗
║ DETAILED COMPARISON TABLE ║
╚═══════════════════════════════════════════════════════════════════════╝
┌───────────────────┬────────────────────┬───────────────────────────┐
│ METRIC │ THREADING │ MULTIPROCESSING │
├───────────────────┼────────────────────┼───────────────────────────┤
│ Startup Time │ ~1 ms │ ~10-50 ms (spawn) │
│ │ │ ~2-5 ms (fork on Unix) │
├───────────────────┼────────────────────┼───────────────────────────┤
│ Memory Overhead │ ~50 KB per thread │ ~10 MB per process │
├───────────────────┼────────────────────┼───────────────────────────┤
│ Communication │ Instant (shared) │ Serialization overhead │
│ │ │ (pickle + IPC) │
├───────────────────┼────────────────────┼───────────────────────────┤
│ Max Workers │ 100-1000+ │ Usually ≤ CPU count × 2 │
├───────────────────┼────────────────────┼───────────────────────────┤
│ CPU Usage │ Limited by GIL │ Full multi-core usage │
│ │ (one core max) │ │
├───────────────────┼────────────────────┼───────────────────────────┤
│ I/O Performance │ Excellent │ Good │
├───────────────────┼────────────────────┼───────────────────────────┤
│ CPU Performance │ Poor (GIL) │ Excellent │
├───────────────────┼────────────────────┼───────────────────────────┤
│ Debugging │ Easier │ Harder │
│ │ (single process) │ (multiple processes) │
├───────────────────┼────────────────────┼───────────────────────────┤
│ Crash Impact │ Crashes all │ Isolated per process │
└───────────────────┴────────────────────┴───────────────────────────┘
PERFORMANCE RANGES (approximate):
Threading Speedup:
• CPU-bound: 1.0x - 1.2x (GIL limited)
• I/O-bound: Nx (N = number of threads, up to 100+)
Multiprocessing Speedup:
• CPU-bound: Nx (N = CPU cores, near linear)
• I/O-bound: Nx (but higher overhead)
RECOMMENDATIONS BY TASK SIZE:
┌──────────────────────┬───────────────┬─────────────────────┐
│ Task Duration │ Task Count │ Best Choice │
├──────────────────────┼───────────────┼─────────────────────┤
│ < 1ms (very quick) │ Many │ Sequential │
│ 1-10ms │ Many │ Threading (chunked) │
│ 10-100ms │ 10-100 │ Threading │
│ > 100ms (I/O) │ Any │ Threading │
│ > 100ms (CPU) │ Any │ Multiprocessing │
│ > 1s (I/O) │ Many │ Async/Threading │
│ > 1s (CPU) │ Few │ Multiprocessing │
└──────────────────────┴───────────────┴─────────────────────┘
""")
print("=" * 70 + "\n")
# ============================================================================
# MAIN EXECUTION
# ============================================================================
def main():
"""Run all demonstrations."""
print("\n" + "=" * 70)
print(" " * 12 + "THREADING vs MULTIPROCESSING")
print(" " * 20 + "Decision Guide")
print("=" * 70 + "\n")
# Beginner level
explain_core_differences()
cpu_bound_comparison()
io_bound_comparison()
# Intermediate level
use_case_web_scraping()
use_case_image_processing()
use_case_database_operations()
# Advanced level
hybrid_approach_example()
decision_flowchart()
performance_characteristics_summary()
print("\n" + "=" * 70)
print("Decision Guide Complete!")
print("=" * 70)
print("\n💡 Quick Decision Rules:")
print("1. CPU-intensive → Multiprocessing")
print("2. I/O-intensive → Threading")
print("3. Mixed workload → Hybrid approach")
print("4. High concurrency I/O → asyncio")
print("5. Simple tasks → ThreadPoolExecutor or Pool")
print("6. When in doubt → Profile and measure!")
print("=" * 70 + "\n")
if __name__ == "__main__":
main()