GIL and Hardware¶
What is the GIL?¶
The Global Interpreter Lock (GIL) is a mutex that protects access to Python objects, preventing multiple threads from executing Python bytecode simultaneously.
Without GIL (hypothetical):
┌────────────────────────────────────────────────────────────┐
│ Thread 1: refcount++ Thread 2: refcount++ │
│ │
│ Both read refcount = 1 │
│ Both compute 1 + 1 = 2 │
│ Both write refcount = 2 │
│ │
│ Expected: 3, Got: 2 → Memory corruption! │
└────────────────────────────────────────────────────────────┘
With GIL:
┌────────────────────────────────────────────────────────────┐
│ Thread 1: [acquire GIL][refcount++][release GIL] │
│ Thread 2: [wait........][acquire GIL][ref++] │
│ │
│ Operations are serialized → Safe, but not parallel │
└────────────────────────────────────────────────────────────┘
GIL and Multi-Core CPUs¶
Modern CPUs have multiple cores, but Python can only use one at a time for Python code:
8-Core CPU with Python Threads:
Core 0: [Python][Python][Python][Python][Python] ← All Python here
Core 1: [idle ][idle ][idle ][idle ][idle ]
Core 2: [idle ][idle ][idle ][idle ][idle ]
Core 3: [idle ][idle ][idle ][idle ][idle ]
Core 4: [idle ][idle ][idle ][idle ][idle ]
Core 5: [idle ][idle ][idle ][idle ][idle ]
Core 6: [idle ][idle ][idle ][idle ][idle ]
Core 7: [idle ][idle ][idle ][idle ][idle ]
7 cores sitting idle despite multiple threads!
Demonstrating the GIL¶
import threading
import time
def cpu_bound_task(n):
"""CPU-intensive: count to n."""
count = 0
for _ in range(n):
count += 1
return count
n = 50_000_000
# Single-threaded
start = time.perf_counter()
cpu_bound_task(n)
single_time = time.perf_counter() - start
print(f"Single thread: {single_time:.2f}s")
# Two threads (should be 2x faster, right?)
start = time.perf_counter()
t1 = threading.Thread(target=cpu_bound_task, args=(n//2,))
t2 = threading.Thread(target=cpu_bound_task, args=(n//2,))
t1.start()
t2.start()
t1.join()
t2.join()
two_thread_time = time.perf_counter() - start
print(f"Two threads: {two_thread_time:.2f}s")
print(f"Speedup: {single_time/two_thread_time:.2f}x")
Typical output:
Single thread: 2.50s
Two threads: 2.80s ← Actually SLOWER!
Speedup: 0.89x
Two threads are slower due to GIL contention overhead!
When the GIL is Released¶
The GIL is released during:
1. I/O Operations¶
import threading
import time
import urllib.request
def download(url):
"""I/O bound: downloads URL."""
urllib.request.urlopen(url).read()
urls = ['http://example.com'] * 4
# Sequential
start = time.perf_counter()
for url in urls:
download(url)
sequential_time = time.perf_counter() - start
# Parallel (GIL released during network I/O)
start = time.perf_counter()
threads = [threading.Thread(target=download, args=(url,)) for url in urls]
for t in threads:
t.start()
for t in threads:
t.join()
parallel_time = time.perf_counter() - start
print(f"Sequential: {sequential_time:.2f}s")
print(f"Parallel: {parallel_time:.2f}s")
print(f"Speedup: {sequential_time/parallel_time:.2f}x")
Sequential: 2.00s
Parallel: 0.55s
Speedup: 3.64x ← Threading works for I/O!
2. NumPy Operations¶
import numpy as np
import threading
import time
def numpy_operation(arr):
"""NumPy releases GIL during computation."""
for _ in range(100):
np.dot(arr, arr)
arr = np.random.rand(1000, 1000)
# NumPy operations CAN run in parallel
# because NumPy releases the GIL
3. C Extensions That Release GIL¶
// C extension code
Py_BEGIN_ALLOW_THREADS
// ... long computation without Python objects ...
Py_END_ALLOW_THREADS
Working Around the GIL¶
Solution 1: Multiprocessing¶
Use separate processes instead of threads:
from multiprocessing import Pool
import time
def cpu_bound_task(n):
count = 0
for _ in range(n):
count += 1
return count
n = 50_000_000
# Single process
start = time.perf_counter()
cpu_bound_task(n)
single_time = time.perf_counter() - start
# Multiple processes (no GIL issue!)
start = time.perf_counter()
with Pool(4) as pool:
pool.map(cpu_bound_task, [n//4] * 4)
multi_time = time.perf_counter() - start
print(f"Single process: {single_time:.2f}s")
print(f"Four processes: {multi_time:.2f}s")
print(f"Speedup: {single_time/multi_time:.2f}x")
Single process: 2.50s
Four processes: 0.70s
Speedup: 3.57x ← Real parallelism!
Multiprocessing vs Threading:
Threading:
┌──────────┐
│ Process │
│ ┌──────┐ │ GIL
│ │Thread│◀┼───────────────────────▶ Only one runs Python
│ │Thread│ │
│ │Thread│ │
│ └──────┘ │
└──────────┘
Multiprocessing:
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Process │ │ Process │ │ Process │
│ ┌──────┐ │ │ ┌──────┐ │ │ ┌──────┐ │
│ │ GIL │ │ │ │ GIL │ │ │ │ GIL │ │ Each has own GIL
│ └──────┘ │ │ └──────┘ │ │ └──────┘ │
└──────────┘ └──────────┘ └──────────┘
▲ ▲ ▲
└─────────────┼─────────────┘
True parallelism!
Solution 2: Use NumPy/SciPy¶
import numpy as np
import time
n = 10_000_000
# Pure Python (GIL-bound)
data = list(range(n))
start = time.perf_counter()
result = sum(x**2 for x in data)
python_time = time.perf_counter() - start
# NumPy (releases GIL, uses SIMD)
arr = np.arange(n)
start = time.perf_counter()
result = np.sum(arr**2)
numpy_time = time.perf_counter() - start
print(f"Python: {python_time:.2f}s")
print(f"NumPy: {numpy_time:.3f}s")
print(f"Speedup: {python_time/numpy_time:.0f}x")
Solution 3: Numba with nogil¶
from numba import jit, prange
import numpy as np
import time
@jit(nopython=True, parallel=True)
def parallel_sum_squares(arr):
"""Numba can release GIL and parallelize."""
total = 0.0
for i in prange(len(arr)): # prange = parallel range
total += arr[i] ** 2
return total
arr = np.random.rand(10_000_000)
# Warm up JIT
parallel_sum_squares(arr)
# Benchmark
start = time.perf_counter()
result = parallel_sum_squares(arr)
elapsed = time.perf_counter() - start
print(f"Time: {elapsed:.4f}s")
GIL and Hardware Utilization¶
Task Type Threading Multiprocessing NumPy
────────────────────────────────────────────────────────────
CPU-bound Python ✗ No gain ✓ Full parallel N/A
CPU-bound NumPy ✓ Can help ✓ Full parallel ✓ Built-in
I/O-bound ✓ Works ✓ Works N/A
Memory-bound ✗ Limited ✗ Limited ✓ Optimized
The Future: Free-threaded Python¶
Python 3.13+ introduces experimental GIL-free mode:
# Build Python with --disable-gil (experimental)
# Or use the free-threaded build
python3.13t script.py # 't' suffix = free-threaded
# In free-threaded Python, true parallelism is possible
import threading
# This will actually use multiple cores!
threads = [threading.Thread(target=cpu_task) for _ in range(4)]
Summary¶
| Scenario | GIL Impact | Solution |
|---|---|---|
| CPU-bound Python | Serialized | multiprocessing |
| I/O-bound | Released during I/O | threading works |
| NumPy computation | Released | threading can help |
| C extensions | Can release | depends on extension |
Key points:
- GIL prevents true threading parallelism for Python code
- GIL is released during I/O and many C extensions
- Use
multiprocessingfor CPU-bound parallelism - Use
threadingfor I/O-bound concurrency - NumPy releases GIL, enabling parallel computation
- Free-threaded Python (3.13+) removes GIL (experimental)
Decision Tree:
Is your code CPU-bound?
├── Yes: Pure Python?
│ ├── Yes → Use multiprocessing
│ └── No (NumPy) → Threading may help, NumPy parallelizes internally
└── No (I/O-bound) → Use threading or asyncio