CPU Cores and Threads¶

Modern processors contain multiple independent execution units called cores. Operating systems schedule program execution onto these cores using threads.

Understanding how cores, threads, and processes interact is essential for writing efficient concurrent and parallel programs, especially in Python.

Many performance issues in Python arise not from CPU speed but from how work is distributed across cores and how the Python runtime interacts with hardware.

1. CPU Cores¶

A CPU core is an independent hardware execution unit capable of running its own instruction stream.

Each core contains:

arithmetic and logical execution units
registers
instruction pipelines
private caches (L1 and often L2)

Multiple cores allow a processor to execute several programs or tasks simultaneously.

Example CPU configuration¶

Component	Example
CPU package	1
Physical cores	4
Logical cores	8 (with SMT)

Multi-core CPU visualization¶

flowchart LR
    CPU --> Core1
    CPU --> Core2
    CPU --> Core3
    CPU --> Core4

Each core can independently fetch and execute instructions.

2. Processes¶

A process is an isolated execution environment created by the operating system.

Each process has:

its own virtual address space
its own heap and stack
its own system resources

Processes are isolated from one another for security and stability.

Process structure¶

flowchart TD
    Process --> Code
    Process --> Heap
    Process --> Stack

Because processes have separate address spaces, they cannot directly access each other’s memory.

Communication between processes typically occurs through inter-process communication (IPC) mechanisms such as pipes, sockets, or shared memory.

3. Threads¶

A thread is a lightweight execution unit within a process.

Threads share the process memory but maintain their own execution state.

Each thread has:

its own stack
its own program counter
its own registers

However, threads share:

the process heap
global variables
open files

Thread structure¶

flowchart LR
    Process --> Thread1
    Process --> Thread2
    Process --> Thread3

    Thread1 --> Stack1
    Thread2 --> Stack2
    Thread3 --> Stack3

Because threads share memory, communication between them is faster than between processes.

However, shared memory also introduces risks such as race conditions.

4. Simultaneous Multithreading (SMT)¶

Many modern CPUs support Simultaneous Multithreading (SMT).

Intel refers to this technology as Hyperthreading.

SMT allows one physical core to support multiple logical threads.

How SMT works¶

A single core maintains multiple register states so that it can switch between threads when one stalls.

For example, if one thread is waiting for memory, another thread can use the core’s execution units.

SMT visualization¶

flowchart LR
    Core --> ThreadA
    Core --> ThreadB

SMT improves utilization of CPU resources but does not double performance.

Typical gains range from 10% to 30%, depending on workload.

5. Concurrency vs Parallelism¶

Two important concepts often confused in programming are concurrency and parallelism.

Concurrency¶

Concurrency refers to a program structure in which multiple tasks can make progress independently.

Tasks may be interleaved on a single CPU core.

Example:

Task A
Task B
Task A
Task B

Parallelism¶

Parallelism refers to tasks executing simultaneously on different CPU cores.

Example:

Core 1 → Task A
Core 2 → Task B

Visualization¶

flowchart LR
    Concurrency --> Interleaving
    Parallelism --> SimultaneousExecution

Concurrency is necessary to exploit parallel hardware, but concurrency alone does not guarantee parallel execution.

6. The Global Interpreter Lock (GIL)¶

One important constraint in CPython is the Global Interpreter Lock (GIL).

The GIL ensures that only one thread executes Python bytecode at a time within a single process.

Why the GIL exists¶

The GIL simplifies memory management in CPython by protecting shared data structures such as reference counts.

However, it also prevents Python threads from achieving true parallelism for CPU-bound tasks.

Implication¶

Python threads cannot parallelize CPU-bound computations.

Example:

for i in range(10_000_000):
    total += i

Running this loop in multiple Python threads will not use multiple CPU cores.

When the GIL is released¶

The GIL is temporarily released during:

blocking I/O operations
system calls
execution of many C extensions (NumPy, SciPy, BLAS)

This allows threads to run concurrently during I/O waits.

7. Amdahl’s Law¶

Even with many CPU cores, the speedup of a program is limited by the portion of the code that cannot be parallelized.

This relationship is described by Amdahl’s Law.

[ S(n) = \frac{1}{s + \frac{1-s}{n}} ]

Where:

(S(n)) = speedup using (n) cores
(s) = fraction of execution time that is serial
(n) = number of cores

Example¶

If 10% of a program is serial:

s = 0.10

Even with infinite cores:

[ S_{max} = \frac{1}{0.10} = 10 ]

Thus the maximum speedup is 10×, regardless of hardware.

Speedup visualization¶

flowchart LR
    SerialPart --> LimitsSpeedup
    ParallelPart --> UsesCores

Amdahl’s Law highlights the importance of minimizing serial sections of code.

8. Choosing the Right Parallelism Strategy¶

Different workloads benefit from different parallel programming techniques.

CPU-bound workloads¶

Use multiprocessing.

Each process runs on a separate CPU core and bypasses the GIL.

I/O-bound workloads¶

Use threading or asyncio.

Threads can overlap I/O waits even with the GIL.

Numerical computation¶

Use NumPy, SciPy, or BLAS libraries.

These libraries release the GIL and often use parallel native code internally.

Strategy summary¶

Workload	Recommended Tool
CPU-bound Python	multiprocessing
I/O-bound	threading / asyncio
numerical workloads	NumPy / SciPy

9. Example: Counting CPU Cores¶

import os

print(os.cpu_count())

This returns the number of logical cores available to the operating system.

For example:

may correspond to a 4-core CPU with SMT.

10. Example: Parallel Processing with Multiprocessing¶

import multiprocessing

def compute(x):
    return x * x

if __name__ == "__main__":
    with multiprocessing.Pool(4) as pool:
        results = pool.map(compute, range(100))

print(results[:5])

Each worker process runs independently on a separate CPU core.

11. Example: Threading for I/O¶

from concurrent.futures import ThreadPoolExecutor
import urllib.request

def fetch(url):
    with urllib.request.urlopen(url) as resp:
        return len(resp.read())

urls = ["https://example.com"] * 4

with ThreadPoolExecutor(max_workers=4) as executor:
    sizes = list(executor.map(fetch, urls))

print(sizes)

Here threads overlap network latency.

12. Summary¶

Concept	Explanation
Core	independent CPU execution unit
Thread	lightweight execution context within a process
Process	isolated execution environment
SMT	multiple logical threads per core
Concurrency	tasks make progress independently
Parallelism	tasks execute simultaneously
GIL	allows only one Python thread to execute bytecode
Amdahl’s Law	limits achievable parallel speedup

Modern CPUs contain many cores capable of executing multiple threads simultaneously.

However, achieving high performance requires understanding:

how operating systems schedule threads
how Python interacts with hardware
how parallel algorithms scale

By structuring programs to minimize serial work and using appropriate parallel tools, developers can effectively utilize modern multi-core processors.