Skip to content

CPU Cores and Threads

Modern processors contain multiple independent execution units called cores. Operating systems schedule program execution onto these cores using threads.

Understanding how cores, threads, and processes interact is essential for writing efficient concurrent and parallel programs, especially in Python.

Many performance issues in Python arise not from CPU speed but from how work is distributed across cores and how the Python runtime interacts with hardware.


1. CPU Cores

A CPU core is an independent hardware execution unit capable of running its own instruction stream.

Each core contains:

  • arithmetic and logical execution units
  • registers
  • instruction pipelines
  • private caches (L1 and often L2)

Multiple cores allow a processor to execute several programs or tasks simultaneously.


Example CPU configuration

Component Example
CPU package 1
Physical cores 4
Logical cores 8 (with SMT)

Multi-core CPU visualization

flowchart LR
    CPU --> Core1
    CPU --> Core2
    CPU --> Core3
    CPU --> Core4

Each core can independently fetch and execute instructions.


2. Processes

A process is an isolated execution environment created by the operating system.

Each process has:

  • its own virtual address space
  • its own heap and stack
  • its own system resources

Processes are isolated from one another for security and stability.


Process structure

flowchart TD
    Process --> Code
    Process --> Heap
    Process --> Stack

Because processes have separate address spaces, they cannot directly access each other’s memory.

Communication between processes typically occurs through inter-process communication (IPC) mechanisms such as pipes, sockets, or shared memory.


3. Threads

A thread is a lightweight execution unit within a process.

Threads share the process memory but maintain their own execution state.

Each thread has:

  • its own stack
  • its own program counter
  • its own registers

However, threads share:

  • the process heap
  • global variables
  • open files

Thread structure

flowchart LR
    Process --> Thread1
    Process --> Thread2
    Process --> Thread3

    Thread1 --> Stack1
    Thread2 --> Stack2
    Thread3 --> Stack3

Because threads share memory, communication between them is faster than between processes.

However, shared memory also introduces risks such as race conditions.


4. Simultaneous Multithreading (SMT)

Many modern CPUs support Simultaneous Multithreading (SMT).

Intel refers to this technology as Hyperthreading.

SMT allows one physical core to support multiple logical threads.


How SMT works

A single core maintains multiple register states so that it can switch between threads when one stalls.

For example, if one thread is waiting for memory, another thread can use the core’s execution units.


SMT visualization

flowchart LR
    Core --> ThreadA
    Core --> ThreadB

SMT improves utilization of CPU resources but does not double performance.

Typical gains range from 10% to 30%, depending on workload.


5. Concurrency vs Parallelism

Two important concepts often confused in programming are concurrency and parallelism.


Concurrency

Concurrency refers to a program structure in which multiple tasks can make progress independently.

Tasks may be interleaved on a single CPU core.

Example:

Task A
Task B
Task A
Task B

Parallelism

Parallelism refers to tasks executing simultaneously on different CPU cores.

Example:

Core 1 → Task A
Core 2 → Task B

Visualization

flowchart LR
    Concurrency --> Interleaving
    Parallelism --> SimultaneousExecution

Concurrency is necessary to exploit parallel hardware, but concurrency alone does not guarantee parallel execution.


6. The Global Interpreter Lock (GIL)

One important constraint in CPython is the Global Interpreter Lock (GIL).

The GIL ensures that only one thread executes Python bytecode at a time within a single process.


Why the GIL exists

The GIL simplifies memory management in CPython by protecting shared data structures such as reference counts.

However, it also prevents Python threads from achieving true parallelism for CPU-bound tasks.


Implication

Python threads cannot parallelize CPU-bound computations.

Example:

for i in range(10_000_000):
    total += i

Running this loop in multiple Python threads will not use multiple CPU cores.


When the GIL is released

The GIL is temporarily released during:

  • blocking I/O operations
  • system calls
  • execution of many C extensions (NumPy, SciPy, BLAS)

This allows threads to run concurrently during I/O waits.


7. Amdahl’s Law

Even with many CPU cores, the speedup of a program is limited by the portion of the code that cannot be parallelized.

This relationship is described by Amdahl’s Law.

[ S(n) = \frac{1}{s + \frac{1-s}{n}} ]

Where:

  • (S(n)) = speedup using (n) cores
  • (s) = fraction of execution time that is serial
  • (n) = number of cores

Example

If 10% of a program is serial:

s = 0.10

Even with infinite cores:

[ S_{max} = \frac{1}{0.10} = 10 ]

Thus the maximum speedup is 10×, regardless of hardware.


Speedup visualization

flowchart LR
    SerialPart --> LimitsSpeedup
    ParallelPart --> UsesCores

Amdahl’s Law highlights the importance of minimizing serial sections of code.


8. Choosing the Right Parallelism Strategy

Different workloads benefit from different parallel programming techniques.


CPU-bound workloads

Use multiprocessing.

Each process runs on a separate CPU core and bypasses the GIL.


I/O-bound workloads

Use threading or asyncio.

Threads can overlap I/O waits even with the GIL.


Numerical computation

Use NumPy, SciPy, or BLAS libraries.

These libraries release the GIL and often use parallel native code internally.


Strategy summary

Workload Recommended Tool
CPU-bound Python multiprocessing
I/O-bound threading / asyncio
numerical workloads NumPy / SciPy

9. Example: Counting CPU Cores

import os

print(os.cpu_count())

This returns the number of logical cores available to the operating system.

For example:

8

may correspond to a 4-core CPU with SMT.


10. Example: Parallel Processing with Multiprocessing

import multiprocessing

def compute(x):
    return x * x

if __name__ == "__main__":
    with multiprocessing.Pool(4) as pool:
        results = pool.map(compute, range(100))

print(results[:5])

Each worker process runs independently on a separate CPU core.


11. Example: Threading for I/O

from concurrent.futures import ThreadPoolExecutor
import urllib.request

def fetch(url):
    with urllib.request.urlopen(url) as resp:
        return len(resp.read())

urls = ["https://example.com"] * 4

with ThreadPoolExecutor(max_workers=4) as executor:
    sizes = list(executor.map(fetch, urls))

print(sizes)

Here threads overlap network latency.


12. Summary

Concept Explanation
Core independent CPU execution unit
Thread lightweight execution context within a process
Process isolated execution environment
SMT multiple logical threads per core
Concurrency tasks make progress independently
Parallelism tasks execute simultaneously
GIL allows only one Python thread to execute bytecode
Amdahl’s Law limits achievable parallel speedup

Modern CPUs contain many cores capable of executing multiple threads simultaneously.

However, achieving high performance requires understanding:

  • how operating systems schedule threads
  • how Python interacts with hardware
  • how parallel algorithms scale

By structuring programs to minimize serial work and using appropriate parallel tools, developers can effectively utilize modern multi-core processors.