Skip to content

System Bus

What is a Bus?

A bus is a communication pathway that transfers data between computer components. Think of it as a highway system connecting different parts of the computer.

┌─────────────────────────────────────────────────────────────┐
│                        System Bus                           │
│  ═══════════════════════════════════════════════════════   │
│       │           │           │           │                 │
│   ┌───┴───┐   ┌───┴───┐   ┌───┴───┐   ┌───┴───┐           │
│   │  CPU  │   │  RAM  │   │  GPU  │   │  I/O  │           │
│   └───────┘   └───────┘   └───────┘   └───────┘           │
└─────────────────────────────────────────────────────────────┘

Bus Components

A bus consists of three types of lines:

1. Address Bus

Specifies where to read/write data:

Address Bus (unidirectional: CPU → Memory)
┌─────┐                              ┌─────────┐
│ CPU │ ══════ Address Lines ══════▶ │ Memory  │
└─────┘    (e.g., 0x7FFF0000)        └─────────┘

Width determines addressable memory:
  32-bit address bus → 2³² = 4 GB addressable
  64-bit address bus → 2⁶⁴ = 16 EB addressable (theoretical)

2. Data Bus

Transfers actual data between components:

Data Bus (bidirectional)
┌─────┐                              ┌─────────┐
│ CPU │ ◀══════ Data Lines ═══════▶ │ Memory  │
└─────┘     (e.g., 64 bits wide)    └─────────┘

Width determines transfer size per cycle:
  32-bit data bus → 4 bytes per transfer
  64-bit data bus → 8 bytes per transfer

3. Control Bus

Carries command signals:

Control Bus (various directions)
┌─────┐                              ┌─────────┐
│ CPU │ ◀═══ Control Signals ══════▶ │ Memory  │
└─────┘                              └─────────┘

Signals include:
  - Read/Write
  - Clock
  - Interrupt
  - Bus Request/Grant

Bus Operation

Read Operation

CPU wants to read from address 0x1000:

1. CPU places 0x1000 on Address Bus      ────▶
2. CPU sets Read signal on Control Bus   ────▶
3. Memory reads data at 0x1000
4. Memory places data on Data Bus        ◀────
5. CPU reads data from Data Bus

Write Operation

CPU wants to write 42 to address 0x1000:

1. CPU places 0x1000 on Address Bus      ────▶
2. CPU places 42 on Data Bus             ────▶
3. CPU sets Write signal on Control Bus  ────▶
4. Memory stores 42 at address 0x1000

Bus Hierarchy

Modern computers have multiple buses at different speeds:

┌─────────────────────────────────────────────────────────────┐
│                          CPU                                │
│  ┌─────────────────────────────────────────────────────┐   │
│  │                Internal Bus (fastest)                │   │
│  │         Registers ←→ ALU ←→ Cache                   │   │
│  └─────────────────────────────────────────────────────┘   │
└────────────────────────────┬────────────────────────────────┘
                             │
              ┌──────────────┴──────────────┐
              │     Front-Side Bus / QPI     │  (~25 GB/s)
              │     (CPU ↔ Memory Controller)│
              └──────────────┬──────────────┘
                             │
              ┌──────────────┴──────────────┐
              │        Memory Bus            │  (~50 GB/s)
              │       (DDR4/DDR5)            │
              └──────────────┬──────────────┘
                             │
                         ┌───┴───┐
                         │  RAM  │
                         └───────┘

              ┌──────────────────────────────┐
              │         PCIe Bus             │  (~32 GB/s per x16)
              │   (CPU ↔ GPU, NVMe, etc.)    │
              └──────────────────────────────┘

Bus Speed and Bandwidth

Calculating Bus Bandwidth

Bandwidth = Bus Width × Clock Speed × Transfers per Clock

Example DDR4-3200:
  Width: 64 bits = 8 bytes
  Speed: 1600 MHz (base clock)
  Transfers: 2 per clock (DDR = Double Data Rate)

  Bandwidth = 8 × 1600 × 2 = 25,600 MB/s ≈ 25 GB/s per channel

Common Bus Bandwidths

Bus Type Bandwidth Use
CPU Internal ~1 TB/s Register ↔ ALU
L1 Cache ~500 GB/s L1 ↔ CPU
L3 Cache ~200 GB/s L3 ↔ L2
Memory (DDR4) ~25 GB/s RAM ↔ CPU
PCIe 4.0 x16 ~32 GB/s GPU ↔ CPU
SATA III ~600 MB/s SSD ↔ CPU
USB 3.0 ~625 MB/s Peripherals

PCIe: Modern Expansion Bus

PCIe (Peripheral Component Interconnect Express) is the primary expansion bus:

PCIe Lane Configuration

x1:  [──────]           ~2 GB/s (PCIe 4.0)
x4:  [──────────────]   ~8 GB/s
x8:  [──────────────────────────]  ~16 GB/s
x16: [──────────────────────────────────────────]  ~32 GB/s

PCIe Generations

Generation Per-Lane Bandwidth x16 Total
PCIe 3.0 ~1 GB/s ~16 GB/s
PCIe 4.0 ~2 GB/s ~32 GB/s
PCIe 5.0 ~4 GB/s ~64 GB/s
PCIe 6.0 ~8 GB/s ~128 GB/s

Bus Contention

When multiple components need the bus simultaneously:

Problem: Bus Contention

Time →
Device A: [Request]────[Wait]────[Wait]────[Transfer]
Device B: ─────────[Request]────[Wait]────[Wait]────[Transfer]
Device C: ─────────────────[Request]────[Wait]────[Wait]────[Transfer]

Only one device can use the bus at a time!

Bus Arbitration

A bus arbiter decides who gets access:

Arbitration Methods:

1. Priority-based: Higher priority devices go first
2. Round-robin: Fair rotation among devices
3. First-come-first-served: Queue-based

Python Perspective

Why Bus Speed Matters

import numpy as np
import time

# Memory bandwidth limits computation
def memory_bound_operation():
    # 1 GB array
    arr = np.random.rand(125_000_000)  # 1 GB of float64

    start = time.perf_counter()
    # Simple operation - limited by memory bus
    result = np.sum(arr)
    elapsed = time.perf_counter() - start

    bandwidth = arr.nbytes / elapsed / 1e9
    print(f"Achieved bandwidth: {bandwidth:.1f} GB/s")
    # Typically ~30-40 GB/s, limited by memory bus

memory_bound_operation()

PCIe and GPU Operations

import torch

# GPU data transfer goes over PCIe
data = torch.randn(1000, 1000)

# CPU → GPU (over PCIe)
start = time.perf_counter()
data_gpu = data.to('cuda')
torch.cuda.synchronize()
transfer_time = time.perf_counter() - start

bytes_transferred = data.numel() * 4  # float32
bandwidth = bytes_transferred / transfer_time / 1e9
print(f"CPU→GPU bandwidth: {bandwidth:.1f} GB/s")
# Typically ~12-15 GB/s (PCIe limited)

Summary

Component Function Direction
Address Bus Specifies memory location CPU → Memory
Data Bus Transfers actual data Bidirectional
Control Bus Command signals Various

Key points:

  • Bus bandwidth often limits system performance
  • Multiple bus levels with different speeds
  • PCIe connects high-speed devices (GPU, NVMe)
  • Bus contention can create bottlenecks
  • Memory bandwidth (~50 GB/s) often limits Python/NumPy performance
  • GPU transfer bandwidth (~15 GB/s) limits data movement