Hardware and Interpreted Languages¶
The Language Spectrum¶
Programming languages exist on a spectrum from hardware to abstraction:
Hardware Distance Spectrum:
Machine Code Assembly C/C++ Java Python
│ │ │ │ │
▼ ▼ ▼ ▼ ▼
┌─────────────────────────────────────────────────────────────┐
│░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░│
└─────────────────────────────────────────────────────────────┘
Close to Hardware Close to Human
Fast execution Fast development
Manual memory Automatic memory
Hardware-specific Portable
Harder to write Easier to write
Compiled vs Interpreted¶
Compiled Languages (C, C++, Rust)¶
Compilation Process:
Source Code Machine Code
┌─────────────┐ ┌─────────────┐
│ int x = 5; │ │ 10111010... │
│ x = x + 1; │ ──Compiler──▶ │ 01001101... │
│ return x; │ │ 11100010... │
└─────────────┘ └─────────────┘
│
▼
┌──────────┐
│ CPU │
│ (direct) │
└──────────┘
Characteristics:
✓ Compiled once, run many times
✓ Direct CPU execution
✓ Optimizations at compile time
✗ Platform-specific binaries
✗ Slower development cycle
Interpreted Languages (Python, JavaScript, Ruby)¶
Interpretation Process:
Source Code Interpreter CPU
┌─────────────┐ ┌─────────────┐ ┌──────────┐
│ x = 5 │ │ │ │ │
│ x = x + 1 │ ────────▶ │ Interpret │ ────▶ │ Execute │
│ return x │ │ each line │ │ │
└─────────────┘ └─────────────┘ └──────────┘
│
Read → Parse → Execute
Read → Parse → Execute
Read → Parse → Execute
│
Every time!
Characteristics:
✓ No compilation step
✓ Platform independent
✓ Dynamic and flexible
✗ Interpretation overhead
✗ Slower execution
Bytecode Compiled (Python, Java)¶
Python actually uses a hybrid approach:
Python's Execution Model:
Source (.py) Bytecode (.pyc) Execution
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ x = 5 │ │ LOAD_CONST 5│ │ │
│ x = x + 1 │ ───▶ │ STORE_NAME x│ ───▶ │ PVM │
│ print(x) │ │ LOAD_NAME x │ │ (interpret) │
└─────────────┘ │ LOAD_CONST 1│ └─────────────┘
│ BINARY_ADD │
.py file │ STORE_NAME x│ Python Virtual
└─────────────┘ Machine
.pyc file (cached)
Hardware Interaction Layers¶
Direct Hardware Access (C)¶
// C code - directly controls memory
int* arr = malloc(1000 * sizeof(int)); // Direct allocation
arr[0] = 42; // Direct memory write
free(arr); // Manual deallocation
// Compiles to roughly:
// MOV eax, 4000 ; Calculate size
// CALL malloc ; System call
// MOV [eax], 42 ; Write to address
// CALL free ; Deallocate
Abstracted Hardware Access (Python)¶
# Python code - hardware abstracted away
arr = [0] * 1000 # Python handles allocation
arr[0] = 42 # Python handles the write
# Garbage collector handles deallocation
# Internally involves:
# - Create list object
# - Create 1000 integer objects
# - Store references in list
# - Reference counting
# - Type checking at runtime
The Abstraction Cost¶
What Happens in x = x + 1¶
In C:
1 CPU instruction (approximately):
ADD [x_address], 1
In Python:
~100+ operations:
1. Look up 'x' in local namespace (dict lookup)
2. Get PyObject* for x
3. Check type of x (is it int? float? custom?)
4. Look up '__add__' method
5. Look up '1' - create new int object
6. Call __add__(x, 1)
7. Inside __add__:
- Unbox x to C long
- Unbox 1 to C long
- Add them
- Create new PyObject for result
- Set reference count
8. Bind result to name 'x'
9. Decrement old x's reference count
10. Maybe trigger garbage collection
Visualization¶
C: x = x + 1
┌───────────────┐
│ ADD [x], 1 │ ~1 CPU cycle
└───────────────┘
Python: x = x + 1
┌────────────────────────────────────────────────────────────┐
│ LOAD_NAME │ dict lookup, type check │
│ LOAD_CONST │ create int object │
│ BINARY_ADD │ type check, method lookup, unbox, add, │
│ │ create new object, set refcount │
│ STORE_NAME │ dict update, decref old │
└────────────────────────────────────────────────────────────┘
~100-1000 CPU cycles
Why Use Interpreted Languages?¶
Despite the performance cost, interpreted languages dominate:
Development Speed¶
# Python: Write and run immediately
def analyze(data):
return sum(x**2 for x in data) / len(data)
# vs C: Write, compile, link, run, debug memory issues...
Flexibility¶
# Dynamic typing - decide at runtime
def process(x):
if isinstance(x, list):
return [i * 2 for i in x]
elif isinstance(x, dict):
return {k: v * 2 for k, v in x.items()}
else:
return x * 2
Rich Ecosystem¶
# One line to do complex operations
import pandas as pd
df = pd.read_csv('data.csv').groupby('category').mean()
The Best of Both Worlds¶
Modern Python leverages compiled code where it matters:
Python Ecosystem Strategy:
┌─────────────────────────────────────────────────────────────┐
│ Python Code │
│ (easy to write) │
│ │
│ data_processing() # Pure Python - slow but flexible │
│ result = np.dot(A, B) # Calls C code - fast! │
│ model.fit(X, y) # Calls C/Fortran - fast! │
│ │
└─────────────────────────────────────────────────────────────┘
│
┌─────────────┴─────────────┐
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ Python Layer │ │ C/Fortran Layer │
│ (orchestrate) │ │ (compute) │
│ ~10% of time │ │ ~90% of time │
└─────────────────┘ └─────────────────┘
Performance Comparison¶
import time
import numpy as np
# Pure Python
def python_sum(n):
total = 0
for i in range(n):
total += i
return total
# NumPy (calls C)
def numpy_sum(n):
return np.sum(np.arange(n))
n = 10_000_000
start = time.perf_counter()
python_sum(n)
python_time = time.perf_counter() - start
start = time.perf_counter()
numpy_sum(n)
numpy_time = time.perf_counter() - start
print(f"Python: {python_time:.3f}s")
print(f"NumPy: {numpy_time:.3f}s")
print(f"Speedup: {python_time/numpy_time:.0f}x")
Typical output:
Python: 0.850s
NumPy: 0.012s
Speedup: 70x
Summary¶
| Aspect | Compiled (C) | Interpreted (Python) |
|---|---|---|
| Execution | Direct to CPU | Via interpreter |
| Speed | Fast | Slower |
| Development | Slower | Faster |
| Memory | Manual | Automatic |
| Typing | Static | Dynamic |
| Portability | Compile per platform | Run anywhere |
Key insight:
Python's strategy is to be the "glue" language—easy to write Python code that orchestrates fast compiled libraries (NumPy, TensorFlow, etc.). This gives you the best of both worlds: productivity AND performance.