Hardware and Interpreted Languages¶
Mental Model
Compiled languages hand the CPU a finished recipe; interpreted languages read and translate the recipe one step at a time while cooking. Python sits in between -- it compiles to bytecode once, then interprets that bytecode. The translation overhead is the price you pay for flexibility, dynamic typing, and rapid development.
The Language Spectrum¶
Programming languages exist on a spectrum from hardware to abstraction:
``` Hardware Distance Spectrum:
Machine Code Assembly C/C++ Java Python │ │ │ │ │ ▼ ▼ ▼ ▼ ▼ ┌─────────────────────────────────────────────────────────────┐ │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░│ └─────────────────────────────────────────────────────────────┘ Close to Hardware Close to Human
Fast execution Fast development Manual memory Automatic memory Hardware-specific Portable Harder to write Easier to write ```
Compiled vs Interpreted¶
Compiled Languages (C, C++, Rust)¶
``` Compilation Process:
Source Code Machine Code ┌─────────────┐ ┌─────────────┐ │ int x = 5; │ │ 10111010... │ │ x = x + 1; │ ──Compiler──▶ │ 01001101... │ │ return x; │ │ 11100010... │ └─────────────┘ └─────────────┘ │ ▼ ┌──────────┐ │ CPU │ │ (direct) │ └──────────┘
Characteristics: ✓ Compiled once, run many times ✓ Direct CPU execution ✓ Optimizations at compile time ✗ Platform-specific binaries ✗ Slower development cycle ```
Interpreted Languages (Python, JavaScript, Ruby)¶
``` Interpretation Process:
Source Code Interpreter CPU ┌─────────────┐ ┌─────────────┐ ┌──────────┐ │ x = 5 │ │ │ │ │ │ x = x + 1 │ ────────▶ │ Interpret │ ────▶ │ Execute │ │ return x │ │ each line │ │ │ └─────────────┘ └─────────────┘ └──────────┘ │ Read → Parse → Execute Read → Parse → Execute Read → Parse → Execute │ Every time!
Characteristics: ✓ No compilation step ✓ Platform independent ✓ Dynamic and flexible ✗ Interpretation overhead ✗ Slower execution ```
Bytecode Compiled (Python, Java)¶
Python actually uses a hybrid approach:
``` Python's Execution Model:
Source (.py) Bytecode (.pyc) Execution ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ x = 5 │ │ LOAD_CONST 5│ │ │ │ x = x + 1 │ ───▶ │ STORE_NAME x│ ───▶ │ PVM │ │ print(x) │ │ LOAD_NAME x │ │ (interpret) │ └─────────────┘ │ LOAD_CONST 1│ └─────────────┘ │ BINARY_ADD │ .py file │ STORE_NAME x│ Python Virtual └─────────────┘ Machine
.pyc file (cached)
```
Hardware Interaction Layers¶
Direct Hardware Access (C)¶
```c // C code - directly controls memory int* arr = malloc(1000 * sizeof(int)); // Direct allocation arr[0] = 42; // Direct memory write free(arr); // Manual deallocation
// Compiles to roughly: // MOV eax, 4000 ; Calculate size // CALL malloc ; System call // MOV [eax], 42 ; Write to address // CALL free ; Deallocate ```
Abstracted Hardware Access (Python)¶
```python
Python code - hardware abstracted away¶
arr = [0] * 1000 # Python handles allocation arr[0] = 42 # Python handles the write
Garbage collector handles deallocation¶
Internally involves:¶
- Create list object¶
- Create 1000 integer objects¶
- Store references in list¶
- Reference counting¶
- Type checking at runtime¶
```
The Abstraction Cost¶
What Happens in x = x + 1¶
In C:
1 CPU instruction (approximately):
ADD [x_address], 1
In Python:
~100+ operations:
1. Look up 'x' in local namespace (dict lookup)
2. Get PyObject* for x
3. Check type of x (is it int? float? custom?)
4. Look up '__add__' method
5. Look up '1' - create new int object
6. Call __add__(x, 1)
7. Inside __add__:
- Unbox x to C long
- Unbox 1 to C long
- Add them
- Create new PyObject for result
- Set reference count
8. Bind result to name 'x'
9. Decrement old x's reference count
10. Maybe trigger garbage collection
Visualization¶
``` C: x = x + 1 ┌───────────────┐ │ ADD [x], 1 │ ~1 CPU cycle └───────────────┘
Python: x = x + 1 ┌────────────────────────────────────────────────────────────┐ │ LOAD_NAME │ dict lookup, type check │ │ LOAD_CONST │ create int object │ │ BINARY_ADD │ type check, method lookup, unbox, add, │ │ │ create new object, set refcount │ │ STORE_NAME │ dict update, decref old │ └────────────────────────────────────────────────────────────┘ ~100-1000 CPU cycles ```
Why Use Interpreted Languages?¶
Despite the performance cost, interpreted languages dominate:
Development Speed¶
```python
Python: Write and run immediately¶
def analyze(data): return sum(x**2 for x in data) / len(data)
vs C: Write, compile, link, run, debug memory issues...¶
```
Flexibility¶
```python
Dynamic typing - decide at runtime¶
def process(x): if isinstance(x, list): return [i * 2 for i in x] elif isinstance(x, dict): return {k: v * 2 for k, v in x.items()} else: return x * 2 ```
Rich Ecosystem¶
```python
One line to do complex operations¶
import pandas as pd df = pd.read_csv('data.csv').groupby('category').mean() ```
The Best of Both Worlds¶
Modern Python leverages compiled code where it matters:
``` Python Ecosystem Strategy:
┌─────────────────────────────────────────────────────────────┐ │ Python Code │ │ (easy to write) │ │ │ │ data_processing() # Pure Python - slow but flexible │ │ result = np.dot(A, B) # Calls C code - fast! │ │ model.fit(X, y) # Calls C/Fortran - fast! │ │ │ └─────────────────────────────────────────────────────────────┘ │ ┌─────────────┴─────────────┐ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ │ Python Layer │ │ C/Fortran Layer │ │ (orchestrate) │ │ (compute) │ │ ~10% of time │ │ ~90% of time │ └─────────────────┘ └─────────────────┘ ```
Performance Comparison¶
```python import time import numpy as np
Pure Python¶
def python_sum(n): total = 0 for i in range(n): total += i return total
NumPy (calls C)¶
def numpy_sum(n): return np.sum(np.arange(n))
n = 10_000_000
start = time.perf_counter() python_sum(n) python_time = time.perf_counter() - start
start = time.perf_counter() numpy_sum(n) numpy_time = time.perf_counter() - start
print(f"Python: {python_time:.3f}s") print(f"NumPy: {numpy_time:.3f}s") print(f"Speedup: {python_time/numpy_time:.0f}x") ```
Typical output:
Python: 0.850s
NumPy: 0.012s
Speedup: 70x
Summary¶
| Aspect | Compiled (C) | Interpreted (Python) |
|---|---|---|
| Execution | Direct to CPU | Via interpreter |
| Speed | Fast | Slower |
| Development | Slower | Faster |
| Memory | Manual | Automatic |
| Typing | Static | Dynamic |
| Portability | Compile per platform | Run anywhere |
Key insight:
Python's strategy is to be the "glue" language—easy to write Python code that orchestrates fast compiled libraries (NumPy, TensorFlow, etc.). This gives you the best of both worlds: productivity AND performance.
Exercises¶
Exercise 1. Explain the difference between compiled and interpreted languages. Where does Python fall on this spectrum?
Solution to Exercise 1
```python
Conceptual solution - see page content for details¶
import sys import platform
print(f"Python version: {sys.version}") print(f"Platform: {platform.platform()}") print(f"Architecture: {platform.machine()}") ```
Exercise 2. Explain what bytecode is in Python. How can you view the bytecode of a function using the dis module?
Solution to Exercise 2
See the main content for the detailed explanation. The key concept involves understanding the hardware-software interaction and how it affects Python performance.
Exercise 3. Write Python code using the dis module to disassemble a simple function and print its bytecode instructions.
Solution to Exercise 3
```python import time
Simple benchmark¶
n = 10_000_000 start = time.perf_counter() total = sum(range(n)) elapsed = time.perf_counter() - start print(f"Sum of {n} integers: {total}") print(f"Time: {elapsed:.4f} seconds") ```
Exercise 4. Explain what JIT (Just-In-Time) compilation is. Name a Python implementation that uses JIT compilation.
Solution to Exercise 4
```python import numpy as np import time
n = 1_000_000
Python loop¶
start = time.perf_counter() result_py = sum(i * i for i in range(n)) time_py = time.perf_counter() - start
NumPy vectorized¶
arr = np.arange(n) start = time.perf_counter() result_np = np.sum(arr * arr) time_np = time.perf_counter() - start
print(f"Python: {time_py:.4f}s, NumPy: {time_np:.4f}s") print(f"Speedup: {time_py / time_np:.1f}x") ```