Hardware and Interpreted Languages¶

Mental Model

Compiled languages hand the CPU a finished recipe; interpreted languages read and translate the recipe one step at a time while cooking. Python sits in between -- it compiles to bytecode once, then interprets that bytecode. The translation overhead is the price you pay for flexibility, dynamic typing, and rapid development.

The Language Spectrum¶

Programming languages exist on a spectrum from hardware to abstraction:

``` Hardware Distance Spectrum:

Machine Code Assembly C/C++ Java Python │ │ │ │ │ ▼ ▼ ▼ ▼ ▼ ┌─────────────────────────────────────────────────────────────┐ │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░│ └─────────────────────────────────────────────────────────────┘ Close to Hardware Close to Human

Fast execution Fast development Manual memory Automatic memory Hardware-specific Portable Harder to write Easier to write ```

Compiled vs Interpreted¶

Compiled Languages (C, C++, Rust)¶

``` Compilation Process:

Source Code Machine Code ┌─────────────┐ ┌─────────────┐ │ int x = 5; │ │ 10111010... │ │ x = x + 1; │ ──Compiler──▶ │ 01001101... │ │ return x; │ │ 11100010... │ └─────────────┘ └─────────────┘ │ ▼ ┌──────────┐ │ CPU │ │ (direct) │ └──────────┘

Characteristics: ✓ Compiled once, run many times ✓ Direct CPU execution ✓ Optimizations at compile time ✗ Platform-specific binaries ✗ Slower development cycle ```

Interpreted Languages (Python, JavaScript, Ruby)¶

``` Interpretation Process:

Source Code Interpreter CPU ┌─────────────┐ ┌─────────────┐ ┌──────────┐ │ x = 5 │ │ │ │ │ │ x = x + 1 │ ────────▶ │ Interpret │ ────▶ │ Execute │ │ return x │ │ each line │ │ │ └─────────────┘ └─────────────┘ └──────────┘ │ Read → Parse → Execute Read → Parse → Execute Read → Parse → Execute │ Every time!

Characteristics: ✓ No compilation step ✓ Platform independent ✓ Dynamic and flexible ✗ Interpretation overhead ✗ Slower execution ```

Bytecode Compiled (Python, Java)¶

Python actually uses a hybrid approach:

``` Python's Execution Model:

Source (.py) Bytecode (.pyc) Execution ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ x = 5 │ │ LOAD_CONST 5│ │ │ │ x = x + 1 │ ───▶ │ STORE_NAME x│ ───▶ │ PVM │ │ print(x) │ │ LOAD_NAME x │ │ (interpret) │ └─────────────┘ │ LOAD_CONST 1│ └─────────────┘ │ BINARY_ADD │ .py file │ STORE_NAME x│ Python Virtual └─────────────┘ Machine

                 .pyc file (cached)

```

Hardware Interaction Layers¶

Direct Hardware Access (C)¶

```c // C code - directly controls memory int* arr = malloc(1000 * sizeof(int)); // Direct allocation arr[0] = 42; // Direct memory write free(arr); // Manual deallocation

// Compiles to roughly: // MOV eax, 4000 ; Calculate size // CALL malloc ; System call // MOV [eax], 42 ; Write to address // CALL free ; Deallocate ```

Abstracted Hardware Access (Python)¶

```python

Python code - hardware abstracted away¶

arr = [0] * 1000 # Python handles allocation arr[0] = 42 # Python handles the write

Garbage collector handles deallocation¶

Internally involves:¶

- Create list object¶

- Create 1000 integer objects¶

- Store references in list¶

- Reference counting¶

- Type checking at runtime¶

```

The Abstraction Cost¶

What Happens in `x = x + 1`¶

In C: 1 CPU instruction (approximately): ADD [x_address], 1

In Python: ~100+ operations: 1. Look up 'x' in local namespace (dict lookup) 2. Get PyObject* for x 3. Check type of x (is it int? float? custom?) 4. Look up '__add__' method 5. Look up '1' - create new int object 6. Call __add__(x, 1) 7. Inside __add__: - Unbox x to C long - Unbox 1 to C long - Add them - Create new PyObject for result - Set reference count 8. Bind result to name 'x' 9. Decrement old x's reference count 10. Maybe trigger garbage collection

Visualization¶

``` C: x = x + 1 ┌───────────────┐ │ ADD [x], 1 │ ~1 CPU cycle └───────────────┘

Python: x = x + 1 ┌────────────────────────────────────────────────────────────┐ │ LOAD_NAME │ dict lookup, type check │ │ LOAD_CONST │ create int object │ │ BINARY_ADD │ type check, method lookup, unbox, add, │ │ │ create new object, set refcount │ │ STORE_NAME │ dict update, decref old │ └────────────────────────────────────────────────────────────┘ ~100-1000 CPU cycles ```

Why Use Interpreted Languages?¶

Despite the performance cost, interpreted languages dominate:

Development Speed¶

```python

Python: Write and run immediately¶

def analyze(data): return sum(x**2 for x in data) / len(data)

vs C: Write, compile, link, run, debug memory issues...¶

```

Flexibility¶

```python

Dynamic typing - decide at runtime¶

def process(x): if isinstance(x, list): return [i * 2 for i in x] elif isinstance(x, dict): return {k: v * 2 for k, v in x.items()} else: return x * 2 ```

Rich Ecosystem¶

```python

One line to do complex operations¶

import pandas as pd df = pd.read_csv('data.csv').groupby('category').mean() ```

The Best of Both Worlds¶

Modern Python leverages compiled code where it matters:

``` Python Ecosystem Strategy:

┌─────────────────────────────────────────────────────────────┐ │ Python Code │ │ (easy to write) │ │ │ │ data_processing() # Pure Python - slow but flexible │ │ result = np.dot(A, B) # Calls C code - fast! │ │ model.fit(X, y) # Calls C/Fortran - fast! │ │ │ └─────────────────────────────────────────────────────────────┘ │ ┌─────────────┴─────────────┐ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ │ Python Layer │ │ C/Fortran Layer │ │ (orchestrate) │ │ (compute) │ │ ~10% of time │ │ ~90% of time │ └─────────────────┘ └─────────────────┘ ```

Performance Comparison¶

```python import time import numpy as np

Pure Python¶

def python_sum(n): total = 0 for i in range(n): total += i return total

NumPy (calls C)¶

def numpy_sum(n): return np.sum(np.arange(n))

n = 10_000_000

start = time.perf_counter() python_sum(n) python_time = time.perf_counter() - start

start = time.perf_counter() numpy_sum(n) numpy_time = time.perf_counter() - start

print(f"Python: {python_time:.3f}s") print(f"NumPy: {numpy_time:.3f}s") print(f"Speedup: {python_time/numpy_time:.0f}x") ```

Typical output: Python: 0.850s NumPy: 0.012s Speedup: 70x

Summary¶

Aspect	Compiled (C)	Interpreted (Python)
Execution	Direct to CPU	Via interpreter
Speed	Fast	Slower
Development	Slower	Faster
Memory	Manual	Automatic
Typing	Static	Dynamic
Portability	Compile per platform	Run anywhere

Key insight:

Python's strategy is to be the "glue" language—easy to write Python code that orchestrates fast compiled libraries (NumPy, TensorFlow, etc.). This gives you the best of both worlds: productivity AND performance.

Exercises¶

Exercise 1. Explain the difference between compiled and interpreted languages. Where does Python fall on this spectrum?

Solution to Exercise 1

```python

Conceptual solution - see page content for details¶

import sys import platform

print(f"Python version: {sys.version}") print(f"Platform: {platform.platform()}") print(f"Architecture: {platform.machine()}") ```

Exercise 2. Explain what bytecode is in Python. How can you view the bytecode of a function using the dis module?

Solution to Exercise 2

See the main content for the detailed explanation. The key concept involves understanding the hardware-software interaction and how it affects Python performance.

Exercise 3. Write Python code using the dis module to disassemble a simple function and print its bytecode instructions.

Solution to Exercise 3

```python import time

Simple benchmark¶

n = 10_000_000 start = time.perf_counter() total = sum(range(n)) elapsed = time.perf_counter() - start print(f"Sum of {n} integers: {total}") print(f"Time: {elapsed:.4f} seconds") ```

Exercise 4. Explain what JIT (Just-In-Time) compilation is. Name a Python implementation that uses JIT compilation.

Solution to Exercise 4

```python import numpy as np import time

n = 1_000_000

Python loop¶

start = time.perf_counter() result_py = sum(i * i for i in range(n)) time_py = time.perf_counter() - start

NumPy vectorized¶

arr = np.arange(n) start = time.perf_counter() result_np = np.sum(arr * arr) time_np = time.perf_counter() - start

print(f"Python: {time_py:.4f}s, NumPy: {time_np:.4f}s") print(f"Speedup: {time_py / time_np:.1f}x") ```

Hardware and Interpreted Languages¶

The Language Spectrum¶

Compiled vs Interpreted¶

Compiled Languages (C, C++, Rust)¶

Interpreted Languages (Python, JavaScript, Ruby)¶

Bytecode Compiled (Python, Java)¶

Hardware Interaction Layers¶

Direct Hardware Access (C)¶

Abstracted Hardware Access (Python)¶

Python code - hardware abstracted away¶

Garbage collector handles deallocation¶

Internally involves:¶

- Create list object¶

- Create 1000 integer objects¶

- Store references in list¶

- Reference counting¶

- Type checking at runtime¶

The Abstraction Cost¶

What Happens in x = x + 1¶

Visualization¶

Why Use Interpreted Languages?¶

Development Speed¶

Python: Write and run immediately¶

vs C: Write, compile, link, run, debug memory issues...¶

Flexibility¶

Dynamic typing - decide at runtime¶

Rich Ecosystem¶

One line to do complex operations¶

The Best of Both Worlds¶

Performance Comparison¶

Pure Python¶

NumPy (calls C)¶

Summary¶

Exercises¶

Conceptual solution - see page content for details¶

Simple benchmark¶

Python loop¶

NumPy vectorized¶

What Happens in `x = x + 1`¶