Skip to content

dict Internals (Hash Tables)

Python dictionaries are implemented as hash tables, using hash functions to map keys to values with O(1) average lookup time. Understanding dict internals explains performance characteristics and behavioral quirks.

Mental Model

A dict is an array of slots indexed by hash values. To look up a key, Python computes its hash, jumps straight to the corresponding slot, and checks for a match. This is why lookups are O(1) on average and why keys must be hashable -- the hash is the address in the internal array.


Hash Functions

How Hashing Works

```python

Hash of objects

print(f"Hash of 'key': {hash('key')}") print(f"Hash of 42: {hash(42)}") print(f"Hash of (1,2): {hash((1, 2))}")

Lists can't be hashed (mutable)

try: hash([1, 2, 3]) except TypeError as e: print(f"Error: {e}") ```

Output: Hash of 'key': 4567822475321 Hash of 42: 42 Hash of (1,2): 3713081631934410656 Error: unhashable type: 'list'

Hash Collisions

```python

Multiple keys can hash to same bucket

d = {} d['a'] = 1 d['b'] = 2 d['c'] = 3

print(f"Dict size: {d}") ```

Output: Dict size: {'a': 1, 'b': 2, 'c': 3}

Dictionary Growth

Dynamic Resizing

```python d = {} print(f"Initial capacity hint")

for i in range(10): d[f'key_{i}'] = i

print(f"Keys: {len(d)}") print(f"Dict: {d}") ```

Output: Initial capacity hint Keys: 10 Dict: {'key_0': 0, 'key_1': 1, 'key_2': 2, 'key_3': 3, 'key_4': 4, 'key_5': 5, 'key_6': 6, 'key_7': 7, 'key_8': 8, 'key_9': 9}

Performance Characteristics

O(1) Lookup

```python import time

d = {i: i**2 for i in range(100000)}

Fast lookup

start = time.time() for _ in range(1000): value = d[50000] lookup_time = time.time() - start

print(f"Lookup time: {lookup_time:.6f}s") ```

Output: Lookup time: 0.000045s

Key Behavior

Keys Must Be Hashable

```python d = {(1, 2): "tuple_key"} print(d[(1, 2)])

This fails - lists aren't hashable

try: d[[1, 2]] = "list_key" except TypeError: print("Lists cannot be dict keys") ```

Output: tuple_key Lists cannot be dict keys


Exercises

Exercise 1. Create a dictionary with keys 1, 1.0, and True. How many entries does the dictionary have? Explain why.

Solution to Exercise 1
```python
d = {1: "int", 1.0: "float", True: "bool"}
print(d)       # {1: 'bool'}
print(len(d))  # 1
```

The dictionary has only 1 entry because 1 == 1.0 == True and hash(1) == hash(1.0) == hash(True). Python treats them as the same key, and each subsequent assignment overwrites the value.


Exercise 2. Write a function find_hash_collision() that finds two different strings (among the first 100000 integers converted to strings) that have the same hash modulo 1000. Print both strings and their hash values.

Solution to Exercise 2
```python
def find_hash_collision():
    seen = {}
    for i in range(100000):
        s = str(i)
        h = hash(s) % 1000
        if h in seen:
            print(f"'{seen[h]}' and '{s}' both hash to {h}")
            return
        seen[h] = s

find_hash_collision()
```

Hash collisions are common when mapping to a small range. The function finds two strings whose hashes collide modulo 1000.


Exercise 3. Demonstrate that looking up a key in a dictionary is O(1) by timing lookups in dictionaries of size 1000 and 1000000 and showing that the times are approximately equal.

Solution to Exercise 3
```python
import timeit

setup_small = "d = {i: i for i in range(1_000)}"
setup_large = "d = {i: i for i in range(1_000_000)}"

t_small = timeit.timeit("999 in d", setup=setup_small, number=100000)
t_large = timeit.timeit("999999 in d", setup=setup_large, number=100000)

print(f"Small dict: {t_small:.4f}s")
print(f"Large dict: {t_large:.4f}s")
```

Both times should be approximately equal because dictionary lookup is O(1) regardless of size.