defaultdict¶

A defaultdict is a dict subclass that automatically creates missing keys using a factory function.

Mental Model

A defaultdict is a dictionary that never says "key not found." When you access a missing key, it silently creates a default value (an empty list, zero, a new set — whatever factory you specified) and inserts it before returning. This eliminates the "check-then-initialize" pattern that clutters grouping and counting loops.

The Problem¶

Regular dicts raise KeyError for missing keys:

```python

Regular dict: must check or use setdefault¶

groups = {} for name, category in data: if category not in groups: groups[category] = [] groups[category].append(name)

Or using setdefault (verbose)¶

groups = {} for name, category in data: groups.setdefault(category, []).append(name) ```

The Solution¶

```python from collections import defaultdict

groups = defaultdict(list) for name, category in data: groups[category].append(name) # Auto-creates empty list! ```

How It Works¶

```python from collections import defaultdict

d = defaultdict(list) # Factory function: list

Accessing missing key:¶

1. Calls list() to create []¶

2. Assigns d['new_key'] = []¶

3. Returns the empty list¶

d['fruits'].append('apple') print(d) # defaultdict(, {'fruits': ['apple']}) ```

Common Factory Functions¶

`list` — Grouping¶

```python from collections import defaultdict

data = [('apple', 'fruit'), ('carrot', 'vegetable'), ('banana', 'fruit'), ('broccoli', 'vegetable')]

groups = defaultdict(list) for item, category in data: groups[category].append(item)

print(dict(groups))

{'fruit': ['apple', 'banana'], 'vegetable': ['carrot', 'broccoli']}¶

```

`int` — Counting¶

```python counts = defaultdict(int) # int() returns 0

for char in 'mississippi': counts[char] += 1

print(dict(counts))

{'m': 1, 'i': 4, 's': 4, 'p': 2}¶

```

`set` — Unique Grouping¶

```python tags = defaultdict(set)

data = [('doc1', 'python'), ('doc1', 'tutorial'), ('doc2', 'python'), ('doc1', 'python')] # duplicate

for doc, tag in data: tags[doc].add(tag)

print(dict(tags))

{'doc1': {'python', 'tutorial'}, 'doc2': {'python'}}¶

```

`lambda` — Custom Default¶

```python

Default value 'N/A'¶

d = defaultdict(lambda: 'N/A') d['name'] = 'Alice' print(d['name']) # Alice print(d['age']) # N/A

Default value 0.0¶

prices = defaultdict(lambda: 0.0) prices['apple'] = 1.50 print(prices['banana']) # 0.0 ```

Nested defaultdict¶

Two Levels¶

```python

year -> month -> count¶

stats = defaultdict(lambda: defaultdict(int))

stats['2024']['Jan'] += 100 stats['2024']['Feb'] += 200 stats['2025']['Jan'] += 150

print(stats['2024']['Jan']) # 100 print(stats['2024']['Mar']) # 0 (auto-created) ```

Three Levels¶

```python

country -> city -> category -> count¶

data = defaultdict(lambda: defaultdict(lambda: defaultdict(int)))

data['USA']['NYC']['sales'] += 1000 data['USA']['NYC']['returns'] += 50 data['USA']['LA']['sales'] += 800 ```

Converting to Regular Dict¶

```python d = defaultdict(list) d['a'].append(1) d['b'].append(2)

Convert to regular dict¶

regular = dict(d) print(regular) # {'a': [1], 'b': [2]}

Nested conversion¶

import json print(json.dumps(dict(d))) # Works after conversion ```

defaultdict vs setdefault¶

Aspect	`defaultdict`	`setdefault`
Syntax	`d[key].append(x)`	`d.setdefault(key, []).append(x)`
Readability	✅ Clean	❌ Verbose
Creates on read	✅ Yes	❌ No
Regular dict	❌ No	✅ Yes

```python

defaultdict: creates key even on read¶

d = defaultdict(list) _ = d['key'] # Creates empty list print('key' in d) # True

setdefault: only creates on explicit call¶

d = {} _ = d.get('key', []) # Does NOT create print('key' in d) # False ```

Practical Examples¶

Word Index¶

```python from collections import defaultdict

text = "the quick brown fox jumps over the lazy dog" index = defaultdict(list)

for pos, word in enumerate(text.split()): index[word].append(pos)

print(dict(index))

{'the': [0, 6], 'quick': [1], 'brown': [2], ...}¶

```

Graph Adjacency List¶

```python graph = defaultdict(list)

edges = [('A', 'B'), ('A', 'C'), ('B', 'C'), ('C', 'D')] for src, dst in edges: graph[src].append(dst) graph[dst].append(src) # Undirected

print(dict(graph))

{'A': ['B', 'C'], 'B': ['A', 'C'], 'C': ['A', 'B', 'D'], 'D': ['C']}¶

```

Frequency Table¶

```python from collections import defaultdict

scores = [85, 90, 85, 78, 90, 90, 85] freq = defaultdict(int)

for score in scores: freq[score] += 1

print(dict(freq)) # {85: 3, 90: 3, 78: 1} ```

Key Takeaways¶

defaultdict(factory) auto-creates missing keys
Common factories: list, int, set, lambda
Cleaner than setdefault() for grouping/counting
Use dict(d) to convert to regular dict
Accessing missing key creates it (unlike regular dict)

Exercises¶

Exercise 1. Write a function invert_dict that takes a regular dictionary and returns a defaultdict(list) where each value from the original dict becomes a key, and each key from the original dict is appended to the corresponding list. For example, invert_dict({"a": 1, "b": 2, "c": 1}) should return {1: ["a", "c"], 2: ["b"]}.

Solution to Exercise 1

```python from collections import defaultdict

def invert_dict(d): result = defaultdict(list) for key, value in d.items(): result[value].append(key) return result

Test¶

original = {"a": 1, "b": 2, "c": 1} inverted = invert_dict(original) print(dict(inverted))

{1: ['a', 'c'], 2: ['b']}¶

```

Exercise 2. Write a function nested_group that takes a list of (department, team, employee) tuples and returns a nested defaultdict structure where you can access employees as result[department][team] (a list). For example, given [("eng", "backend", "Alice"), ("eng", "backend", "Bob"), ("eng", "frontend", "Carol")], result["eng"]["backend"] should return ["Alice", "Bob"].

Solution to Exercise 2

```python from collections import defaultdict

def nested_group(records): result = defaultdict(lambda: defaultdict(list)) for department, team, employee in records: result[department][team].append(employee) return result

Test¶

data = [ ("eng", "backend", "Alice"), ("eng", "backend", "Bob"), ("eng", "frontend", "Carol"), ("sales", "west", "Dave"), ] groups = nested_group(data) print(groups["eng"]["backend"]) # ['Alice', 'Bob'] print(groups["eng"]["frontend"]) # ['Carol'] print(groups["sales"]["west"]) # ['Dave'] ```

Exercise 3. Write a function word_positions that takes a sentence string and returns a defaultdict(list) mapping each lowercase word to a list of its 0-based positions in the sentence. For example, word_positions("the cat and the dog") should return {"the": [0, 3], "cat": [1], "and": [2], "dog": [4]}.

Solution to Exercise 3

```python from collections import defaultdict

def word_positions(sentence): result = defaultdict(list) for pos, word in enumerate(sentence.lower().split()): result[word].append(pos) return result

Test¶

positions = word_positions("the cat and the dog") print(dict(positions))

{'the': [0, 3], 'cat': [1], 'and': [2], 'dog': [4]}¶

```

defaultdict¶

The Problem¶

Regular dict: must check or use setdefault¶

Or using setdefault (verbose)¶

The Solution¶

How It Works¶

Accessing missing key:¶

1. Calls list() to create []¶

2. Assigns d['new_key'] = []¶

3. Returns the empty list¶

Common Factory Functions¶

list — Grouping¶

{'fruit': ['apple', 'banana'], 'vegetable': ['carrot', 'broccoli']}¶

int — Counting¶

{'m': 1, 'i': 4, 's': 4, 'p': 2}¶

set — Unique Grouping¶

{'doc1': {'python', 'tutorial'}, 'doc2': {'python'}}¶

lambda — Custom Default¶

Default value 'N/A'¶

Default value 0.0¶

Nested defaultdict¶

Two Levels¶

year -> month -> count¶

Three Levels¶

country -> city -> category -> count¶

Converting to Regular Dict¶

Convert to regular dict¶

Nested conversion¶

defaultdict vs setdefault¶

defaultdict: creates key even on read¶

setdefault: only creates on explicit call¶

Practical Examples¶

Word Index¶

{'the': [0, 6], 'quick': [1], 'brown': [2], ...}¶

Graph Adjacency List¶

{'A': ['B', 'C'], 'B': ['A', 'C'], 'C': ['A', 'B', 'D'], 'D': ['C']}¶

Frequency Table¶

Key Takeaways¶

Exercises¶

Test¶

{1: ['a', 'c'], 2: ['b']}¶

Test¶

Test¶

{'the': [0, 3], 'cat': [1], 'and': [2], 'dog': [4]}¶

`list` — Grouping¶

`int` — Counting¶

`set` — Unique Grouping¶

`lambda` — Custom Default¶