defaultdict¶
A defaultdict is a dict subclass that automatically creates missing keys using a factory function.
Mental Model
A defaultdict is a dictionary that never says "key not found." When you access a missing key, it silently creates a default value (an empty list, zero, a new set — whatever factory you specified) and inserts it before returning. This eliminates the "check-then-initialize" pattern that clutters grouping and counting loops.
The Problem¶
Regular dicts raise KeyError for missing keys:
```python
Regular dict: must check or use setdefault¶
groups = {} for name, category in data: if category not in groups: groups[category] = [] groups[category].append(name)
Or using setdefault (verbose)¶
groups = {} for name, category in data: groups.setdefault(category, []).append(name) ```
The Solution¶
```python from collections import defaultdict
groups = defaultdict(list) for name, category in data: groups[category].append(name) # Auto-creates empty list! ```
How It Works¶
```python from collections import defaultdict
d = defaultdict(list) # Factory function: list
Accessing missing key:¶
1. Calls list() to create []¶
2. Assigns d['new_key'] = []¶
3. Returns the empty list¶
d['fruits'].append('apple')
print(d) # defaultdict(
Common Factory Functions¶
list — Grouping¶
```python from collections import defaultdict
data = [('apple', 'fruit'), ('carrot', 'vegetable'), ('banana', 'fruit'), ('broccoli', 'vegetable')]
groups = defaultdict(list) for item, category in data: groups[category].append(item)
print(dict(groups))
{'fruit': ['apple', 'banana'], 'vegetable': ['carrot', 'broccoli']}¶
```
int — Counting¶
```python counts = defaultdict(int) # int() returns 0
for char in 'mississippi': counts[char] += 1
print(dict(counts))
{'m': 1, 'i': 4, 's': 4, 'p': 2}¶
```
set — Unique Grouping¶
```python tags = defaultdict(set)
data = [('doc1', 'python'), ('doc1', 'tutorial'), ('doc2', 'python'), ('doc1', 'python')] # duplicate
for doc, tag in data: tags[doc].add(tag)
print(dict(tags))
{'doc1': {'python', 'tutorial'}, 'doc2': {'python'}}¶
```
lambda — Custom Default¶
```python
Default value 'N/A'¶
d = defaultdict(lambda: 'N/A') d['name'] = 'Alice' print(d['name']) # Alice print(d['age']) # N/A
Default value 0.0¶
prices = defaultdict(lambda: 0.0) prices['apple'] = 1.50 print(prices['banana']) # 0.0 ```
Nested defaultdict¶
Two Levels¶
```python
year -> month -> count¶
stats = defaultdict(lambda: defaultdict(int))
stats['2024']['Jan'] += 100 stats['2024']['Feb'] += 200 stats['2025']['Jan'] += 150
print(stats['2024']['Jan']) # 100 print(stats['2024']['Mar']) # 0 (auto-created) ```
Three Levels¶
```python
country -> city -> category -> count¶
data = defaultdict(lambda: defaultdict(lambda: defaultdict(int)))
data['USA']['NYC']['sales'] += 1000 data['USA']['NYC']['returns'] += 50 data['USA']['LA']['sales'] += 800 ```
Converting to Regular Dict¶
```python d = defaultdict(list) d['a'].append(1) d['b'].append(2)
Convert to regular dict¶
regular = dict(d) print(regular) # {'a': [1], 'b': [2]}
Nested conversion¶
import json print(json.dumps(dict(d))) # Works after conversion ```
defaultdict vs setdefault¶
| Aspect | defaultdict |
setdefault |
|---|---|---|
| Syntax | d[key].append(x) |
d.setdefault(key, []).append(x) |
| Readability | ✅ Clean | ❌ Verbose |
| Creates on read | ✅ Yes | ❌ No |
| Regular dict | ❌ No | ✅ Yes |
```python
defaultdict: creates key even on read¶
d = defaultdict(list) _ = d['key'] # Creates empty list print('key' in d) # True
setdefault: only creates on explicit call¶
d = {} _ = d.get('key', []) # Does NOT create print('key' in d) # False ```
Practical Examples¶
Word Index¶
```python from collections import defaultdict
text = "the quick brown fox jumps over the lazy dog" index = defaultdict(list)
for pos, word in enumerate(text.split()): index[word].append(pos)
print(dict(index))
{'the': [0, 6], 'quick': [1], 'brown': [2], ...}¶
```
Graph Adjacency List¶
```python graph = defaultdict(list)
edges = [('A', 'B'), ('A', 'C'), ('B', 'C'), ('C', 'D')] for src, dst in edges: graph[src].append(dst) graph[dst].append(src) # Undirected
print(dict(graph))
{'A': ['B', 'C'], 'B': ['A', 'C'], 'C': ['A', 'B', 'D'], 'D': ['C']}¶
```
Frequency Table¶
```python from collections import defaultdict
scores = [85, 90, 85, 78, 90, 90, 85] freq = defaultdict(int)
for score in scores: freq[score] += 1
print(dict(freq)) # {85: 3, 90: 3, 78: 1} ```
Key Takeaways¶
defaultdict(factory)auto-creates missing keys- Common factories:
list,int,set,lambda - Cleaner than
setdefault()for grouping/counting - Use
dict(d)to convert to regular dict - Accessing missing key creates it (unlike regular dict)
Exercises¶
Exercise 1.
Write a function invert_dict that takes a regular dictionary and returns a defaultdict(list) where each value from the original dict becomes a key, and each key from the original dict is appended to the corresponding list. For example, invert_dict({"a": 1, "b": 2, "c": 1}) should return {1: ["a", "c"], 2: ["b"]}.
Solution to Exercise 1
```python from collections import defaultdict
def invert_dict(d): result = defaultdict(list) for key, value in d.items(): result[value].append(key) return result
Test¶
original = {"a": 1, "b": 2, "c": 1} inverted = invert_dict(original) print(dict(inverted))
{1: ['a', 'c'], 2: ['b']}¶
```
Exercise 2.
Write a function nested_group that takes a list of (department, team, employee) tuples and returns a nested defaultdict structure where you can access employees as result[department][team] (a list). For example, given [("eng", "backend", "Alice"), ("eng", "backend", "Bob"), ("eng", "frontend", "Carol")], result["eng"]["backend"] should return ["Alice", "Bob"].
Solution to Exercise 2
```python from collections import defaultdict
def nested_group(records): result = defaultdict(lambda: defaultdict(list)) for department, team, employee in records: result[department][team].append(employee) return result
Test¶
data = [ ("eng", "backend", "Alice"), ("eng", "backend", "Bob"), ("eng", "frontend", "Carol"), ("sales", "west", "Dave"), ] groups = nested_group(data) print(groups["eng"]["backend"]) # ['Alice', 'Bob'] print(groups["eng"]["frontend"]) # ['Carol'] print(groups["sales"]["west"]) # ['Dave'] ```
Exercise 3.
Write a function word_positions that takes a sentence string and returns a defaultdict(list) mapping each lowercase word to a list of its 0-based positions in the sentence. For example, word_positions("the cat and the dog") should return {"the": [0, 3], "cat": [1], "and": [2], "dog": [4]}.
Solution to Exercise 3
```python from collections import defaultdict
def word_positions(sentence): result = defaultdict(list) for pos, word in enumerate(sentence.lower().split()): result[word].append(pos) return result
Test¶
positions = word_positions("the cat and the dog") print(dict(positions))
{'the': [0, 3], 'cat': [1], 'and': [2], 'dog': [4]}¶
```