Sets¶
A set is an unordered collection of unique elements. Sets trade order for fast membership testing and uniqueness guarantees.
Sets are useful when the main concern is not order, but membership and uniqueness.
flowchart TD
A[set]
A --> B[unordered]
A --> C[unique elements]
A --> D[fast membership testing]
Mental Model
A set is a bag of unique items with no particular order. You cannot access elements by position, but you can ask "is this item in the bag?" extremely fast. Sets also support mathematical operations like union, intersection, and difference, making them ideal for comparing groups of items.
1. Creating Sets¶
Sets can be written with braces.
python
colors = {"red", "green", "blue"}
An empty set must be created with set().
python
empty = set()
Using {} creates an empty dictionary, not a set.
python
print(type({}))
print(type(set()))
Output:
text
<class 'dict'>
<class 'set'>
2. Uniqueness¶
Sets automatically remove duplicates.
python
data = {1, 2, 2, 3, 3, 3}
print(data)
Output:
text
{1, 2, 3}
This makes sets very useful for eliminating repeated values.
3. Membership Testing¶
Sets are especially good for membership checks.
```python vowels = {"a", "e", "i", "o", "u"}
print("a" in vowels) print("z" in vowels) ```
Output:
text
True
False
4. Set Operations¶
Sets support important mathematical operations.
| Operation | Symbol | Meaning |
|---|---|---|
| union | \| |
all elements from both sets |
| intersection | & |
common elements |
| difference | - |
elements in first not second |
| symmetric difference | ^ |
elements in either set, but not both |
Example:
```python a = {1, 2, 3} b = {3, 4, 5}
print(a | b) print(a & b) print(a - b) print(a ^ b) ```
Output:
text
{1, 2, 3, 4, 5}
{3}
{1, 2}
{1, 2, 4, 5}
Each operator also has a method form: union(), intersection(), difference(), symmetric_difference().
In-place variants update the set directly instead of creating a new one. The pattern extends to all operators.
python
a = {1, 2, 3}
a |= {4, 5}
print(a)
Output:
text
{1, 2, 3, 4, 5}
5. Common Set Methods¶
| Method | Purpose |
|---|---|
add(x) |
add element |
remove(x) |
remove element |
discard(x) |
remove if present |
pop() |
remove arbitrary element |
clear() |
remove all elements |
remove(x) raises KeyError if x is not in the set. discard(x) does nothing if x is absent. pop() raises KeyError on an empty set.
Example:
```python s = {1, 2, 3}
s.remove(2) print(s)
s.discard(99) print(s)
s.discard(3) print(s) ```
Output:
text
{1, 3}
{1, 3}
{1}
6. Worked Examples¶
Example 1: remove duplicates¶
python
nums = [1, 2, 2, 3, 3]
unique = set(nums)
print(unique)
Output:
text
{1, 2, 3}
Example 2: membership test¶
```python allowed = {"admin", "editor"}
if "admin" in allowed: print("granted") ```
Output:
text
granted
Example 3: intersection¶
```python a = {"red", "green"} b = {"green", "blue"}
print(a & b) ```
Output:
text
{'green'}
7. Common Pitfalls¶
Expecting order¶
Sets are unordered collections. Do not rely on iteration order.
python
s = {3, 1, 2}
print(s)
On CPython, small integer sets may appear sorted due to hash values, but this is an implementation detail and must not be relied upon.
Using {} for an empty set¶
{} creates a dictionary, not a set. Always use set() for an empty set, as shown in Section 1.
Assuming all objects can be stored in a set¶
Set elements must be hashable. Mutable types like lists and dictionaries cannot be added to a set.
python
s = set()
s.add([1, 2])
Output:
text
TypeError: unhashable type: 'list'
Python also provides frozenset, an immutable variant that is itself hashable and can be stored inside another set. Hashing is covered in more detail in a later chapter.
8. When to Use a Set¶
Use a set when:
- uniqueness matters --- duplicates should be automatically eliminated
- membership testing is frequent ---
inon a set is O(1), versus O(n) on a list - order is not semantically important --- you care about what is present, not where
Use a list instead when order matters or when you need to store duplicate values.
9. Summary¶
Key ideas:
- sets store unique elements
- sets are unordered
- membership testing is a major strength of sets
- set operations reflect mathematical set ideas
frozensetprovides an immutable, hashable alternative
Sets are especially useful for uniqueness, filtering, and fast membership logic.
Notebook Examples¶
python
a = {'1', "2", '1'} # set
b = {'1', "3"}
c = a + b
print(c)
python
a = {1,2,3,"four",4,4,4,4,4} # set
print(a)
python
a = {1,2,3,"four",4,4,4,4,4} # set
a.add("five")
print(a)
python
a = {1,2,3,"four",4,4,4,4,4} # set
a = a.add("five")
print(a) # None
python
a = {1,2,3,"four",4,4,4,4,4} # set
a.remove("four")
print(a)
a.remove("four")
print(a)
python
a = {1,2,3,"four",4,4,4,4,4} # set
a.discard("four")
print(a)
a.discard("four")
print(a)
```python a = {1,2,3,"four",4,4,4,4,4} # set
for _ in range(3): a.pop() print(a)
print() print(a) print(_) ```
```python a = {1,2,3,"four",4,4,4,4,4} # set
for _ in range(3): popped = a.pop() print(a, popped) ```
```python a = [1,2,3,"four",4,4,4,4,4] # list
for _ in range(10): popped = a.pop() print(a, popped) ```
```python a = [1,2,3,"four",4,4,4,4,4] # list
for _ in range(10): popped = a.pop(0) print(a, popped) ```
python
nums = [1, 2, 2, 3, 3]
unique = set(nums)
print(unique)
```python allowed = {"admin", "editor"}
if "admin" in allowed: print("granted") ```
```python allowed = {"admin", "editor"}
if "Bob" in allowed: print("granted") ```
python
a = {1,2,3}
b = {3,4,5}
c = a | b
d = a.union(b)
print(c)
print(d)
python
a = {1,2,3}
b = {3,4,5}
c = a & b
d = a.intersection(b)
print(c)
print(d)
python
a = {1,2,3}
b = {3,4,5}
c = a - b
d = a.difference(b)
print(c)
print(d)
python
a = {1,2,3}
b = {3,4,5}
c = a ^ b
d = a.symmetric_difference(b)
e = set.symmetric_difference(a,b)
print(c)
print(d)
print(e)
Exercises¶
Exercise 1.
Checking membership with in is dramatically faster for sets than for lists. Explain why from a data structure perspective. If you have a list of 1 million items and need to check whether 10,000 values exist in it, why is converting to a set first (and then checking) much faster than checking the list directly?
Solution to Exercise 1
A list in check scans elements one by one until a match is found -- O(n) time on average. For 10,000 checks against a 1 million-item list: \(10{,}000 \times 1{,}000{,}000 / 2 = 5 \times 10^9\) comparisons.
A set uses a hash table. in computes the hash of the value, looks up the corresponding bucket, and checks a small number of entries -- O(1) average time. For 10,000 checks: \(10{,}000 \times O(1) = O(10{,}000)\) operations.
Converting the list to a set costs O(n) -- one pass through the data. After that, each in check is O(1). Total: \(O(1{,}000{,}000) + O(10{,}000) = O(1{,}010{,}000)\), which is roughly 5000x faster than the list approach.
This is why converting to a set is worthwhile whenever you need repeated membership checks.
Exercise 2.
Explain why {} creates an empty dictionary rather than an empty set. If you need an empty set, how do you create one? Why did Python's designers make this choice, given that {1, 2, 3} creates a set?
Solution to Exercise 2
{} creates an empty dictionary because dictionaries existed in Python before sets. The {key: value} syntax was already established for dictionaries. When set literals were added (Python 2.7 / 3.1), the non-empty form {1, 2, 3} was unambiguous (no colons = set), but {} was already taken.
To create an empty set: set().
This is a historical artifact. If Python were designed from scratch with both types, the syntax might be different. The key distinction: {...} without colons and with at least one element is a set literal; {key: value, ...} is a dict literal; {} is always a dict.
Exercise 3. Predict the output and explain:
python
a = {1, 2, 3}
b = {3, 4, 5}
print(a | b)
print(a & b)
print(a - b)
print(a ^ b)
How do these set operations correspond to mathematical set concepts? Why are sets the natural data structure for these operations, rather than lists?
Solution to Exercise 3
Output:
text
{1, 2, 3, 4, 5}
{3}
{1, 2}
{1, 2, 4, 5}
a | bis union: all elements in either set. Corresponds to \(A \cup B\).a & bis intersection: elements in both sets. Corresponds to \(A \cap B\).a - bis difference: elements inabut not inb. Corresponds to \(A \setminus B\).a ^ bis symmetric difference: elements in either set but not both. Corresponds to \(A \triangle B\).
Sets are natural for these operations because they enforce uniqueness and are backed by hash tables, making each operation efficient (O(n) where n is the size of the sets). Lists would require manual deduplication and nested loops, making the same operations O(n*m) and more error-prone.