Integer Caching¶
Mental Model
CPython pre-allocates integer objects for -5 through 256 at startup, so every use of 42 points to the same object. Outside this range, each expression creates a fresh object. This is a performance shortcut, not a language rule -- always compare integers with ==, never is.
CPython Behavior¶
1. Small Integers¶
CPython caches [-5, 256]:
```python
Cached range¶
a = 100 b = 100 print(a is b) # True print(id(a) == id(b)) # True ```
2. Outside Range¶
```python
Not cached¶
x = 1000 y = 1000 print(x is y) # Usually False print(x == y) # True ```
Why Cache¶
1. Performance¶
```python
Without caching:¶
Every loop creates new objects¶
for i in range(100): total += i # 100 objects
With caching:¶
Reuses same 100 objects¶
```
2. Memory¶
```python
Without caching¶
numbers = [1, 2, 3, 1, 2, 3]
6 separate objects¶
With caching¶
Only 3 objects shared¶
```
Best Practices¶
1. Never Rely On¶
```python
Bad: assumes caching¶
def bad(x): if x is 42: # Don't! return True
Good: use ==¶
def good(x): if x == 42: # Correct return True ```
2. Singletons Only¶
```python
Use 'is' only for:¶
if x is None: # OK pass
if x is True: # OK pass
Not for integers:¶
if x is 0: # Bad!¶
```
Summary¶
1. CPython¶
- Caches [-5, 256]
- Automatic optimization
- Not part of language spec
2. Write Portable¶
```python
Always use ==¶
if count == 0: pass
Only is for singletons¶
if result is None: pass ```
Exercises¶
Exercise 1. Integer caching behavior depends on context in CPython. Predict the output:
```python a = 256 b = 256 print(a is b)
c = 257 d = 257 print(c is d)
e, f = 1000, 1000 print(e is f) ```
Why might e is f return True even though 1000 is outside the cached range? What compile-time optimization can cause this?
Solution to Exercise 1
Output (CPython):
text
True
False
True
a is b is True because 256 is within the cached range [-5, 256]. c is d is False because 257 is outside this range, so two separate objects are created.
e is f is likely True because CPython's peephole optimizer (or AST optimizer) recognizes that e and f are assigned in the same statement from the same constant. The compiler stores 1000 once in the code object's constant pool, so both names reference the same object. This is a compile-time optimization, separate from the runtime small-integer cache.
This demonstrates why is for integers is unreliable: the result depends on compile-time optimizations that vary between Python versions and contexts.
Exercise 2. Integer identity can change with how the integer is produced. Predict the output:
```python a = 100 b = 50 + 50 c = int("100")
print(a is b) print(a is c) print(a == b == c) ```
Why is a is b likely True while a is c may or may not be True? What does this tell you about the relationship between integer value and object identity?
Solution to Exercise 2
Output (CPython):
text
True
True
True
All three are True in CPython because 100 is within the cached range [-5, 256]. The expression 50 + 50 evaluates to the integer 100, and the cache ensures the same object is returned. Similarly, int("100") produces the cached 100 object.
However, for values outside the cache range, a = 500 and c = int("500") would likely produce different objects (a is c would be False). The key lesson: value equality (==) is always reliable, but identity (is) depends on whether the implementation happens to reuse objects. Never use is for integer comparison.
Exercise 3. Caching exists for performance. Predict which is faster and explain why:
```python import sys
How much memory does a single int use?¶
print(sys.getsizeof(0)) print(sys.getsizeof(1)) print(sys.getsizeof(2**30))
Without caching, this loop would create 100 new int objects per iteration:¶
total = 0 for i in range(100): total += i ```
Why does Python cache small integers but not large ones? What is the trade-off between caching more integers and the memory cost of pre-allocating them?
Solution to Exercise 3
Output (approximate, varies by platform):
text
28
28
32
Even a small integer like 0 takes 28 bytes in CPython (object header: reference count + type pointer + value). Larger integers need more space for additional digits.
Python caches integers [-5, 256] because these appear extremely frequently in typical programs (loop counters, indices, return codes, flag values). Pre-allocating 262 integer objects costs about 262 * 28 = ~7 KB, which is negligible. Without caching, a simple for i in range(100) would allocate and deallocate 100 integer objects per iteration.
The trade-off: caching more integers saves allocation time but costs memory upfront. The range [-5, 256] was chosen empirically to cover the vast majority of commonly used integers. Caching up to, say, 10,000 would cost ~280 KB with diminishing returns, since larger integers appear less frequently.