Skip to content

String Interning

What Is Interning

1. Concept

Share memory for identical strings:

s1 = "hello"
s2 = "hello"
print(s1 is s2)         # Often True
print(id(s1) == id(s2)) # Often True

2. Why Intern

# Without interning:
names = ["hello"] * 1000
# 1000 separate objects

# With interning:
# All share one object

CPython Rules

1. Identifier-Like

Automatically interned:

# Identifier-like
s1 = "hello"
s2 = "hello"
print(s1 is s2)         # True

s1 = "hello_world"
s2 = "hello_world"
print(s1 is s2)         # True

2. Not Interned

# Contains spaces/special chars
s1 = "hello world"
s2 = "hello world"
print(s1 is s2)         # May be False

s1 = "hello!"
s2 = "hello!"
print(s1 is s2)         # May be False

Force Interning

1. sys.intern()

import sys

# Force interning
s1 = sys.intern("hello world")
s2 = sys.intern("hello world")
print(s1 is s2)         # True

2. Use Case

# Many duplicate strings
data = ["user_id"] * 10000

# Intern keys
data_interned = [
    sys.intern("user_id")
    for _ in range(10000)
]

# All same object
print(data_interned[0] is data_interned[9999])

Best Practices

1. Never Rely On

# Bad: assumes interning
def bad(s):
    if s is "hello":    # Don't!
        return True

# Good: use ==
def good(s):
    if s == "hello":    # Correct
        return True

2. Explicit Intern

# When many duplicates
import sys

class Config:
    def __init__(self):
        self.USER_ID = sys.intern("user_id")

    def check(self, key):
        if key is self.USER_ID:  # Fast
            return "id"

Summary

1. CPython

  • Auto-interns identifier-like
  • Can force with sys.intern()
  • Performance optimization
  • Not guaranteed by spec

2. Portable Code

# Always use ==
if name == "admin":
    pass

# Never use is for strings
# except explicit interning
admin_key = sys.intern("admin")
if name is admin_key:   # OK
    pass