Practical Patterns¶
Real-world dataclass patterns that solve common problems: configuration objects, API models, and domain entities. These patterns show dataclasses as engineering tools, not just syntax sugar — they replace boilerplate while remaining fully compatible with the rest of the Python ecosystem.
Mental Model
A dataclass is a structured bag of data with batteries included -- __init__, __repr__, __eq__, and more are generated for free. The practical patterns on this page show how to combine that foundation with properties, __post_init__, and field() to build production-ready configuration objects, API models, and domain entities without writing boilerplate.
Configuration Objects¶
```python from dataclasses import dataclass, field from typing import Dict, List
@dataclass class DatabaseConfig: host: str port: int = 5432 username: str = "admin" password: str = field(repr=False) # Don't show in repr options: Dict[str, str] = field(default_factory=dict)
@property
def connection_string(self) -> str:
return f"postgres://{self.username}@{self.host}:{self.port}"
config = DatabaseConfig("localhost", password="secret") print(f"Connecting to {config.connection_string}") ```
API Request/Response Models¶
```python from dataclasses import dataclass, asdict from typing import Optional from datetime import datetime
@dataclass class User: id: int name: str email: str created_at: datetime = None
def __post_init__(self):
if self.created_at is None:
self.created_at = datetime.now()
def to_dict(self):
return asdict(self)
user = User(1, "Alice", "alice@example.com") print(user.to_dict()) ```
Domain Model with Validation¶
```python from dataclasses import dataclass from enum import Enum
class OrderStatus(Enum): PENDING = "pending" SHIPPED = "shipped" DELIVERED = "delivered"
@dataclass class Order: order_id: str customer: str amount: float status: OrderStatus = OrderStatus.PENDING
def __post_init__(self):
if self.amount <= 0:
raise ValueError("Amount must be positive")
if not self.order_id:
raise ValueError("Order ID required")
def ship(self):
if self.status != OrderStatus.PENDING:
raise ValueError("Can only ship pending orders")
self.status = OrderStatus.SHIPPED
def deliver(self):
if self.status != OrderStatus.SHIPPED:
raise ValueError("Can only deliver shipped orders")
self.status = OrderStatus.DELIVERED
order = Order("ORD-001", "Alice", 99.99) order.ship() order.deliver() print(f"Order {order.order_id}: {order.status.value}") ```
Nested Dataclasses¶
```python from dataclasses import dataclass from typing import List
@dataclass class Address: street: str city: str country: str
@dataclass class Contact: email: str phone: str
@dataclass class Company: name: str address: Address contact: Contact employees: int = 0
company = Company( name="TechCorp", address=Address("123 Main St", "San Francisco", "USA"), contact=Contact("info@techcorp.com", "555-1234"), employees=50 )
print(f"{company.name} in {company.address.city}") ```
Builder Pattern with Dataclasses¶
```python from dataclasses import dataclass, field
@dataclass class QueryBuilder: table: str where_conditions: list = field(default_factory=list) select_fields: list = field(default_factory=lambda: ["*"]) limit_value: int = None
def where(self, condition: str):
self.where_conditions.append(condition)
return self
def select(self, *fields):
self.select_fields = list(fields)
return self
def limit(self, count: int):
self.limit_value = count
return self
def build(self) -> str:
query = f"SELECT {', '.join(self.select_fields)} FROM {self.table}"
if self.where_conditions:
query += " WHERE " + " AND ".join(self.where_conditions)
if self.limit_value:
query += f" LIMIT {self.limit_value}"
return query
query = (QueryBuilder("users") .select("id", "name", "email") .where("age > 18") .where("country = 'USA'") .limit(10) .build())
print(query) ```
Ecosystem Context: dataclass vs pydantic vs attrs¶
For simple internal data containers, @dataclass from the standard library is
sufficient. When your models sit at a system boundary — parsing JSON from an API,
reading user config files, or feeding data into an ORM — consider these alternatives:
| Concern | dataclass | pydantic | attrs |
|---|---|---|---|
| Validation | Manual (__post_init__) |
Declarative, automatic | Declarative (@validator) |
| Type coercion | None | Automatic | Via converter |
| Serialization | asdict() (shallow) |
.model_dump() / .model_dump_json() |
attrs.asdict() |
| Dependency | Standard library | External (pip install pydantic) |
External (pip install attrs) |
| Performance | Fast creation | Slower (validation overhead) | Fast (slots by default) |
Use @dataclass when validation is simple or unnecessary. Reach for pydantic when
you need schema validation at API boundaries, and attrs when you want rich validators
without pydantic's heavier runtime.
Anti-patterns to avoid
- Business-logic-heavy dataclasses: A dataclass that grows dozens of methods and complex state transitions is no longer a data container — it is a domain object pretending to be one. Extract behavior into separate service classes and keep the dataclass as a plain data carrier.
- God objects: A single dataclass with 20+ fields that represents an entire domain (user + preferences + billing + permissions) should be decomposed into smaller, focused dataclasses composed together.
Data Validation with post_init¶
```python from dataclasses import dataclass, field from typing import List
@dataclass class EmailList: emails: List[str] = field(default_factory=list)
def __post_init__(self):
# Validate and normalize emails
validated = []
for email in self.emails:
if '@' in email:
validated.append(email.lower())
self.emails = validated
def add_email(self, email: str):
if '@' in email:
self.emails.append(email.lower())
email_list = EmailList(["Alice@Example.com", "Bob@Test.com"]) print(email_list.emails) # ['alice@example.com', 'bob@test.com'] ```
Exercises¶
Exercise 1.
Create a DatabaseConfig dataclass with fields host (str), port (int, default 5432), database (str), user (str), and password (str). Add a connection_string property that returns a formatted connection URL. Add a class method from_env() that creates a config from environment variables (simulate with a dictionary). Use field(repr=False) on password.
Solution to Exercise 1
from dataclasses import dataclass, field
@dataclass
class DatabaseConfig:
host: str
database: str
user: str
password: str = field(repr=False)
port: int = 5432
@property
def connection_string(self):
return f"postgresql://{self.user}:{self.password}@{self.host}:{self.port}/{self.database}"
@classmethod
def from_env(cls, env: dict):
return cls(
host=env.get("DB_HOST", "localhost"),
port=int(env.get("DB_PORT", "5432")),
database=env.get("DB_NAME", "mydb"),
user=env.get("DB_USER", "admin"),
password=env.get("DB_PASS", "secret"),
)
env = {"DB_HOST": "prod-server", "DB_NAME": "app", "DB_USER": "root", "DB_PASS": "s3cret"}
config = DatabaseConfig.from_env(env)
print(config) # password hidden in repr
print(config.connection_string)
Exercise 2.
Design an APIResponse dataclass with fields status_code (int), body (dict), headers (dict with default_factory), and timestamp (auto-set in __post_init__). Add a is_success property and a json() method that returns the body as a JSON string. Create responses for success (200) and error (404) cases.
Solution to Exercise 2
from dataclasses import dataclass, field
from datetime import datetime
import json
@dataclass
class APIResponse:
status_code: int
body: dict
headers: dict = field(default_factory=dict)
timestamp: str = field(init=False)
def __post_init__(self):
self.timestamp = datetime.now().isoformat()
@property
def is_success(self):
return 200 <= self.status_code < 300
def json(self):
return json.dumps(self.body, indent=2)
ok = APIResponse(200, {"data": [1, 2, 3]})
err = APIResponse(404, {"error": "Not found"})
print(ok.is_success) # True
print(err.is_success) # False
print(ok.json())
Exercise 3.
Build a TaskList dataclass that manages a list of Task dataclasses. Task has title, done (bool, default False), and priority (int, default 0). TaskList has a tasks field using default_factory. Add methods add(title, priority), complete(title), pending() (returns incomplete tasks sorted by priority), and summary(). Demonstrate the full workflow.
Solution to Exercise 3
from dataclasses import dataclass, field
@dataclass
class Task:
title: str
done: bool = False
priority: int = 0
@dataclass
class TaskList:
tasks: list = field(default_factory=list)
def add(self, title, priority=0):
self.tasks.append(Task(title, priority=priority))
def complete(self, title):
for task in self.tasks:
if task.title == title:
task.done = True
return
raise ValueError(f"Task not found: {title}")
def pending(self):
return sorted(
[t for t in self.tasks if not t.done],
key=lambda t: t.priority,
reverse=True,
)
def summary(self):
total = len(self.tasks)
done = sum(1 for t in self.tasks if t.done)
return f"{done}/{total} tasks completed"
tl = TaskList()
tl.add("Write tests", priority=3)
tl.add("Fix bug", priority=5)
tl.add("Update docs", priority=1)
tl.complete("Fix bug")
print(tl.summary()) # 1/3 tasks completed
for t in tl.pending():
print(f" [{t.priority}] {t.title}")