eval() Method¶
The eval() method provides fast, memory-efficient expression evaluation for column operations. It uses NumExpr under the hood when available, enabling optimized computation.
Basic Syntax¶
DataFrame.eval(expr, inplace=False)
Parameters:
- expr: String expression to evaluate
- inplace: If True, modify DataFrame in place
Why Use eval()?¶
Problem: Standard Operations Create Intermediate Arrays¶
import pandas as pd
import numpy as np
df = pd.DataFrame({
'A': np.random.randn(1_000_000),
'B': np.random.randn(1_000_000),
'C': np.random.randn(1_000_000)
})
# Standard approach: creates temporary arrays
df['D'] = df['A'] + df['B'] * df['C']
This creates intermediate arrays for df['B'] * df['C'], consuming extra memory.
Solution: eval() Does One-Pass Evaluation¶
# eval approach: single-pass, less memory
df.eval('D = A + B * C', inplace=True)
Creating New Columns¶
# Single column
df.eval('D = A + B', inplace=True)
# Multiple columns in one call
df.eval('''
D = A + B
E = A - B
F = A * B
''', inplace=True)
Supported Operations¶
Arithmetic¶
df.eval('result = A + B - C * D / E', inplace=True)
df.eval('power = A ** 2', inplace=True)
df.eval('floor_div = A // B', inplace=True)
df.eval('modulo = A % B', inplace=True)
Comparisons¶
# Returns boolean Series
mask = df.eval('A > B')
high_values = df.eval('A > 0.5 and B > 0.5')
Boolean Logic¶
# Use 'and', 'or', 'not' (not &, |, ~)
df.eval('flag = (A > 0) and (B < 0)', inplace=True)
df.eval('either = (A > 0) or (B > 0)', inplace=True)
df.eval('neither = not ((A > 0) or (B > 0))', inplace=True)
Parentheses for Complex Expressions¶
df.eval('result = (A + B) / (C - D)', inplace=True)
df.eval('complex = ((A + B) * C) / (D - E)', inplace=True)
Using Local Variables¶
Reference local Python variables with @:
threshold = 0.5
multiplier = 2.0
# Use @ to reference local variables
df.eval('scaled = A * @multiplier', inplace=True)
df.eval('above_threshold = A > @threshold', inplace=True)
# Multiple local variables
mean_a = df['A'].mean()
std_a = df['A'].std()
df.eval('z_score = (A - @mean_a) / @std_a', inplace=True)
Combining with query()¶
Use eval() for computation, query() for filtering:
# Compute
df.eval('ratio = A / B', inplace=True)
# Filter
high_ratio = df.query('ratio > 2')
Performance Comparison¶
import time
n = 5_000_000
df = pd.DataFrame({
'A': np.random.randn(n),
'B': np.random.randn(n),
'C': np.random.randn(n)
})
# Standard approach
start = time.time()
df['D'] = df['A'] + df['B'] * df['C'] - df['A'] / df['B']
standard_time = time.time() - start
# Reset
df = df.drop('D', axis=1)
# eval approach
start = time.time()
df.eval('D = A + B * C - A / B', inplace=True)
eval_time = time.time() - start
print(f"Standard: {standard_time:.3f}s")
print(f"eval(): {eval_time:.3f}s")
print(f"Speedup: {standard_time/eval_time:.1f}x")
Typical results on large DataFrames:
Standard: 0.089s
eval(): 0.034s
Speedup: 2.6x
Limitations¶
Cannot Use¶
# Method calls
# df.eval('D = A.abs()') # Error
# String operations
# df.eval('D = A.str.upper()') # Error
# Custom functions
# df.eval('D = my_func(A)') # Error
# Aggregations
# df.eval('D = A.sum()') # Error
Workarounds¶
# For method calls, compute first
abs_A = df['A'].abs()
df.eval('D = @abs_A + B', inplace=True)
# Or use standard pandas
df['D'] = df['A'].abs() + df['B']
pd.eval() for Non-DataFrame Operations¶
Use pd.eval() for standalone expressions:
# Evaluate expression on Series
a = pd.Series(np.random.randn(1000))
b = pd.Series(np.random.randn(1000))
result = pd.eval('a + b * 2')
Return vs Inplace¶
# Return new value (default)
result = df.eval('A + B') # Returns Series
# Modify in place
df.eval('D = A + B', inplace=True) # Returns None, modifies df
Practical Examples¶
Financial Calculations¶
portfolio = pd.DataFrame({
'price': np.random.uniform(10, 100, 100000),
'shares': np.random.randint(1, 1000, 100000),
'cost_basis': np.random.uniform(5, 50, 100000)
})
# Calculate multiple metrics efficiently
portfolio.eval('''
market_value = price * shares
total_cost = cost_basis * shares
profit = market_value - total_cost
return_pct = (profit / total_cost) * 100
''', inplace=True)
Conditional Calculations¶
# Combine with numpy where for conditionals
df['sign'] = np.where(df.eval('A > 0'), 1, -1)
# Or use boolean eval
df.eval('is_positive = A > 0', inplace=True)
df.eval('is_both_positive = (A > 0) and (B > 0)', inplace=True)
Rolling Metrics with eval¶
# Compute rolling mean first, then use in eval
df['rolling_mean'] = df['A'].rolling(20).mean()
df.eval('deviation = A - rolling_mean', inplace=True)
Summary¶
| Feature | Standard Pandas | eval() |
|---|---|---|
| Memory usage | Creates intermediates | Single-pass |
| Speed (large data) | Baseline | 2-3x faster |
| Readability | Verbose | Compact |
| Method calls | Supported | Not supported |
| Local variables | Direct use | Use @ prefix |
When to use eval(): - Large DataFrames (>100K rows) - Complex arithmetic expressions - Memory-constrained environments - Multiple column operations
When to use standard pandas:
- Small DataFrames
- Need method calls (.abs(), .str, etc.)
- Custom functions required
- Debugging (easier to step through)