Skip to content

eval() Method

The eval() method provides fast, memory-efficient expression evaluation for column operations. It uses NumExpr under the hood when available, enabling optimized computation.

Basic Syntax

DataFrame.eval(expr, inplace=False)

Parameters: - expr: String expression to evaluate - inplace: If True, modify DataFrame in place

Why Use eval()?

Problem: Standard Operations Create Intermediate Arrays

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'A': np.random.randn(1_000_000),
    'B': np.random.randn(1_000_000),
    'C': np.random.randn(1_000_000)
})

# Standard approach: creates temporary arrays
df['D'] = df['A'] + df['B'] * df['C']

This creates intermediate arrays for df['B'] * df['C'], consuming extra memory.

Solution: eval() Does One-Pass Evaluation

# eval approach: single-pass, less memory
df.eval('D = A + B * C', inplace=True)

Creating New Columns

# Single column
df.eval('D = A + B', inplace=True)

# Multiple columns in one call
df.eval('''
    D = A + B
    E = A - B
    F = A * B
''', inplace=True)

Supported Operations

Arithmetic

df.eval('result = A + B - C * D / E', inplace=True)
df.eval('power = A ** 2', inplace=True)
df.eval('floor_div = A // B', inplace=True)
df.eval('modulo = A % B', inplace=True)

Comparisons

# Returns boolean Series
mask = df.eval('A > B')
high_values = df.eval('A > 0.5 and B > 0.5')

Boolean Logic

# Use 'and', 'or', 'not' (not &, |, ~)
df.eval('flag = (A > 0) and (B < 0)', inplace=True)
df.eval('either = (A > 0) or (B > 0)', inplace=True)
df.eval('neither = not ((A > 0) or (B > 0))', inplace=True)

Parentheses for Complex Expressions

df.eval('result = (A + B) / (C - D)', inplace=True)
df.eval('complex = ((A + B) * C) / (D - E)', inplace=True)

Using Local Variables

Reference local Python variables with @:

threshold = 0.5
multiplier = 2.0

# Use @ to reference local variables
df.eval('scaled = A * @multiplier', inplace=True)
df.eval('above_threshold = A > @threshold', inplace=True)

# Multiple local variables
mean_a = df['A'].mean()
std_a = df['A'].std()
df.eval('z_score = (A - @mean_a) / @std_a', inplace=True)

Combining with query()

Use eval() for computation, query() for filtering:

# Compute
df.eval('ratio = A / B', inplace=True)

# Filter
high_ratio = df.query('ratio > 2')

Performance Comparison

import time

n = 5_000_000
df = pd.DataFrame({
    'A': np.random.randn(n),
    'B': np.random.randn(n),
    'C': np.random.randn(n)
})

# Standard approach
start = time.time()
df['D'] = df['A'] + df['B'] * df['C'] - df['A'] / df['B']
standard_time = time.time() - start

# Reset
df = df.drop('D', axis=1)

# eval approach
start = time.time()
df.eval('D = A + B * C - A / B', inplace=True)
eval_time = time.time() - start

print(f"Standard: {standard_time:.3f}s")
print(f"eval(): {eval_time:.3f}s")
print(f"Speedup: {standard_time/eval_time:.1f}x")

Typical results on large DataFrames:

Standard: 0.089s
eval(): 0.034s
Speedup: 2.6x

Limitations

Cannot Use

# Method calls
# df.eval('D = A.abs()')  # Error

# String operations
# df.eval('D = A.str.upper()')  # Error

# Custom functions
# df.eval('D = my_func(A)')  # Error

# Aggregations
# df.eval('D = A.sum()')  # Error

Workarounds

# For method calls, compute first
abs_A = df['A'].abs()
df.eval('D = @abs_A + B', inplace=True)

# Or use standard pandas
df['D'] = df['A'].abs() + df['B']

pd.eval() for Non-DataFrame Operations

Use pd.eval() for standalone expressions:

# Evaluate expression on Series
a = pd.Series(np.random.randn(1000))
b = pd.Series(np.random.randn(1000))

result = pd.eval('a + b * 2')

Return vs Inplace

# Return new value (default)
result = df.eval('A + B')  # Returns Series

# Modify in place
df.eval('D = A + B', inplace=True)  # Returns None, modifies df

Practical Examples

Financial Calculations

portfolio = pd.DataFrame({
    'price': np.random.uniform(10, 100, 100000),
    'shares': np.random.randint(1, 1000, 100000),
    'cost_basis': np.random.uniform(5, 50, 100000)
})

# Calculate multiple metrics efficiently
portfolio.eval('''
    market_value = price * shares
    total_cost = cost_basis * shares
    profit = market_value - total_cost
    return_pct = (profit / total_cost) * 100
''', inplace=True)

Conditional Calculations

# Combine with numpy where for conditionals
df['sign'] = np.where(df.eval('A > 0'), 1, -1)

# Or use boolean eval
df.eval('is_positive = A > 0', inplace=True)
df.eval('is_both_positive = (A > 0) and (B > 0)', inplace=True)

Rolling Metrics with eval

# Compute rolling mean first, then use in eval
df['rolling_mean'] = df['A'].rolling(20).mean()
df.eval('deviation = A - rolling_mean', inplace=True)

Summary

Feature Standard Pandas eval()
Memory usage Creates intermediates Single-pass
Speed (large data) Baseline 2-3x faster
Readability Verbose Compact
Method calls Supported Not supported
Local variables Direct use Use @ prefix

When to use eval(): - Large DataFrames (>100K rows) - Complex arithmetic expressions - Memory-constrained environments - Multiple column operations

When to use standard pandas: - Small DataFrames - Need method calls (.abs(), .str, etc.) - Custom functions required - Debugging (easier to step through)