Skip to content

eval() Method

The eval() method provides fast, memory-efficient expression evaluation for column operations. It uses NumExpr under the hood when available, enabling optimized computation.

Mental Model

eval() takes a string expression like "A + B * C" and evaluates it without creating intermediate arrays. For large DataFrames this saves memory because the NumExpr engine fuses operations into a single pass. Think of it as a compiled calculator for column arithmetic -- same result as normal pandas math, but more memory-efficient at scale.

Basic Syntax

python DataFrame.eval(expr, inplace=False)

Parameters:

  • expr: String expression to evaluate
  • inplace: If True, modify DataFrame in place

Why Use eval()?

Problem: Standard Operations Create Intermediate Arrays

```python import pandas as pd import numpy as np

df = pd.DataFrame({ 'A': np.random.randn(1_000_000), 'B': np.random.randn(1_000_000), 'C': np.random.randn(1_000_000) })

Standard approach: creates temporary arrays

df['D'] = df['A'] + df['B'] * df['C'] ```

This creates intermediate arrays for df['B'] * df['C'], consuming extra memory.

Solution: eval() Does One-Pass Evaluation

```python

eval approach: single-pass, less memory

df.eval('D = A + B * C', inplace=True) ```

Creating New Columns

```python

Single column

df.eval('D = A + B', inplace=True)

Multiple columns in one call

df.eval(''' D = A + B E = A - B F = A * B ''', inplace=True) ```

Supported Operations

Arithmetic

python df.eval('result = A + B - C * D / E', inplace=True) df.eval('power = A ** 2', inplace=True) df.eval('floor_div = A // B', inplace=True) df.eval('modulo = A % B', inplace=True)

Comparisons

```python

Returns boolean Series

mask = df.eval('A > B') high_values = df.eval('A > 0.5 and B > 0.5') ```

Boolean Logic

```python

Use 'and', 'or', 'not' (not &, |, ~)

df.eval('flag = (A > 0) and (B < 0)', inplace=True) df.eval('either = (A > 0) or (B > 0)', inplace=True) df.eval('neither = not ((A > 0) or (B > 0))', inplace=True) ```

Parentheses for Complex Expressions

python df.eval('result = (A + B) / (C - D)', inplace=True) df.eval('complex = ((A + B) * C) / (D - E)', inplace=True)

Using Local Variables

Reference local Python variables with @:

```python threshold = 0.5 multiplier = 2.0

Use @ to reference local variables

df.eval('scaled = A * @multiplier', inplace=True) df.eval('above_threshold = A > @threshold', inplace=True)

Multiple local variables

mean_a = df['A'].mean() std_a = df['A'].std() df.eval('z_score = (A - @mean_a) / @std_a', inplace=True) ```

Combining with query()

Use eval() for computation, query() for filtering:

```python

Compute

df.eval('ratio = A / B', inplace=True)

Filter

high_ratio = df.query('ratio > 2') ```

Performance Comparison

```python import time

n = 5_000_000 df = pd.DataFrame({ 'A': np.random.randn(n), 'B': np.random.randn(n), 'C': np.random.randn(n) })

Standard approach

start = time.time() df['D'] = df['A'] + df['B'] * df['C'] - df['A'] / df['B'] standard_time = time.time() - start

Reset

df = df.drop('D', axis=1)

eval approach

start = time.time() df.eval('D = A + B * C - A / B', inplace=True) eval_time = time.time() - start

print(f"Standard: {standard_time:.3f}s") print(f"eval(): {eval_time:.3f}s") print(f"Speedup: {standard_time/eval_time:.1f}x") ```

Typical results on large DataFrames: Standard: 0.089s eval(): 0.034s Speedup: 2.6x

Limitations

Cannot Use

```python

Method calls

df.eval('D = A.abs()') # Error

String operations

df.eval('D = A.str.upper()') # Error

Custom functions

df.eval('D = my_func(A)') # Error

Aggregations

df.eval('D = A.sum()') # Error

```

Workarounds

```python

For method calls, compute first

abs_A = df['A'].abs() df.eval('D = @abs_A + B', inplace=True)

Or use standard pandas

df['D'] = df['A'].abs() + df['B'] ```

pd.eval() for Non-DataFrame Operations

Use pd.eval() for standalone expressions:

```python

Evaluate expression on Series

a = pd.Series(np.random.randn(1000)) b = pd.Series(np.random.randn(1000))

result = pd.eval('a + b * 2') ```

Return vs Inplace

```python

Return new value (default)

result = df.eval('A + B') # Returns Series

Modify in place

df.eval('D = A + B', inplace=True) # Returns None, modifies df ```

Practical Examples

Financial Calculations

```python portfolio = pd.DataFrame({ 'price': np.random.uniform(10, 100, 100000), 'shares': np.random.randint(1, 1000, 100000), 'cost_basis': np.random.uniform(5, 50, 100000) })

Calculate multiple metrics efficiently

portfolio.eval(''' market_value = price * shares total_cost = cost_basis * shares profit = market_value - total_cost return_pct = (profit / total_cost) * 100 ''', inplace=True) ```

Conditional Calculations

```python

Combine with numpy where for conditionals

df['sign'] = np.where(df.eval('A > 0'), 1, -1)

Or use boolean eval

df.eval('is_positive = A > 0', inplace=True) df.eval('is_both_positive = (A > 0) and (B > 0)', inplace=True) ```

Rolling Metrics with eval

```python

Compute rolling mean first, then use in eval

df['rolling_mean'] = df['A'].rolling(20).mean() df.eval('deviation = A - rolling_mean', inplace=True) ```

Summary

Feature Standard Pandas eval()
Memory usage Creates intermediates Single-pass
Speed (large data) Baseline 2-3x faster
Readability Verbose Compact
Method calls Supported Not supported
Local variables Direct use Use @ prefix

When to use eval():

  • Large DataFrames (>100K rows)
  • Complex arithmetic expressions
  • Memory-constrained environments
  • Multiple column operations

When to use standard pandas:

  • Small DataFrames
  • Need method calls (.abs(), .str, etc.)
  • Custom functions required
  • Debugging (easier to step through)

Exercises

Exercise 1. Create a DataFrame with columns 'revenue' and 'cost'. Use df.eval('profit = revenue - cost') to add a new 'profit' column without modifying the original DataFrame (use inplace=False).

Solution to Exercise 1

Use eval to create a computed column.

import pandas as pd

df = pd.DataFrame({
    'revenue': [100, 200, 300],
    'cost': [60, 120, 180]
})
result = df.eval('profit = revenue - cost')
print(result)

Exercise 2. Use pd.eval() to compute an expression involving two DataFrames: given df1 and df2 each with a column 'value', compute df1['value'] + df2['value'] using pd.eval('df1.value + df2.value').

Solution to Exercise 2

Use pd.eval() for cross-DataFrame operations.

import pandas as pd

df1 = pd.DataFrame({'value': [10, 20, 30]})
df2 = pd.DataFrame({'value': [1, 2, 3]})
result = pd.eval('df1.value + df2.value')
print(result)

Exercise 3. Use df.eval() with the @ syntax to reference a local variable. Given a tax rate stored in a Python variable, compute after_tax = salary * (1 - @tax_rate) inside eval.

Solution to Exercise 3

Reference local variables with the @ prefix.

import pandas as pd

df = pd.DataFrame({'salary': [50000, 60000, 70000]})
tax_rate = 0.25
result = df.eval('after_tax = salary * (1 - @tax_rate)')
print(result)