Skip to content

apply with axis

The axis parameter in apply() determines whether the function is applied along rows or columns. Understanding axis behavior is crucial for correct DataFrame operations.

axis=0 Column-wise

Apply function to each column (default behavior).

1. Default Behavior

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
})

result = df.apply(np.sum, axis=0)
print(result)
A     6
B    15
C    24
dtype: int64

2. Custom Function

def column_stats(col):
    return pd.Series({
        'mean': col.mean(),
        'std': col.std(),
        'min': col.min(),
        'max': col.max()
    })

df.apply(column_stats, axis=0)

3. Column Normalization

df.apply(lambda col: (col - col.mean()) / col.std(), axis=0)

axis=1 Row-wise

Apply function to each row.

1. Row Sum

result = df.apply(np.sum, axis=1)
print(result)
0    12
1    15
2    18
dtype: int64

2. Row Maximum

df.apply(lambda row: row.max(), axis=1)

3. Conditional Row Logic

df.apply(
    lambda row: 'High' if row['A'] > 2 else 'Low',
    axis=1
)

Multiple Column Operations

Common patterns using axis=1.

1. Weighted Calculation

df = pd.DataFrame({
    'quantity': [10, 20, 30],
    'price': [100, 200, 150],
    'discount': [0.1, 0.2, 0.15]
})

df['total'] = df.apply(
    lambda row: row['quantity'] * row['price'] * (1 - row['discount']),
    axis=1
)

2. String Concatenation

df = pd.DataFrame({
    'first_name': ['John', 'Jane'],
    'last_name': ['Doe', 'Smith']
})

df['full_name'] = df.apply(
    lambda row: f"{row['first_name']} {row['last_name']}",
    axis=1
)

3. Conditional Assignment

df['status'] = df.apply(
    lambda row: 'Pass' if row['score'] >= 60 else 'Fail',
    axis=1
)

result_type Parameter

Control the output format when applying row-wise.

1. result_type='expand'

def extract_parts(row):
    return [row['A'], row['B'], row['A'] + row['B']]

df.apply(extract_parts, axis=1, result_type='expand')

Returns a DataFrame with columns 0, 1, 2.

2. result_type='reduce'

# Return a Series (default)
df.apply(lambda row: row.sum(), axis=1, result_type='reduce')

3. result_type='broadcast'

# Same shape as original DataFrame
df.apply(lambda row: row - row.mean(), axis=1, result_type='broadcast')

LeetCode Example: Quality Metrics

Calculate metrics with groupby and apply.

1. Sample Data

queries = pd.DataFrame({
    'query_name': ['Query1', 'Query1', 'Query2', 'Query2'],
    'rating': [5, 4, 3, 2],
    'position': [2, 1, 3, 2]
})

queries['quality'] = queries['rating'] / queries['position']
queries['poor_query'] = (queries['rating'] < 3).astype(int) * 100

2. Round Function

round2 = lambda x: round(x, 2)

result = (queries
    .groupby('query_name')[['quality', 'poor_query']]
    .mean()
    .apply(round2)
    .reset_index())
print(result)

3. Result

  query_name  quality  poor_query
0     Query1     3.25        0.00
1     Query2     1.00       50.00

Performance Comparison

Row-wise apply vs vectorized operations.

1. Slow Row-wise

%%timeit
df.apply(lambda row: row['A'] + row['B'], axis=1)

2. Fast Vectorized

%%timeit
df['A'] + df['B']

3. When to Use axis=1

  • Complex conditional logic
  • Operations requiring multiple columns
  • Non-vectorizable custom functions