Skip to content

Basic Aggregations

Aggregation functions summarize data by computing statistics like sum, mean, and count. They reduce multiple values to a single result.

Column Aggregations

Apply aggregations to DataFrame columns.

1. Single Column Mean

import pandas as pd

df = pd.DataFrame({
    'Age': [25, 30, 35, 40],
    'Salary': [50000, 60000, 70000, 80000]
})

print(df['Age'].mean())  # 32.5

2. Single Column Sum

print(df['Salary'].sum())  # 260000

3. Multiple Aggregations

print(df['Age'].min())    # 25
print(df['Age'].max())    # 40
print(df['Age'].std())    # 6.45
print(df['Age'].count())  # 4

DataFrame Aggregations

Apply aggregations across the entire DataFrame.

1. All Columns Mean

print(df.mean())
Age          32.5
Salary    65000.0
dtype: float64

2. All Columns Sum

print(df.sum())

3. Summary Statistics

print(df.describe())
             Age        Salary
count   4.000000      4.000000
mean   32.500000  65000.000000
std     6.454972  12909.944487
min    25.000000  50000.000000
25%    28.750000  57500.000000
50%    32.500000  65000.000000
75%    36.250000  72500.000000
max    40.000000  80000.000000

Common Aggregation Methods

Methods available on Series and DataFrame.

1. Central Tendency

df['Age'].mean()    # Arithmetic mean
df['Age'].median()  # Middle value
df['Age'].mode()    # Most frequent value

2. Dispersion

df['Age'].std()     # Standard deviation
df['Age'].var()     # Variance
df['Age'].sem()     # Standard error of mean

3. Quantiles

df['Age'].quantile(0.25)       # First quartile
df['Age'].quantile([0.25, 0.75])  # Multiple quantiles

Numeric Only Aggregations

Handle mixed data types.

1. numeric_only Parameter

df = pd.DataFrame({
    'Name': ['Alice', 'Bob'],
    'Age': [25, 30],
    'Salary': [50000, 60000]
})

df.mean(numeric_only=True)  # Exclude 'Name'

2. Select Numeric Columns

df.select_dtypes(include='number').mean()

3. Specific Columns

df[['Age', 'Salary']].mean()

Axis Parameter

Aggregate along rows or columns.

1. axis=0 (Default)

df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
})

df.sum(axis=0)  # Sum each column
A     6
B    15
dtype: int64

2. axis=1

df.sum(axis=1)  # Sum each row
0    5
1    7
2    9
dtype: int64

3. Row Mean

df['RowMean'] = df.mean(axis=1)

Handling Missing Values

Aggregation methods handle NaN by default.

1. skipna=True (Default)

import numpy as np

s = pd.Series([1, 2, np.nan, 4])
s.mean()  # 2.333... (ignores NaN)

2. skipna=False

s.mean(skipna=False)  # NaN (includes NaN)

3. Count Non-NaN

s.count()  # 3 (only non-NaN values)
len(s)     # 4 (all values including NaN)