Iteration with GroupBy¶
GroupBy objects support iteration, allowing you to process each group individually.
Basic Iteration¶
Iterate through groups as (name, group) pairs.
1. For Loop¶
import pandas as pd
data = {
'day': ['1/1/20', '1/2/20', '1/1/20', '1/2/20', '1/1/20', '1/2/20'],
'city': ['NY', 'NY', 'SF', 'SF', 'LA', 'LA'],
'temperature': [21, 14, 25, 32, 36, 42],
'humidity': [31, 15, 36, 22, 16, 29],
}
df = pd.DataFrame(data)
for city, df_city in df.groupby("city"):
print(city)
print(df_city)
print()
2. Tuple Unpacking¶
# city: group key (string)
# df_city: DataFrame for that group
3. Output¶
LA
day city temperature humidity
4 1/1/20 LA 36 16
5 1/2/20 LA 42 29
NY
day city temperature humidity
0 1/1/20 NY 21 31
1 1/2/20 NY 14 15
SF
day city temperature humidity
2 1/1/20 SF 25 36
3 1/2/20 SF 32 22
Multiple Group Keys¶
Iterate with multiple grouping columns.
1. Tuple Keys¶
for (city, day), group in df.groupby(['city', 'day']):
print(f"City: {city}, Day: {day}")
print(group)
print()
2. Named Tuple¶
# Keys are returned as tuple (city, day)
3. Access Individual Keys¶
for keys, group in df.groupby(['city', 'day']):
city, day = keys
print(f"Processing {city} on {day}")
Custom Processing¶
Apply custom logic to each group.
1. Compute Statistics¶
results = []
for city, group in df.groupby('city'):
results.append({
'city': city,
'mean_temp': group['temperature'].mean(),
'max_temp': group['temperature'].max()
})
summary = pd.DataFrame(results)
2. Conditional Logic¶
for city, group in df.groupby('city'):
if group['temperature'].mean() > 30:
print(f"{city} is hot!")
3. Save to Files¶
for city, group in df.groupby('city'):
group.to_csv(f'{city}_data.csv', index=False)
When to Use Iteration¶
Guidelines for choosing iteration vs aggregation.
1. Prefer Built-in Methods¶
# Fast and optimized
df.groupby('city')['temperature'].mean()
2. Use Iteration When¶
# Complex custom logic
# Need to output multiple files
# Debugging group contents
3. Performance¶
# Built-in aggregations are much faster
# Iteration is slower but more flexible