Iteration with GroupBy¶
GroupBy objects support iteration, allowing you to process each group individually.
Mental Model
Iterating over a GroupBy yields (key, sub_dataframe) pairs, one per group. It is the escape hatch for custom logic that does not fit into agg, transform, or filter. Use it sparingly -- vectorized group operations are faster -- but it is invaluable for debugging or applying complex per-group logic.
Basic Iteration¶
Iterate through groups as (name, group) pairs.
1. For Loop¶
```python import pandas as pd
data = { 'day': ['1/1/20', '1/2/20', '1/1/20', '1/2/20', '1/1/20', '1/2/20'], 'city': ['NY', 'NY', 'SF', 'SF', 'LA', 'LA'], 'temperature': [21, 14, 25, 32, 36, 42], 'humidity': [31, 15, 36, 22, 16, 29], } df = pd.DataFrame(data)
for city, df_city in df.groupby("city"): print(city) print(df_city) print() ```
2. Tuple Unpacking¶
```python
city: group key (string)¶
df_city: DataFrame for that group¶
```
3. Output¶
``` LA day city temperature humidity 4 1/1/20 LA 36 16 5 1/2/20 LA 42 29
NY day city temperature humidity 0 1/1/20 NY 21 31 1 1/2/20 NY 14 15
SF day city temperature humidity 2 1/1/20 SF 25 36 3 1/2/20 SF 32 22 ```
Multiple Group Keys¶
Iterate with multiple grouping columns.
1. Tuple Keys¶
python
for (city, day), group in df.groupby(['city', 'day']):
print(f"City: {city}, Day: {day}")
print(group)
print()
2. Named Tuple¶
```python
Keys are returned as tuple (city, day)¶
```
3. Access Individual Keys¶
python
for keys, group in df.groupby(['city', 'day']):
city, day = keys
print(f"Processing {city} on {day}")
Custom Processing¶
Apply custom logic to each group.
1. Compute Statistics¶
```python results = [] for city, group in df.groupby('city'): results.append({ 'city': city, 'mean_temp': group['temperature'].mean(), 'max_temp': group['temperature'].max() })
summary = pd.DataFrame(results) ```
2. Conditional Logic¶
python
for city, group in df.groupby('city'):
if group['temperature'].mean() > 30:
print(f"{city} is hot!")
3. Save to Files¶
python
for city, group in df.groupby('city'):
group.to_csv(f'{city}_data.csv', index=False)
When to Use Iteration¶
Guidelines for choosing iteration vs aggregation.
1. Prefer Built-in Methods¶
```python
Fast and optimized¶
df.groupby('city')['temperature'].mean() ```
2. Use Iteration When¶
```python
Complex custom logic¶
Need to output multiple files¶
Debugging group contents¶
```
3. Performance¶
```python
Built-in aggregations are much faster¶
Iteration is slower but more flexible¶
```
Exercises¶
Exercise 1. Iterate over a GroupBy object and print each group name and the number of rows in that group. Use tuple unpacking in the for loop.
Solution to Exercise 1
Unpack group name and DataFrame in the for loop.
import pandas as pd
df = pd.DataFrame({
'city': ['NY', 'NY', 'LA', 'SF', 'SF', 'SF'],
'value': [10, 20, 30, 40, 50, 60]
})
for city, group in df.groupby('city'):
print(f"{city}: {len(group)} rows")
Exercise 2.
Group a sales DataFrame by 'region' and iterate to build a list of dictionaries, each containing the region name, total sales, and average order value. Convert the list to a DataFrame.
Solution to Exercise 2
Build summary statistics via iteration.
import pandas as pd
df = pd.DataFrame({
'region': ['East', 'East', 'West', 'West', 'North'],
'sales': [100, 200, 150, 250, 300]
})
results = []
for region, group in df.groupby('region'):
results.append({
'region': region,
'total_sales': group['sales'].sum(),
'avg_order': group['sales'].mean()
})
summary = pd.DataFrame(results)
print(summary)
Exercise 3.
Group by ['year', 'quarter'] and iterate using tuple unpacking for the composite key. Print each (year, quarter) combination along with the group's row count.
Solution to Exercise 3
Unpack composite keys from multi-column groupby.
import pandas as pd
df = pd.DataFrame({
'year': [2023, 2023, 2023, 2024, 2024],
'quarter': ['Q1', 'Q2', 'Q1', 'Q1', 'Q2'],
'revenue': [100, 200, 150, 180, 220]
})
for (year, quarter), group in df.groupby(['year', 'quarter']):
print(f"{year}-{quarter}: {len(group)} rows")