Cross Merge¶
A cross merge produces the Cartesian product of two DataFrames, pairing every row from the left with every row from the right.
Basic Concept¶
Create all possible combinations of rows.
1. Cross Join¶
import pandas as pd
students = pd.DataFrame({
'student_id': [1, 2],
'student_name': ['Alice', 'Bob']
})
subjects = pd.DataFrame({
'subject_name': ['Math', 'Science']
})
result = pd.merge(students, subjects, how='cross')
print(result)
student_id student_name subject_name
0 1 Alice Math
1 1 Alice Science
2 2 Bob Math
3 2 Bob Science
2. Result Size¶
Cross merge produces len(left) × len(right) rows.
3. No Join Key¶
Cross merge does not use on parameter.
LeetCode Example: Student Examinations¶
Create all student-subject combinations.
1. Sample Data¶
students = pd.DataFrame({
'student_id': [1, 2],
'student_name': ['Alice', 'Bob']
})
subjects = pd.DataFrame({
'subject_name': ['Math', 'Science']
})
2. Cross Merge¶
student_subject = pd.merge(students, subjects, how='cross')
print(student_subject)
3. Use with Left Join¶
# After creating all combinations, left join with actual data
examination_count = pd.DataFrame({
'student_id': [1, 1, 2],
'subject_name': ['Math', 'Science', 'Math'],
'attended_exams': [2, 1, 3]
})
result = pd.merge(
student_subject,
examination_count,
on=['student_id', 'subject_name'],
how='left'
)
Practical Applications¶
When to use cross merge.
1. Generate All Combinations¶
# All product-store combinations
products = pd.DataFrame({'product': ['A', 'B', 'C']})
stores = pd.DataFrame({'store': ['X', 'Y']})
all_combos = pd.merge(products, stores, how='cross')
2. Time Period Analysis¶
# All month-category combinations
months = pd.DataFrame({'month': ['Jan', 'Feb', 'Mar']})
categories = pd.DataFrame({'category': ['Food', 'Drinks']})
template = pd.merge(months, categories, how='cross')
3. Fill Missing Combinations¶
# Create complete grid, then merge with actual data
template = pd.merge(dates_df, products_df, how='cross')
result = pd.merge(template, sales_data, how='left').fillna(0)
Performance Warning¶
Cross merge can create very large DataFrames.
1. Size Calculation¶
left_size = len(df1)
right_size = len(df2)
result_size = left_size * right_size
print(f"Result will have {result_size} rows")
2. Memory Considerations¶
# 1000 × 1000 = 1,000,000 rows
# Be cautious with large DataFrames
3. Filter After Merge¶
# Consider filtering immediately after cross merge
cross_result = pd.merge(df1, df2, how='cross')
filtered = cross_result[cross_result['condition']]
Alternative Methods¶
Other ways to create Cartesian products.
1. itertools.product¶
from itertools import product
combos = list(product(df1['col'], df2['col']))
2. MultiIndex.from_product¶
idx = pd.MultiIndex.from_product([
df1['col'].unique(),
df2['col'].unique()
])
3. Comparison¶
# pd.merge(how='cross') is most readable
# itertools.product for non-DataFrame use
Complete Example¶
Build examination matrix with cross merge.
1. Create Template¶
students = pd.DataFrame({
'student_id': [1, 2, 3],
'student_name': ['Alice', 'Bob', 'Carol']
})
subjects = pd.DataFrame({
'subject_name': ['Math', 'Science', 'History']
})
template = pd.merge(students, subjects, how='cross')
2. Merge with Data¶
exam_data = pd.DataFrame({
'student_id': [1, 1, 2, 3],
'subject_name': ['Math', 'Science', 'Math', 'History'],
'score': [85, 90, 78, 92]
})
result = pd.merge(
template,
exam_data,
on=['student_id', 'subject_name'],
how='left'
)
3. Fill Missing¶
result['score'] = result['score'].fillna(0)