Cross Merge¶
A cross merge produces the Cartesian product of two DataFrames, pairing every row from the left with every row from the right.
Mental Model
A cross merge is "every possible combination." If the left has M rows and the right has N rows, the result has M x N rows. It is SQL's CROSS JOIN -- useful for generating all pairings (e.g., students x exams) but dangerous on large tables because the output size explodes multiplicatively.
Basic Concept¶
Create all possible combinations of rows.
1. Cross Join¶
```python import pandas as pd
students = pd.DataFrame({ 'student_id': [1, 2], 'student_name': ['Alice', 'Bob'] })
subjects = pd.DataFrame({ 'subject_name': ['Math', 'Science'] })
result = pd.merge(students, subjects, how='cross') print(result) ```
student_id student_name subject_name
0 1 Alice Math
1 1 Alice Science
2 2 Bob Math
3 2 Bob Science
2. Result Size¶
Cross merge produces len(left) × len(right) rows.
3. No Join Key¶
Cross merge does not use on parameter.
LeetCode Example: Student Examinations¶
Create all student-subject combinations.
1. Sample Data¶
```python students = pd.DataFrame({ 'student_id': [1, 2], 'student_name': ['Alice', 'Bob'] })
subjects = pd.DataFrame({ 'subject_name': ['Math', 'Science'] }) ```
2. Cross Merge¶
python
student_subject = pd.merge(students, subjects, how='cross')
print(student_subject)
3. Use with Left Join¶
```python
After creating all combinations, left join with actual data¶
examination_count = pd.DataFrame({ 'student_id': [1, 1, 2], 'subject_name': ['Math', 'Science', 'Math'], 'attended_exams': [2, 1, 3] })
result = pd.merge( student_subject, examination_count, on=['student_id', 'subject_name'], how='left' ) ```
Practical Applications¶
When to use cross merge.
1. Generate All Combinations¶
```python
All product-store combinations¶
products = pd.DataFrame({'product': ['A', 'B', 'C']}) stores = pd.DataFrame({'store': ['X', 'Y']})
all_combos = pd.merge(products, stores, how='cross') ```
2. Time Period Analysis¶
```python
All month-category combinations¶
months = pd.DataFrame({'month': ['Jan', 'Feb', 'Mar']}) categories = pd.DataFrame({'category': ['Food', 'Drinks']})
template = pd.merge(months, categories, how='cross') ```
3. Fill Missing Combinations¶
```python
Create complete grid, then merge with actual data¶
template = pd.merge(dates_df, products_df, how='cross') result = pd.merge(template, sales_data, how='left').fillna(0) ```
Performance Warning¶
Cross merge can create very large DataFrames.
1. Size Calculation¶
```python left_size = len(df1) right_size = len(df2) result_size = left_size * right_size
print(f"Result will have {result_size} rows") ```
2. Memory Considerations¶
```python
1000 × 1000 = 1,000,000 rows¶
Be cautious with large DataFrames¶
```
3. Filter After Merge¶
```python
Consider filtering immediately after cross merge¶
cross_result = pd.merge(df1, df2, how='cross') filtered = cross_result[cross_result['condition']] ```
Alternative Methods¶
Other ways to create Cartesian products.
1. itertools.product¶
```python from itertools import product
combos = list(product(df1['col'], df2['col'])) ```
2. MultiIndex.from_product¶
python
idx = pd.MultiIndex.from_product([
df1['col'].unique(),
df2['col'].unique()
])
3. Comparison¶
```python
pd.merge(how='cross') is most readable¶
itertools.product for non-DataFrame use¶
```
Complete Example¶
Build examination matrix with cross merge.
1. Create Template¶
```python students = pd.DataFrame({ 'student_id': [1, 2, 3], 'student_name': ['Alice', 'Bob', 'Carol'] })
subjects = pd.DataFrame({ 'subject_name': ['Math', 'Science', 'History'] })
template = pd.merge(students, subjects, how='cross') ```
2. Merge with Data¶
```python exam_data = pd.DataFrame({ 'student_id': [1, 1, 2, 3], 'subject_name': ['Math', 'Science', 'Math', 'History'], 'score': [85, 90, 78, 92] })
result = pd.merge( template, exam_data, on=['student_id', 'subject_name'], how='left' ) ```
3. Fill Missing¶
python
result['score'] = result['score'].fillna(0)
Exercises¶
Exercise 1.
Create a colors DataFrame (['red', 'blue']) and a sizes DataFrame (['S', 'M', 'L']). Use a cross merge to generate all 6 possible color-size combinations.
Solution to Exercise 1
Generate all color-size combinations.
import pandas as pd
colors = pd.DataFrame({'color': ['red', 'blue']})
sizes = pd.DataFrame({'size': ['S', 'M', 'L']})
result = pd.merge(colors, sizes, how='cross')
print(result)
assert len(result) == 6
Exercise 2.
Create a students DataFrame and an exams DataFrame. Use a cross merge to create a row for every student-exam combination, then verify the result has len(students) * len(exams) rows.
Solution to Exercise 2
Create all student-exam combinations.
import pandas as pd
students = pd.DataFrame({'student': ['Alice', 'Bob', 'Carol']})
exams = pd.DataFrame({'exam': ['Midterm', 'Final']})
result = pd.merge(students, exams, how='cross')
print(result)
assert len(result) == len(students) * len(exams)
Exercise 3. Use a cross merge to create a multiplication table: merge a DataFrame of numbers 1-5 with itself to get all 25 pairs, then compute the product for each pair.
Solution to Exercise 3
Build a multiplication table using cross merge.
import pandas as pd
nums = pd.DataFrame({'a': range(1, 6)})
result = pd.merge(nums, nums.rename(columns={'a': 'b'}), how='cross')
result['product'] = result['a'] * result['b']
print(result)
assert len(result) == 25