Skip to content

Cross Merge

A cross merge produces the Cartesian product of two DataFrames, pairing every row from the left with every row from the right.

Basic Concept

Create all possible combinations of rows.

1. Cross Join

import pandas as pd

students = pd.DataFrame({
    'student_id': [1, 2],
    'student_name': ['Alice', 'Bob']
})

subjects = pd.DataFrame({
    'subject_name': ['Math', 'Science']
})

result = pd.merge(students, subjects, how='cross')
print(result)
   student_id student_name subject_name
0           1        Alice         Math
1           1        Alice      Science
2           2          Bob         Math
3           2          Bob      Science

2. Result Size

Cross merge produces len(left) × len(right) rows.

3. No Join Key

Cross merge does not use on parameter.

LeetCode Example: Student Examinations

Create all student-subject combinations.

1. Sample Data

students = pd.DataFrame({
    'student_id': [1, 2],
    'student_name': ['Alice', 'Bob']
})

subjects = pd.DataFrame({
    'subject_name': ['Math', 'Science']
})

2. Cross Merge

student_subject = pd.merge(students, subjects, how='cross')
print(student_subject)

3. Use with Left Join

# After creating all combinations, left join with actual data
examination_count = pd.DataFrame({
    'student_id': [1, 1, 2],
    'subject_name': ['Math', 'Science', 'Math'],
    'attended_exams': [2, 1, 3]
})

result = pd.merge(
    student_subject,
    examination_count,
    on=['student_id', 'subject_name'],
    how='left'
)

Practical Applications

When to use cross merge.

1. Generate All Combinations

# All product-store combinations
products = pd.DataFrame({'product': ['A', 'B', 'C']})
stores = pd.DataFrame({'store': ['X', 'Y']})

all_combos = pd.merge(products, stores, how='cross')

2. Time Period Analysis

# All month-category combinations
months = pd.DataFrame({'month': ['Jan', 'Feb', 'Mar']})
categories = pd.DataFrame({'category': ['Food', 'Drinks']})

template = pd.merge(months, categories, how='cross')

3. Fill Missing Combinations

# Create complete grid, then merge with actual data
template = pd.merge(dates_df, products_df, how='cross')
result = pd.merge(template, sales_data, how='left').fillna(0)

Performance Warning

Cross merge can create very large DataFrames.

1. Size Calculation

left_size = len(df1)
right_size = len(df2)
result_size = left_size * right_size

print(f"Result will have {result_size} rows")

2. Memory Considerations

# 1000 × 1000 = 1,000,000 rows
# Be cautious with large DataFrames

3. Filter After Merge

# Consider filtering immediately after cross merge
cross_result = pd.merge(df1, df2, how='cross')
filtered = cross_result[cross_result['condition']]

Alternative Methods

Other ways to create Cartesian products.

1. itertools.product

from itertools import product

combos = list(product(df1['col'], df2['col']))

2. MultiIndex.from_product

idx = pd.MultiIndex.from_product([
    df1['col'].unique(),
    df2['col'].unique()
])

3. Comparison

# pd.merge(how='cross') is most readable
# itertools.product for non-DataFrame use

Complete Example

Build examination matrix with cross merge.

1. Create Template

students = pd.DataFrame({
    'student_id': [1, 2, 3],
    'student_name': ['Alice', 'Bob', 'Carol']
})

subjects = pd.DataFrame({
    'subject_name': ['Math', 'Science', 'History']
})

template = pd.merge(students, subjects, how='cross')

2. Merge with Data

exam_data = pd.DataFrame({
    'student_id': [1, 1, 2, 3],
    'subject_name': ['Math', 'Science', 'Math', 'History'],
    'score': [85, 90, 78, 92]
})

result = pd.merge(
    template,
    exam_data,
    on=['student_id', 'subject_name'],
    how='left'
)

3. Fill Missing

result['score'] = result['score'].fillna(0)