merge Method¶

The merge() function combines DataFrames based on column values, similar to SQL joins. It is the most flexible method for combining datasets.

Mental Model

merge() is SQL JOIN for DataFrames. You specify which columns to match on (on, left_on, right_on), how to handle mismatches (how), and what to do with name collisions (suffixes). It defaults to an inner join on all shared column names -- always set on explicitly to avoid surprises.

Basic Syntax¶

Merge two DataFrames on a common column.

1. Simple Merge¶

```python import pandas as pd

df1 = pd.DataFrame({ 'city': ['NY', 'SF', 'LA'], 'temperature': [21, 14, 35] })

df2 = pd.DataFrame({ 'city': ['SF', 'NY', 'ICN'], 'humidity': [65, 68, 75] })

df = pd.merge(df1, df2, on='city') print(df) ```

city temperature humidity 0 NY 21 68 1 SF 14 65

2. Default Inner Join¶

By default, merge performs an inner join, keeping only matching rows.

3. Method Syntax¶

```python

Function syntax¶

pd.merge(df1, df2, on='key')

Method syntax (equivalent)¶

df1.merge(df2, on='key') ```

LeetCode Example: Person and Address¶

Combine person information with address.

1. Sample Data¶

```python person = pd.DataFrame({ 'personId': [1, 2, 3], 'firstName': ['John', 'Jane', 'Jake'], 'lastName': ['Doe', 'Smith', 'Brown'] })

address = pd.DataFrame({ 'personId': [1, 3], 'city': ['New York', 'Los Angeles'], 'state': ['NY', 'CA'] }) ```

2. Left Merge¶

python merged_df = person.merge(address, on='personId', how='left') print(merged_df)

3. Result¶

personId firstName lastName city state 0 1 John Doe New York NY 1 2 Jane Smith NaN NaN 2 3 Jake Brown Los Angeles CA

LeetCode Example: Project Employees¶

Join projects with employee experience.

1. Sample Data¶

```python project = pd.DataFrame({ 'project_id': [101, 102, 103], 'project_name': ['Project A', 'Project B', 'Project C'] })

employee = pd.DataFrame({ 'employee_id': [1, 2, 3, 4, 5], 'project_id': [101, 101, 102, 102, 103], 'experience_years': [5, 7, 3, 4, 6] }) ```

2. Merge with Default¶

python merged_df = pd.merge(project, employee, how='left') print(merged_df)

3. Auto-detected Key¶

When on=None, merge uses common column names automatically.

Common Column Detection¶

Merge automatically finds common columns when on is not specified.

1. Auto Detection¶

```python

Both DataFrames have 'project_id' column¶

pd.merge(project, employee) # on='project_id' inferred ```

2. Multiple Common Columns¶

```python

If both have ['key1', 'key2'], merges on both¶

pd.merge(df1, df2) ```

3. No Common Columns¶

```python

Raises ValueError if no common columns exist¶

Must specify left_on and right_on¶

```

Merge Result Structure¶

Understanding the output DataFrame.

1. Column Order¶

```python

Left DataFrame columns first, then right¶

Common column appears once (unless using suffixes)¶

```

2. Row Order¶

```python

Order depends on join type and matching¶

```

3. Index Reset¶

```python

Merge resets index to RangeIndex¶

Original indices are discarded¶

```

Runnable Example: `merge_join_tutorial.py`¶

```python """ Pandas Tutorial: Merging, Joining, and Concatenating DataFrames.

Covers different ways to combine DataFrames. """

import pandas as pd import numpy as np

=============================================================================¶

Main¶

=============================================================================¶

if name == "main":

print("="*70)
print("MERGING, JOINING, AND CONCATENATING")
print("="*70)

# Create sample DataFrames
df1 = pd.DataFrame({
    'ID': [1, 2, 3, 4],
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 35, 28]
})

df2 = pd.DataFrame({
    'ID': [1, 2, 3, 5],
    'Department': ['HR', 'IT', 'Finance', 'Marketing'],
    'Salary': [50000, 60000, 75000, 55000]
})

print("\nDataFrame 1 (Employees):")
print(df1)
print("\nDataFrame 2 (Departments):")
print(df2)

# Inner join (default)
print("\n1. Inner Join (intersection):")
inner_merged = pd.merge(df1, df2, on='ID', how='inner')
print(inner_merged)

# Left join
print("\n2. Left Join (keep all from left):")
left_merged = pd.merge(df1, df2, on='ID', how='left')
print(left_merged)

# Right join
print("\n3. Right Join (keep all from right):")
right_merged = pd.merge(df1, df2, on='ID', how='right')
print(right_merged)

# Outer join
print("\n4. Outer Join (keep all from both):")
outer_merged = pd.merge(df1, df2, on='ID', how='outer')
print(outer_merged)

# Merge on different column names
df3 = pd.DataFrame({
    'EmpID': [1, 2, 3],
    'Project': ['A', 'B', 'C']
})

print("\n5. Merge on different column names:")
merged_diff = pd.merge(df1, df3, left_on='ID', right_on='EmpID')
print(merged_diff)

# Concatenate DataFrames vertically
df_top = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df_bottom = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})

print("\n6. Concatenate vertically (stack rows):")
print("Top:")
print(df_top)
print("Bottom:")
print(df_bottom)
print("Result:")
vertical_concat = pd.concat([df_top, df_bottom], ignore_index=True)
print(vertical_concat)

# Concatenate horizontally
df_left = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df_right = pd.DataFrame({'C': [7, 8, 9], 'D': [10, 11, 12]})

print("\n7. Concatenate horizontally (add columns):")
horizontal_concat = pd.concat([df_left, df_right], axis=1)
print(horizontal_concat)

# Join (using index)
df_indexed1 = df1.set_index('ID')
df_indexed2 = df2.set_index('ID')

print("\n8. Join using index:")
joined = df_indexed1.join(df_indexed2, how='inner')
print(joined)

print("\nKEY TAKEAWAYS:")
print("- merge(): SQL-style joins on columns")
print("- concat(): Stack DataFrames vertically or horizontally")
print("- join(): Merge on index")
print("- Join types: inner, left, right, outer")
print("- Use on= for same column names, left_on=/right_on= for different names")

```

Exercises¶

Exercise 1. Create two DataFrames sharing a 'key' column with partial overlap. Perform a default merge (inner join) and count the resulting rows. Then perform an outer merge and compare.

Solution to Exercise 1

Compare inner and outer merge row counts.

import pandas as pd

df1 = pd.DataFrame({'key': ['a', 'b', 'c'], 'val1': [1, 2, 3]})
df2 = pd.DataFrame({'key': ['b', 'c', 'd'], 'val2': [4, 5, 6]})
inner = pd.merge(df1, df2, on='key')
outer = pd.merge(df1, df2, on='key', how='outer')
print(f"Inner: {len(inner)} rows")
print(f"Outer: {len(outer)} rows")

Exercise 2. Create an orders DataFrame with a 'product_id' column and a products DataFrame with a 'product_id' and 'product_name' column. Use pd.merge() to look up the product name for each order.

Solution to Exercise 2

Look up product names for orders.

import pandas as pd

orders = pd.DataFrame({'order_id': [1, 2, 3], 'product_id': [101, 102, 101]})
products = pd.DataFrame({'product_id': [101, 102, 103], 'product_name': ['Widget', 'Gadget', 'Doohickey']})
result = pd.merge(orders, products, on='product_id')
print(result)

Exercise 3. Create two DataFrames and merge them using both the function syntax pd.merge(df1, df2) and the method syntax df1.merge(df2). Verify both produce identical results.

Solution to Exercise 3

Verify function syntax and method syntax produce the same result.

import pandas as pd

df1 = pd.DataFrame({'key': ['a', 'b'], 'x': [1, 2]})
df2 = pd.DataFrame({'key': ['a', 'b'], 'y': [3, 4]})
result_func = pd.merge(df1, df2, on='key')
result_method = df1.merge(df2, on='key')
assert result_func.equals(result_method)
print("Both approaches produce identical results.")

merge Method¶

Basic Syntax¶

1. Simple Merge¶

2. Default Inner Join¶

3. Method Syntax¶

Function syntax¶

Method syntax (equivalent)¶

LeetCode Example: Person and Address¶

1. Sample Data¶

2. Left Merge¶

3. Result¶

LeetCode Example: Project Employees¶

1. Sample Data¶

2. Merge with Default¶

3. Auto-detected Key¶

Common Column Detection¶

1. Auto Detection¶

Both DataFrames have 'project_id' column¶

2. Multiple Common Columns¶

If both have ['key1', 'key2'], merges on both¶

3. No Common Columns¶

Raises ValueError if no common columns exist¶

Must specify left_on and right_on¶

Merge Result Structure¶

1. Column Order¶

Left DataFrame columns first, then right¶

Common column appears once (unless using suffixes)¶

2. Row Order¶

Order depends on join type and matching¶

3. Index Reset¶

Merge resets index to RangeIndex¶

Original indices are discarded¶

Runnable Example: merge_join_tutorial.py¶

=============================================================================¶

Main¶

=============================================================================¶

Exercises¶

Runnable Example: `merge_join_tutorial.py`¶