Skip to content

Keyword - how

The how parameter specifies the type of join to perform. It determines which rows are included in the result based on key matching.

Inner Join

Keep only rows with matching keys in both DataFrames.

1. Default Behavior

import pandas as pd

df1 = pd.DataFrame({
    'city': ['NY', 'SF', 'LA'],
    'temperature': [21, 14, 35]
})

df2 = pd.DataFrame({
    'city': ['SF', 'NY', 'ICN'],
    'humidity': [65, 68, 75]
})

df = pd.merge(df1, df2, on='city', how='inner')
print(df)
  city  temperature  humidity
0   NY           21        68
1   SF           14        65

2. Intersection of Keys

Only SF and NY exist in both DataFrames.

3. Most Restrictive

Inner join produces the smallest result set.

Left Join

Keep all rows from the left DataFrame.

1. Left Join Example

df = pd.merge(df1, df2, on='city', how='left')
print(df)
  city  temperature  humidity
0   NY           21      68.0
1   SF           14      65.0
2   LA           35       NaN

2. NaN for Missing

LA has no match in df2, so humidity is NaN.

3. Preserve Left Rows

All rows from df1 are preserved.

Right Join

Keep all rows from the right DataFrame.

1. Right Join Example

df = pd.merge(df1, df2, on='city', how='right')
print(df)
  city  temperature  humidity
0   SF         14.0        65
1   NY         21.0        68
2  ICN          NaN        75

2. NaN for Missing

ICN has no match in df1, so temperature is NaN.

3. Preserve Right Rows

All rows from df2 are preserved.

Outer Join

Keep all rows from both DataFrames.

1. Outer Join Example

df = pd.merge(df1, df2, on='city', how='outer')
print(df)
  city  temperature  humidity
0   NY         21.0      68.0
1   SF         14.0      65.0
2   LA         35.0       NaN
3  ICN          NaN      75.0

2. Union of Keys

All unique cities from both DataFrames.

3. Most Inclusive

Outer join produces the largest result set.

Visual Comparison

Side-by-side comparison of join types.

1. Sample Data

left = pd.DataFrame({'key': ['A', 'B', 'C'], 'left_val': [1, 2, 3]})
right = pd.DataFrame({'key': ['B', 'C', 'D'], 'right_val': [4, 5, 6]})

2. Results Summary

how Keys in Result Missing Values
inner B, C None
left A, B, C right_val for A
right B, C, D left_val for D
outer A, B, C, D Both sides

3. Row Counts

print(len(pd.merge(left, right, how='inner')))  # 2
print(len(pd.merge(left, right, how='left')))   # 3
print(len(pd.merge(left, right, how='right')))  # 3
print(len(pd.merge(left, right, how='outer')))  # 4

Practical Guidelines

Choosing the right join type.

1. Use Inner When

  • Only need matched records
  • Data quality requires both sides present
  • Computing ratios or comparisons

2. Use Left When

  • Preserving primary table structure
  • Optional enrichment data
  • Most common in practice

3. Use Outer When

  • Need complete picture of both sources
  • Finding mismatches
  • Data reconciliation