Skip to content

Keyword - how

The how parameter in join() specifies the type of join, controlling which rows are included based on index matching.

Left Join (Default)

Keep all rows from the calling DataFrame.

1. Default Behavior

import pandas as pd
import yfinance as yf

def download(ticker):
    return yf.Ticker(ticker).history(period="max")

tickers = ['META', 'AAPL', 'AMZN', 'NFLX', 'GOOG']

for i, ticker in enumerate(tickers):
    if i == 0:
        df = download(ticker).rename(columns={'Close': ticker})[[ticker]]
    else:
        dg = download(ticker).rename(columns={'Close': ticker})[[ticker]]
        df = df.join(dg, how="left")  # Default

print(df.head(3))

2. Preserves Left Index

All dates from the first stock are kept.

3. NaN for Missing

Stocks without data for certain dates have NaN.

Right Join

Keep all rows from the passed DataFrame.

1. Right Join Example

for i, ticker in enumerate(tickers):
    if i == 0:
        df = download(ticker).rename(columns={'Close': ticker})[[ticker]]
    else:
        dg = download(ticker).rename(columns={'Close': ticker})[[ticker]]
        df = df.join(dg, how="right")

print(df.head(3))

2. Preserves Right Index

All dates from the joined stock are kept.

3. Use Case

When the right DataFrame has the authoritative index.

Inner Join

Keep only rows with matching indices.

1. Inner Join Example

for i, ticker in enumerate(tickers):
    if i == 0:
        df = download(ticker).rename(columns={'Close': ticker})[[ticker]]
    else:
        dg = download(ticker).rename(columns={'Close': ticker})[[ticker]]
        df = df.join(dg, how="inner")

print(df.head(3))
print(df.tail(3))

2. Intersection of Indices

Only dates present in all stocks.

3. No Missing Values

Inner join produces complete data without NaN.

Outer Join

Keep all rows from both DataFrames.

1. Outer Join Example

for i, ticker in enumerate(tickers):
    if i == 0:
        df = download(ticker).rename(columns={'Close': ticker})[[ticker]]
    else:
        dg = download(ticker).rename(columns={'Close': ticker})[[ticker]]
        df = df.join(dg, how="outer")

print(df.head(3))
print(df.tail(3))

2. Union of Indices

All dates from any stock are included.

3. Most Missing Values

Outer join may have many NaN values.

Comparison

Summary of join types.

1. Row Counts

# Given df1 (100 dates) and df2 (80 dates) with 60 overlap:
# how='left':  100 rows (all from df1)
# how='right':  80 rows (all from df2)
# how='inner':  60 rows (intersection)
# how='outer': 120 rows (union)

2. Best Practices

# Inner: When you need complete data
# Left: When preserving primary DataFrame structure
# Outer: When you need all dates for analysis

3. Financial Context

# Inner join for synchronized analysis
# Outer join for data completeness check
# Left join for preserving benchmark dates