shift and diff Methods¶
The shift() and diff() methods are essential for time series analysis, computing lagged values and differences.
shift Method¶
Move data by specified number of periods.
1. Basic shift¶
import pandas as pd
import yfinance as yf
df = yf.Ticker('WMT').history(start='2020-01-01', end='2020-01-10')
df = df[['Close']]
df['shift_1'] = df['Close'].shift(1)
df['shift_2'] = df['Close'].shift(2)
print(df)
Close shift_1 shift_2
Date
2020-01-02 116.459999 NaN NaN
2020-01-03 116.279999 116.459999 NaN
2020-01-06 116.230003 116.279999 116.459999
2020-01-07 116.849998 116.230003 116.279999
2020-01-08 116.220001 116.849998 116.230003
2. Positive Shift¶
# shift(1): current row gets previous value
df['previous_close'] = df['Close'].shift(1)
3. Negative Shift¶
# shift(-1): current row gets next value
df['next_close'] = df['Close'].shift(-1)
shift for Comparisons¶
Compare current values to previous.
1. Day-over-Day Change¶
df['change'] = df['Close'] - df['Close'].shift(1)
2. Conditional Logic¶
# Did price increase?
df['increased'] = df['Close'] > df['Close'].shift(1)
3. LeetCode: Rising Temperature¶
# Temperature rose AND consecutive day
weather[
(weather['temperature'] > weather['temperature'].shift(1)) &
(weather['id'] == weather['id'].shift(1) + 1)
]
diff Method¶
Calculate difference between consecutive elements.
1. Basic diff¶
df = yf.Ticker('WMT').history(start='2020-01-01', end='2020-01-10')
df = df[['Close']]
df['diff_1'] = df['Close'].diff(1)
df['diff_2'] = df['Close'].diff(2)
print(df)
Close diff_1 diff_2
Date
2020-01-02 116.459999 NaN NaN
2020-01-03 116.279999 -0.180000 NaN
2020-01-06 116.230003 -0.049996 -0.229996
2020-01-07 116.849998 0.619995 0.569999
2020-01-08 116.220001 -0.629997 0.009998
2. diff Equivalent¶
# diff(1) is equivalent to:
df['Close'] - df['Close'].shift(1)
3. Higher Order Differences¶
# Second difference (change in change)
df['second_diff'] = df['Close'].diff().diff()
LeetCode Example: Rising Temperature¶
Find days with temperature increase.
1. Sample Data¶
weather = pd.DataFrame({
'id': [1, 2, 3, 4],
'recordDate': pd.to_datetime(['2024-01-01', '2024-01-02',
'2024-01-03', '2024-01-04']),
'temperature': [10, 25, 20, 30]
})
weather = weather.sort_values('recordDate')
2. Using diff¶
# Temperature increased AND consecutive dates
result = weather[
(weather['temperature'].diff() > 0) &
(weather['recordDate'].diff().dt.days == 1)
]
3. Using shift¶
result = weather[
(weather['temperature'] > weather['temperature'].shift(1)) &
(weather['recordDate'] == weather['recordDate'].shift(1) + pd.Timedelta(days=1))
]
LeetCode Example: Consecutive Numbers¶
Find numbers appearing 3+ times consecutively.
1. Sample Data¶
logs = pd.DataFrame({
'id': [1, 2, 3, 4, 5, 6],
'num': [1, 1, 1, 2, 1, 2]
})
logs = logs.sort_values('id')
2. Check Consecutive¶
consecutive = logs[
(logs['num'] == logs['num'].shift(1)) &
(logs['num'] == logs['num'].shift(2)) &
(logs['id'] == logs['id'].shift(1) + 1) &
(logs['id'] == logs['id'].shift(2) + 2)
]
3. Get Unique Numbers¶
result = consecutive.drop_duplicates('num')[['num']]
Date Differences¶
shift and diff with dates.
1. Date shift¶
df['prev_date'] = df['date'].shift(1)
2. Date diff¶
df['days_between'] = df['date'].diff().dt.days
3. Check Consecutive Dates¶
df['consecutive'] = df['date'].diff().dt.days == 1
LeetCode: Human Traffic¶
Find 3+ consecutive high traffic days.
1. Sample Data¶
stadium = pd.DataFrame({
'id': [1, 2, 3, 5, 6, 7, 8],
'people': [100, 200, 150, 300, 250, 400, 350]
})
2. Check Consecutive IDs¶
# Current and previous two rows are consecutive
consecutive = (
(stadium['id'].diff() == 1) &
(stadium['id'].diff().shift(1) == 1)
)
3. Alternative¶
# Marks the third row of each consecutive sequence
consecutive = (
(stadium['id'] == stadium['id'].shift(1) + 1) &
(stadium['id'] == stadium['id'].shift(2) + 2)
)
Financial Applications¶
Daily returns and changes.
1. Price Change¶
df['price_change'] = df['Close'].diff()
2. Daily Return¶
df['daily_return'] = df['Close'].diff() / df['Close'].shift(1)
# or
df['daily_return'] = df['Close'].pct_change()
3. Cumulative Return¶
df['cum_return'] = (1 + df['daily_return']).cumprod() - 1