Skip to content

shift and diff Methods

The shift() and diff() methods are essential for time series analysis, computing lagged values and differences.

Mental Model

shift(n) slides the data up or down by n positions without changing the index -- it creates lagged or lead columns. diff(n) computes value - value_shifted_by_n, giving the absolute change. Together they are the building blocks for returns, growth rates, and any "compare this period to the previous" analysis.

shift Method

Move data by specified number of periods.

1. Basic shift

```python import pandas as pd import yfinance as yf

df = yf.Ticker('WMT').history(start='2020-01-01', end='2020-01-10') df = df[['Close']]

df['shift_1'] = df['Close'].shift(1) df['shift_2'] = df['Close'].shift(2) print(df) ```

Close shift_1 shift_2 Date 2020-01-02 116.459999 NaN NaN 2020-01-03 116.279999 116.459999 NaN 2020-01-06 116.230003 116.279999 116.459999 2020-01-07 116.849998 116.230003 116.279999 2020-01-08 116.220001 116.849998 116.230003

2. Positive Shift

```python

shift(1): current row gets previous value

df['previous_close'] = df['Close'].shift(1) ```

3. Negative Shift

```python

shift(-1): current row gets next value

df['next_close'] = df['Close'].shift(-1) ```

shift for Comparisons

Compare current values to previous.

1. Day-over-Day Change

python df['change'] = df['Close'] - df['Close'].shift(1)

2. Conditional Logic

```python

Did price increase?

df['increased'] = df['Close'] > df['Close'].shift(1) ```

3. LeetCode: Rising Temperature

```python

Temperature rose AND consecutive day

weather[ (weather['temperature'] > weather['temperature'].shift(1)) & (weather['id'] == weather['id'].shift(1) + 1) ] ```

diff Method

Calculate difference between consecutive elements.

1. Basic diff

```python df = yf.Ticker('WMT').history(start='2020-01-01', end='2020-01-10') df = df[['Close']]

df['diff_1'] = df['Close'].diff(1) df['diff_2'] = df['Close'].diff(2) print(df) ```

Close diff_1 diff_2 Date 2020-01-02 116.459999 NaN NaN 2020-01-03 116.279999 -0.180000 NaN 2020-01-06 116.230003 -0.049996 -0.229996 2020-01-07 116.849998 0.619995 0.569999 2020-01-08 116.220001 -0.629997 0.009998

2. diff Equivalent

```python

diff(1) is equivalent to:

df['Close'] - df['Close'].shift(1) ```

3. Higher Order Differences

```python

Second difference (change in change)

df['second_diff'] = df['Close'].diff().diff() ```

LeetCode Example: Rising Temperature

Find days with temperature increase.

1. Sample Data

python weather = pd.DataFrame({ 'id': [1, 2, 3, 4], 'recordDate': pd.to_datetime(['2024-01-01', '2024-01-02', '2024-01-03', '2024-01-04']), 'temperature': [10, 25, 20, 30] }) weather = weather.sort_values('recordDate')

2. Using diff

```python

Temperature increased AND consecutive dates

result = weather[ (weather['temperature'].diff() > 0) & (weather['recordDate'].diff().dt.days == 1) ] ```

3. Using shift

python result = weather[ (weather['temperature'] > weather['temperature'].shift(1)) & (weather['recordDate'] == weather['recordDate'].shift(1) + pd.Timedelta(days=1)) ]

LeetCode Example: Consecutive Numbers

Find numbers appearing 3+ times consecutively.

1. Sample Data

python logs = pd.DataFrame({ 'id': [1, 2, 3, 4, 5, 6], 'num': [1, 1, 1, 2, 1, 2] }) logs = logs.sort_values('id')

2. Check Consecutive

python consecutive = logs[ (logs['num'] == logs['num'].shift(1)) & (logs['num'] == logs['num'].shift(2)) & (logs['id'] == logs['id'].shift(1) + 1) & (logs['id'] == logs['id'].shift(2) + 2) ]

3. Get Unique Numbers

python result = consecutive.drop_duplicates('num')[['num']]

Date Differences

shift and diff with dates.

1. Date shift

python df['prev_date'] = df['date'].shift(1)

2. Date diff

python df['days_between'] = df['date'].diff().dt.days

3. Check Consecutive Dates

python df['consecutive'] = df['date'].diff().dt.days == 1

LeetCode: Human Traffic

Find 3+ consecutive high traffic days.

1. Sample Data

python stadium = pd.DataFrame({ 'id': [1, 2, 3, 5, 6, 7, 8], 'people': [100, 200, 150, 300, 250, 400, 350] })

2. Check Consecutive IDs

```python

Current and previous two rows are consecutive

consecutive = ( (stadium['id'].diff() == 1) & (stadium['id'].diff().shift(1) == 1) ) ```

3. Alternative

```python

Marks the third row of each consecutive sequence

consecutive = ( (stadium['id'] == stadium['id'].shift(1) + 1) & (stadium['id'] == stadium['id'].shift(2) + 2) ) ```

Financial Applications

Daily returns and changes.

1. Price Change

python df['price_change'] = df['Close'].diff()

2. Daily Return

```python df['daily_return'] = df['Close'].diff() / df['Close'].shift(1)

or

df['daily_return'] = df['Close'].pct_change() ```

3. Cumulative Return

python df['cum_return'] = (1 + df['daily_return']).cumprod() - 1


Exercises

Exercise 1. Write code that uses s.shift(1) and s.diff(1) on a price Series. Explain the relationship between them.

Solution to Exercise 1

```python import pandas as pd import numpy as np

Solution for the specific exercise

np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(10), 'B': np.random.randn(10)}) print(df.head()) ```


Exercise 2. Explain the shift() method. What does a positive vs negative shift value do?

Solution to Exercise 2

See the main content for the detailed explanation. The key concept involves understanding the Pandas API and its behavior for this specific operation.


Exercise 3. Write code that computes log returns using np.log(s / s.shift(1)). Compare with pct_change().

Solution to Exercise 3

```python import pandas as pd import numpy as np

np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(20), 'B': np.random.randn(20)}) result = df.describe() print(result) ```


Exercise 4. Create a DataFrame and use shift() to create a lagged feature column for time series forecasting.

Solution to Exercise 4

```python import pandas as pd import numpy as np

np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(50), 'group': np.random.choice(['X', 'Y'], 50)}) result = df.groupby('group').mean() print(result) ```