Skip to content

DatetimeIndex

pandas provides rich support for time series data through the DatetimeIndex, which enables label-based time alignment and slicing.

Mental Model

A DatetimeIndex is a sorted array of timestamps that serves as a time-aware row label. It unlocks partial-string indexing (df['2024'] for all of 2024), resampling to different frequencies, and automatic time-based alignment. Any DataFrame with a DatetimeIndex becomes a first-class time series.

Creating DatetimeIndex

Generate datetime indices for time series.

1. date_range Function

```python import pandas as pd

dates = pd.date_range("2020-01-01", periods=5, freq="D") print(dates) ```

DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04', '2020-01-05'], dtype='datetime64[ns]', freq='D')

2. With End Date

python dates = pd.date_range(start="2024-01-01", end="2024-01-10")

3. Nanosecond Precision

DatetimeIndex stores timestamps with nanosecond precision.

Converting to Datetime

Parse strings and other formats to datetime.

1. to_datetime Function

python pd.to_datetime(["2021-01-01", "2021-01-05"])

2. Common with CSV Loading

python df = pd.read_csv('data.csv', parse_dates=['date_column'])

3. Format Specification

python pd.to_datetime("01-15-2024", format="%m-%d-%Y")

Date Range Frequencies

Common frequency strings.

1. Daily and Sub-daily

python pd.date_range("2024-01-01", periods=5, freq="D") # Daily pd.date_range("2024-01-01", periods=5, freq="H") # Hourly pd.date_range("2024-01-01", periods=5, freq="T") # Minute

2. Weekly and Monthly

python pd.date_range("2024-01-01", periods=5, freq="W") # Weekly pd.date_range("2024-01-01", periods=5, freq="M") # Month end pd.date_range("2024-01-01", periods=5, freq="MS") # Month start

3. Business Days

python pd.date_range("2024-01-01", periods=5, freq="B") # Business days

Indexing with Dates

Date-based slicing and selection.

1. String-based Selection

python s = pd.Series(range(5), index=pd.date_range("2020-01-01", periods=5)) s["2020-01-02":"2020-01-04"] # Inclusive slicing

2. Partial String Indexing

python s["2020-01"] # All of January 2020 s["2020"] # All of 2020

3. loc with Dates

python s.loc["2020-01-02"] s.loc["2020-01-02":"2020-01-04"]

Time Zone Handling

Work with time zones.

1. Localize to UTC

python dates = pd.date_range("2020-01-01", periods=5) dates_utc = dates.tz_localize("UTC")

2. Convert Time Zone

python dates_eastern = dates_utc.tz_convert("US/Eastern")

3. Financial Data

Time zones are essential for global financial data.

Financial Context

DatetimeIndex underlies financial analysis.

1. Price Time Series

python prices = pd.Series([100, 101, 102], index=pd.date_range("2024-01-01", periods=3))

2. Returns Calculation

python returns = prices.pct_change()

3. Event Studies

```python

Select specific date ranges for analysis

event_window = prices["2024-01-01":"2024-01-31"] ```


Runnable Example: time_series_tutorial.py

```python """ Pandas Tutorial: Time Series Analysis.

Covers datetime operations, resampling, rolling windows, time zones. """

import pandas as pd import numpy as np

=============================================================================

Main

=============================================================================

if name == "main":

print("="*70)
print("TIME SERIES ANALYSIS")
print("="*70)

# Create time series data
dates = pd.date_range('2024-01-01', periods=100, freq='D')
np.random.seed(42)
ts_data = pd.DataFrame({
    'Date': dates,
    'Sales': np.random.randint(100, 500, 100),
    'Temperature': np.random.uniform(15, 35, 100)
})
ts_data.set_index('Date', inplace=True)

print("\nTime Series Data:")
print(ts_data.head())

# Date parsing
print("\n1. Parse dates from strings:")
date_strings = ['2024-01-01', '2024-02-15', '2024-03-30']
parsed_dates = pd.to_datetime(date_strings)
print(parsed_dates)

# Date ranges
print("\n2. Create date ranges:")
print("Daily:", pd.date_range('2024-01-01', periods=5, freq='D'))
print("Weekly:", pd.date_range('2024-01-01', periods=5, freq='W'))
print("Monthly:", pd.date_range('2024-01-01', periods=5, freq='MS'))

# Accessing datetime components
print("\n3. Extract datetime components:")
ts_data['Year'] = ts_data.index.year
ts_data['Month'] = ts_data.index.month
ts_data['Day'] = ts_data.index.day
ts_data['DayOfWeek'] = ts_data.index.dayofweek
print(ts_data[['Sales', 'Year', 'Month', 'Day', 'DayOfWeek']].head())

# Resampling (changing frequency)
print("\n4. Resample to weekly frequency (sum):")
weekly = ts_data['Sales'].resample('W').sum()
print(weekly.head())

print("\n5. Resample to monthly (mean):")
monthly = ts_data['Sales'].resample('MS').mean()
print(monthly.head())

# Rolling windows
print("\n6. Rolling mean (7-day window):")
ts_data['Sales_MA7'] = ts_data['Sales'].rolling(window=7).mean()
print(ts_data[['Sales', 'Sales_MA7']].head(10))

print("\n7. Rolling statistics:")
rolling_stats = ts_data['Sales'].rolling(window=7).agg(['mean', 'std', 'min', 'max'])
print(rolling_stats.head(10))

# Expanding windows
print("\n8. Expanding mean (cumulative):")
ts_data['Cumulative_Mean'] = ts_data['Sales'].expanding().mean()
print(ts_data[['Sales', 'Cumulative_Mean']].head(10))

# Shift and lag
print("\n9. Shift values (lag/lead):")
ts_data['Sales_Yesterday'] = ts_data['Sales'].shift(1)
ts_data['Sales_Tomorrow'] = ts_data['Sales'].shift(-1)
print(ts_data[['Sales', 'Sales_Yesterday', 'Sales_Tomorrow']].head())

# Percentage change
print("\n10. Percentage change:")
ts_data['Sales_Pct_Change'] = ts_data['Sales'].pct_change()
print(ts_data[['Sales', 'Sales_Pct_Change']].head())

# Time zones
print("\n11. Time zone operations:")
utc_dates = pd.date_range('2024-01-01', periods=3, freq='D', tz='UTC')
print("UTC:", utc_dates)

# Convert time zone
eastern = utc_dates.tz_convert('US/Eastern')
print("Eastern:", eastern)

print("\nKEY TAKEAWAYS:")
print("- pd.to_datetime(): Parse dates from strings")
print("- pd.date_range(): Create date sequences")
print("- resample(): Change time frequency")
print("- rolling(): Moving window calculations")
print("- shift(): Lag/lead values")
print("- pct_change(): Percentage change")
print("- Time zone handling with tz parameter")

```


Exercises

Exercise 1. Write code that creates a DataFrame with a DatetimeIndex and accesses rows for a specific month using .loc['2024-06'].

Solution to Exercise 1

```python import pandas as pd import numpy as np

Solution for the specific exercise

np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(10), 'B': np.random.randn(10)}) print(df.head()) ```


Exercise 2. Explain three advantages of having a DatetimeIndex on a DataFrame.

Solution to Exercise 2

See the main content for the detailed explanation. The key concept involves understanding the Pandas API and its behavior for this specific operation.


Exercise 3. Write code that extracts the year, month, and day from a DatetimeIndex using .year, .month, .day.

Solution to Exercise 3

```python import pandas as pd import numpy as np

np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(20), 'B': np.random.randn(20)}) result = df.describe() print(result) ```


Exercise 4. Create a DatetimeIndex with hourly frequency and demonstrate slicing by date and time.

Solution to Exercise 4

```python import pandas as pd import numpy as np

np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(50), 'group': np.random.choice(['X', 'Y'], 50)}) result = df.groupby('group').mean() print(result) ```