Skip to content

Financial Data Workflow

Mental Model

The yfinance workflow is a three-stage pipeline: download raw OHLCV data, transform it with pandas (compute returns, resample, merge tickers), and visualize the results. Each stage maps directly to pandas operations you already know -- the financial domain just provides the context.

Data Download

1. Single Ticker

```python import yfinance as yf

df = yf.download("AAPL", start="2023-01-01", end="2023-12-31") print(type(df)) # DataFrame print(df.columns) # ['Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume'] ```

2. Index Types

python print(type(df.index)) # DatetimeIndex print(df.index[0]) # Timestamp('2023-01-03')

3. Access Patterns

```python

By label

df.loc['2023-01-03', 'Close']

By position

df.iloc[0, 3]

Column

df['Close'] ```

Multi-Asset Analysis

1. Multiple Tickers

python tickers = ["AAPL", "MSFT", "GOOGL"] df = yf.download(tickers, start="2023-01-01")

2. MultiIndex Columns

```python print(df.columns)

MultiIndex([('Open', 'AAPL'), ('Open', 'MSFT'), ...])

Access

df['Close']['AAPL'] # Apple closing prices df[('Close', 'AAPL')] # Alternative ```

3. Comparison

```python

Normalize to first day

normalized = df['Close'] / df['Close'].iloc[0] normalized.plot(figsize=(12, 6)) ```

Visualization

1. Price Series

```python import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(12, 6)) df['Close'].plot(ax=ax) ax.set_ylabel('Price ($)') ax.set_title('AAPL Closing Price') plt.show() ```

2. Multiple Assets

python fig, ax = plt.subplots(figsize=(12, 6)) for ticker in ['AAPL', 'MSFT', 'GOOGL']: (df['Close'][ticker] / df['Close'][ticker].iloc[0]).plot(ax=ax, label=ticker) ax.legend() ax.set_ylabel('Normalized Price') plt.show()

3. Volume

python fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 8), sharex=True) df['Close'].plot(ax=ax1) df['Volume'].plot(ax=ax2, color='gray', alpha=0.5) ax1.set_ylabel('Price') ax2.set_ylabel('Volume')

Analysis

1. Returns

```python

Daily returns

returns = df['Close'].pct_change() returns.plot(kind='hist', bins=50, alpha=0.6) ```

2. Moving Averages

```python df['MA20'] = df['Close'].rolling(20).mean() df['MA50'] = df['Close'].rolling(50).mean()

fig, ax = plt.subplots(figsize=(12, 6)) df[['Close', 'MA20', 'MA50']].plot(ax=ax) ax.set_title('Price with Moving Averages') ```

3. Volatility

```python

20-day rolling volatility

volatility = returns.rolling(20).std() * np.sqrt(252) volatility.plot(figsize=(12, 4)) ```


Exercises

Exercise 1. Write code that creates a synthetic stock price DataFrame with columns ['Open', 'High', 'Low', 'Close', 'Volume'] and a DatetimeIndex.

Solution to Exercise 1

```python import pandas as pd

df = pd.DataFrame({ 'name': ['Alice', 'Bob', 'Charlie', 'David'], 'salary': [70000, 80000, 60000, 90000], 'department': ['IT', 'IT', 'HR', 'HR'] }) result = df.groupby('department')['salary'].max() print(result) ```


Exercise 2. Explain the typical workflow for downloading, cleaning, and analyzing financial data with Pandas.

Solution to Exercise 2

See the main content for the relevant patterns and API calls. The solution involves understanding how to combine Pandas operations to solve data manipulation problems.


Exercise 3. Write code that computes the cumulative return of a price series. Start from the formula: cumulative return = (price / first_price) - 1.

Solution to Exercise 3

```python import pandas as pd import numpy as np

np.random.seed(42) df = pd.DataFrame({ 'value': np.random.randint(0, 100, 20), 'group': np.random.choice(['A', 'B'], 20) }) result = df.groupby('group')['value'].transform('sum') print(result) ```


Exercise 4. Create a function that takes a DataFrame of daily prices and returns a summary with: total return, annualized return, and volatility.

Solution to Exercise 4

```python import pandas as pd import numpy as np

np.random.seed(42) s = pd.Series(np.random.randn(100)) s_clean = s.clip(lower=0) print(s_clean.describe()) ```