Skip to content

SP500 Analysis

A comprehensive example demonstrating pandas operations for analyzing S&P 500 stock data.

Mental Model

This example ties together the full pandas workflow: download data, inspect it, filter and group, compute statistics, and visualize results. Think of it as a capstone that shows how individual pandas verbs chain together into a real analysis pipeline, with an OOP wrapper for reusability.

SP500 Class Design

Build a class to download and analyze S&P 500 data.

1. Class Structure

```python import pandas as pd import yfinance as yf

class SP500: """ Class to download SP500 companies' fundamental and stock price data. """

def __init__(self):
    self.tickers = []
    self.data = pd.DataFrame()
    self.price_data = pd.DataFrame()
    self.fundamental_data = pd.DataFrame()

```

2. Fetch Tickers

python def fetch_sp500_tickers(self): table = pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies') self.tickers = table[0]['Symbol'].tolist() # Adjust ticker symbols if needed self.tickers = [ticker.replace('.', '-') for ticker in self.tickers]

3. Fetch Data

```python def fetch_data(self): infos = [] closes = []

    for ticker in self.tickers:
        try:
            stock = yf.Ticker(ticker)
            info = stock.info
            df = stock.history(period='1y')
            closes.append(df[['Close']].rename(columns={'Close': ticker}))
            infos.append({
                'Ticker': ticker,
                'PER (Trailing)': info.get('trailingPE'),
                'PBR': info.get('priceToBook'),
                'Market Cap': info.get('marketCap')
            })
        except Exception as e:
            print(f"Failed to fetch {ticker}: {e}")
            continue

    self.price_data = pd.concat(closes, axis=1)
    self.fundamental_data = pd.DataFrame(infos)

```

Data Merging

Combine price and fundamental data.

1. Merge Method

python def merge_data(self): self.data = self.fundamental_data.set_index('Ticker').join( self.price_data.transpose(), how='left' )

2. Get Data

python def get_data(self): return self.data

3. Usage Example

```python

Example Usage

sp500 = SP500() sp500.fetch_sp500_tickers() sp500.fetch_data() sp500.merge_data() data = sp500.get_data() print(data.head()) ```

Analysis Operations

Common analysis patterns with the data.

1. Filter by Sector

```python

Assuming sector data is available

finance_df = data[data['Sector'] == 'Finance'] ```

2. Group Statistics

python sector_stats = data.groupby('Sector').agg({ 'Market Cap': 'sum', 'PER (Trailing)': 'mean', 'PBR': ['mean', 'std'] })

3. Top Performers

```python

Top 10 by market cap

top_10 = data.nlargest(10, 'Market Cap') ```

Visualization Integration

Combine pandas with matplotlib.

1. Sector Distribution

```python import matplotlib.pyplot as plt

sector_caps = data.groupby('Sector')['Market Cap'].sum() sector_caps.plot(kind='bar', figsize=(12, 6)) plt.title('Market Cap by Sector') plt.ylabel('Market Cap ($)') plt.show() ```

2. Price Correlation

```python

Correlation matrix of prices

price_corr = sp500.price_data.corr() ```

3. Returns Analysis

python returns = sp500.price_data.pct_change() returns.mean().nlargest(10).plot(kind='bar') plt.title('Top 10 Average Daily Returns') plt.show()

Best Practices

Guidelines for large-scale data analysis.

1. Error Handling

```python

Always handle API errors gracefully

try: data = yf.download(ticker) except Exception as e: print(f"Error: {e}") ```

2. Incremental Loading

```python

For large datasets, process in batches

batch_size = 50 for i in range(0, len(tickers), batch_size): batch = tickers[i:i+batch_size] # Process batch ```

3. Caching Results

```python

Save intermediate results

data.to_pickle('sp500_data.pkl')

Load later

data = pd.read_pickle('sp500_data.pkl') ```


Exercises

Exercise 1. Write code that creates a DataFrame of stock prices (date, close) and computes the 20-day and 50-day rolling means.

Solution to Exercise 1

```python import pandas as pd

df = pd.DataFrame({ 'name': ['Alice', 'Bob', 'Charlie', 'David'], 'salary': [70000, 80000, 60000, 90000], 'department': ['IT', 'IT', 'HR', 'HR'] }) result = df.groupby('department')['salary'].max() print(result) ```


Exercise 2. Explain how to compute daily returns from a price series using pct_change(). Write code demonstrating this.

Solution to Exercise 2

See the main content for the relevant patterns and API calls. The solution involves understanding how to combine Pandas operations to solve data manipulation problems.


Exercise 3. Write code that resamples daily stock data to monthly frequency, taking the last close price of each month.

Solution to Exercise 3

```python import pandas as pd import numpy as np

np.random.seed(42) df = pd.DataFrame({ 'value': np.random.randint(0, 100, 20), 'group': np.random.choice(['A', 'B'], 20) }) result = df.groupby('group')['value'].transform('sum') print(result) ```


Exercise 4. Create a function that takes a price DataFrame and returns the maximum drawdown (largest peak-to-trough decline).

Solution to Exercise 4

```python import pandas as pd import numpy as np

np.random.seed(42) s = pd.Series(np.random.randn(100)) s_clean = s.clip(lower=0) print(s_clean.describe()) ```