Saving DataFrames¶

Export DataFrames to various file formats for storage and sharing.

Mental Model

Every to_* method is the mirror of a read_* function. to_csv serializes to text, to_parquet serializes to efficient binary, to_excel writes spreadsheets. The key decision is whether you need human-readable output (CSV) or fast, type-preserving storage (Parquet). Always check index=False if you do not want the index written as a column.

to_csv¶

Save DataFrame to CSV file.

1. Basic Save¶

```python import pandas as pd

df.to_csv('output.csv') ```

2. Without Index¶

python df.to_csv('output.csv', index=False)

3. Custom Separator¶

python df.to_csv('output.tsv', sep='\t')

to_csv Keywords¶

Customize CSV output.

1. Select Columns¶

python df.to_csv('output.csv', columns=['Name', 'Age'])

2. Handle Missing¶

python df.to_csv('output.csv', na_rep='NULL')

3. Float Format¶

python df.to_csv('output.csv', float_format='%.2f')

to_excel¶

Save DataFrame to Excel file.

1. Basic Save¶

python df.to_excel('output.xlsx')

2. Specify Sheet Name¶

python df.to_excel('output.xlsx', sheet_name='Data')

3. Without Index¶

python df.to_excel('output.xlsx', index=False)

Multiple Sheets¶

Save multiple DataFrames to one Excel file.

1. ExcelWriter¶

python with pd.ExcelWriter('output.xlsx') as writer: df1.to_excel(writer, sheet_name='Sheet1') df2.to_excel(writer, sheet_name='Sheet2') df3.to_excel(writer, sheet_name='Sheet3')

2. Append Mode¶

python with pd.ExcelWriter('output.xlsx', mode='a') as writer: df_new.to_excel(writer, sheet_name='NewSheet')

3. Engine Selection¶

```python

openpyxl for .xlsx, xlsxwriter for formatting¶

with pd.ExcelWriter('output.xlsx', engine='xlsxwriter') as writer: df.to_excel(writer) ```

to_json¶

Save DataFrame to JSON file.

1. Basic Save¶

python df.to_json('output.json')

2. Orient Options¶

python df.to_json('output.json', orient='records') # List of dicts df.to_json('output.json', orient='columns') # Dict of lists df.to_json('output.json', orient='index') # Dict of dicts

3. Indent for Readability¶

python df.to_json('output.json', orient='records', indent=2)

to_pickle¶

Save DataFrame in pickle format.

1. Save Pickle¶

python df.to_pickle('output.pkl')

2. Load Pickle¶

python df = pd.read_pickle('output.pkl')

3. Compression¶

python df.to_pickle('output.pkl.gz', compression='gzip')

to_parquet¶

Save in Parquet format (efficient columnar storage).

1. Basic Save¶

python df.to_parquet('output.parquet')

2. Compression¶

python df.to_parquet('output.parquet', compression='snappy')

3. Read Back¶

python df = pd.read_parquet('output.parquet')

Financial Example¶

Save stock data workflow.

1. Download and Save¶

```python import yfinance as yf

ticker = 'WMT' df = yf.Ticker(ticker).history(start='2020-01-01', end='2020-12-31')

Save to CSV¶

df.to_csv(f'{ticker}.csv')

Save to Excel¶

df.to_excel(f'{ticker}.xlsx', sheet_name='stocks') ```

2. Multiple Tickers¶

```python tickers = ['AAPL', 'MSFT', 'GOOGL']

with pd.ExcelWriter('portfolio.xlsx') as writer: for ticker in tickers: df = yf.Ticker(ticker).history(period='1y') df.to_excel(writer, sheet_name=ticker) ```

3. Pickle for Speed¶

```python

Faster save/load for large DataFrames¶

df.to_pickle('large_data.pkl') df = pd.read_pickle('large_data.pkl') ```

Format Comparison¶

Choose the right format.

1. CSV¶

Pros: Universal, human-readable
Cons: No type preservation, slow for large data

2. Excel¶

Pros: Business-friendly, multiple sheets
Cons: Slower, file size limits

3. Pickle¶

Pros: Fast, preserves types
Cons: Python-only, security concerns

4. Parquet¶

Pros: Fast, compressed, columnar
Cons: Less universal

Exercises¶

Exercise 1. Create a DataFrame with columns 'ticker', 'date', and 'close' containing three rows of sample stock data. Save it to a CSV file without the index, then read it back and verify the data matches the original.

Solution to Exercise 1

Save with index=False and read back with pd.read_csv.

import pandas as pd

df = pd.DataFrame({
    'ticker': ['AAPL', 'MSFT', 'GOOGL'],
    'date': ['2024-01-01', '2024-01-01', '2024-01-01'],
    'close': [150.0, 350.0, 140.0]
})
df.to_csv('stocks.csv', index=False)
df_loaded = pd.read_csv('stocks.csv')
print(df_loaded)
print(df.equals(df_loaded))  # True

Exercise 2. Given a dictionary of three DataFrames (one per ticker), write them to a single Excel file where each ticker has its own sheet. Use pd.ExcelWriter as a context manager.

Solution to Exercise 2

Use pd.ExcelWriter with each DataFrame written to a named sheet.

import pandas as pd
import numpy as np

dfs = {
    'AAPL': pd.DataFrame({'close': np.random.uniform(140, 160, 5)}),
    'MSFT': pd.DataFrame({'close': np.random.uniform(340, 360, 5)}),
    'GOOGL': pd.DataFrame({'close': np.random.uniform(130, 150, 5)}),
}

with pd.ExcelWriter('portfolio.xlsx') as writer:
    for ticker, df in dfs.items():
        df.to_excel(writer, sheet_name=ticker, index=False)

Exercise 3. Create a DataFrame with 1000 rows and columns 'id' (int), 'value' (float), and 'category' (string). Save it as both CSV and Parquet. Compare the file sizes and explain why Parquet is typically smaller.

Solution to Exercise 3

Create sample data, save in both formats, and compare sizes.

import pandas as pd
import numpy as np
import os

np.random.seed(42)
df = pd.DataFrame({
    'id': range(1000),
    'value': np.random.randn(1000),
    'category': np.random.choice(['A', 'B', 'C'], 1000)
})

df.to_csv('data.csv', index=False)
df.to_parquet('data.parquet')

csv_size = os.path.getsize('data.csv')
parquet_size = os.path.getsize('data.parquet')
print(f"CSV size: {csv_size / 1e3:.1f} KB")
print(f"Parquet size: {parquet_size / 1e3:.1f} KB")
# Parquet is smaller because it uses columnar storage
# with built-in compression and schema encoding.