Skip to content

Saving DataFrames

Export DataFrames to various file formats for storage and sharing.

Mental Model

Every to_* method is the mirror of a read_* function. to_csv serializes to text, to_parquet serializes to efficient binary, to_excel writes spreadsheets. The key decision is whether you need human-readable output (CSV) or fast, type-preserving storage (Parquet). Always check index=False if you do not want the index written as a column.

to_csv

Save DataFrame to CSV file.

1. Basic Save

```python import pandas as pd

df.to_csv('output.csv') ```

2. Without Index

python df.to_csv('output.csv', index=False)

3. Custom Separator

python df.to_csv('output.tsv', sep='\t')

to_csv Keywords

Customize CSV output.

1. Select Columns

python df.to_csv('output.csv', columns=['Name', 'Age'])

2. Handle Missing

python df.to_csv('output.csv', na_rep='NULL')

3. Float Format

python df.to_csv('output.csv', float_format='%.2f')

to_excel

Save DataFrame to Excel file.

1. Basic Save

python df.to_excel('output.xlsx')

2. Specify Sheet Name

python df.to_excel('output.xlsx', sheet_name='Data')

3. Without Index

python df.to_excel('output.xlsx', index=False)

Multiple Sheets

Save multiple DataFrames to one Excel file.

1. ExcelWriter

python with pd.ExcelWriter('output.xlsx') as writer: df1.to_excel(writer, sheet_name='Sheet1') df2.to_excel(writer, sheet_name='Sheet2') df3.to_excel(writer, sheet_name='Sheet3')

2. Append Mode

python with pd.ExcelWriter('output.xlsx', mode='a') as writer: df_new.to_excel(writer, sheet_name='NewSheet')

3. Engine Selection

```python

openpyxl for .xlsx, xlsxwriter for formatting

with pd.ExcelWriter('output.xlsx', engine='xlsxwriter') as writer: df.to_excel(writer) ```

to_json

Save DataFrame to JSON file.

1. Basic Save

python df.to_json('output.json')

2. Orient Options

python df.to_json('output.json', orient='records') # List of dicts df.to_json('output.json', orient='columns') # Dict of lists df.to_json('output.json', orient='index') # Dict of dicts

3. Indent for Readability

python df.to_json('output.json', orient='records', indent=2)

to_pickle

Save DataFrame in pickle format.

1. Save Pickle

python df.to_pickle('output.pkl')

2. Load Pickle

python df = pd.read_pickle('output.pkl')

3. Compression

python df.to_pickle('output.pkl.gz', compression='gzip')

to_parquet

Save in Parquet format (efficient columnar storage).

1. Basic Save

python df.to_parquet('output.parquet')

2. Compression

python df.to_parquet('output.parquet', compression='snappy')

3. Read Back

python df = pd.read_parquet('output.parquet')

Financial Example

Save stock data workflow.

1. Download and Save

```python import yfinance as yf

ticker = 'WMT' df = yf.Ticker(ticker).history(start='2020-01-01', end='2020-12-31')

Save to CSV

df.to_csv(f'{ticker}.csv')

Save to Excel

df.to_excel(f'{ticker}.xlsx', sheet_name='stocks') ```

2. Multiple Tickers

```python tickers = ['AAPL', 'MSFT', 'GOOGL']

with pd.ExcelWriter('portfolio.xlsx') as writer: for ticker in tickers: df = yf.Ticker(ticker).history(period='1y') df.to_excel(writer, sheet_name=ticker) ```

3. Pickle for Speed

```python

Faster save/load for large DataFrames

df.to_pickle('large_data.pkl') df = pd.read_pickle('large_data.pkl') ```

Format Comparison

Choose the right format.

1. CSV

  • Pros: Universal, human-readable
  • Cons: No type preservation, slow for large data

2. Excel

  • Pros: Business-friendly, multiple sheets
  • Cons: Slower, file size limits

3. Pickle

  • Pros: Fast, preserves types
  • Cons: Python-only, security concerns

4. Parquet

  • Pros: Fast, compressed, columnar
  • Cons: Less universal

Exercises

Exercise 1. Create a DataFrame with columns 'ticker', 'date', and 'close' containing three rows of sample stock data. Save it to a CSV file without the index, then read it back and verify the data matches the original.

Solution to Exercise 1

Save with index=False and read back with pd.read_csv.

import pandas as pd

df = pd.DataFrame({
    'ticker': ['AAPL', 'MSFT', 'GOOGL'],
    'date': ['2024-01-01', '2024-01-01', '2024-01-01'],
    'close': [150.0, 350.0, 140.0]
})
df.to_csv('stocks.csv', index=False)
df_loaded = pd.read_csv('stocks.csv')
print(df_loaded)
print(df.equals(df_loaded))  # True

Exercise 2. Given a dictionary of three DataFrames (one per ticker), write them to a single Excel file where each ticker has its own sheet. Use pd.ExcelWriter as a context manager.

Solution to Exercise 2

Use pd.ExcelWriter with each DataFrame written to a named sheet.

import pandas as pd
import numpy as np

dfs = {
    'AAPL': pd.DataFrame({'close': np.random.uniform(140, 160, 5)}),
    'MSFT': pd.DataFrame({'close': np.random.uniform(340, 360, 5)}),
    'GOOGL': pd.DataFrame({'close': np.random.uniform(130, 150, 5)}),
}

with pd.ExcelWriter('portfolio.xlsx') as writer:
    for ticker, df in dfs.items():
        df.to_excel(writer, sheet_name=ticker, index=False)

Exercise 3. Create a DataFrame with 1000 rows and columns 'id' (int), 'value' (float), and 'category' (string). Save it as both CSV and Parquet. Compare the file sizes and explain why Parquet is typically smaller.

Solution to Exercise 3

Create sample data, save in both formats, and compare sizes.

import pandas as pd
import numpy as np
import os

np.random.seed(42)
df = pd.DataFrame({
    'id': range(1000),
    'value': np.random.randn(1000),
    'category': np.random.choice(['A', 'B', 'C'], 1000)
})

df.to_csv('data.csv', index=False)
df.to_parquet('data.parquet')

csv_size = os.path.getsize('data.csv')
parquet_size = os.path.getsize('data.parquet')
print(f"CSV size: {csv_size / 1e3:.1f} KB")
print(f"Parquet size: {parquet_size / 1e3:.1f} KB")
# Parquet is smaller because it uses columnar storage
# with built-in compression and schema encoding.