Loading Excel and JSON¶

pandas supports loading data from Excel spreadsheets, JSON files, and other formats.

pd.read_excel¶

Load Excel files (.xlsx, .xls).

import pandas as pd

df = pd.read_excel('data.xlsx')
print(df.head())

url = 'https://example.com/data.xlsx?raw=true'
df = pd.read_excel(url, sheet_name='Sheet1')

# By name
df = pd.read_excel('data.xlsx', sheet_name='Sales')

# By index (0-based)
df = pd.read_excel('data.xlsx', sheet_name=0)

Customize Excel loading.

# Load all sheets as dictionary
dfs = pd.read_excel('data.xlsx', sheet_name=None)
# dfs['Sheet1'], dfs['Sheet2'], etc.

df = pd.read_excel('data.xlsx', skiprows=2)

df = pd.read_excel('data.xlsx', usecols='A:D')
# or
df = pd.read_excel('data.xlsx', usecols=[0, 1, 2, 3])

Load JSON files into DataFrames.

df = pd.read_json('data.json')
print(df.head())

url = 'https://raw.githubusercontent.com/example/data.json'
df = pd.read_json(url)

# JSON should be array of objects or object of arrays
# [{"a": 1, "b": 2}, {"a": 3, "b": 4}]

Different JSON structures.

# JSON: [{"col1": 1, "col2": 2}, ...]
df = pd.read_json('data.json', orient='records')

# JSON: {"col1": [1, 2], "col2": [3, 4]}
df = pd.read_json('data.json', orient='columns')

# JSON: {"row1": {"col1": 1}, "row2": {"col1": 2}}
df = pd.read_json('data.json', orient='index')

Load delimited files with read_table.

df = pd.read_table('data.txt', sep='|')

names = ['user_id', 'age', 'gender', 'occupation', 'zip_code']
df = pd.read_table(
    'data.txt',
    sep='|',
    header=None,
    names=names
)

df = pd.read_table(
    'data.txt',
    sep='|',
    usecols=['age', 'gender', 'occupation']
)

Load fixed-width formatted files.

df = pd.read_fwf('data.txt')
print(df.head())

df = pd.read_fwf('data.txt', widths=[10, 5, 8, 12])

df = pd.read_fwf('data.txt', colspecs=[(0, 10), (10, 15), (15, 23)])

Load HDF5 files for large datasets.

h5 = pd.HDFStore('data.h5', 'r')
print(h5.keys())

df = h5['/table_name']
# or
df = h5['table_name']

h5.close()

# Or use context manager
with pd.HDFStore('data.h5', 'r') as h5:
    df = h5['table_name']

When to use each format.