Loading Excel and JSON¶
pandas supports loading data from Excel spreadsheets, JSON files, and other formats.
pd.read_excel¶
Load Excel files (.xlsx, .xls).
1. Basic Load¶
import pandas as pd
df = pd.read_excel('data.xlsx')
print(df.head())
2. From URL¶
url = 'https://example.com/data.xlsx?raw=true'
df = pd.read_excel(url, sheet_name='Sheet1')
3. Specify Sheet¶
# By name
df = pd.read_excel('data.xlsx', sheet_name='Sales')
# By index (0-based)
df = pd.read_excel('data.xlsx', sheet_name=0)
Excel Keywords¶
Customize Excel loading.
1. Multiple Sheets¶
# Load all sheets as dictionary
dfs = pd.read_excel('data.xlsx', sheet_name=None)
# dfs['Sheet1'], dfs['Sheet2'], etc.
2. Skip Rows¶
df = pd.read_excel('data.xlsx', skiprows=2)
3. Use Columns¶
df = pd.read_excel('data.xlsx', usecols='A:D')
# or
df = pd.read_excel('data.xlsx', usecols=[0, 1, 2, 3])
pd.read_json¶
Load JSON files into DataFrames.
1. Basic Load¶
df = pd.read_json('data.json')
print(df.head())
2. From URL¶
url = 'https://raw.githubusercontent.com/example/data.json'
df = pd.read_json(url)
3. JSON Structure¶
# JSON should be array of objects or object of arrays
# [{"a": 1, "b": 2}, {"a": 3, "b": 4}]
JSON Orientations¶
Different JSON structures.
1. Records Orient¶
# JSON: [{"col1": 1, "col2": 2}, ...]
df = pd.read_json('data.json', orient='records')
2. Columns Orient¶
# JSON: {"col1": [1, 2], "col2": [3, 4]}
df = pd.read_json('data.json', orient='columns')
3. Index Orient¶
# JSON: {"row1": {"col1": 1}, "row2": {"col1": 2}}
df = pd.read_json('data.json', orient='index')
pd.read_table¶
Load delimited files with read_table.
1. Pipe Delimiter¶
df = pd.read_table('data.txt', sep='|')
2. With Header¶
names = ['user_id', 'age', 'gender', 'occupation', 'zip_code']
df = pd.read_table(
'data.txt',
sep='|',
header=None,
names=names
)
3. Select Columns¶
df = pd.read_table(
'data.txt',
sep='|',
usecols=['age', 'gender', 'occupation']
)
pd.read_fwf¶
Load fixed-width formatted files.
1. Basic Load¶
df = pd.read_fwf('data.txt')
print(df.head())
2. Specify Widths¶
df = pd.read_fwf('data.txt', widths=[10, 5, 8, 12])
3. Column Positions¶
df = pd.read_fwf('data.txt', colspecs=[(0, 10), (10, 15), (15, 23)])
pd.HDFStore¶
Load HDF5 files for large datasets.
1. Open Store¶
h5 = pd.HDFStore('data.h5', 'r')
print(h5.keys())
2. Read Data¶
df = h5['/table_name']
# or
df = h5['table_name']
3. Close Store¶
h5.close()
# Or use context manager
with pd.HDFStore('data.h5', 'r') as h5:
df = h5['table_name']
Comparison¶
When to use each format.
1. CSV¶
- Universal compatibility
- Human-readable
- Good for medium data
2. Excel¶
- Preserves formatting
- Multiple sheets
- Good for business users
3. JSON¶
- Web API responses
- Nested structures
- JavaScript integration
4. HDF5¶
- Large datasets
- Fast I/O
- Scientific computing