Skip to content

Loading Excel and JSON

pandas supports loading data from Excel spreadsheets, JSON files, and other formats.

Mental Model

read_excel and read_json follow the same philosophy as read_csv: one function call turns a file into a DataFrame. Excel adds sheet selection and header-row options; JSON adds orientation (records, columns, index) because JSON can nest data in multiple ways. The output is always the same familiar DataFrame.

pd.read_excel

Load Excel files (.xlsx, .xls).

1. Basic Load

```python import pandas as pd

df = pd.read_excel('data.xlsx') print(df.head()) ```

2. From URL

python url = 'https://example.com/data.xlsx?raw=true' df = pd.read_excel(url, sheet_name='Sheet1')

3. Specify Sheet

```python

By name

df = pd.read_excel('data.xlsx', sheet_name='Sales')

By index (0-based)

df = pd.read_excel('data.xlsx', sheet_name=0) ```

Excel Keywords

Customize Excel loading.

1. Multiple Sheets

```python

Load all sheets as dictionary

dfs = pd.read_excel('data.xlsx', sheet_name=None)

dfs['Sheet1'], dfs['Sheet2'], etc.

```

2. Skip Rows

python df = pd.read_excel('data.xlsx', skiprows=2)

3. Use Columns

```python df = pd.read_excel('data.xlsx', usecols='A:D')

or

df = pd.read_excel('data.xlsx', usecols=[0, 1, 2, 3]) ```

pd.read_json

Load JSON files into DataFrames.

1. Basic Load

python df = pd.read_json('data.json') print(df.head())

2. From URL

python url = 'https://raw.githubusercontent.com/example/data.json' df = pd.read_json(url)

3. JSON Structure

```python

JSON should be array of objects or object of arrays

[{"a": 1, "b": 2}, {"a": 3, "b": 4}]

```

JSON Orientations

Different JSON structures.

1. Records Orient

```python

JSON: [{"col1": 1, "col2": 2}, ...]

df = pd.read_json('data.json', orient='records') ```

2. Columns Orient

```python

JSON:

df = pd.read_json('data.json', orient='columns') ```

3. Index Orient

```python

JSON: {"row1": {"col1": 1}, "row2": {"col1": 2}}

df = pd.read_json('data.json', orient='index') ```

pd.read_table

Load delimited files with read_table.

1. Pipe Delimiter

python df = pd.read_table('data.txt', sep='|')

2. With Header

python names = ['user_id', 'age', 'gender', 'occupation', 'zip_code'] df = pd.read_table( 'data.txt', sep='|', header=None, names=names )

3. Select Columns

python df = pd.read_table( 'data.txt', sep='|', usecols=['age', 'gender', 'occupation'] )

pd.read_fwf

Load fixed-width formatted files.

1. Basic Load

python df = pd.read_fwf('data.txt') print(df.head())

2. Specify Widths

python df = pd.read_fwf('data.txt', widths=[10, 5, 8, 12])

3. Column Positions

python df = pd.read_fwf('data.txt', colspecs=[(0, 10), (10, 15), (15, 23)])

pd.HDFStore

Load HDF5 files for large datasets.

1. Open Store

python h5 = pd.HDFStore('data.h5', 'r') print(h5.keys())

2. Read Data

```python df = h5['/table_name']

or

df = h5['table_name'] ```

3. Close Store

```python h5.close()

Or use context manager

with pd.HDFStore('data.h5', 'r') as h5: df = h5['table_name'] ```

Comparison

When to use each format.

1. CSV

  • Universal compatibility
  • Human-readable
  • Good for medium data

2. Excel

  • Preserves formatting
  • Multiple sheets
  • Good for business users

3. JSON

  • Web API responses
  • Nested structures
  • JavaScript integration

4. HDF5

  • Large datasets
  • Fast I/O
  • Scientific computing

Exercises

Exercise 1. Create a JSON string in "records" orientation (a list of dictionaries) with keys 'name', 'age', and 'city' for three people. Write it to a file and read it back using pd.read_json with orient='records'. Print the resulting DataFrame.

Solution to Exercise 1

Write JSON records and read them back.

import pandas as pd
import json

records = [
    {"name": "Alice", "age": 30, "city": "NYC"},
    {"name": "Bob", "age": 25, "city": "LA"},
    {"name": "Carol", "age": 35, "city": "SF"}
]
with open('people.json', 'w') as f:
    json.dump(records, f)

df = pd.read_json('people.json', orient='records')
print(df)

Exercise 2. Create a DataFrame and save it to an Excel file with the sheet name 'Report'. Then read it back specifying the sheet name. Verify the loaded DataFrame matches the original.

Solution to Exercise 2

Save to Excel with a named sheet and read it back.

import pandas as pd

df = pd.DataFrame({
    'product': ['Widget', 'Gadget'],
    'sales': [100, 200]
})
df.to_excel('report.xlsx', sheet_name='Report', index=False)

df_loaded = pd.read_excel('report.xlsx', sheet_name='Report')
print(df_loaded)
print(df.equals(df_loaded))  # True

Exercise 3. Given a pipe-delimited text string with header user_id|age|gender, use pd.read_table (with io.StringIO) to read the data. Add custom column names ['ID', 'Age', 'Gender'] using the names parameter and skip the original header row.

Solution to Exercise 3

Use io.StringIO with pd.read_table and custom column names.

import pandas as pd
import io

text = "user_id|age|gender\n1|25|M\n2|30|F\n3|28|M"
df = pd.read_table(
    io.StringIO(text),
    sep='|',
    header=0,
    names=['ID', 'Age', 'Gender'],
    skiprows=1
)
print(df)