Loading Excel and JSON¶

pandas supports loading data from Excel spreadsheets, JSON files, and other formats.

Mental Model

read_excel and read_json follow the same philosophy as read_csv: one function call turns a file into a DataFrame. Excel adds sheet selection and header-row options; JSON adds orientation (records, columns, index) because JSON can nest data in multiple ways. The output is always the same familiar DataFrame.

pd.read_excel¶

Load Excel files (.xlsx, .xls).

1. Basic Load¶

```python import pandas as pd

df = pd.read_excel('data.xlsx') print(df.head()) ```

2. From URL¶

python url = 'https://example.com/data.xlsx?raw=true' df = pd.read_excel(url, sheet_name='Sheet1')

3. Specify Sheet¶

```python

By name¶

df = pd.read_excel('data.xlsx', sheet_name='Sales')

By index (0-based)¶

df = pd.read_excel('data.xlsx', sheet_name=0) ```

Excel Keywords¶

Customize Excel loading.

1. Multiple Sheets¶

```python

Load all sheets as dictionary¶

dfs = pd.read_excel('data.xlsx', sheet_name=None)

dfs['Sheet1'], dfs['Sheet2'], etc.¶

```

2. Skip Rows¶

python df = pd.read_excel('data.xlsx', skiprows=2)

3. Use Columns¶

```python df = pd.read_excel('data.xlsx', usecols='A:D')

or¶

df = pd.read_excel('data.xlsx', usecols=[0, 1, 2, 3]) ```

pd.read_json¶

Load JSON files into DataFrames.

1. Basic Load¶

python df = pd.read_json('data.json') print(df.head())

2. From URL¶

python url = 'https://raw.githubusercontent.com/example/data.json' df = pd.read_json(url)

3. JSON Structure¶

```python

JSON should be array of objects or object of arrays¶

[{"a": 1, "b": 2}, {"a": 3, "b": 4}]¶

```

JSON Orientations¶

Different JSON structures.

1. Records Orient¶

```python

JSON: [{"col1": 1, "col2": 2}, ...]¶

df = pd.read_json('data.json', orient='records') ```

2. Columns Orient¶

```python

JSON:¶

df = pd.read_json('data.json', orient='columns') ```

3. Index Orient¶

```python

JSON: {"row1": {"col1": 1}, "row2": {"col1": 2}}¶

df = pd.read_json('data.json', orient='index') ```

pd.read_table¶

Load delimited files with read_table.

1. Pipe Delimiter¶

python df = pd.read_table('data.txt', sep='|')

2. With Header¶

python names = ['user_id', 'age', 'gender', 'occupation', 'zip_code'] df = pd.read_table( 'data.txt', sep='|', header=None, names=names )

3. Select Columns¶

python df = pd.read_table( 'data.txt', sep='|', usecols=['age', 'gender', 'occupation'] )

pd.read_fwf¶

Load fixed-width formatted files.

1. Basic Load¶

python df = pd.read_fwf('data.txt') print(df.head())

2. Specify Widths¶

python df = pd.read_fwf('data.txt', widths=[10, 5, 8, 12])

3. Column Positions¶

python df = pd.read_fwf('data.txt', colspecs=[(0, 10), (10, 15), (15, 23)])

pd.HDFStore¶

Load HDF5 files for large datasets.

1. Open Store¶

python h5 = pd.HDFStore('data.h5', 'r') print(h5.keys())

2. Read Data¶

```python df = h5['/table_name']

or¶

df = h5['table_name'] ```

3. Close Store¶

```python h5.close()

Or use context manager¶

with pd.HDFStore('data.h5', 'r') as h5: df = h5['table_name'] ```

Comparison¶

When to use each format.

1. CSV¶

Universal compatibility
Human-readable
Good for medium data

2. Excel¶

Preserves formatting
Multiple sheets
Good for business users

3. JSON¶

Web API responses
Nested structures
JavaScript integration

4. HDF5¶

Large datasets
Fast I/O
Scientific computing

Exercises¶

Exercise 1. Create a JSON string in "records" orientation (a list of dictionaries) with keys 'name', 'age', and 'city' for three people. Write it to a file and read it back using pd.read_json with orient='records'. Print the resulting DataFrame.

Solution to Exercise 1

Write JSON records and read them back.

import pandas as pd
import json

records = [
    {"name": "Alice", "age": 30, "city": "NYC"},
    {"name": "Bob", "age": 25, "city": "LA"},
    {"name": "Carol", "age": 35, "city": "SF"}
]
with open('people.json', 'w') as f:
    json.dump(records, f)

df = pd.read_json('people.json', orient='records')
print(df)

Exercise 2. Create a DataFrame and save it to an Excel file with the sheet name 'Report'. Then read it back specifying the sheet name. Verify the loaded DataFrame matches the original.

Solution to Exercise 2

Save to Excel with a named sheet and read it back.

import pandas as pd

df = pd.DataFrame({
    'product': ['Widget', 'Gadget'],
    'sales': [100, 200]
})
df.to_excel('report.xlsx', sheet_name='Report', index=False)

df_loaded = pd.read_excel('report.xlsx', sheet_name='Report')
print(df_loaded)
print(df.equals(df_loaded))  # True

Exercise 3. Given a pipe-delimited text string with header user_id|age|gender, use pd.read_table (with io.StringIO) to read the data. Add custom column names ['ID', 'Age', 'Gender'] using the names parameter and skip the original header row.

Solution to Exercise 3

Use io.StringIO with pd.read_table and custom column names.

import pandas as pd
import io

text = "user_id|age|gender\n1|25|M\n2|30|F\n3|28|M"
df = pd.read_table(
    io.StringIO(text),
    sep='|',
    header=0,
    names=['ID', 'Age', 'Gender'],
    skiprows=1
)
print(df)