DataFrame Creation¶

DataFrames can be created from various data structures including dictionaries, lists, NumPy arrays, and other DataFrames.

Mental Model

Every DataFrame constructor answers the same question: "How should I interpret this raw data as rows and columns?" A dict of lists reads keys as column names. A list of dicts reads keys as column names per row. A 2D array needs explicit column labels. Choose the constructor that matches how your source data is already organized.

From Dictionary of Lists¶

Column-oriented data with lists as values.

1. Basic Dictionary¶

```python import pandas as pd

data = { 'temperature': [32, 35, 28], 'windspeed': [6, 7, 2], 'event': ['Rain', 'Sunny', 'Snow'] }

df = pd.DataFrame(data) print(df) ```

temperature windspeed event 0 32 6 Rain 1 35 7 Sunny 2 28 2 Snow

2. With Custom Index¶

python day = ['1/1/2017', '1/2/2017', '1/3/2017'] df = pd.DataFrame(data, index=day) print(df)

temperature windspeed event 1/1/2017 32 6 Rain 1/2/2017 35 7 Sunny 1/3/2017 28 2 Snow

3. Access Attributes¶

python print(df.index) # Index(['1/1/2017', '1/2/2017', '1/3/2017']) print(df.columns) # Index(['temperature', 'windspeed', 'event'])

From Dictionary of Dictionaries¶

Row keys become the index automatically.

1. Nested Dictionaries¶

```python temp = {'1/1/2017': 32, '1/2/2017': 35, '1/3/2017': 28} wind = {'1/1/2017': 6, '1/2/2017': 7, '1/3/2017': 2} event = {'1/1/2017': 'Rain', '1/2/2017': 'Sunny', '1/3/2017': 'Snow'}

data = {'temperature': temp, 'windspeed': wind, 'event': event} df = pd.DataFrame(data) print(df) ```

2. Automatic Index¶

Keys from inner dictionaries become the DataFrame index.

3. Handling Missing Keys¶

```python

If inner dicts have different keys, NaN fills missing values¶

```

From List of Lists¶

Row-oriented data with each inner list as a row.

1. Basic List of Lists¶

```python data = [ ['1/1/2017', 32, 6, 'Rain'], ['1/2/2017', 35, 7, 'Sunny'], ['1/3/2017', 28, 2, 'Snow'] ]

columns = ['day', 'temperature', 'windspeed', 'event'] df = pd.DataFrame(data, columns=columns) print(df) ```

2. Set Column as Index¶

python df = df.set_index('day')

3. Direct Index Assignment¶

python df = pd.DataFrame(data, columns=columns).set_index('day')

From List of Dictionaries¶

Each dictionary represents a row.

1. Row Dictionaries¶

```python data = [ {'day': '1/1/2017', 'temperature': 32, 'windspeed': 6, 'event': 'Rain'}, {'day': '1/2/2017', 'temperature': 35, 'windspeed': 7, 'event': 'Sunny'}, {'day': '1/3/2017', 'temperature': 28, 'windspeed': 2, 'event': 'Snow'} ]

df = pd.DataFrame(data).set_index('day') ```

2. Automatic Column Detection¶

Column names are inferred from dictionary keys.

3. Missing Keys¶

```python

Missing keys in some dicts result in NaN values¶

```

From NumPy Array¶

Create DataFrame from 2D array.

1. Random Data¶

```python import numpy as np

np.random.seed(0) data = np.random.normal(size=(3, 4))

index = ['Jenny', 'Frank', 'Wenfei'] columns = list('ABCD')

df = pd.DataFrame(data, index=index, columns=columns) print(df) ```

A B C D Jenny 1.764052 0.400157 0.978738 2.240893 Frank 1.867558 -0.977278 0.950088 -0.151357 Wenfei -0.103219 0.410599 0.144044 1.454274

2. Specify dtype¶

python df = pd.DataFrame(data, dtype=float)

3. Shape Preservation¶

DataFrame shape matches array shape.

LeetCode Example¶

Create DataFrame from list of student data.

1. Sample Data¶

python student_data = [ [101, 20], [102, 22], [103, 21] ]

2. Create DataFrame¶

python df = pd.DataFrame(student_data, columns=['student_id', 'age']) print(df)

student_id age 0 101 20 1 102 22 2 103 21

3. Type Annotation¶

```python from typing import List

def createDataframe(student_data: List[List[int]]) -> pd.DataFrame: return pd.DataFrame(student_data, columns=['student_id', 'age']) ```

Exercises¶

Exercise 1. Create a DataFrame from a dictionary where keys are 'ticker', 'sector', and 'market_cap' with at least four rows of sample stock data. Set the 'ticker' column as the index after creation using set_index.

Solution to Exercise 1

Create from a dictionary and set the index.

import pandas as pd

df = pd.DataFrame({
    'ticker': ['AAPL', 'MSFT', 'GOOGL', 'AMZN'],
    'sector': ['Tech', 'Tech', 'Tech', 'Consumer'],
    'market_cap': [2800, 2400, 1800, 1500]
})
df = df.set_index('ticker')
print(df)

Exercise 2. Create a DataFrame from a list of dictionaries where each dictionary represents a student with keys 'name', 'grade', and 'score'. One of the dictionaries should be missing the 'score' key. Print the DataFrame and observe how pandas handles the missing value.

Solution to Exercise 2

Missing keys in dictionaries produce NaN in the DataFrame.

import pandas as pd

students = [
    {'name': 'Alice', 'grade': 'A', 'score': 95},
    {'name': 'Bob', 'grade': 'B', 'score': 85},
    {'name': 'Carol', 'grade': 'A'},  # Missing 'score'
]
df = pd.DataFrame(students)
print(df)
# Carol's score will be NaN

Exercise 3. Create a DataFrame from a 3x4 NumPy random array. Assign custom column names ['Q1', 'Q2', 'Q3', 'Q4'] and custom index labels ['2022', '2023', '2024']. Then verify the shape is (3, 4).

Solution to Exercise 3

Create from a NumPy array with custom labels.

import pandas as pd
import numpy as np

np.random.seed(42)
data = np.random.rand(3, 4)
df = pd.DataFrame(
    data,
    columns=['Q1', 'Q2', 'Q3', 'Q4'],
    index=['2022', '2023', '2024']
)
print(df)
print(f"Shape: {df.shape}")  # (3, 4)