DataFrame Creation¶
DataFrames can be created from various data structures including dictionaries, lists, NumPy arrays, and other DataFrames.
From Dictionary of Lists¶
Column-oriented data with lists as values.
1. Basic Dictionary¶
import pandas as pd
data = {
'temperature': [32, 35, 28],
'windspeed': [6, 7, 2],
'event': ['Rain', 'Sunny', 'Snow']
}
df = pd.DataFrame(data)
print(df)
temperature windspeed event
0 32 6 Rain
1 35 7 Sunny
2 28 2 Snow
2. With Custom Index¶
day = ['1/1/2017', '1/2/2017', '1/3/2017']
df = pd.DataFrame(data, index=day)
print(df)
temperature windspeed event
1/1/2017 32 6 Rain
1/2/2017 35 7 Sunny
1/3/2017 28 2 Snow
3. Access Attributes¶
print(df.index) # Index(['1/1/2017', '1/2/2017', '1/3/2017'])
print(df.columns) # Index(['temperature', 'windspeed', 'event'])
From Dictionary of Dictionaries¶
Row keys become the index automatically.
1. Nested Dictionaries¶
temp = {'1/1/2017': 32, '1/2/2017': 35, '1/3/2017': 28}
wind = {'1/1/2017': 6, '1/2/2017': 7, '1/3/2017': 2}
event = {'1/1/2017': 'Rain', '1/2/2017': 'Sunny', '1/3/2017': 'Snow'}
data = {'temperature': temp, 'windspeed': wind, 'event': event}
df = pd.DataFrame(data)
print(df)
2. Automatic Index¶
Keys from inner dictionaries become the DataFrame index.
3. Handling Missing Keys¶
# If inner dicts have different keys, NaN fills missing values
From List of Lists¶
Row-oriented data with each inner list as a row.
1. Basic List of Lists¶
data = [
['1/1/2017', 32, 6, 'Rain'],
['1/2/2017', 35, 7, 'Sunny'],
['1/3/2017', 28, 2, 'Snow']
]
columns = ['day', 'temperature', 'windspeed', 'event']
df = pd.DataFrame(data, columns=columns)
print(df)
2. Set Column as Index¶
df = df.set_index('day')
3. Direct Index Assignment¶
df = pd.DataFrame(data, columns=columns).set_index('day')
From List of Dictionaries¶
Each dictionary represents a row.
1. Row Dictionaries¶
data = [
{'day': '1/1/2017', 'temperature': 32, 'windspeed': 6, 'event': 'Rain'},
{'day': '1/2/2017', 'temperature': 35, 'windspeed': 7, 'event': 'Sunny'},
{'day': '1/3/2017', 'temperature': 28, 'windspeed': 2, 'event': 'Snow'}
]
df = pd.DataFrame(data).set_index('day')
2. Automatic Column Detection¶
Column names are inferred from dictionary keys.
3. Missing Keys¶
# Missing keys in some dicts result in NaN values
From NumPy Array¶
Create DataFrame from 2D array.
1. Random Data¶
import numpy as np
np.random.seed(0)
data = np.random.normal(size=(3, 4))
index = ['Jenny', 'Frank', 'Wenfei']
columns = list('ABCD')
df = pd.DataFrame(data, index=index, columns=columns)
print(df)
A B C D
Jenny 1.764052 0.400157 0.978738 2.240893
Frank 1.867558 -0.977278 0.950088 -0.151357
Wenfei -0.103219 0.410599 0.144044 1.454274
2. Specify dtype¶
df = pd.DataFrame(data, dtype=float)
3. Shape Preservation¶
DataFrame shape matches array shape.
LeetCode Example¶
Create DataFrame from list of student data.
1. Sample Data¶
student_data = [
[101, 20],
[102, 22],
[103, 21]
]
2. Create DataFrame¶
df = pd.DataFrame(student_data, columns=['student_id', 'age'])
print(df)
student_id age
0 101 20
1 102 22
2 103 21
3. Type Annotation¶
from typing import List
def createDataframe(student_data: List[List[int]]) -> pd.DataFrame:
return pd.DataFrame(student_data, columns=['student_id', 'age'])