Series Creation¶
This document covers all methods for creating pandas Series objects, from basic list conversion to extracting columns from DataFrames.
From a List¶
The simplest way to create a Series is from a Python list.
Basic Creation¶
import pandas as pd
# Default integer index (0, 1, 2, ...)
s = pd.Series([3, 9, 1])
print(s)
0 3
1 9
2 1
dtype: int64
With Custom Index¶
s = pd.Series([3, 9, 1], index=['a', 'b', 'c'])
print(s)
a 3
b 9
c 1
dtype: int64
With Name¶
s = pd.Series([3, 9, 1], name='values')
print(s)
0 3
1 9
2 1
Name: values, dtype: int64
With DatetimeIndex¶
data = [3, 9, 1]
name = "daily_values"
index = pd.date_range(start='2019-09-01', end='2019-09-03')
s = pd.Series(data, name=name, index=index)
print(s)
2019-09-01 3
2019-09-02 9
2019-09-03 1
Freq: D, Name: daily_values, dtype: int64
Specifying dtype¶
# Force float type
s = pd.Series([1, 2, 3], dtype='float64')
print(s)
0 1.0
1 2.0
2 3.0
dtype: float64
From a Dictionary¶
When creating from a dictionary, keys become index labels.
Basic Dictionary Creation¶
data = {'a': 10, 'b': 20, 'c': 30}
s = pd.Series(data)
print(s)
a 10
b 20
c 30
dtype: int64
With Date String Keys¶
data_dict = {
'2019-09-01': 3,
'2019-09-02': 9,
'2019-09-03': 1
}
s = pd.Series(data_dict, name="data")
print(s)
2019-09-01 3
2019-09-02 9
2019-09-03 1
Name: data, dtype: int64
Reordering with Index Parameter¶
data = {'a': 10, 'b': 20, 'c': 30}
# Reorder and potentially add NaN for missing keys
s = pd.Series(data, index=['c', 'b', 'a', 'd'])
print(s)
c 30.0
b 20.0
a 10.0
d NaN
dtype: float64
From a DataFrame Column¶
Extracting a column from a DataFrame returns a Series.
Bracket Notation (Recommended)¶
url = "https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv"
df = pd.read_csv(url)
# Single brackets return a Series
survived = df["Survived"]
print(type(survived)) # <class 'pandas.core.series.Series'>
print(survived.head())
0 0
1 1
2 1
3 1
4 0
Name: Survived, dtype: int64
Dot Notation (Use Cautiously)¶
# Works for simple column names
survived = df.Survived
print(type(survived)) # <class 'pandas.core.series.Series'>
Limitations of dot notation: - Fails if column name contains spaces - Fails if column name starts with a number - Fails if column name conflicts with DataFrame methods
# These will NOT work with dot notation:
# df.Passenger Id # Syntax error (space)
# df.1st_class # Syntax error (starts with number)
# df.count # Returns method, not column
Preserving DataFrame Type¶
# Double brackets return a DataFrame, not a Series
survived_df = df[["Survived"]]
print(type(survived_df)) # <class 'pandas.core.frame.DataFrame'>
print(survived_df.shape) # (891, 1)
# Single brackets return a Series
survived_series = df["Survived"]
print(survived_series.shape) # (891,)
From a NumPy Array¶
import numpy as np
arr = np.array([1.5, 2.5, 3.5])
s = pd.Series(arr)
print(s)
0 1.5
1 2.5
2 3.5
dtype: float64
With Shared Memory¶
By default, the Series may share memory with the original array:
arr = np.array([1, 2, 3])
s = pd.Series(arr)
arr[0] = 999
print(s[0]) # May be 999 (shared memory)
# To avoid shared memory, use copy=True
s = pd.Series(arr, copy=True)
From a Scalar Value¶
A scalar is broadcast to fill all index positions.
s = pd.Series(5, index=['a', 'b', 'c'])
print(s)
a 5
b 5
c 5
dtype: int64
From a Range¶
s = pd.Series(range(5))
print(s)
0 0
1 1
2 2
3 3
4 4
dtype: int64
dtype Inference and Upcasting¶
pandas automatically infers the appropriate dtype and performs upcasting when needed.
String Data → object¶
s = pd.Series(['Boat', 'Car', 'Bike'])
print(f"{s.dtype = }") # s.dtype = dtype('O')
Integer Data → int64¶
s = pd.Series([1, 55, 99])
print(f"{s.dtype = }") # s.dtype = dtype('int64')
Float Data → float64¶
s = pd.Series([1., 55., 99.])
print(f"{s.dtype = }") # s.dtype = dtype('float64')
Mixed int/float → Upcasted to float64¶
s = pd.Series([1., 55, 99]) # Mixed float and int
print(f"{s.dtype = }") # s.dtype = dtype('float64')
With Missing Values → float64¶
s = pd.Series([1, 2, None])
print(f"{s.dtype = }") # s.dtype = dtype('float64')
print(s)
0 1.0
1 2.0
2 NaN
dtype: float64
Nullable Integer Type¶
# Use nullable integer type to preserve integers with NaN
s = pd.Series([1, 2, None], dtype='Int64')
print(s)
0 1
1 2
2 <NA>
dtype: Int64
Financial Examples¶
Stock Prices¶
import yfinance as yf
# Download and extract close prices as Series
ticker = 'AAPL'
df = yf.Ticker(ticker).history(start='2024-01-01', end='2024-06-30')
close_prices = df['Close']
print(type(close_prices)) # <class 'pandas.core.series.Series'>
print(close_prices.head())
Portfolio Weights¶
weights = pd.Series({
'AAPL': 0.30,
'MSFT': 0.25,
'GOOGL': 0.20,
'AMZN': 0.15,
'META': 0.10
}, name='weight')
print(weights)
print(f"Total: {weights.sum()}") # 1.0
Daily Returns¶
# Create returns Series from prices
prices = pd.Series(
[100, 102, 101, 105, 103],
index=pd.date_range('2024-01-01', periods=5),
name='AAPL'
)
returns = prices.pct_change()
print(returns)
2024-01-01 NaN
2024-01-02 0.020000
2024-01-03 -0.009804
2024-01-04 0.039604
2024-01-05 -0.019048
Freq: D, Name: AAPL, dtype: float64
Creation Method Summary¶
| Method | Use Case | Example |
|---|---|---|
| From list | Simple data | pd.Series([1, 2, 3]) |
| From dict | Labeled data | pd.Series({'a': 1, 'b': 2}) |
| From DataFrame | Column extraction | df['column'] |
| From NumPy | Numerical computing | pd.Series(np.array([...])) |
| From scalar | Constant fill | pd.Series(0, index=[...]) |
| From range | Sequential integers | pd.Series(range(10)) |