Series Creation¶

This document covers all methods for creating pandas Series objects, from basic list conversion to extracting columns from DataFrames.

From a List¶

The simplest way to create a Series is from a Python list.

Basic Creation¶

import pandas as pd

# Default integer index (0, 1, 2, ...)
s = pd.Series([3, 9, 1])
print(s)

0    3
1    9
2    1
dtype: int64

With Custom Index¶

s = pd.Series([3, 9, 1], index=['a', 'b', 'c'])
print(s)

a    3
b    9
c    1
dtype: int64

With Name¶

s = pd.Series([3, 9, 1], name='values')
print(s)

0    3
1    9
2    1
Name: values, dtype: int64

With DatetimeIndex¶

data = [3, 9, 1]
name = "daily_values"
index = pd.date_range(start='2019-09-01', end='2019-09-03')

s = pd.Series(data, name=name, index=index)
print(s)

2019-09-01    3
2019-09-02    9
2019-09-03    1
Freq: D, Name: daily_values, dtype: int64

Specifying dtype¶

# Force float type
s = pd.Series([1, 2, 3], dtype='float64')
print(s)

0    1.0
1    2.0
2    3.0
dtype: float64

From a Dictionary¶

When creating from a dictionary, keys become index labels.

Basic Dictionary Creation¶

data = {'a': 10, 'b': 20, 'c': 30}
s = pd.Series(data)
print(s)

a    10
b    20
c    30
dtype: int64

With Date String Keys¶

data_dict = {
    '2019-09-01': 3,
    '2019-09-02': 9,
    '2019-09-03': 1
}
s = pd.Series(data_dict, name="data")
print(s)

2019-09-01    3
2019-09-02    9
2019-09-03    1
Name: data, dtype: int64

Reordering with Index Parameter¶

data = {'a': 10, 'b': 20, 'c': 30}

# Reorder and potentially add NaN for missing keys
s = pd.Series(data, index=['c', 'b', 'a', 'd'])
print(s)

c    30.0
b    20.0
a    10.0
d     NaN
dtype: float64

From a DataFrame Column¶

Extracting a column from a DataFrame returns a Series.

Bracket Notation (Recommended)¶

url = "https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv"
df = pd.read_csv(url)

# Single brackets return a Series
survived = df["Survived"]
print(type(survived))  # <class 'pandas.core.series.Series'>
print(survived.head())

0    0
1    1
2    1
3    1
4    0
Name: Survived, dtype: int64

Dot Notation (Use Cautiously)¶

# Works for simple column names
survived = df.Survived
print(type(survived))  # <class 'pandas.core.series.Series'>

Limitations of dot notation: - Fails if column name contains spaces - Fails if column name starts with a number - Fails if column name conflicts with DataFrame methods

# These will NOT work with dot notation:
# df.Passenger Id  # Syntax error (space)
# df.1st_class     # Syntax error (starts with number)
# df.count         # Returns method, not column

Preserving DataFrame Type¶

# Double brackets return a DataFrame, not a Series
survived_df = df[["Survived"]]
print(type(survived_df))  # <class 'pandas.core.frame.DataFrame'>
print(survived_df.shape)  # (891, 1)

# Single brackets return a Series
survived_series = df["Survived"]
print(survived_series.shape)  # (891,)

From a NumPy Array¶

import numpy as np

arr = np.array([1.5, 2.5, 3.5])
s = pd.Series(arr)
print(s)

0    1.5
1    2.5
2    3.5
dtype: float64

With Shared Memory¶

By default, the Series may share memory with the original array:

arr = np.array([1, 2, 3])
s = pd.Series(arr)

arr[0] = 999
print(s[0])  # May be 999 (shared memory)

# To avoid shared memory, use copy=True
s = pd.Series(arr, copy=True)

From a Scalar Value¶

A scalar is broadcast to fill all index positions.

s = pd.Series(5, index=['a', 'b', 'c'])
print(s)

a    5
b    5
c    5
dtype: int64

From a Range¶

s = pd.Series(range(5))
print(s)

0    0
1    1
2    2
3    3
4    4
dtype: int64

dtype Inference and Upcasting¶

pandas automatically infers the appropriate dtype and performs upcasting when needed.

String Data → object¶

s = pd.Series(['Boat', 'Car', 'Bike'])
print(f"{s.dtype = }")  # s.dtype = dtype('O')

Integer Data → int64¶

s = pd.Series([1, 55, 99])
print(f"{s.dtype = }")  # s.dtype = dtype('int64')

Float Data → float64¶

s = pd.Series([1., 55., 99.])
print(f"{s.dtype = }")  # s.dtype = dtype('float64')

Mixed int/float → Upcasted to float64¶

s = pd.Series([1., 55, 99])  # Mixed float and int
print(f"{s.dtype = }")  # s.dtype = dtype('float64')

With Missing Values → float64¶

s = pd.Series([1, 2, None])
print(f"{s.dtype = }")  # s.dtype = dtype('float64')
print(s)

0    1.0
1    2.0
2    NaN
dtype: float64

Nullable Integer Type¶

# Use nullable integer type to preserve integers with NaN
s = pd.Series([1, 2, None], dtype='Int64')
print(s)

0       1
1       2
2    <NA>
dtype: Int64

Financial Examples¶

Stock Prices¶

import yfinance as yf

# Download and extract close prices as Series
ticker = 'AAPL'
df = yf.Ticker(ticker).history(start='2024-01-01', end='2024-06-30')

close_prices = df['Close']
print(type(close_prices))  # <class 'pandas.core.series.Series'>
print(close_prices.head())

Portfolio Weights¶

weights = pd.Series({
    'AAPL': 0.30,
    'MSFT': 0.25,
    'GOOGL': 0.20,
    'AMZN': 0.15,
    'META': 0.10
}, name='weight')

print(weights)
print(f"Total: {weights.sum()}")  # 1.0

Daily Returns¶

# Create returns Series from prices
prices = pd.Series(
    [100, 102, 101, 105, 103],
    index=pd.date_range('2024-01-01', periods=5),
    name='AAPL'
)

returns = prices.pct_change()
print(returns)

2024-01-01         NaN
2024-01-02    0.020000
2024-01-03   -0.009804
2024-01-04    0.039604
2024-01-05   -0.019048
Freq: D, Name: AAPL, dtype: float64

Creation Method Summary¶

Method	Use Case	Example
From list	Simple data	`pd.Series([1, 2, 3])`
From dict	Labeled data	`pd.Series({'a': 1, 'b': 2})`
From DataFrame	Column extraction	`df['column']`
From NumPy	Numerical computing	`pd.Series(np.array([...]))`
From scalar	Constant fill	`pd.Series(0, index=[...])`
From range	Sequential integers	`pd.Series(range(10))`