Skip to content

View vs Copy

One of the most common sources of bugs in pandas is confusion between views and copies. Understanding when pandas returns a view (reference to original data) vs a copy (independent duplicate) is essential for avoiding silent data corruption.

The Problem

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

# Is this a view or a copy?
subset = df[df['A'] > 1]
subset['B'] = 99  # Does this modify df?

print(df)  # Is df changed?

The answer depends on pandas internals and can vary between versions. This unpredictability is the problem.

What is a View?

A view shares memory with the original DataFrame:

# Arrays can have views
arr = np.array([1, 2, 3, 4, 5])
view = arr[1:4]  # This is a view

view[0] = 999
print(arr)  # [1, 999, 3, 4, 5] - Original changed!

What is a Copy?

A copy is independent - modifying it doesn't affect the original:

arr = np.array([1, 2, 3, 4, 5])
copy = arr[1:4].copy()  # Explicit copy

copy[0] = 999
print(arr)  # [1, 2, 3, 4, 5] - Original unchanged

When Does pandas Return a View?

Likely View (But Not Guaranteed)

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

# Single column selection - often a view
col = df['A']

# Slice of rows - sometimes a view
rows = df[0:2]

Likely Copy

# Boolean indexing - usually a copy
subset = df[df['A'] > 1]

# Multiple column selection - usually a copy
cols = df[['A', 'B']]

# Chained indexing - definitely problematic
result = df[df['A'] > 1]['B']

The Danger: Silent Bugs

Bug Example 1: Modification Doesn't Persist

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

# This might not work as expected
subset = df[df['A'] > 1]
subset['B'] = 0  # Modifying a copy

print(df)  # df might be unchanged

Bug Example 2: Unintended Modification

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

# Get a "view"
col = df['A']

# Modify through the view
col[0] = 999  # This might change df!

print(df)  # df might be changed

The Solution: Be Explicit

Rule 1: Use .copy() When You Want Independence

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

# Explicit copy - safe to modify
subset = df[df['A'] > 1].copy()
subset['B'] = 0

print(df)  # Definitely unchanged
print(subset)  # Has your changes

Rule 2: Use .loc for Direct Modification

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

# Direct modification - guaranteed to work
df.loc[df['A'] > 1, 'B'] = 0

print(df)  # Definitely modified

SettingWithCopyWarning

pandas tries to warn you about ambiguous situations:

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

subset = df[df['A'] > 1]
subset['B'] = 0  # SettingWithCopyWarning!
SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

Never ignore this warning!

Safe Patterns

Pattern 1: Filter and Modify (Using .loc)

# WRONG
df[df['A'] > 1]['B'] = 0

# RIGHT
df.loc[df['A'] > 1, 'B'] = 0

Pattern 2: Create Modified Subset

# WRONG
subset = df[df['A'] > 1]
subset['new_col'] = subset['B'] * 2

# RIGHT
subset = df[df['A'] > 1].copy()
subset['new_col'] = subset['B'] * 2

Pattern 3: Process and Return

def process_data(df):
    # WRONG - might modify original
    result = df[df['A'] > 1]
    result['processed'] = True
    return result

    # RIGHT - explicit copy
    result = df[df['A'] > 1].copy()
    result['processed'] = True
    return result

Pattern 4: Chain Operations Safely

# Using method chaining (creates copies automatically)
result = (df
    .query('A > 1')
    .assign(B_doubled=lambda x: x['B'] * 2)
    .sort_values('B_doubled')
)

Checking If It's a View

# Check if two arrays share memory
def shares_memory(a, b):
    return np.shares_memory(a.values, b.values)

df = pd.DataFrame({'A': [1, 2, 3]})
col = df['A']

print(shares_memory(df, col))  # Might be True

Copy-on-Write (pandas 2.0+)

pandas 2.0 introduced Copy-on-Write (CoW) mode:

pd.options.mode.copy_on_write = True

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
subset = df[df['A'] > 1]

# With CoW, this creates a copy automatically when needed
subset['B'] = 0

print(df)  # Original unchanged

Summary: Golden Rules

Situation Safe Pattern
Modify subset df.loc[condition, col] = value
Create independent subset df[condition].copy()
Add column to subset subset = df[...].copy(); subset['new'] = ...
Function that modifies def f(df): df = df.copy(); ...
Chained operations Method chaining with .assign()

When in doubt, use .copy()!