Sorting by Index and Values¶
pandas provides methods to sort DataFrames and Series by their index labels or column values.
sort_index Method¶
Sort by row or column labels.
1. Sort Rows by Index¶
import pandas as pd
df = pd.DataFrame({
'A': [3, 1, 2],
'B': [6, 4, 5]
}, index=['c', 'a', 'b'])
df.sort_index()
A B
a 1 4
b 2 5
c 3 6
2. Descending Order¶
df.sort_index(ascending=False)
A B
c 3 6
b 2 5
a 1 4
3. Sort Columns¶
df = pd.DataFrame({
'C': [1, 2],
'A': [3, 4],
'B': [5, 6]
})
df.sort_index(axis=1) # Sort columns alphabetically
sort_values Method¶
Sort by values in specified columns.
1. Single Column¶
df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Carol'],
'Age': [30, 25, 35]
})
df.sort_values(by='Age')
Name Age
1 Bob 25
0 Alice 30
2 Carol 35
2. Multiple Columns¶
df = pd.DataFrame({
'Dept': ['A', 'B', 'A', 'B'],
'Name': ['Alice', 'Bob', 'Carol', 'Dave'],
'Salary': [50000, 60000, 55000, 60000]
})
df.sort_values(by=['Dept', 'Salary'])
3. Mixed Order¶
df.sort_values(
by=['Dept', 'Salary'],
ascending=[True, False] # Dept ascending, Salary descending
)
Series Sorting¶
Sort Series by index or values.
1. Sort by Values¶
s = pd.Series([3, 1, 4, 1, 5], index=['a', 'b', 'c', 'd', 'e'])
s.sort_values()
b 1
d 1
a 3
c 4
e 5
dtype: int64
2. Sort by Index¶
s.sort_index()
3. Get Sorted Index¶
s.argsort() # Returns positions for sorting
Handling NaN¶
Control how missing values are sorted.
1. NaN at End (Default)¶
df = pd.DataFrame({
'A': [3, None, 1, 2]
})
df.sort_values(by='A') # NaN at end
2. NaN at Beginning¶
df.sort_values(by='A', na_position='first')
3. Drop NaN Before Sorting¶
df.dropna().sort_values(by='A')
Ranking¶
Assign ranks to values.
1. Basic Ranking¶
df = pd.DataFrame({
'Score': [85, 90, 85, 95]
})
df['Rank'] = df['Score'].rank(ascending=False)
2. Ranking Methods¶
# method='average' (default): average rank for ties
# method='min': lowest rank for ties
# method='max': highest rank for ties
# method='first': ranks by order of appearance
# method='dense': like 'min', but ranks always increase by 1
3. Dense Ranking Example¶
df['DenseRank'] = df['Score'].rank(ascending=False, method='dense')
Practical Examples¶
Common sorting scenarios.
1. Top N Values with nlargest¶
The nlargest() method is more efficient than sort_values().head() for finding top N values.
df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Carol', 'David', 'Eve'],
'Salary': [75000, 60000, 90000, 55000, 85000],
'Experience': [5, 3, 8, 2, 6]
})
# Get top 3 highest salaries
top_earners = df.nlargest(3, 'Salary')
print(top_earners)
Name Salary Experience
2 Carol 90000 8
4 Eve 85000 6
0 Alice 75000 5
2. Bottom N Values with nsmallest¶
# Get 3 employees with least experience
newest = df.nsmallest(3, 'Experience')
print(newest)
Name Salary Experience
3 David 55000 2
1 Bob 60000 3
0 Alice 75000 5
3. Multiple Columns in nlargest/nsmallest¶
# Top 3 by salary, then by experience (tie-breaker)
df.nlargest(3, ['Salary', 'Experience'])
# Equivalent to sorting by multiple columns
df.sort_values(['Salary', 'Experience'], ascending=[False, False]).head(3)
4. Performance Comparison¶
import time
import numpy as np
# Large DataFrame
large_df = pd.DataFrame({
'value': np.random.randn(1_000_000)
})
# Method 1: sort_values + head (slower)
start = time.time()
result1 = large_df.sort_values('value', ascending=False).head(10)
sort_time = time.time() - start
# Method 2: nlargest (faster)
start = time.time()
result2 = large_df.nlargest(10, 'value')
nlargest_time = time.time() - start
print(f"sort_values + head: {sort_time:.3f}s")
print(f"nlargest: {nlargest_time:.3f}s")
print(f"Speedup: {sort_time/nlargest_time:.1f}x")
Typical result: nlargest is 5-10x faster for large DataFrames with small N.
5. Series nlargest and nsmallest¶
s = pd.Series([3, 1, 4, 1, 5, 9, 2, 6], index=list('abcdefgh'))
# Top 3 values
print(s.nlargest(3))
f 9
h 6
e 5
dtype: int64
# Bottom 3 values
print(s.nsmallest(3))
b 1
d 1
g 2
dtype: int64
6. Financial Example: Top Performers¶
import yfinance as yf
# Get stock data
tickers = ['AAPL', 'GOOGL', 'MSFT', 'AMZN', 'META', 'NVDA', 'TSLA']
returns = {}
for ticker in tickers:
data = yf.Ticker(ticker).history(period='1y')
returns[ticker] = (data['Close'].iloc[-1] / data['Close'].iloc[0]) - 1
returns_df = pd.DataFrame({
'Ticker': list(returns.keys()),
'Annual_Return': list(returns.values())
})
# Top 3 performers
print("Top 3 Performers:")
print(returns_df.nlargest(3, 'Annual_Return'))
# Bottom 3 performers
print("\nBottom 3 Performers:")
print(returns_df.nsmallest(3, 'Annual_Return'))
7. Keep Parameter (Handling Duplicates)¶
df = pd.DataFrame({
'name': ['A', 'B', 'C', 'D'],
'value': [10, 20, 20, 30] # B and C have same value
})
# 'first' (default): keep first occurrence
print(df.nlargest(3, 'value', keep='first'))
# 'last': keep last occurrence
print(df.nlargest(3, 'value', keep='last'))
# 'all': keep all duplicates (may return more than n rows)
print(df.nlargest(3, 'value', keep='all'))
8. Combining with groupby¶
df = pd.DataFrame({
'department': ['Sales', 'Sales', 'IT', 'IT', 'HR', 'HR'],
'employee': ['A', 'B', 'C', 'D', 'E', 'F'],
'salary': [50000, 55000, 70000, 65000, 45000, 48000]
})
# Top 1 earner per department
top_per_dept = df.groupby('department').apply(
lambda x: x.nlargest(1, 'salary')
).reset_index(drop=True)
print(top_per_dept)
9. Reset Index After Sort¶
df_sorted = df.sort_values('Age').reset_index(drop=True)
Key Parameter¶
Custom sort using a key function.
1. Case-insensitive Sort¶
df.sort_values(by='Name', key=lambda x: x.str.lower())
2. String Length Sort¶
df.sort_values(by='Name', key=lambda x: x.str.len())
3. Custom Transform¶
df.sort_values(by='Value', key=lambda x: abs(x))