Sorting by Index and Values¶

pandas provides methods to sort DataFrames and Series by their index labels or column values.

Mental Model

sort_values reorders rows by the data in one or more columns -- like clicking a column header in a spreadsheet. sort_index reorders rows by their index labels. Both return a new DataFrame by default and support ascending=False for descending order.

sort_index Method¶

Sort by row or column labels.

1. Sort Rows by Index¶

```python import pandas as pd

df = pd.DataFrame({ 'A': [3, 1, 2], 'B': [6, 4, 5] }, index=['c', 'a', 'b'])

df.sort_index() ```

A B a 1 4 b 2 5 c 3 6

2. Descending Order¶

python df.sort_index(ascending=False)

A B c 3 6 b 2 5 a 1 4

3. Sort Columns¶

```python df = pd.DataFrame({ 'C': [1, 2], 'A': [3, 4], 'B': [5, 6] })

df.sort_index(axis=1) # Sort columns alphabetically ```

sort_values Method¶

Sort by values in specified columns.

1. Single Column¶

```python df = pd.DataFrame({ 'Name': ['Alice', 'Bob', 'Carol'], 'Age': [30, 25, 35] })

df.sort_values(by='Age') ```

Name Age 1 Bob 25 0 Alice 30 2 Carol 35

2. Multiple Columns¶

```python df = pd.DataFrame({ 'Dept': ['A', 'B', 'A', 'B'], 'Name': ['Alice', 'Bob', 'Carol', 'Dave'], 'Salary': [50000, 60000, 55000, 60000] })

df.sort_values(by=['Dept', 'Salary']) ```

3. Mixed Order¶

python df.sort_values( by=['Dept', 'Salary'], ascending=[True, False] # Dept ascending, Salary descending )

Series Sorting¶

Sort Series by index or values.

1. Sort by Values¶

python s = pd.Series([3, 1, 4, 1, 5], index=['a', 'b', 'c', 'd', 'e']) s.sort_values()

b 1 d 1 a 3 c 4 e 5 dtype: int64

2. Sort by Index¶

python s.sort_index()

3. Get Sorted Index¶

python s.argsort() # Returns positions for sorting

Handling NaN¶

Control how missing values are sorted.

1. NaN at End (Default)¶

```python df = pd.DataFrame({ 'A': [3, None, 1, 2] })

df.sort_values(by='A') # NaN at end ```

2. NaN at Beginning¶

python df.sort_values(by='A', na_position='first')

3. Drop NaN Before Sorting¶

python df.dropna().sort_values(by='A')

Ranking¶

Assign ranks to values.

1. Basic Ranking¶

```python df = pd.DataFrame({ 'Score': [85, 90, 85, 95] })

df['Rank'] = df['Score'].rank(ascending=False) ```

2. Ranking Methods¶

```python

method='average' (default): average rank for ties¶

method='min': lowest rank for ties¶

method='max': highest rank for ties¶

method='first': ranks by order of appearance¶

method='dense': like 'min', but ranks always increase by 1¶

```

3. Dense Ranking Example¶

python df['DenseRank'] = df['Score'].rank(ascending=False, method='dense')

Practical Examples¶

Common sorting scenarios.

1. Top N Values with nlargest¶

The nlargest() method is more efficient than sort_values().head() for finding top N values.

```python df = pd.DataFrame({ 'Name': ['Alice', 'Bob', 'Carol', 'David', 'Eve'], 'Salary': [75000, 60000, 90000, 55000, 85000], 'Experience': [5, 3, 8, 2, 6] })

Get top 3 highest salaries¶

top_earners = df.nlargest(3, 'Salary') print(top_earners) ```

Name Salary Experience 2 Carol 90000 8 4 Eve 85000 6 0 Alice 75000 5

2. Bottom N Values with nsmallest¶

```python

Get 3 employees with least experience¶

newest = df.nsmallest(3, 'Experience') print(newest) ```

Name Salary Experience 3 David 55000 2 1 Bob 60000 3 0 Alice 75000 5

3. Multiple Columns in nlargest/nsmallest¶

```python

Top 3 by salary, then by experience (tie-breaker)¶

df.nlargest(3, ['Salary', 'Experience'])

Equivalent to sorting by multiple columns¶

df.sort_values(['Salary', 'Experience'], ascending=[False, False]).head(3) ```

4. Performance Comparison¶

```python import time import numpy as np

Large DataFrame¶

large_df = pd.DataFrame({ 'value': np.random.randn(1_000_000) })

Method 1: sort_values + head (slower)¶

start = time.time() result1 = large_df.sort_values('value', ascending=False).head(10) sort_time = time.time() - start

Method 2: nlargest (faster)¶

start = time.time() result2 = large_df.nlargest(10, 'value') nlargest_time = time.time() - start

print(f"sort_values + head: {sort_time:.3f}s") print(f"nlargest: {nlargest_time:.3f}s") print(f"Speedup: {sort_time/nlargest_time:.1f}x") ```

Typical result: nlargest is 5-10x faster for large DataFrames with small N.

5. Series nlargest and nsmallest¶

```python s = pd.Series([3, 1, 4, 1, 5, 9, 2, 6], index=list('abcdefgh'))

Top 3 values¶

print(s.nlargest(3)) ```

f 9 h 6 e 5 dtype: int64

```python

Bottom 3 values¶

print(s.nsmallest(3)) ```

b 1 d 1 g 2 dtype: int64

6. Financial Example: Top Performers¶

```python import yfinance as yf

Get stock data¶

tickers = ['AAPL', 'GOOGL', 'MSFT', 'AMZN', 'META', 'NVDA', 'TSLA'] returns = {}

for ticker in tickers: data = yf.Ticker(ticker).history(period='1y') returns[ticker] = (data['Close'].iloc[-1] / data['Close'].iloc[0]) - 1

returns_df = pd.DataFrame({ 'Ticker': list(returns.keys()), 'Annual_Return': list(returns.values()) })

Top 3 performers¶

print("Top 3 Performers:") print(returns_df.nlargest(3, 'Annual_Return'))

Bottom 3 performers¶

print("\nBottom 3 Performers:") print(returns_df.nsmallest(3, 'Annual_Return')) ```

7. Keep Parameter (Handling Duplicates)¶

```python df = pd.DataFrame({ 'name': ['A', 'B', 'C', 'D'], 'value': [10, 20, 20, 30] # B and C have same value })

'first' (default): keep first occurrence¶

print(df.nlargest(3, 'value', keep='first'))

'last': keep last occurrence¶

print(df.nlargest(3, 'value', keep='last'))

'all': keep all duplicates (may return more than n rows)¶

print(df.nlargest(3, 'value', keep='all')) ```

8. Combining with groupby¶

```python df = pd.DataFrame({ 'department': ['Sales', 'Sales', 'IT', 'IT', 'HR', 'HR'], 'employee': ['A', 'B', 'C', 'D', 'E', 'F'], 'salary': [50000, 55000, 70000, 65000, 45000, 48000] })

Top 1 earner per department¶

top_per_dept = df.groupby('department').apply( lambda x: x.nlargest(1, 'salary') ).reset_index(drop=True) print(top_per_dept) ```

9. Reset Index After Sort¶

python df_sorted = df.sort_values('Age').reset_index(drop=True)

Key Parameter¶

Custom sort using a key function.

1. Case-insensitive Sort¶

python df.sort_values(by='Name', key=lambda x: x.str.lower())

2. String Length Sort¶

python df.sort_values(by='Name', key=lambda x: x.str.len())

3. Custom Transform¶

python df.sort_values(by='Value', key=lambda x: abs(x))

Exercises¶

Exercise 1. Create a DataFrame with columns 'name', 'age', and 'score'. Sort by 'score' in descending order using sort_values(). Then sort by 'age' ascending and 'score' descending simultaneously.

Solution to Exercise 1

Sort by one or multiple columns.

import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Carol', 'Dave'],
    'age': [30, 25, 30, 25],
    'score': [85, 90, 78, 92]
})
print(df.sort_values('score', ascending=False))
print(df.sort_values(['age', 'score'], ascending=[True, False]))

Exercise 2. Create a DataFrame with a non-sequential index (e.g., [5, 2, 8, 1, 3]). Use sort_index() to sort rows by the index. Then sort the columns alphabetically using sort_index(axis=1).

Solution to Exercise 2

Sort by index and by columns.

import pandas as pd

df = pd.DataFrame(
    {'C': [1, 2, 3, 4, 5], 'A': [6, 7, 8, 9, 10], 'B': [11, 12, 13, 14, 15]},
    index=[5, 2, 8, 1, 3]
)
print("Sort by index:")
print(df.sort_index())
print("\nSort columns:")
print(df.sort_index(axis=1))

Exercise 3. Create a DataFrame with some NaN values in a column. Use sort_values() with na_position='first' to place NaN rows at the top, then with na_position='last' to place them at the bottom. Compare the two results.

Solution to Exercise 3

Control NaN position in sorted output.

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Carol', 'Dave'],
    'score': [85, np.nan, 78, np.nan]
})
print("NaN first:")
print(df.sort_values('score', na_position='first'))
print("\nNaN last:")
print(df.sort_values('score', na_position='last'))