Plot Types (kind Parameter)¶
The kind parameter in pandas plot() determines the type of visualization. This document covers all available plot types.
Available Plot Types¶
| kind | Plot Type | Use Case |
|---|---|---|
'line' |
Line plot | Time series, trends |
'bar' |
Vertical bar | Category comparison |
'barh' |
Horizontal bar | Category comparison |
'hist' |
Histogram | Distribution |
'box' |
Box plot | Distribution summary |
'kde'/'density' |
Kernel density | Smooth distribution |
'area' |
Stacked area | Composition over time |
'pie' |
Pie chart | Proportions |
'scatter' |
Scatter plot | Relationship between variables |
'hexbin' |
Hexbin plot | Dense scatter alternative |
Line Plot (Default)¶
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
df = pd.DataFrame({
'A': np.random.randn(50).cumsum(),
'B': np.random.randn(50).cumsum()
})
df.plot(kind='line') # or just df.plot()
plt.show()
Bar Plot¶
Vertical Bar (kind='bar')¶
# Count categories
url = 'https://raw.githubusercontent.com/justmarkham/DAT8/master/data/drinks.csv'
df = pd.read_csv(url)
fig, ax = plt.subplots(figsize=(10, 4))
df['continent'].value_counts().plot(kind='bar', ax=ax)
ax.set_title('Countries by Continent')
plt.show()
Horizontal Bar (kind='barh')¶
fig, ax = plt.subplots(figsize=(8, 5))
df['continent'].value_counts().plot(kind='barh', ax=ax)
ax.set_title('Countries by Continent')
plt.show()
Histogram (kind='hist')¶
url = "https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv"
df = pd.read_csv(url)
fig, ax = plt.subplots(figsize=(8, 4))
df['Age'].plot(kind='hist', bins=20, ax=ax, edgecolor='black')
ax.set_title('Age Distribution')
ax.set_xlabel('Age')
plt.show()
Histogram Keywords¶
df['Age'].plot(
kind='hist',
bins=30, # Number of bins
density=True, # Normalize to density
alpha=0.7, # Transparency
edgecolor='black' # Bar edge color
)
Box Plot (kind='box')¶
url = "https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv"
df = pd.read_csv(url)
fig, ax = plt.subplots(figsize=(5, 4))
df['Age'].plot(kind='box', ax=ax)
ax.set_title('Age Distribution')
plt.show()
Horizontal Box Plot¶
fig, ax = plt.subplots(figsize=(8, 3))
df['Age'].plot(kind='box', ax=ax, vert=False)
ax.set_title('Horizontal Boxplot of Passenger Ages')
ax.set_xlabel('Age')
plt.show()
Multiple Box Plots¶
fig, ax = plt.subplots(figsize=(10, 4))
df[['Age', 'Fare']].plot(kind='box', ax=ax)
plt.show()
Density Plot (kind='density' or kind='kde')¶
Kernel Density Estimation shows a smooth distribution curve:
url = 'https://raw.githubusercontent.com/justmarkham/DAT8/master/data/drinks.csv'
df = pd.read_csv(url)
fig, ax = plt.subplots(figsize=(10, 4))
# Histogram with density overlay
df['beer_servings'].plot(kind='hist', bins=20, density=True, alpha=0.5, ax=ax)
df['beer_servings'].plot(kind='density', ax=ax)
ax.set_xlabel('Beer Servings')
ax.set_title('Distribution of Beer Servings')
plt.show()
Scatter Plot (kind='scatter')¶
Requires both x and y parameters:
df = pd.read_csv('https://vincentarelbundock.github.io/Rdatasets/csv/datasets/mtcars.csv')
fig, ax = plt.subplots(figsize=(8, 5))
df.plot(
kind='scatter',
x='wt',
y='mpg',
ax=ax
)
ax.set_title('Weight vs MPG')
plt.show()
Scatter with Size and Color¶
fig, ax = plt.subplots(figsize=(10, 6))
df.plot(
kind='scatter',
x='wt',
y='mpg',
s=df['hp'], # Point size by horsepower
c='disp', # Color by displacement
colormap='Blues',
alpha=0.6,
ax=ax
)
ax.set_title('Weight vs MPG (size=HP, color=Displacement)')
plt.show()
Area Plot (kind='area')¶
Stacked area chart for composition over time:
df = pd.DataFrame({
'A': np.random.rand(10) * 10,
'B': np.random.rand(10) * 10,
'C': np.random.rand(10) * 10
}, index=pd.date_range('2024-01-01', periods=10))
fig, ax = plt.subplots(figsize=(10, 5))
df.plot(kind='area', ax=ax, alpha=0.5)
ax.set_title('Stacked Area Plot')
plt.show()
Unstacked Area¶
df.plot(kind='area', stacked=False, alpha=0.4)
Pie Chart (kind='pie')¶
For Series data showing proportions:
data = pd.Series([30, 25, 20, 15, 10],
index=['A', 'B', 'C', 'D', 'E'])
fig, ax = plt.subplots(figsize=(6, 6))
data.plot(kind='pie', ax=ax, autopct='%1.1f%%')
ax.set_ylabel('') # Remove default ylabel
ax.set_title('Category Proportions')
plt.show()
Hexbin Plot (kind='hexbin')¶
For large scatter datasets, hexbin aggregates points:
n = 10000
df = pd.DataFrame({
'x': np.random.randn(n),
'y': np.random.randn(n)
})
fig, ax = plt.subplots(figsize=(8, 6))
df.plot(
kind='hexbin',
x='x',
y='y',
gridsize=25,
cmap='YlOrRd',
ax=ax
)
ax.set_title('Hexbin Density Plot')
plt.show()
Choosing the Right Plot Type¶
| Data Type | Goal | Recommended kind |
|---|---|---|
| Time series | Show trend | 'line' |
| Categories | Compare counts | 'bar' or 'barh' |
| Single numeric | Show distribution | 'hist' or 'kde' |
| Single numeric | Summary stats | 'box' |
| Two numeric | Show relationship | 'scatter' |
| Two numeric (large n) | Density | 'hexbin' |
| Proportions | Part of whole | 'pie' |
| Multiple series | Composition | 'area' |
Quick Reference¶
# Line (default)
df.plot()
df.plot(kind='line')
# Bar
df['col'].value_counts().plot(kind='bar')
df['col'].value_counts().plot(kind='barh')
# Histogram
df['col'].plot(kind='hist', bins=20)
# Box
df['col'].plot(kind='box')
df[['col1', 'col2']].plot(kind='box')
# Density
df['col'].plot(kind='density')
df['col'].plot(kind='kde')
# Scatter
df.plot(kind='scatter', x='col1', y='col2')
# Area
df.plot(kind='area')
# Pie
series.plot(kind='pie')
# Hexbin
df.plot(kind='hexbin', x='col1', y='col2')