Plot Types (kind Parameter)¶

The kind parameter in pandas plot() determines the type of visualization. This document covers all available plot types.

Mental Model

The kind parameter selects the chart type. Line for trends over time, bar for category comparisons, hist for distributions, scatter for relationships between two variables, box for distribution summaries. Each kind answers a different question about the data -- pick the kind that matches your question.

Available Plot Types¶

kind	Plot Type	Use Case
`'line'`	Line plot	Time series, trends
`'bar'`	Vertical bar	Category comparison
`'barh'`	Horizontal bar	Category comparison
`'hist'`	Histogram	Distribution
`'box'`	Box plot	Distribution summary
`'kde'`/`'density'`	Kernel density	Smooth distribution
`'area'`	Stacked area	Composition over time
`'pie'`	Pie chart	Proportions
`'scatter'`	Scatter plot	Relationship between variables
`'hexbin'`	Hexbin plot	Dense scatter alternative

Line Plot (Default)¶

```python import pandas as pd import matplotlib.pyplot as plt import numpy as np

df = pd.DataFrame({ 'A': np.random.randn(50).cumsum(), 'B': np.random.randn(50).cumsum() })

df.plot(kind='line') # or just df.plot() plt.show() ```

Bar Plot¶

Vertical Bar (kind='bar')¶

```python

Count categories¶

url = 'https://raw.githubusercontent.com/justmarkham/DAT8/master/data/drinks.csv' df = pd.read_csv(url)

fig, ax = plt.subplots(figsize=(10, 4)) df['continent'].value_counts().plot(kind='bar', ax=ax) ax.set_title('Countries by Continent') plt.show() ```

Horizontal Bar (kind='barh')¶

python fig, ax = plt.subplots(figsize=(8, 5)) df['continent'].value_counts().plot(kind='barh', ax=ax) ax.set_title('Countries by Continent') plt.show()

Histogram (kind='hist')¶

```python url = "https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv" df = pd.read_csv(url)

fig, ax = plt.subplots(figsize=(8, 4)) df['Age'].plot(kind='hist', bins=20, ax=ax, edgecolor='black') ax.set_title('Age Distribution') ax.set_xlabel('Age') plt.show() ```

Histogram Keywords¶

python df['Age'].plot( kind='hist', bins=30, # Number of bins density=True, # Normalize to density alpha=0.7, # Transparency edgecolor='black' # Bar edge color )

Box Plot (kind='box')¶

```python url = "https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv" df = pd.read_csv(url)

fig, ax = plt.subplots(figsize=(5, 4)) df['Age'].plot(kind='box', ax=ax) ax.set_title('Age Distribution') plt.show() ```

Horizontal Box Plot¶

python fig, ax = plt.subplots(figsize=(8, 3)) df['Age'].plot(kind='box', ax=ax, vert=False) ax.set_title('Horizontal Boxplot of Passenger Ages') ax.set_xlabel('Age') plt.show()

Multiple Box Plots¶

python fig, ax = plt.subplots(figsize=(10, 4)) df[['Age', 'Fare']].plot(kind='box', ax=ax) plt.show()

Density Plot (kind='density' or kind='kde')¶

Kernel Density Estimation shows a smooth distribution curve:

```python url = 'https://raw.githubusercontent.com/justmarkham/DAT8/master/data/drinks.csv' df = pd.read_csv(url)

fig, ax = plt.subplots(figsize=(10, 4))

Histogram with density overlay¶

df['beer_servings'].plot(kind='hist', bins=20, density=True, alpha=0.5, ax=ax) df['beer_servings'].plot(kind='density', ax=ax)

ax.set_xlabel('Beer Servings') ax.set_title('Distribution of Beer Servings') plt.show() ```

Scatter Plot (kind='scatter')¶

Requires both x and y parameters:

```python df = pd.read_csv('https://vincentarelbundock.github.io/Rdatasets/csv/datasets/mtcars.csv')

fig, ax = plt.subplots(figsize=(8, 5)) df.plot( kind='scatter', x='wt', y='mpg', ax=ax ) ax.set_title('Weight vs MPG') plt.show() ```

Scatter with Size and Color¶

python fig, ax = plt.subplots(figsize=(10, 6)) df.plot( kind='scatter', x='wt', y='mpg', s=df['hp'], # Point size by horsepower c='disp', # Color by displacement colormap='Blues', alpha=0.6, ax=ax ) ax.set_title('Weight vs MPG (size=HP, color=Displacement)') plt.show()

Area Plot (kind='area')¶

Stacked area chart for composition over time:

```python df = pd.DataFrame({ 'A': np.random.rand(10) * 10, 'B': np.random.rand(10) * 10, 'C': np.random.rand(10) * 10 }, index=pd.date_range('2024-01-01', periods=10))

fig, ax = plt.subplots(figsize=(10, 5)) df.plot(kind='area', ax=ax, alpha=0.5) ax.set_title('Stacked Area Plot') plt.show() ```

Unstacked Area¶

python df.plot(kind='area', stacked=False, alpha=0.4)

Pie Chart (kind='pie')¶

For Series data showing proportions:

```python data = pd.Series([30, 25, 20, 15, 10], index=['A', 'B', 'C', 'D', 'E'])

fig, ax = plt.subplots(figsize=(6, 6)) data.plot(kind='pie', ax=ax, autopct='%1.1f%%') ax.set_ylabel('') # Remove default ylabel ax.set_title('Category Proportions') plt.show() ```

Hexbin Plot (kind='hexbin')¶

For large scatter datasets, hexbin aggregates points:

```python n = 10000 df = pd.DataFrame({ 'x': np.random.randn(n), 'y': np.random.randn(n) })

fig, ax = plt.subplots(figsize=(8, 6)) df.plot( kind='hexbin', x='x', y='y', gridsize=25, cmap='YlOrRd', ax=ax ) ax.set_title('Hexbin Density Plot') plt.show() ```

Choosing the Right Plot Type¶

Data Type	Goal	Recommended kind
Time series	Show trend	`'line'`
Categories	Compare counts	`'bar'` or `'barh'`
Single numeric	Show distribution	`'hist'` or `'kde'`
Single numeric	Summary stats	`'box'`
Two numeric	Show relationship	`'scatter'`
Two numeric (large n)	Density	`'hexbin'`
Proportions	Part of whole	`'pie'`
Multiple series	Composition	`'area'`

Quick Reference¶

```python

Line (default)¶

df.plot() df.plot(kind='line')

Bar¶

df['col'].value_counts().plot(kind='bar') df['col'].value_counts().plot(kind='barh')

Histogram¶

df['col'].plot(kind='hist', bins=20)

Box¶

df['col'].plot(kind='box') df[['col1', 'col2']].plot(kind='box')

Density¶

df['col'].plot(kind='density') df['col'].plot(kind='kde')

Scatter¶

df.plot(kind='scatter', x='col1', y='col2')

Area¶

df.plot(kind='area')

Pie¶

series.plot(kind='pie')

Hexbin¶

df.plot(kind='hexbin', x='col1', y='col2') ```

Exercises¶

Exercise 1. Write code that creates a line plot, bar plot, and scatter plot from the same DataFrame using df.plot(kind=...).

Solution to Exercise 1

```python import pandas as pd import numpy as np

Solution for the specific exercise¶

np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(10), 'B': np.random.randn(10)}) print(df.head()) ```

Exercise 2. List all available plot kinds in df.plot() and describe when each is most appropriate.

Solution to Exercise 2

See the main content for the detailed explanation. The key concept involves understanding the Pandas API and its behavior for this specific operation.

Exercise 3. Write code that creates an area plot using df.plot.area() to show cumulative values over time.

Solution to Exercise 3

```python import pandas as pd import numpy as np

np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(20), 'B': np.random.randn(20)}) result = df.describe() print(result) ```

Exercise 4. Create a pie chart from a Series using s.plot.pie() with autopct='%1.1f%%'.

Solution to Exercise 4

```python import pandas as pd import numpy as np

np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(50), 'group': np.random.choice(['X', 'Y'], 50)}) result = df.groupby('group').mean() print(result) ```