Skip to content

Bar Plots

Bar plots visualize categorical data by displaying rectangular bars with heights proportional to the values they represent. Pandas provides multiple ways to create bar plots.

Mental Model

A bar plot maps categories to bar heights. Use kind='bar' for vertical bars and kind='barh' for horizontal bars when labels are long. Stacked bars show composition within each category; grouped bars show side-by-side comparison. The input is typically value_counts() or a groupby result.

Basic Bar Plot

Using plot(kind='bar')

```python import pandas as pd import matplotlib.pyplot as plt

Count categories

url = 'https://raw.githubusercontent.com/justmarkham/DAT8/master/data/drinks.csv' df = pd.read_csv(url)

fig, ax = plt.subplots(figsize=(10, 4)) df['continent'].value_counts().plot(kind='bar', ax=ax) ax.set_title('Countries by Continent') ax.set_ylabel('Count') plt.tight_layout() plt.show() ```

Horizontal Bar (kind='barh')

python fig, ax = plt.subplots(figsize=(8, 5)) df['continent'].value_counts().plot(kind='barh', ax=ax) ax.set_title('Countries by Continent') ax.set_xlabel('Count') plt.tight_layout() plt.show()

Single Bar Plot with Matplotlib

For more control, use matplotlib's ax.bar():

```python import matplotlib.pyplot as plt import pandas as pd

def load_teachers_data(): data = { 'Courses': ('Language', 'History', 'Geometry', 'Chemistry', 'Physics'), 'Number of Teachers': (7, 3, 9, 1, 2) } return pd.DataFrame(data).set_index('Courses')

df = load_teachers_data()

fig, ax = plt.subplots(figsize=(10, 4)) teacher_counts = df['Number of Teachers']

ax.bar( x=range(len(teacher_counts)), height=teacher_counts, tick_label=df.index, width=0.5 )

ax.set_xlabel('Courses') ax.set_ylabel('Number of Teachers') ax.set_title('Favorite Courses of Teachers') ax.spines[['right', 'top']].set_visible(False)

plt.tight_layout() plt.show() ```

Grouped Bar Plot

Compare multiple metrics across categories:

```python import matplotlib.pyplot as plt import numpy as np import pandas as pd

def load_student_scores(): data = { 'Student': ['Brandon', 'Vanessa', 'Daniel', 'Kevin', 'William'], 'Midterm': [85, 60, 60, 65, 100], 'Final': [90, 90, 65, 80, 95] } return pd.DataFrame(data).set_index('Student')

df = load_student_scores()

Set up positions

positions = np.arange(len(df)) width = 0.35

fig, ax = plt.subplots(figsize=(10, 5))

Plot bars side by side

ax.bar(positions - width/2, df['Midterm'], width, label='Midterm') ax.bar(positions + width/2, df['Final'], width, label='Final')

Customize

ax.set_xticks(positions) ax.set_xticklabels(df.index) ax.set_xlabel('Student') ax.set_ylabel('Score') ax.set_title('Midterm and Final Scores') ax.legend() ax.spines[['right', 'top']].set_visible(False)

plt.tight_layout() plt.show() ```

Using pandas plot() for Grouped Bars

python fig, ax = plt.subplots(figsize=(10, 5)) df.plot(kind='bar', ax=ax) ax.set_title('Student Scores') ax.set_ylabel('Score') plt.xticks(rotation=0) plt.tight_layout() plt.show()

Stacked Bar Plot

Show composition within categories:

```python

Using pandas

fig, ax = plt.subplots(figsize=(10, 5)) df.plot(kind='bar', stacked=True, ax=ax) ax.set_title('Student Scores (Stacked)') plt.xticks(rotation=0) plt.tight_layout() plt.show() ```

Segmented Bar Plot (100% Stacked)

Show proportions rather than absolute values:

```python import matplotlib.pyplot as plt import numpy as np

Data: Has Antibodies?

labels = ('Yes', 'No') antibody_pcts = ( np.array([95, 90, 40]), # Yes percentages np.array([5, 10, 60]) # No percentages ) age_groups = ('Adults', 'Children', 'Infants')

fig, ax = plt.subplots(figsize=(8, 5))

Initialize bottom for stacking

bottom = np.zeros(3)

Stack bars

for label, pct in zip(labels, antibody_pcts): ax.bar( x=np.arange(3), height=pct, width=0.5, bottom=bottom, label=label ) bottom += pct

ax.set_xticks(np.arange(3)) ax.set_xticklabels(age_groups) ax.set_ylabel('Percentage') ax.set_title('Has Antibodies?') ax.spines[['top', 'right']].set_visible(False) ax.legend(title='Response', loc='center left', bbox_to_anchor=(1.0, 0.5))

plt.tight_layout() plt.show() ```

Bar Plot from Value Counts

Common pattern for categorical data:

```python url = "https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv" df = pd.read_csv(url)

fig, axes = plt.subplots(1, 3, figsize=(12, 4))

Survival counts

df['Survived'].value_counts().plot(kind='bar', ax=axes[0]) axes[0].set_title('Survival') axes[0].set_xticklabels(['Died', 'Survived'], rotation=0)

Passenger class

df['Pclass'].value_counts().sort_index().plot(kind='bar', ax=axes[1]) axes[1].set_title('Passenger Class') axes[1].set_xticklabels(['1st', '2nd', '3rd'], rotation=0)

Embarkation port

df['Embarked'].value_counts().plot(kind='bar', ax=axes[2]) axes[2].set_title('Embarkation Port')

plt.tight_layout() plt.show() ```

Customization Options

Colors

```python

Single color

df['continent'].value_counts().plot(kind='bar', color='steelblue')

Multiple colors

colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b'] df['continent'].value_counts().plot(kind='bar', color=colors) ```

Edge Color

python df['continent'].value_counts().plot(kind='bar', edgecolor='black')

Bar Width

python df['continent'].value_counts().plot(kind='bar', width=0.8) # 0-1 range

Rotation

```python

Rotate x-tick labels

df.plot(kind='bar', rot=45) ```

Grid

python df.plot(kind='bar', grid=True)

Sorting Bars

```python

Sort by value (descending - default for value_counts)

df['continent'].value_counts().plot(kind='bar')

Sort by value (ascending)

df['continent'].value_counts().sort_values().plot(kind='bar')

Sort alphabetically

df['continent'].value_counts().sort_index().plot(kind='bar') ```

Adding Value Labels

```python fig, ax = plt.subplots(figsize=(10, 5)) counts = df['continent'].value_counts() bars = ax.bar(range(len(counts)), counts.values)

Add labels on bars

for bar, count in zip(bars, counts.values): ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.5, str(count), ha='center', va='bottom')

ax.set_xticks(range(len(counts))) ax.set_xticklabels(counts.index) ax.set_title('Countries by Continent') plt.tight_layout() plt.show() ```

Summary of Bar Plot Types

Type Code Use Case
Vertical bar plot(kind='bar') Category comparison
Horizontal bar plot(kind='barh') Long category names
Grouped bar Multiple ax.bar() calls Compare metrics
Stacked bar plot(kind='bar', stacked=True) Show composition
100% stacked Manual with percentages Show proportions

Quick Reference

```python

Basic bar from value counts

series.value_counts().plot(kind='bar')

Horizontal

series.value_counts().plot(kind='barh')

Multiple columns grouped

df.plot(kind='bar')

Stacked

df.plot(kind='bar', stacked=True)

Customized

series.value_counts().plot( kind='bar', color='steelblue', edgecolor='black', width=0.7, rot=45 ) ```


Exercises

Exercise 1. Write code that creates a bar plot from a DataFrame using df.plot.bar(). Add a title and axis labels.

Solution to Exercise 1

```python import pandas as pd import numpy as np

Solution for the specific exercise

np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(10), 'B': np.random.randn(10)}) print(df.head()) ```


Exercise 2. Create a grouped bar plot from a DataFrame with multiple numeric columns using df.plot.bar().

Solution to Exercise 2

See the main content for the detailed explanation. The key concept involves understanding the Pandas API and its behavior for this specific operation.


Exercise 3. Write code that creates a horizontal bar plot using df.plot.barh() and sorts the bars by value.

Solution to Exercise 3

```python import pandas as pd import numpy as np

np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(20), 'B': np.random.randn(20)}) result = df.describe() print(result) ```


Exercise 4. Create a stacked bar plot using df.plot.bar(stacked=True) to show the composition of categories.

Solution to Exercise 4

```python import pandas as pd import numpy as np

np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(50), 'group': np.random.choice(['X', 'Y'], 50)}) result = df.groupby('group').mean() print(result) ```