hist() Method¶

The hist() method creates histograms for numeric columns in a DataFrame. Unlike plot(kind='hist'), this method is specifically designed for histogram creation with additional features.

Basic Usage¶

DataFrame.hist()¶

Creates histograms for all numeric columns:

import pandas as pd
import matplotlib.pyplot as plt

url = "https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv"
df = pd.read_csv(url)

# Histogram for all numeric columns
df.hist(figsize=(12, 8))
plt.tight_layout()
plt.show()

Series.hist()¶

df['Age'].hist(bins=20)
plt.show()

Key Parameters¶

bins - Number of Bins¶

fig, axes = plt.subplots(1, 3, figsize=(12, 3))

df['Age'].hist(bins=10, ax=axes[0])
axes[0].set_title('10 bins')

df['Age'].hist(bins=30, ax=axes[1])
axes[1].set_title('30 bins')

df['Age'].hist(bins=50, ax=axes[2])
axes[2].set_title('50 bins')

plt.tight_layout()
plt.show()

column - Specific Column¶

df.hist(column='Age', bins=20)
plt.show()

by - Group by Category¶

Create separate histograms for each category:

url = 'https://raw.githubusercontent.com/justmarkham/DAT8/master/data/drinks.csv'
df = pd.read_csv(url)

# Histogram of beer servings by continent
df.hist(column='beer_servings', by='continent', figsize=(12, 8), sharex=True, sharey=True)
plt.tight_layout()
plt.show()

layout - Subplot Arrangement¶

# Arrange histograms in 2 rows, 3 columns
df.hist(column='beer_servings', by='continent', layout=(2, 3), figsize=(12, 6))
plt.tight_layout()
plt.show()

df.hist(
    column='beer_servings',
    by='continent',
    sharex=True,  # Same x-axis scale
    sharey=True   # Same y-axis scale
)

figsize - Figure Size¶

df.hist(figsize=(15, 10))

ax - Specify Axes¶

fig, axes = plt.subplots(3, 3, figsize=(12, 9))
df.hist(ax=axes)
plt.tight_layout()
plt.show()

Practical Example: Housing Data¶

import os
import tarfile
import urllib.request
import pandas as pd
import matplotlib.pyplot as plt

# Download and load housing data
DOWNLOAD_ROOT = "https://raw.githubusercontent.com/ageron/handson-ml2/master/"
HOUSING_PATH = os.path.join("datasets", "housing")
HOUSING_URL = DOWNLOAD_ROOT + "datasets/housing/housing.tgz"

def fetch_housing_data():
    if not os.path.isdir(HOUSING_PATH):
        os.makedirs(HOUSING_PATH)
    tgz_path = os.path.join(HOUSING_PATH, "housing.tgz")
    urllib.request.urlretrieve(HOUSING_URL, tgz_path)
    with tarfile.open(tgz_path) as tgz:
        tgz.extractall(path=HOUSING_PATH)

def load_housing_data():
    return pd.read_csv(os.path.join(HOUSING_PATH, "housing.csv"))

fetch_housing_data()
df = load_housing_data()

# Create histogram grid
fig, axes = plt.subplots(3, 3, figsize=(12, 9))
df.hist(bins=50, ax=axes)

# Clean up appearance
for ax in axes.flatten():
    ax.grid(False)
    ax.spines[['top', 'right']].set_visible(False)

plt.tight_layout()
plt.show()

Practical Example: Titanic Data¶

url = "https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv"
df = pd.read_csv(url)

# Create histograms for specific columns
fig, axes = plt.subplots(1, 5, figsize=(15, 3))
columns = ['Survived', 'Pclass', 'Age', 'SibSp', 'Fare']

for ax, col in zip(axes, columns):
    df[col].hist(ax=ax, density=True, edgecolor='black', alpha=0.7)
    ax.set_title(col)
    ax.set_ylabel('Density')

plt.tight_layout()
plt.show()

Customization Options¶

density - Normalize to Probability¶

df['Age'].hist(density=True)  # Y-axis shows probability density

edgecolor - Bar Borders¶

df['Age'].hist(edgecolor='black')

alpha - Transparency¶

df['Age'].hist(alpha=0.7)

color - Bar Color¶

df['Age'].hist(color='steelblue')

Comparing Distributions¶

Overlapping Histograms¶

fig, ax = plt.subplots(figsize=(10, 5))

# Survived vs not survived
df[df['Survived'] == 1]['Age'].hist(alpha=0.5, label='Survived', ax=ax)
df[df['Survived'] == 0]['Age'].hist(alpha=0.5, label='Did not survive', ax=ax)

ax.legend()
ax.set_xlabel('Age')
ax.set_title('Age Distribution by Survival')
plt.show()

Side-by-Side with by Parameter¶

df.hist(column='Age', by='Survived', figsize=(10, 4), sharex=True)
plt.tight_layout()
plt.show()

hist() vs plot(kind='hist')¶

Feature	df.hist()	df.plot(kind='hist')
Multiple columns	Automatic grid	Manual iteration
by parameter	✅ Supported	❌ Not supported
layout control	✅ Built-in	Manual
Single column	Use Series	Use Series

# Both produce similar output for single column:
df['Age'].hist(bins=20)
df['Age'].plot(kind='hist', bins=20)

Method Signature¶

DataFrame.hist(
    column=None,       # Column(s) to plot
    by=None,           # Group by this column
    grid=True,         # Show grid
    xlabelsize=None,   # X label font size
    ylabelsize=None,   # Y label font size
    ax=None,           # Matplotlib axes
    sharex=False,      # Share x-axis
    sharey=False,      # Share y-axis
    figsize=None,      # Figure size
    layout=None,       # Subplot layout (rows, cols)
    bins=10,           # Number of bins
    **kwargs           # Additional hist kwargs
)

Summary¶

# All numeric columns
df.hist()

# Specific column
df.hist(column='Age')
df['Age'].hist()

# Grouped by category
df.hist(column='Age', by='Survived')

# Customized
df.hist(bins=30, figsize=(12, 8), edgecolor='black', alpha=0.7)

# With layout
df.hist(column='beer', by='continent', layout=(2, 3), sharex=True)