Skip to content

hist() Method

The hist() method creates histograms for numeric columns in a DataFrame. Unlike plot(kind='hist'), this method is specifically designed for histogram creation with additional features.

Mental Model

A histogram chops a continuous variable into bins and counts how many values fall in each bin. The bins parameter controls granularity: too few bins hide patterns, too many create noise. Use density=True to normalize to a probability distribution, and the by parameter to compare distributions across groups.

Basic Usage

DataFrame.hist()

Creates histograms for all numeric columns:

```python import pandas as pd import matplotlib.pyplot as plt

url = "https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv" df = pd.read_csv(url)

Histogram for all numeric columns

df.hist(figsize=(12, 8)) plt.tight_layout() plt.show() ```

Series.hist()

python df['Age'].hist(bins=20) plt.show()

Key Parameters

bins - Number of Bins

```python fig, axes = plt.subplots(1, 3, figsize=(12, 3))

df['Age'].hist(bins=10, ax=axes[0]) axes[0].set_title('10 bins')

df['Age'].hist(bins=30, ax=axes[1]) axes[1].set_title('30 bins')

df['Age'].hist(bins=50, ax=axes[2]) axes[2].set_title('50 bins')

plt.tight_layout() plt.show() ```

column - Specific Column

python df.hist(column='Age', bins=20) plt.show()

by - Group by Category

Create separate histograms for each category:

```python url = 'https://raw.githubusercontent.com/justmarkham/DAT8/master/data/drinks.csv' df = pd.read_csv(url)

Histogram of beer servings by continent

df.hist(column='beer_servings', by='continent', figsize=(12, 8), sharex=True, sharey=True) plt.tight_layout() plt.show() ```

layout - Subplot Arrangement

```python

Arrange histograms in 2 rows, 3 columns

df.hist(column='beer_servings', by='continent', layout=(2, 3), figsize=(12, 6)) plt.tight_layout() plt.show() ```

sharex and sharey - Share Axes

python df.hist( column='beer_servings', by='continent', sharex=True, # Same x-axis scale sharey=True # Same y-axis scale )

figsize - Figure Size

python df.hist(figsize=(15, 10))

ax - Specify Axes

python fig, axes = plt.subplots(3, 3, figsize=(12, 9)) df.hist(ax=axes) plt.tight_layout() plt.show()

Practical Example: Housing Data

```python import os import tarfile import urllib.request import pandas as pd import matplotlib.pyplot as plt

Download and load housing data

DOWNLOAD_ROOT = "https://raw.githubusercontent.com/ageron/handson-ml2/master/" HOUSING_PATH = os.path.join("datasets", "housing") HOUSING_URL = DOWNLOAD_ROOT + "datasets/housing/housing.tgz"

def fetch_housing_data(): if not os.path.isdir(HOUSING_PATH): os.makedirs(HOUSING_PATH) tgz_path = os.path.join(HOUSING_PATH, "housing.tgz") urllib.request.urlretrieve(HOUSING_URL, tgz_path) with tarfile.open(tgz_path) as tgz: tgz.extractall(path=HOUSING_PATH)

def load_housing_data(): return pd.read_csv(os.path.join(HOUSING_PATH, "housing.csv"))

fetch_housing_data() df = load_housing_data()

Create histogram grid

fig, axes = plt.subplots(3, 3, figsize=(12, 9)) df.hist(bins=50, ax=axes)

Clean up appearance

for ax in axes.flatten(): ax.grid(False) ax.spines[['top', 'right']].set_visible(False)

plt.tight_layout() plt.show() ```

Practical Example: Titanic Data

```python url = "https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv" df = pd.read_csv(url)

Create histograms for specific columns

fig, axes = plt.subplots(1, 5, figsize=(15, 3)) columns = ['Survived', 'Pclass', 'Age', 'SibSp', 'Fare']

for ax, col in zip(axes, columns): df[col].hist(ax=ax, density=True, edgecolor='black', alpha=0.7) ax.set_title(col) ax.set_ylabel('Density')

plt.tight_layout() plt.show() ```

Customization Options

density - Normalize to Probability

python df['Age'].hist(density=True) # Y-axis shows probability density

edgecolor - Bar Borders

python df['Age'].hist(edgecolor='black')

alpha - Transparency

python df['Age'].hist(alpha=0.7)

color - Bar Color

python df['Age'].hist(color='steelblue')

Comparing Distributions

Overlapping Histograms

```python fig, ax = plt.subplots(figsize=(10, 5))

Survived vs not survived

df[df['Survived'] == 1]['Age'].hist(alpha=0.5, label='Survived', ax=ax) df[df['Survived'] == 0]['Age'].hist(alpha=0.5, label='Did not survive', ax=ax)

ax.legend() ax.set_xlabel('Age') ax.set_title('Age Distribution by Survival') plt.show() ```

Side-by-Side with by Parameter

python df.hist(column='Age', by='Survived', figsize=(10, 4), sharex=True) plt.tight_layout() plt.show()

hist() vs plot(kind='hist')

Feature df.hist() df.plot(kind='hist')
Multiple columns Automatic grid Manual iteration
by parameter ✅ Supported ❌ Not supported
layout control ✅ Built-in Manual
Single column Use Series Use Series

```python

Both produce similar output for single column:

df['Age'].hist(bins=20) df['Age'].plot(kind='hist', bins=20) ```

Method Signature

python DataFrame.hist( column=None, # Column(s) to plot by=None, # Group by this column grid=True, # Show grid xlabelsize=None, # X label font size ylabelsize=None, # Y label font size ax=None, # Matplotlib axes sharex=False, # Share x-axis sharey=False, # Share y-axis figsize=None, # Figure size layout=None, # Subplot layout (rows, cols) bins=10, # Number of bins **kwargs # Additional hist kwargs )

Summary

```python

All numeric columns

df.hist()

Specific column

df.hist(column='Age') df['Age'].hist()

Grouped by category

df.hist(column='Age', by='Survived')

Customized

df.hist(bins=30, figsize=(12, 8), edgecolor='black', alpha=0.7)

With layout

df.hist(column='beer', by='continent', layout=(2, 3), sharex=True) ```


Exercises

Exercise 1. Write code that creates a histogram from a DataFrame column using df['col'].plot.hist(bins=30).

Solution to Exercise 1

```python import pandas as pd import numpy as np

Solution for the specific exercise

np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(10), 'B': np.random.randn(10)}) print(df.head()) ```


Exercise 2. Create overlaid histograms for two columns using df[['a', 'b']].plot.hist(alpha=0.5).

Solution to Exercise 2

See the main content for the detailed explanation. The key concept involves understanding the Pandas API and its behavior for this specific operation.


Exercise 3. Write code that uses df.plot.hist(subplots=True) to create a separate histogram for each numeric column.

Solution to Exercise 3

```python import pandas as pd import numpy as np

np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(20), 'B': np.random.randn(20)}) result = df.describe() print(result) ```


Exercise 4. Create a histogram with density=True and overlay a KDE curve using df['col'].plot.kde() on the same axes.

Solution to Exercise 4

```python import pandas as pd import numpy as np

np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(50), 'group': np.random.choice(['X', 'Y'], 50)}) result = df.groupby('group').mean() print(result) ```