Skip to content

Violin Plot

Violin plots combine box plot statistics with kernel density estimation, showing the full distribution shape of data.

Mental Model

A violin plot is a box plot that traded its box for a mirrored density curve. The wider the violin at a given value, the more data points cluster there. This reveals multimodality and skewness that a box plot hides, while still showing median and quartile markers inside the shape.

Box vs Violin — the tradeoff:

  • Box plot → compact and comparable (easy to line up 10+ groups), but hides shape (a bimodal distribution looks the same as a unimodal one).
  • Violin plot → detailed and expressive (shows full density), but takes more space and can be harder to read for quick comparison.

Use box plots when the audience needs fast comparison across many groups. Use violin plots when the shape of the distribution matters (bimodality, skewness).

Violin plots are slower to read than box plots — the density curve requires more visual processing. Choose them when detail matters, not speed. Also be cautious with small samples: KDE can suggest smooth shapes that the sparse data does not actually support. Use violins when the shape of the distribution matters (e.g., detecting bimodality or heavy tails).

Basic Violin Plot

Simple Example

```python import matplotlib.pyplot as plt import numpy as np

np.random.seed(42) data = [np.random.normal(0, std, 100) for std in range(1, 5)]

fig, ax = plt.subplots() ax.violinplot(data) ax.set_xlabel('Dataset') ax.set_ylabel('Value') ax.set_title('Basic Violin Plot') plt.show() ```

With Custom Positions

python fig, ax = plt.subplots() positions = [1, 2, 4, 5] # Custom x positions ax.violinplot(data, positions=positions) ax.set_xticks(positions) ax.set_xticklabels(['A', 'B', 'C', 'D']) plt.show()

Anatomy of a Violin Plot

___ / \ ← Kernel density estimate (distribution shape) | | | ─ | ← Median (if showmedians=True) | │ | ← Interquartile range | ─ | ← Q1 and Q3 markers \___/ │ ← Extrema lines (min/max)

Key Parameters

Parameter Description Default
dataset Data to plot Required
positions X positions [1, 2, ...]
widths Width of violins 0.5
showmeans Show mean line False
showmedians Show median line False
showextrema Show min/max lines True
quantiles Quantile lines None
vert Vertical orientation True

Displaying Statistics

Show Median and Mean

python fig, ax = plt.subplots() ax.violinplot(data, showmeans=True, showmedians=True) plt.show()

Show Quantiles

```python fig, ax = plt.subplots()

Show 25th, 50th, and 75th percentiles

ax.violinplot(data, quantiles=[[0.25, 0.5, 0.75]] * len(data)) plt.show() ```

Hide Extrema

python fig, ax = plt.subplots() ax.violinplot(data, showextrema=False, showmedians=True) plt.show()

Styling Violin Plots

Accessing Violin Components

```python fig, ax = plt.subplots() parts = ax.violinplot(data, showmedians=True)

'parts' is a dictionary with keys:

'bodies': list of PolyCollection (violin shapes)

'cmeans': LineCollection (mean lines, if shown)

'cmedians': LineCollection (median lines, if shown)

'cbars': LineCollection (center bars)

'cmins': LineCollection (min lines)

'cmaxes': LineCollection (max lines)

```

Custom Colors

```python fig, ax = plt.subplots() parts = ax.violinplot(data, showmedians=True)

Color the violin bodies

colors = ['lightblue', 'lightgreen', 'lightyellow', 'lightcoral'] for i, body in enumerate(parts['bodies']): body.set_facecolor(colors[i]) body.set_edgecolor('black') body.set_alpha(0.7)

Color the median lines

parts['cmedians'].set_color('red') parts['cmedians'].set_linewidth(2)

plt.show() ```

Consistent Styling Function

```python def style_violins(parts, body_color='lightblue', edge_color='black', median_color='red', alpha=0.7): """Apply consistent styling to violin plot parts.""" for body in parts['bodies']: body.set_facecolor(body_color) body.set_edgecolor(edge_color) body.set_alpha(alpha)

if 'cmedians' in parts:
    parts['cmedians'].set_color(median_color)
    parts['cmedians'].set_linewidth(2)

for key in ['cbars', 'cmins', 'cmaxes']:
    if key in parts:
        parts[key].set_color(edge_color)

```

Horizontal Violin Plots

python fig, ax = plt.subplots() ax.violinplot(data, vert=False, showmedians=True) ax.set_ylabel('Dataset') ax.set_xlabel('Value') plt.show()

Practical Examples

1. Comparing Distributions

```python import matplotlib.pyplot as plt import numpy as np

np.random.seed(42)

Different distributions

normal = np.random.normal(0, 1, 200) uniform = np.random.uniform(-2, 2, 200) bimodal = np.concatenate([np.random.normal(-1, 0.5, 100), np.random.normal(1, 0.5, 100)]) skewed = np.random.exponential(1, 200) - 1

data = [normal, uniform, bimodal, skewed] labels = ['Normal', 'Uniform', 'Bimodal', 'Skewed']

fig, ax = plt.subplots(figsize=(10, 6)) parts = ax.violinplot(data, showmedians=True)

Style

for body in parts['bodies']: body.set_facecolor('steelblue') body.set_alpha(0.6)

ax.set_xticks([1, 2, 3, 4]) ax.set_xticklabels(labels) ax.set_ylabel('Value') ax.set_title('Comparing Different Distributions') plt.show() ```

2. Split Violin (Comparison)

```python import matplotlib.pyplot as plt import numpy as np

def half_violin(ax, data1, data2, positions, colors=['lightblue', 'lightcoral']): """Create split violin plot for comparison."""

# Left half
parts1 = ax.violinplot(data1, positions=positions, showmedians=True)
for body in parts1['bodies']:
    # Get the paths and modify to show only left half
    m = np.mean(body.get_paths()[0].vertices[:, 0])
    body.get_paths()[0].vertices[:, 0] = np.clip(
        body.get_paths()[0].vertices[:, 0], -np.inf, m)
    body.set_facecolor(colors[0])
    body.set_alpha(0.7)

# Right half
parts2 = ax.violinplot(data2, positions=positions, showmedians=True)
for body in parts2['bodies']:
    m = np.mean(body.get_paths()[0].vertices[:, 0])
    body.get_paths()[0].vertices[:, 0] = np.clip(
        body.get_paths()[0].vertices[:, 0], m, np.inf)
    body.set_facecolor(colors[1])
    body.set_alpha(0.7)

return parts1, parts2

Example usage

np.random.seed(42) data1 = [np.random.normal(0, std, 100) for std in [1, 1.5, 2]] data2 = [np.random.normal(0.5, std, 100) for std in [1, 1.5, 2]]

fig, ax = plt.subplots() half_violin(ax, data1, data2, [1, 2, 3]) ax.legend(['Group A', 'Group B']) ax.set_xticks([1, 2, 3]) ax.set_xticklabels(['Low', 'Medium', 'High']) plt.show() ```

3. Violin with Box Plot Overlay

```python import matplotlib.pyplot as plt import numpy as np

np.random.seed(42) data = [np.random.normal(0, std, 200) for std in range(1, 5)]

fig, ax = plt.subplots()

Violin plot

parts = ax.violinplot(data, showextrema=False) for body in parts['bodies']: body.set_facecolor('lightblue') body.set_alpha(0.5)

Box plot overlay

bp = ax.boxplot(data, widths=0.15, patch_artist=True, boxprops=dict(facecolor='white', edgecolor='black'), medianprops=dict(color='red', linewidth=2), whiskerprops=dict(color='black'), capprops=dict(color='black'), flierprops=dict(marker='o', markersize=4))

ax.set_title('Violin Plot with Box Plot Overlay') plt.show() ```

4. Grouped Violin Plots

```python import matplotlib.pyplot as plt import numpy as np

np.random.seed(42)

Generate data for two groups across three categories

group1 = [np.random.normal(10, 2, 100), np.random.normal(15, 3, 100), np.random.normal(12, 2.5, 100)]

group2 = [np.random.normal(12, 2, 100), np.random.normal(14, 2.5, 100), np.random.normal(16, 3, 100)]

fig, ax = plt.subplots(figsize=(10, 6))

Positions for each group

pos1 = [1, 3, 5] pos2 = [1.6, 3.6, 5.6]

Plot both groups

parts1 = ax.violinplot(group1, positions=pos1, widths=0.5, showmedians=True) parts2 = ax.violinplot(group2, positions=pos2, widths=0.5, showmedians=True)

Style group 1

for body in parts1['bodies']: body.set_facecolor('steelblue') body.set_alpha(0.7)

Style group 2

for body in parts2['bodies']: body.set_facecolor('coral') body.set_alpha(0.7)

ax.set_xticks([1.3, 3.3, 5.3]) ax.set_xticklabels(['Category A', 'Category B', 'Category C'])

Custom legend

from matplotlib.patches import Patch legend_elements = [Patch(facecolor='steelblue', alpha=0.7, label='Group 1'), Patch(facecolor='coral', alpha=0.7, label='Group 2')] ax.legend(handles=legend_elements)

ax.set_ylabel('Value') ax.set_title('Grouped Violin Plots') plt.show() ```

5. Financial Data Distribution

```python import matplotlib.pyplot as plt import numpy as np

Simulated daily returns for different assets

np.random.seed(42) stocks = np.random.normal(0.0005, 0.02, 252) # ~12.5% annual return bonds = np.random.normal(0.0002, 0.005, 252) # ~5% annual return commodities = np.random.normal(0, 0.03, 252) # High volatility

data = [stocks * 100, bonds * 100, commodities * 100] # Convert to percentage

fig, ax = plt.subplots(figsize=(8, 6)) parts = ax.violinplot(data, showmedians=True, showmeans=True)

colors = ['green', 'blue', 'orange'] for i, body in enumerate(parts['bodies']): body.set_facecolor(colors[i]) body.set_alpha(0.6)

ax.axhline(y=0, color='black', linestyle='--', alpha=0.5) ax.set_xticks([1, 2, 3]) ax.set_xticklabels(['Stocks', 'Bonds', 'Commodities']) ax.set_ylabel('Daily Return (%)') ax.set_title('Distribution of Daily Returns') plt.show() ```

Violin Plot vs Box Plot

Feature Box Plot Violin Plot
Shows quartiles ❌ (without overlay)
Shows distribution shape
Shows multimodality
Compact display
Shows outliers
Sample size indication Via width

Width and Scaling

Fixed Widths

python ax.violinplot(data, widths=0.8) # All same width

Variable Widths

python widths = [0.5, 0.7, 0.9, 1.0] # Different widths ax.violinplot(data, widths=widths)

Scale by Sample Size

```python

Width proportional to sqrt of sample size

sample_sizes = [len(d) for d in data] widths = [0.5 * np.sqrt(n) / np.sqrt(max(sample_sizes)) for n in sample_sizes] ax.violinplot(data, widths=widths) ```

Common Pitfalls

1. Small Sample Sizes

```python

Violin plots need sufficient data for meaningful KDE

Minimum recommended: 30-50 points per group

small_data = np.random.randn(10) # Too few points

Consider using box plot or jittered strip plot instead

```

2. Missing Labels

```python

violinplot doesn't automatically set x-tick labels

fig, ax = plt.subplots() ax.violinplot(data)

Must set labels manually

ax.set_xticks([1, 2, 3, 4]) ax.set_xticklabels(['A', 'B', 'C', 'D']) ```

3. Comparing with Different Scales

```python

Use same axis limits for fair comparison

ax.set_ylim(-5, 5) # Consistent scale across subplots ```


Exercises

Exercise 1. Create violin plots for three datasets: a normal distribution, a bimodal distribution (mix of two normals), and a uniform distribution. Use 500 samples each and set showmedians=True and showextrema=True. Add custom labels below each violin.

Solution to Exercise 1
import matplotlib.pyplot as plt
import numpy as np

np.random.seed(42)
normal = np.random.randn(500)
bimodal = np.concatenate([np.random.normal(-2, 0.5, 250),
                           np.random.normal(2, 0.5, 250)])
uniform = np.random.uniform(-3, 3, 500)

fig, ax = plt.subplots(figsize=(8, 5))
vp = ax.violinplot([normal, bimodal, uniform],
                    showmedians=True, showextrema=True)
ax.set_xticks([1, 2, 3])
ax.set_xticklabels(['Normal', 'Bimodal', 'Uniform'])
ax.set_title('Violin Plot Comparison')
plt.show()

Exercise 2. Create a split violin plot comparing two groups. Generate "before" data from N(5, 1) and "after" data from N(7, 1.5) (200 samples each). Plot them as a single violin with the left half showing "before" and the right half showing "after" using ax.violinplot with custom polygon manipulation or by plotting two half-violins.

Solution to Exercise 2
import matplotlib.pyplot as plt
import numpy as np

np.random.seed(42)
before = np.random.normal(5, 1, 200)
after = np.random.normal(7, 1.5, 200)

fig, ax = plt.subplots(figsize=(6, 6))

vp1 = ax.violinplot([before], positions=[1], showmedians=True)
for body in vp1['bodies']:
    m = np.mean(body.get_paths()[0].vertices[:, 0])
    body.get_paths()[0].vertices[:, 0] = np.clip(
        body.get_paths()[0].vertices[:, 0], -np.inf, m)
    body.set_color('steelblue')

vp2 = ax.violinplot([after], positions=[1], showmedians=True)
for body in vp2['bodies']:
    m = np.mean(body.get_paths()[0].vertices[:, 0])
    body.get_paths()[0].vertices[:, 0] = np.clip(
        body.get_paths()[0].vertices[:, 0], m, np.inf)
    body.set_color('coral')

ax.set_xticks([1])
ax.set_xticklabels(['Before / After'])
ax.set_title('Split Violin Plot')
ax.legend(['Before', 'After'])
plt.show()

Exercise 3. Create a combined violin and box plot where the violin shows the full distribution shape and a thin box plot is overlaid inside. Use ax.violinplot with showextrema=False, then overlay ax.boxplot with widths=0.1 and patch_artist=True on the same axes. Use 4 groups of 300 samples.

Solution to Exercise 3
import matplotlib.pyplot as plt
import numpy as np

np.random.seed(42)
data = [np.random.randn(300) + i * 2 for i in range(4)]

fig, ax = plt.subplots(figsize=(8, 5))

vp = ax.violinplot(data, showextrema=False, showmedians=False)
for body in vp['bodies']:
    body.set_alpha(0.3)
    body.set_color('steelblue')

ax.boxplot(data, widths=0.1, patch_artist=True,
            boxprops=dict(facecolor='orange', alpha=0.8),
            medianprops=dict(color='red', linewidth=2),
            showfliers=False)

ax.set_xticklabels(['A', 'B', 'C', 'D'])
ax.set_title('Violin + Box Plot Overlay')
plt.show()