Skip to content

Basic Scatter Plot

Scatter plots display individual data points using markers, revealing relationships, clusters, and patterns between two variables.

Simple Scatter Plot

Create a basic scatter plot with ax.scatter().

1. Import and Setup

import matplotlib.pyplot as plt
import numpy as np

2. Generate Data

np.random.seed(42)
x = np.random.rand(50)
y = np.random.rand(50)

3. Create Scatter Plot

fig, ax = plt.subplots()
ax.scatter(x, y)
ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_title('Basic Scatter Plot')
plt.show()

Correlated Data

Visualize relationships between variables.

1. Positive Correlation

np.random.seed(42)
x = np.random.rand(100)
y = x + np.random.normal(0, 0.1, 100)

fig, ax = plt.subplots()
ax.scatter(x, y)
ax.set_title('Positive Correlation')
plt.show()

2. Negative Correlation

x = np.random.rand(100)
y = 1 - x + np.random.normal(0, 0.1, 100)

fig, ax = plt.subplots()
ax.scatter(x, y)
ax.set_title('Negative Correlation')
plt.show()

3. No Correlation

x = np.random.rand(100)
y = np.random.rand(100)

fig, ax = plt.subplots()
ax.scatter(x, y)
ax.set_title('No Correlation')
plt.show()

Multiple Groups

Plot multiple data groups on the same axes.

1. Sequential Plotting

np.random.seed(42)
x1 = np.random.normal(2, 0.5, 50)
y1 = np.random.normal(2, 0.5, 50)
x2 = np.random.normal(4, 0.5, 50)
y2 = np.random.normal(4, 0.5, 50)

fig, ax = plt.subplots()
ax.scatter(x1, y1, label='Group A')
ax.scatter(x2, y2, label='Group B')
ax.legend()
plt.show()

2. Different Colors

fig, ax = plt.subplots()
ax.scatter(x1, y1, color='blue', label='Group A')
ax.scatter(x2, y2, color='red', label='Group B')
ax.legend()
plt.show()

3. Different Markers

fig, ax = plt.subplots()
ax.scatter(x1, y1, marker='o', label='Group A')
ax.scatter(x2, y2, marker='^', label='Group B')
ax.legend()
plt.show()

Data Input Types

Various ways to provide data to scatter.

1. Lists

x = [1, 2, 3, 4, 5]
y = [2, 4, 1, 5, 3]
ax.scatter(x, y)

2. NumPy Arrays

x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 1, 5, 3])
ax.scatter(x, y)

3. Pandas Series

import pandas as pd

df = pd.DataFrame({'x': [1, 2, 3, 4, 5], 'y': [2, 4, 1, 5, 3]})
ax.scatter(df['x'], df['y'])

scatter vs plot

Understanding when to use each method.

1. plot with Markers

fig, ax = plt.subplots()
ax.plot(x, y, 'o')  # Circle markers, no line
plt.show()

2. scatter Advantages

# scatter supports:
# - Individual point sizes (s parameter)
# - Individual point colors (c parameter)
# - Colormaps for continuous color mapping
# - Alpha per point

3. Performance Comparison

# plot is faster for large datasets with uniform styling
# scatter is preferred when points need individual properties

Adding Trend Lines

Overlay regression lines on scatter plots.

1. Linear Fit

np.random.seed(42)
x = np.random.rand(50) * 10
y = 2 * x + 1 + np.random.normal(0, 2, 50)

coeffs = np.polyfit(x, y, 1)
trend = np.poly1d(coeffs)

fig, ax = plt.subplots()
ax.scatter(x, y, alpha=0.7)
ax.plot(x, trend(x), color='red', linewidth=2, label=f'y = {coeffs[0]:.2f}x + {coeffs[1]:.2f}')
ax.legend()
plt.show()

2. Polynomial Fit

coeffs = np.polyfit(x, y, 2)
trend = np.poly1d(coeffs)

x_line = np.linspace(x.min(), x.max(), 100)
ax.plot(x_line, trend(x_line), color='red')

3. Sorted Line Data

# Sort x for proper line plotting
sort_idx = np.argsort(x)
ax.plot(x[sort_idx], trend(x[sort_idx]), color='red')

Practical Example

Create a complete scatter plot with annotations.

1. Generate Sample Data

np.random.seed(42)
n = 30
x = np.random.rand(n) * 100
y = 0.5 * x + np.random.normal(0, 10, n)
labels = [f'P{i}' for i in range(n)]

2. Create Visualization

fig, ax = plt.subplots(figsize=(10, 6))

ax.scatter(x, y, s=100, alpha=0.7, edgecolors='black')

ax.set_xlabel('Feature X', fontsize=12)
ax.set_ylabel('Feature Y', fontsize=12)
ax.set_title('Scatter Plot with Labels', fontsize=14)
ax.grid(True, alpha=0.3)

3. Add Point Labels

for i, label in enumerate(labels):
    ax.annotate(label, (x[i], y[i]), textcoords='offset points',
                xytext=(5, 5), fontsize=8)

plt.tight_layout()
plt.show()