Basic Scatter Plot¶
Scatter plots display individual data points using markers, revealing relationships, clusters, and patterns between two variables.
Simple Scatter Plot¶
Create a basic scatter plot with ax.scatter().
1. Import and Setup¶
import matplotlib.pyplot as plt
import numpy as np
2. Generate Data¶
np.random.seed(42)
x = np.random.rand(50)
y = np.random.rand(50)
3. Create Scatter Plot¶
fig, ax = plt.subplots()
ax.scatter(x, y)
ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_title('Basic Scatter Plot')
plt.show()
Correlated Data¶
Visualize relationships between variables.
1. Positive Correlation¶
np.random.seed(42)
x = np.random.rand(100)
y = x + np.random.normal(0, 0.1, 100)
fig, ax = plt.subplots()
ax.scatter(x, y)
ax.set_title('Positive Correlation')
plt.show()
2. Negative Correlation¶
x = np.random.rand(100)
y = 1 - x + np.random.normal(0, 0.1, 100)
fig, ax = plt.subplots()
ax.scatter(x, y)
ax.set_title('Negative Correlation')
plt.show()
3. No Correlation¶
x = np.random.rand(100)
y = np.random.rand(100)
fig, ax = plt.subplots()
ax.scatter(x, y)
ax.set_title('No Correlation')
plt.show()
Multiple Groups¶
Plot multiple data groups on the same axes.
1. Sequential Plotting¶
np.random.seed(42)
x1 = np.random.normal(2, 0.5, 50)
y1 = np.random.normal(2, 0.5, 50)
x2 = np.random.normal(4, 0.5, 50)
y2 = np.random.normal(4, 0.5, 50)
fig, ax = plt.subplots()
ax.scatter(x1, y1, label='Group A')
ax.scatter(x2, y2, label='Group B')
ax.legend()
plt.show()
2. Different Colors¶
fig, ax = plt.subplots()
ax.scatter(x1, y1, color='blue', label='Group A')
ax.scatter(x2, y2, color='red', label='Group B')
ax.legend()
plt.show()
3. Different Markers¶
fig, ax = plt.subplots()
ax.scatter(x1, y1, marker='o', label='Group A')
ax.scatter(x2, y2, marker='^', label='Group B')
ax.legend()
plt.show()
Data Input Types¶
Various ways to provide data to scatter.
1. Lists¶
x = [1, 2, 3, 4, 5]
y = [2, 4, 1, 5, 3]
ax.scatter(x, y)
2. NumPy Arrays¶
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 1, 5, 3])
ax.scatter(x, y)
3. Pandas Series¶
import pandas as pd
df = pd.DataFrame({'x': [1, 2, 3, 4, 5], 'y': [2, 4, 1, 5, 3]})
ax.scatter(df['x'], df['y'])
scatter vs plot¶
Understanding when to use each method.
1. plot with Markers¶
fig, ax = plt.subplots()
ax.plot(x, y, 'o') # Circle markers, no line
plt.show()
2. scatter Advantages¶
# scatter supports:
# - Individual point sizes (s parameter)
# - Individual point colors (c parameter)
# - Colormaps for continuous color mapping
# - Alpha per point
3. Performance Comparison¶
# plot is faster for large datasets with uniform styling
# scatter is preferred when points need individual properties
Adding Trend Lines¶
Overlay regression lines on scatter plots.
1. Linear Fit¶
np.random.seed(42)
x = np.random.rand(50) * 10
y = 2 * x + 1 + np.random.normal(0, 2, 50)
coeffs = np.polyfit(x, y, 1)
trend = np.poly1d(coeffs)
fig, ax = plt.subplots()
ax.scatter(x, y, alpha=0.7)
ax.plot(x, trend(x), color='red', linewidth=2, label=f'y = {coeffs[0]:.2f}x + {coeffs[1]:.2f}')
ax.legend()
plt.show()
2. Polynomial Fit¶
coeffs = np.polyfit(x, y, 2)
trend = np.poly1d(coeffs)
x_line = np.linspace(x.min(), x.max(), 100)
ax.plot(x_line, trend(x_line), color='red')
3. Sorted Line Data¶
# Sort x for proper line plotting
sort_idx = np.argsort(x)
ax.plot(x[sort_idx], trend(x[sort_idx]), color='red')
Practical Example¶
Create a complete scatter plot with annotations.
1. Generate Sample Data¶
np.random.seed(42)
n = 30
x = np.random.rand(n) * 100
y = 0.5 * x + np.random.normal(0, 10, n)
labels = [f'P{i}' for i in range(n)]
2. Create Visualization¶
fig, ax = plt.subplots(figsize=(10, 6))
ax.scatter(x, y, s=100, alpha=0.7, edgecolors='black')
ax.set_xlabel('Feature X', fontsize=12)
ax.set_ylabel('Feature Y', fontsize=12)
ax.set_title('Scatter Plot with Labels', fontsize=14)
ax.grid(True, alpha=0.3)
3. Add Point Labels¶
for i, label in enumerate(labels):
ax.annotate(label, (x[i], y[i]), textcoords='offset points',
xytext=(5, 5), fontsize=8)
plt.tight_layout()
plt.show()