Optimization Methods¶
SciPy's minimize() function supports numerous optimization algorithms. Each has different strengths, weaknesses, and computational costs. Understanding these differences is crucial for solving optimization problems efficiently.
Gradient-Free Methods¶
These methods do not require gradient information and often work well on noisy or non-smooth functions.
Nelder-Mead Simplex Method¶
The Nelder-Mead method maintains a simplex (a geometric shape in \(n\)-dimensional space) and iteratively moves away from the worst point.
Characteristics: - No gradient required - Works reasonably well in low dimensions (< 10) - Slower convergence than gradient-based methods - Robust to noise and non-smoothness - Default method in older SciPy versions
from scipy import optimize
import numpy as np
def noisy_function(x):
"""Function with some noise."""
return (x[0] - 2)**2 + (x[1] - 3)**2 + 0.01 * np.random.randn()
# Nelder-Mead
result = optimize.minimize(
noisy_function,
x0=[0, 0],
method='Nelder-Mead',
options={'maxiter': 5000}
)
print(f"Nelder-Mead result: {result.x}")
print(f"Iterations: {result.nit}")
print(f"Function evaluations: {result.nfev}")
When to use: - Noisy objective functions - Non-smooth functions - Low-dimensional problems - When you have no gradient information
Powell's Method¶
Powell's method uses conjugate direction methods and performs line searches along successive dimensions.
Characteristics: - No gradient required - Good for smooth functions - Often faster than Nelder-Mead - Works well for dimension < 100 - Can be sensitive to poor starting points
from scipy import optimize
def rosenbrock(x):
return (1 - x[0])**2 + 100 * (x[1] - x[0]**2)**2
# Powell's method
result = optimize.minimize(
rosenbrock,
x0=[0, 0],
method='Powell'
)
print(f"Powell result: {result.x}")
print(f"Function value: {result.fun:.2e}")
print(f"Iterations: {result.nit}")
print(f"Function evaluations: {result.nfev}")
When to use: - Smooth but gradient-free optimization - Medium dimensions - Image registration and alignment - When you need good speed without gradient computation
Gradient-Based Methods¶
These methods use the gradient (first derivative) to guide the search. They're usually faster but require either analytical gradients or numerical gradient estimation.
Gradient Descent (Steepest Descent)¶
The simplest gradient-based method: move in the direction opposite to the gradient.
Characteristics: - Slow convergence - Requires gradient - Simple to understand - Not typically recommended (BFGS is usually better)
from scipy.optimize import minimize
import numpy as np
def quadratic(x):
return (x[0] - 2)**2 + (x[1] - 3)**2
def quadratic_grad(x):
return np.array([2*(x[0] - 2), 2*(x[1] - 3)])
# Gradient descent
result = minimize(
quadratic,
x0=[0, 0],
method='BFGS', # Default is BFGS, not pure gradient descent
jac=quadratic_grad
)
print(f"Result: {result.x}")
BFGS (Broyden-Fletcher-Goldfarb-Shanno)¶
BFGS is a quasi-Newton method that approximates the inverse Hessian (second derivatives).
Characteristics: - Uses first-order derivatives (gradients) - Excellent convergence for smooth problems - Efficient: requires only gradients, not Hessians - Default method for unconstrained optimization - Robust and widely used - Requires storing an \(n \times n\) matrix (memory for large \(n\))
from scipy import optimize
import numpy as np
def quadratic(x):
return (x[0] - 2)**2 + 3*(x[1] - 3)**2
def quadratic_grad(x):
return np.array([2*(x[0] - 2), 6*(x[1] - 3)])
# With gradient
result = optimize.minimize(
quadratic,
x0=[0, 0],
method='BFGS',
jac=quadratic_grad
)
print(f"BFGS with gradient: {result.x}")
print(f"Iterations: {result.nit}")
print(f"Function evaluations: {result.nfev}")
# Without gradient (finite differences)
result2 = optimize.minimize(
quadratic,
x0=[0, 0],
method='BFGS'
)
print(f"BFGS with finite differences: {result2.x}")
print(f"Function evaluations: {result2.nfev}")
When to use: - Smooth, continuous objective functions - When gradient is available or easily computed - Medium to large-scale problems - Generally the first choice for unconstrained optimization
L-BFGS-B (Limited Memory BFGS with Bounds)¶
L-BFGS-B is BFGS with limited memory (useful for large problems) and built-in bounds support.
Characteristics: - Limited memory version of BFGS (uses \(\mathcal{O}(m \times n)\) memory instead of \(\mathcal{O}(n^2)\)) - Handles bounds naturally - Good for large-scale problems - Can handle inequality constraints via bounds - Typically requires about 5-10 previous iterations to approximate Hessian
from scipy import optimize
import numpy as np
def f(x):
return np.sum((x - np.arange(1, len(x) + 1))**2)
def grad_f(x):
return 2 * (x - np.arange(1, len(x) + 1))
# Large problem
x0 = np.zeros(1000)
# L-BFGS-B is much more efficient than BFGS for this
result = optimize.minimize(
f,
x0,
method='L-BFGS-B',
jac=grad_f,
bounds=[(0, None)] * 1000 # All parameters >= 0
)
print(f"Optimal value: {result.fun:.2e}")
print(f"Iterations: {result.nit}")
When to use: - Large-scale problems (thousands of variables) - When you need to constrain parameters to ranges - Limited memory available - Problems where BFGS runs out of memory
Conjugate Gradient (CG)¶
CG uses conjugate directions to achieve faster convergence than steepest descent.
Characteristics: - Requires gradient - Good for large sparse problems - Converges in \(n\) iterations for quadratic functions - Sensitive to numerical accuracy - Memory efficient
from scipy import optimize
import numpy as np
def quadratic(x):
A = np.array([[10, 1], [1, 10]])
return 0.5 * x @ A @ x
def quadratic_grad(x):
A = np.array([[10, 1], [1, 10]])
return A @ x
result = optimize.minimize(
quadratic,
x0=[1, 1],
method='CG',
jac=quadratic_grad
)
print(f"CG result: {result.x}")
print(f"Iterations: {result.nit}")
When to use: - Very large-scale smooth problems - When memory is limited - Problems with sparse structure - Quadratic or nearly quadratic functions
Comparison of Methods¶
Performance on Rosenbrock Function¶
import numpy as np
from scipy import optimize
import time
def rosenbrock(x):
return (1 - x[0])**2 + 100 * (x[1] - x[0]**2)**2
def rosenbrock_grad(x):
dfdx0 = 2*(x[0] - 1) + 100*2*(x[1] - x[0]**2)*(-2*x[0])
dfdx1 = 100*2*(x[1] - x[0]**2)
return np.array([dfdx0, dfdx1])
methods = ['Nelder-Mead', 'Powell', 'BFGS', 'L-BFGS-B', 'CG']
results_dict = {}
for method in methods:
if method in ['BFGS', 'CG']:
result = optimize.minimize(
rosenbrock,
[0, 0],
method=method,
jac=rosenbrock_grad
)
else:
result = optimize.minimize(
rosenbrock,
[0, 0],
method=method
)
results_dict[method] = {
'x': result.x,
'fun': result.fun,
'nit': result.nit,
'nfev': result.nfev
}
print("Method Comparison on Rosenbrock Function")
print("=" * 70)
print(f"{'Method':<15} {'Function Value':<20} {'Iterations':<15} {'F-evals':<10}")
print("-" * 70)
for method, res in results_dict.items():
print(f"{method:<15} {res['fun']:.2e} {res['nit']:<15} {res['nfev']:<10}")
Summary Table¶
| Method | Gradient | Bounds | Memory | Speed | Robustness |
|---|---|---|---|---|---|
| Nelder-Mead | No | No | Low | Slow | High |
| Powell | No | No | Low | Medium | High |
| BFGS | Yes | No | \(O(n^2)\) | Fast | High |
| L-BFGS-B | Yes | Yes | \(O(mn)\) | Fast | High |
| CG | Yes | No | Low | Fast | Medium |
Choosing a Method¶
Decision Tree¶
Do you have gradient information?
├─ NO
│ ├─ Low dimensional (< 10)?
│ │ └─ Use: Nelder-Mead or Powell
│ └─ Higher dimensional?
│ └─ Use: Powell or differential_evolution
│
├─ YES
│ ├─ Need bounds?
│ │ └─ Use: L-BFGS-B
│ │
│ ├─ Very large problem (n > 10,000)?
│ │ └─ Use: CG or L-BFGS-B
│ │
│ └─ Medium problem, want best speed?
│ └─ Use: BFGS or L-BFGS-B
Practical Recommendations¶
from scipy import optimize
import numpy as np
# Example: Your optimization problem
def objective(x):
return (x[0] - 2)**2 + (x[1] - 3)**2
# Decision 1: Do you have a gradient function?
has_gradient = False
# Decision 2: Do you need bounds on parameters?
needs_bounds = True
bounds = [(0, 5), (0, 5)]
# Decision 3: Dimension and problem size?
dimension = 2
large_scale = False
# Choice of method
if needs_bounds:
method = 'L-BFGS-B'
elif large_scale:
method = 'CG'
elif has_gradient:
method = 'BFGS'
else:
method = 'Powell'
result = optimize.minimize(
objective,
x0=[0, 0],
method=method,
bounds=bounds if needs_bounds else None
)
print(f"Using method: {method}")
print(f"Result: {result.x}")
Providing Gradients¶
Providing analytical gradients can significantly speed up optimization:
from scipy import optimize
import numpy as np
import time
def f(x):
return np.sum(np.sin(x) * np.exp(-x**2))
def grad_f(x):
return (np.cos(x) * np.exp(-x**2) -
2*x * np.sin(x) * np.exp(-x**2))
x0 = np.array([1.0, 2.0, 3.0])
# With gradient
t1 = time.time()
result1 = optimize.minimize(f, x0, method='BFGS', jac=grad_f)
time1 = time.time() - t1
# Without gradient (finite differences)
t2 = time.time()
result2 = optimize.minimize(f, x0, method='BFGS')
time2 = time.time() - t2
print(f"With analytical gradient:")
print(f" Time: {time1:.4f}s")
print(f" Function evals: {result1.nfev}")
print(f"\nWith finite differences:")
print(f" Time: {time2:.4f}s")
print(f" Function evals: {result2.nfev}")
print(f"\nSpeedup: {time2/time1:.1f}x")
Advanced: Hessian Information¶
Some methods can use second-order derivatives (Hessian matrix):
from scipy import optimize
import numpy as np
def f(x):
return x[0]**2 + 2*x[1]**2
def grad_f(x):
return np.array([2*x[0], 4*x[1]])
def hess_f(x):
"""Hessian matrix."""
return np.array([[2, 0],
[0, 4]])
# Newton-CG method uses Hessian
result = optimize.minimize(
f,
x0=[1, 1],
method='Newton-CG',
jac=grad_f,
hess=hess_f
)
print(f"Result: {result.x}")
print(f"Iterations: {result.nit}")
Summary¶
Key Points:
- Gradient-free methods (Nelder-Mead, Powell) work on any function but are slower
- Gradient-based methods (BFGS, L-BFGS-B) are faster but require smooth functions
- L-BFGS-B is usually the best choice for constrained problems
- CG is memory-efficient for very large problems
- Providing analytical gradients can significantly improve speed
- Choose based on: gradient availability, problem scale, constraints, and desired accuracy
The next section addresses the critical issue of local minima and strategies to find global optima.
Runnable Example: optimization_comparison.py¶
"""
Comparing SciPy Optimization Methods - A Comprehensive Tutorial
Exploring scipy.optimize.minimize with multiple algorithms, callback functions,
Jacobians, Hessians, and convergence analysis. See optimization in action!
Run this file to compare different optimization strategies!
"""
import numpy as np
from scipy import optimize
import sys
if __name__ == "__main__":
print("=" * 70)
print("COMPARING SCIPY OPTIMIZATION METHODS")
print("=" * 70)
# ============================================================================
# EXAMPLE 1: The Rosenbrock Function - A Classic Optimization Benchmark
# ============================================================================
print("\n1. THE ROSENBROCK FUNCTION: A CHALLENGING TEST CASE")
print("-" * 70)
def rosenbrock_f(a, b):
"""
Return the Rosenbrock function, Jacobian, and Hessian.
The Rosenbrock function is a classic test case for optimization:
f(x, y) = (a - x)^2 + b*(y - x^2)^2
It has a minimum value of 0 at (a, a^2).
The function is easy to evaluate but has a narrow curved valley,
making it challenging for gradient-free methods.
Parameters
----------
a, b : float
Parameters defining the surface. Typical values are a=1, b=100.
Higher b makes the valley narrower and harder to navigate.
"""
def f(x, y):
return (a - x)**2 + b * (y - x**2) ** 2
def J(x, y):
# Jacobian (gradient) of the Rosenbrock function
# df/dx = -2(a - x) - 4*b*x*(y - x^2)
# df/dy = 2*b*(y - x^2)
return np.array([
-2 * (a - x) - 4 * b * x * (y - x**2),
2 * b * (y - x ** 2)
])
def H(x, y):
# Hessian (matrix of second derivatives)
# Used by Newton-CG and trust region methods
# [d²f/dx² d²f/dxdy]
# [d²f/dydx d²f/dy²]
return np.array([
[2 + 8*b*x**2 - 4*b*(y - x**2), -4*b*x],
[-4*b*x, 2*b]
])
return f, J, H
# Set up Rosenbrock with typical parameters
a, b = 1.0, 100.0
rosenbrock, rosenbrock_J, rosenbrock_H = rosenbrock_f(a=a, b=b)
# Test evaluation
x0_test, y0_test = -0.5, 2.5
f_val = rosenbrock(x0_test, y0_test)
print(f"Rosenbrock function at ({x0_test}, {y0_test}): {f_val:.4f}")
print(f"True minimum is at ({a}, {a**2}) with value 0.0")
print(f"Starting point is {f_val:.0f} units away from optimum")
# Evaluate gradient and Hessian
grad = rosenbrock_J(x0_test, y0_test)
hess = rosenbrock_H(x0_test, y0_test)
print(f"\nGradient at start: {grad}")
print(f"Hessian at start:\n{hess}")
# ============================================================================
# EXAMPLE 2: Nelder-Mead Method - Gradient-Free Optimization
# ============================================================================
print("\n2. NELDER-MEAD METHOD: NO GRADIENTS NEEDED")
print("-" * 70)
print("Nelder-Mead is a simplex-based method requiring only function values.")
print("Advantages: robust, no gradients needed, handles non-smooth functions")
print("Disadvantages: slower convergence, more function evaluations\n")
x0 = np.array([-0.5, 2.5])
result_nm = optimize.minimize(
lambda p: rosenbrock(*p),
x0=x0,
method='Nelder-Mead',
options={'disp': False}
)
print("Nelder-Mead Results:")
print(f" Optimal point: ({result_nm.x[0]:.6f}, {result_nm.x[1]:.6f})")
print(f" Function value: {result_nm.fun:.2e}")
print(f" Iterations: {result_nm.nit}")
print(f" Function evaluations: {result_nm.nfev}")
print(f" Success: {result_nm.success}")
# ============================================================================
# EXAMPLE 3: BFGS Method - Quasi-Newton with Gradient
# ============================================================================
print("\n3. BFGS METHOD: QUASI-NEWTON WITH GRADIENT")
print("-" * 70)
print("BFGS is a quasi-Newton method that approximates the Hessian.")
print("Advantages: faster than gradient-free, uses only gradients (no Hessian)")
print("Disadvantages: needs accurate gradients, may fail on non-smooth functions\n")
result_bfgs = optimize.minimize(
lambda p: rosenbrock(*p),
x0=x0,
jac=lambda p: rosenbrock_J(*p),
method='BFGS',
options={'disp': False}
)
print("BFGS Results:")
print(f" Optimal point: ({result_bfgs.x[0]:.6f}, {result_bfgs.x[1]:.6f})")
print(f" Function value: {result_bfgs.fun:.2e}")
print(f" Iterations: {result_bfgs.nit}")
print(f" Function evaluations: {result_bfgs.nfev}")
print(f" Gradient evaluations: {result_bfgs.njev}")
# ============================================================================
# EXAMPLE 4: Newton-CG Method - Using Exact Hessian
# ============================================================================
print("\n4. NEWTON-CG METHOD: USING EXACT HESSIAN")
print("-" * 70)
print("Newton-CG uses exact Hessian information for rapid convergence.")
print("Advantages: fewest iterations, most accurate near minimum")
print("Disadvantages: must compute and provide Hessian matrix\n")
result_ncg = optimize.minimize(
lambda p: rosenbrock(*p),
x0=x0,
jac=lambda p: rosenbrock_J(*p),
hess=lambda p: rosenbrock_H(*p),
method='Newton-CG',
options={'disp': False}
)
print("Newton-CG Results:")
print(f" Optimal point: ({result_ncg.x[0]:.6f}, {result_ncg.x[1]:.6f})")
print(f" Function value: {result_ncg.fun:.2e}")
print(f" Iterations: {result_ncg.nit}")
print(f" Function evaluations: {result_ncg.nfev}")
# ============================================================================
# EXAMPLE 5: L-BFGS-B Method - Bounded Optimization
# ============================================================================
print("\n5. L-BFGS-B METHOD: HANDLE BOUNDS ON VARIABLES")
print("-" * 70)
print("L-BFGS-B handles box constraints (lower and upper bounds per variable).")
print("Advantages: efficient for bounded problems, limited memory use")
print("Disadvantages: requires bounds, doesn't use Hessian\n")
# Add box constraints: -0.8 <= x <= 0.8, 0 <= y <= 3.0
bounds = [(-0.8, 0.8), (0, 3.0)]
result_lbfgs = optimize.minimize(
lambda p: rosenbrock(*p),
x0=x0,
jac=lambda p: rosenbrock_J(*p),
method='L-BFGS-B',
bounds=bounds,
options={'disp': False}
)
print("L-BFGS-B Results (with bounds: x in [-0.8, 0.8], y in [0, 3]):")
print(f" Optimal point: ({result_lbfgs.x[0]:.6f}, {result_lbfgs.x[1]:.6f})")
print(f" Function value: {result_lbfgs.fun:.2e}")
print(f" Iterations: {result_lbfgs.nit}")
print(f" Note: optimum would be (1, 1) but y=1 is outside x-bound")
# ============================================================================
# EXAMPLE 6: Callback Functions - Tracking Optimization Path
# ============================================================================
print("\n6. TRACKING OPTIMIZATION PROGRESS WITH CALLBACKS")
print("-" * 70)
print("Callback functions execute during optimization to monitor progress.")
print("Track iteration count, function values, or record the optimization path.\n")
path = [x0.copy()] # Record starting point
iteration_count = [0] # Use list to modify in nested function
def callback_function(xk):
"""
Called after each iteration.
xk: current parameter values
"""
iteration_count[0] += 1
path.append(xk.copy())
result_with_callback = optimize.minimize(
lambda p: rosenbrock(*p),
x0=x0,
jac=lambda p: rosenbrock_J(*p),
method='BFGS',
callback=callback_function,
options={'disp': False}
)
path = np.array(path)
print("Optimization path (first 5 steps):")
for i in range(min(5, len(path))):
f_val = rosenbrock(path[i, 0], path[i, 1])
print(f" Step {i}: x={path[i, 0]:7.4f}, y={path[i, 1]:7.4f}, f={f_val:10.4f}")
print(f"\n ... (total {len(path)-1} steps)")
print(f" Final: x={path[-1, 0]:7.4f}, y={path[-1, 1]:7.4f}, f={result_with_callback.fun:.2e}")
# ============================================================================
# EXAMPLE 7: Convergence Comparison - Method Efficiency
# ============================================================================
print("\n7. COMPARING CONVERGENCE RATES AND EFFICIENCY")
print("-" * 70)
# Compare different methods
methods_to_compare = [
('Nelder-Mead', {'method': 'Nelder-Mead', 'options': {'disp': False}}),
('BFGS', {'method': 'BFGS',
'jac': lambda p: rosenbrock_J(*p), 'options': {'disp': False}}),
('Newton-CG', {'method': 'Newton-CG',
'jac': lambda p: rosenbrock_J(*p),
'hess': lambda p: rosenbrock_H(*p),
'options': {'disp': False}}),
('L-BFGS-B', {'method': 'L-BFGS-B',
'jac': lambda p: rosenbrock_J(*p),
'options': {'disp': False}}),
('CG', {'method': 'CG',
'jac': lambda p: rosenbrock_J(*p),
'options': {'disp': False}}),
]
print("\nMethod Comparison (starting from [-0.5, 2.5]):")
print("-" * 70)
print(f"{'Method':<15} {'Iterations':<12} {'Func Evals':<12} {'Final Value':<15} {'Success':<10}")
print("-" * 70)
results_comparison = {}
for method_name, options in methods_to_compare:
result = optimize.minimize(lambda p: rosenbrock(*p), x0=x0, **options)
results_comparison[method_name] = result
print(f"{method_name:<15} {result.nit:<12} {result.nfev:<12} {result.fun:<15.2e} {str(result.success):<10}")
print("\nInterpretation:")
print(" Iterations: fewer iterations = faster algorithm")
print(" Func Evals: evaluations of objective function")
print(" Final Value: lower is better (should be ~0)")
print(" Success: did optimization converge?")
# ============================================================================
# EXAMPLE 8: Effect of Starting Point on Convergence
# ============================================================================
print("\n8. STARTING POINT SENSITIVITY")
print("-" * 70)
print("Optimization can be sensitive to initial conditions.\n")
starting_points = [
(-0.5, 2.5),
(0.0, 0.0),
(2.0, 2.0),
(-1.0, 3.0),
]
print(f"{'Starting Point':<20} {'BFGS Iters':<15} {'Newton-CG Iters':<15}")
print("-" * 50)
for x0_point in starting_points:
x0_array = np.array(x0_point)
result_bfgs_var = optimize.minimize(
lambda p: rosenbrock(*p),
x0=x0_array,
jac=lambda p: rosenbrock_J(*p),
method='BFGS',
options={'disp': False}
)
result_ncg_var = optimize.minimize(
lambda p: rosenbrock(*p),
x0=x0_array,
jac=lambda p: rosenbrock_J(*p),
hess=lambda p: rosenbrock_H(*p),
method='Newton-CG',
options={'disp': False}
)
print(f"{str(x0_point):<20} {result_bfgs_var.nit:<15} {result_ncg_var.nit:<15}")
# ============================================================================
# EXAMPLE 9: Visualization of Convergence Patterns
# ============================================================================
print("\n9. DETAILED CONVERGENCE ANALYSIS")
print("-" * 70)
print("Track function values throughout optimization process:\n")
# Track function values with BFGS
function_values = [rosenbrock(*x0)]
iteration_num = [0]
def callback_tracking(xk):
function_values.append(rosenbrock(*xk))
iteration_num.append(iteration_num[-1] + 1)
result_track = optimize.minimize(
lambda p: rosenbrock(*p),
x0=x0,
jac=lambda p: rosenbrock_J(*p),
method='BFGS',
callback=callback_tracking,
options={'disp': False}
)
print("Function value reduction in BFGS method:")
print(f"{'Iteration':<12} {'Function Value':<20} {'Reduction':<20}")
print("-" * 52)
for i in range(min(10, len(function_values))):
fval = function_values[i]
if i == 0:
reduction = "-"
else:
reduction = f"{function_values[i-1] - fval:.4e}"
print(f"{i:<12} {fval:<20.6e} {str(reduction):<20}")
if len(function_values) > 10:
print(f"... ({len(function_values)-10} more steps)")
print(f"{len(function_values)-1:<12} {function_values[-1]:<20.6e}")
# ============================================================================
# EXAMPLE 10: Gradient Accuracy and Optimization Quality
# ============================================================================
print("\n10. IMPORTANCE OF ACCURATE GRADIENTS")
print("-" * 70)
# Compare with numerical gradient approximation
def numerical_gradient(func, x, eps=1e-8):
"""Approximate gradient using finite differences"""
grad = np.zeros_like(x)
for i in range(len(x)):
x_plus = x.copy()
x_minus = x.copy()
x_plus[i] += eps
x_minus[i] -= eps
grad[i] = (func(x_plus) - func(x_minus)) / (2 * eps)
return grad
print("Comparing analytical vs numerical gradients:\n")
x_test = np.array([0.5, 0.5])
analytical_grad = rosenbrock_J(*x_test)
numerical_grad = numerical_gradient(lambda p: rosenbrock(*p), x_test)
print(f"Analytical gradient: {analytical_grad}")
print(f"Numerical gradient: {numerical_grad}")
print(f"Difference: {np.linalg.norm(analytical_grad - numerical_grad):.2e}")
# Optimize with analytical gradient
result_analytical = optimize.minimize(
lambda p: rosenbrock(*p),
x0=x0,
jac=lambda p: rosenbrock_J(*p),
method='BFGS',
options={'disp': False}
)
print(f"\nWith analytical gradient: {result_analytical.nit} iterations")
print(f"Final value: {result_analytical.fun:.2e}")
print("\n" + "=" * 70)
print("Key takeaways:")
print("- Nelder-Mead: robust, no gradients, but slower")
print("- BFGS: good balance, needs gradient, no Hessian")
print("- Newton-CG: fastest, needs exact Hessian")
print("- L-BFGS-B: for bounded optimization problems")
print("- Callbacks: track optimization progress in real-time")
print("- Accurate gradients/Hessians = faster convergence")
print("- Starting point can affect convergence speed")
print("=" * 70)