Financial Returns Analysis¶
Financial asset prices change over time, and the statistical properties of their returns determine risk, inform portfolio construction, and underpin derivative pricing models. This section applies scipy.stats tools to analyze return distributions, test for normality, and fit parametric models to return data.
Return Definitions¶
Let \(P_t\) denote the price of an asset at time \(t\). Two common return measures are:
Simple return over one period:
Log return (continuously compounded return):
Log returns have the advantage of being additive over time: the multi-period log return is \(r_{t_1, t_2} = \sum_{s=t_1+1}^{t_2} r_s\).
import numpy as np
# Simulated daily prices
rng = np.random.default_rng(42)
prices = 100 * np.exp(np.cumsum(rng.normal(0.0005, 0.02, 252)))
# Compute returns
simple_returns = np.diff(prices) / prices[:-1]
log_returns = np.diff(np.log(prices))
print(f"Mean daily simple return: {simple_returns.mean():.6f}")
print(f"Mean daily log return: {log_returns.mean():.6f}")
print(f"Annualized volatility: {log_returns.std() * np.sqrt(252):.4f}")
Stylized Facts of Return Distributions¶
Empirical return distributions consistently exhibit several properties that deviate from normality:
-
Fat tails (leptokurtosis): Extreme returns occur more frequently than a normal distribution predicts. The excess kurtosis \(\kappa - 3\) is typically positive.
-
Slight negative skewness: Large negative returns tend to be more extreme than large positive returns, especially for equity indices.
-
Volatility clustering: Periods of high volatility tend to be followed by high volatility, and similarly for calm periods. This manifests as autocorrelation in squared returns.
-
Approximate symmetry at short horizons: Daily log returns are roughly symmetric around zero.
Descriptive Statistics¶
The first four moments characterize the shape of the return distribution. For a sample \(r_1, \ldots, r_n\), the sample skewness and kurtosis are
where \(\bar{r}\) is the sample mean and \(s\) is the sample standard deviation.
from scipy import stats
import numpy as np
rng = np.random.default_rng(42)
log_returns = rng.standard_t(df=5, size=1000) * 0.01
print(f"Mean: {np.mean(log_returns):.6f}")
print(f"Std dev: {np.std(log_returns, ddof=1):.6f}")
print(f"Skewness: {stats.skew(log_returns):.4f}")
print(f"Kurtosis: {stats.kurtosis(log_returns):.4f}") # excess kurtosis
Normality Testing¶
The assumption that returns follow a normal distribution is central to many financial models. Several tests evaluate this assumption.
Jarque-Bera Test¶
The Jarque-Bera test combines skewness and kurtosis into a single test statistic:
Under \(H_0\) (normality), \(JB \sim \chi^2(2)\) asymptotically.
from scipy import stats
jb_stat, jb_p = stats.jarque_bera(log_returns)
print(f"Jarque-Bera statistic: {jb_stat:.2f}")
print(f"p-value: {jb_p:.4e}")
Shapiro-Wilk and Anderson-Darling Tests¶
from scipy import stats
# Shapiro-Wilk (best for n < 5000)
sw_stat, sw_p = stats.shapiro(log_returns[:500])
print(f"Shapiro-Wilk p-value: {sw_p:.4e}")
# Anderson-Darling
ad_result = stats.anderson(log_returns, dist='norm')
print(f"Anderson-Darling statistic: {ad_result.statistic:.4f}")
print(f"Critical values (5%): {ad_result.critical_values[2]:.4f}")
Distribution Fitting¶
When returns deviate from normality, alternative distributions provide better fits. Common choices include the Student's \(t\)-distribution (captures fat tails) and the skew-normal distribution.
Fitting a Student's t-Distribution¶
The Student's \(t\)-distribution with \(\nu\) degrees of freedom has probability density
Smaller \(\nu\) produces heavier tails. As \(\nu \to \infty\), the distribution converges to the normal.
from scipy import stats
import numpy as np
rng = np.random.default_rng(42)
log_returns = rng.standard_t(df=5, size=1000) * 0.01
# Fit t-distribution
df_fit, loc_fit, scale_fit = stats.t.fit(log_returns)
print(f"Fitted degrees of freedom: {df_fit:.2f}")
print(f"Fitted location: {loc_fit:.6f}")
print(f"Fitted scale: {scale_fit:.6f}")
# Goodness of fit: KS test
ks_stat, ks_p = stats.kstest(log_returns, 't', args=(df_fit, loc_fit, scale_fit))
print(f"KS test p-value (t-fit): {ks_p:.4f}")
# Compare with normal fit
loc_n, scale_n = stats.norm.fit(log_returns)
ks_stat_n, ks_p_n = stats.kstest(log_returns, 'norm', args=(loc_n, scale_n))
print(f"KS test p-value (normal-fit): {ks_p_n:.4f}")
Value at Risk¶
Value at Risk (VaR) estimates the maximum expected loss at a given confidence level over a specified time horizon. At confidence level \(1 - \alpha\), the parametric VaR under a fitted distribution \(F\) is
where \(F^{-1}\) is the quantile function (percent point function).
from scipy import stats
import numpy as np
# Using the fitted t-distribution
alpha = 0.05
var_t = -stats.t.ppf(alpha, df=df_fit, loc=loc_fit, scale=scale_fit)
var_norm = -stats.norm.ppf(alpha, loc=loc_n, scale=scale_n)
print(f"VaR (5%, t-distribution): {var_t:.6f}")
print(f"VaR (5%, normal): {var_norm:.6f}")
Parametric VaR Assumptions
Parametric VaR relies on the chosen distribution accurately modeling the tail behavior. If the fitted distribution underestimates tail risk, VaR will understate potential losses. Historical simulation and Monte Carlo methods provide non-parametric alternatives.
Summary¶
Financial returns analysis applies descriptive statistics, normality tests, and distribution fitting from scipy.stats to characterize asset return behavior. The key findings across most asset classes are fat tails, slight negative skewness, and rejection of normality. The Student's \(t\)-distribution provides a better parametric fit than the normal, and quantile-based risk measures like VaR directly leverage fitted distribution objects.