Dynamic Consistency¶

Introduction¶

Dynamic consistency is a fundamental requirement for coherent multi-period decision-making under uncertainty. A decision-maker is dynamically consistent if their planned future actions remain optimal when the future arrives — that is, they have no incentive to deviate from previously optimal plans.

Under standard expected utility with Bayesian updating, dynamic consistency is automatic. However, when preferences incorporate ambiguity aversion, robust optimization, or non-expected utility, dynamic consistency may fail, leading to time-inconsistent behavior.

Understanding and ensuring dynamic consistency is crucial for: 1. Dynamic portfolio optimization: Multi-period investment strategies 2. Derivatives pricing: Hedging strategies over time 3. Risk management: Coherent risk assessment across horizons 4. Mechanism design: Contracts that remain incentive-compatible

Foundations of Dynamic Consistency¶

1. Sequential Decision Problems¶

Setup: Consider a decision tree with: - Time periods $t = 0, 1, \ldots, T$ - Information filtration $\{\mathcal{F}_t\}$ - Acts (strategies) $\{a_t\}_{t=0}^T$ - Outcomes determined by actions and states

Strategy: A strategy $\sigma = \{a_t(\cdot)\}$ specifies actions contingent on information.

2. Planned vs. Actual Behavior¶

Ex-Ante Plan: At $t=0$, the decision-maker forms optimal strategy $\sigma^* = \{a_0^*, a_1^*(\cdot), \ldots\}$.

Ex-Post Behavior: At $t=1$, given realized information $\omega \in \mathcal{F}_1$, the decision-maker may re-optimize.

Dynamic Consistency: The plan is dynamically consistent if:

\[ a_t^*(\omega) = \arg\max_{a_t} V_t(a_t, \sigma^*_{t+1:T} | \mathcal{F}_t = \omega) \]

for all $t$ and $\omega$ — the planned action remains optimal at the time of execution.

3. Time Consistency vs. Dynamic Consistency¶

Time Consistency: Preferences over consumption streams are time-consistent if:

\[ (c_0, c_1, \ldots) \succeq_0 (c_0', c_1', \ldots) \implies (c_1, c_2, \ldots) \succeq_1 (c_1', c_2', \ldots) \]

when $c_0 = c_0'$.

Dynamic Consistency: Broader concept including consistency of conditional preferences over acts.

Relationship: Time consistency is necessary but not sufficient for dynamic consistency under uncertainty.

Expected Utility and Dynamic Consistency¶

1. Bayesian Updating¶

Bayes' Rule: Given prior $P$ and event $A$:

\[ P(\cdot | A) = \frac{P(\cdot \cap A)}{P(A)} \]

Sequential Expected Utility: Value of strategy $\sigma$ at time $t$:

\[ V_t(\sigma) = \mathbb{E}_P[U(\sigma) | \mathcal{F}_t] \]

2. Tower Property¶

Law of Iterated Expectations:

\[ \mathbb{E}_P[\mathbb{E}_P[U | \mathcal{F}_s] | \mathcal{F}_t] = \mathbb{E}_P[U | \mathcal{F}_t] \quad \text{for } t \leq s \]

Implication: Sequential optimization is equivalent to backward induction:

\[ V_t(\sigma) = \mathbb{E}_P[V_{t+1}(\sigma) | \mathcal{F}_t] \]

3. Bellman Principle¶

Dynamic Programming: Under EU with Bayesian updating:

\[ V_t(x) = \max_{a_t} \left\{u(x, a_t) + \beta \mathbb{E}_P[V_{t+1}(x') | \mathcal{F}_t]\right\} \]

Dynamic Consistency: The Bellman optimal policy is dynamically consistent by construction.

Dynamic Inconsistency Under Ambiguity¶

1. Max-Min Expected Utility¶

MEU Preferences:

\[ V(f) = \min_{P \in \mathcal{P}} \mathbb{E}_P[U(f)] \]

Problem: The minimum may be achieved by different measures at different times.

2. Example of Inconsistency¶

Two-Period Problem: - States: $\{\omega_1, \omega_2, \omega_3\}$ - Period 1 information: $\{\{\omega_1\}, \{\omega_2, \omega_3\}\}$ - Priors: $\mathcal{P} = \{P_1, P_2\}$ with different weights

At $t=0$: Worst-case measure might be $P_1$.

At $t=1$, conditional on $\{\omega_2, \omega_3\}$: Worst-case measure might be $P_2|_{\{\omega_2, \omega_3\}}$.

Result: Planned action at $t=0$ may differ from optimal action at $t=1$ — dynamic inconsistency.

3. Source of Inconsistency¶

Non-Additive Expectations: MEU uses:

\[ V_0(f) = \min_{P \in \mathcal{P}} \mathbb{E}_P[f] \]

but this does not satisfy the tower property:

\[ \min_{P \in \mathcal{P}} \mathbb{E}_P[\min_{Q \in \mathcal{P}} \mathbb{E}_Q[f | \mathcal{F}_1]] \neq \min_{P \in \mathcal{P}} \mathbb{E}_P[f] \]

in general.

Rectangularity¶

1. Definition¶

Definition (Epstein-Schneider, 2003): A set of priors $\mathcal{P}$ is rectangular with respect to filtration $\{\mathcal{F}_t\}$ if it can be decomposed as:

\[ \mathcal{P} = \left\{P: P_t(\cdot | \mathcal{F}_t) \in \mathcal{P}_t(\omega) \text{ for all } t, \omega\right\} \]

where $\mathcal{P}_t(\omega)$ is a family of conditional probability measures.

Equivalent Characterization: For any $P, Q \in \mathcal{P}$ and stopping time $\tau$, the "pasted" measure:

\[ (P \otimes_{\tau} Q)(A) = \mathbb{E}_P[\mathbb{1}_{\tau > T} \mathbb{1}_A] + \mathbb{E}_P[\mathbb{1}_{\tau \leq T} Q(A | \mathcal{F}_{\tau})] \]

is also in $\mathcal{P}$.

2. Intuition¶

Product Structure: Rectangularity means the set of priors has a "product" structure across time: - Period-by-period uncertainty sets are independent - No constraints linking uncertainty at different times

Example (Rectangular):

\[ \mathcal{P} = \{P: P_t(\cdot | \mathcal{F}_t) \in \mathcal{P}_t \text{ for all } t\} \]

where each $\mathcal{P}_t$ is specified independently.

Example (Non-Rectangular):

\[ \mathcal{P} = \{P: \mathbb{E}_P[\sum_{t=0}^T X_t] \in [a, b]\} \]

Constraints across periods create non-rectangularity.

3. Main Theorem¶

Theorem (Epstein-Schneider, 2003): Max-min expected utility preferences with prior set $\mathcal{P}$ are dynamically consistent if and only if $\mathcal{P}$ is rectangular.

Proof Sketch:

Necessity: If $\mathcal{P}$ is not rectangular, construct a counterexample showing inconsistency.

Sufficiency: Under rectangularity:

\[ \min_{P \in \mathcal{P}} \mathbb{E}_P[U] = \min_{P_0 \in \mathcal{P}_0} \mathbb{E}_{P_0}\left[\min_{P_1 \in \mathcal{P}_1} \mathbb{E}_{P_1}[\cdots]\right] \]

The tower property holds for the "min" operation, enabling backward induction.

Recursive Preferences Under Ambiguity¶

1. Recursive Max-Min¶

Formulation: With rectangular priors, value function satisfies:

\[ V_t = \min_{P_t \in \mathcal{P}_t} \mathbb{E}_{P_t}[u(c_t) + \beta V_{t+1} | \mathcal{F}_t] \]

Properties: - Dynamically consistent by construction - Nests EU as special case ($\mathcal{P}_t$ singleton) - Tractable for computation

2. Epstein-Zin with Ambiguity¶

Standard Epstein-Zin:

\[ V_t = \left[(1-\beta) c_t^{1-1/\psi} + \beta \mu_t(V_{t+1})^{1-1/\psi}\right]^{1/(1-1/\psi)} \]

where $\mu_t(V) = (\mathbb{E}_t[V^{1-\gamma}])^{1/(1-\gamma)}$.

With Ambiguity: Replace $\mu_t$ with robust certainty equivalent:

\[ \mu_t^R(V) = \min_{P_t \in \mathcal{P}_t} (\mathbb{E}_{P_t}[V^{1-\gamma}])^{1/(1-\gamma)} \]

Dynamic Consistency: Preserved under rectangularity.

3. Smooth Ambiguity (Recursive)¶

KMM Recursive:

\[ V_t = u(c_t) + \beta \phi^{-1}\left(\mathbb{E}_{\mu_t}\left[\phi\left(\mathbb{E}_{P_t}[V_{t+1}]\right)\right]\right) \]

where $\mu_t$ is a second-order probability over models.

Dynamic Consistency: Automatic with proper recursive structure.

Variational Preferences¶

1. Static Formulation¶

Definition (Maccheroni-Marinacci-Rustichini, 2006):

\[ V(f) = \min_{P} \left\{\mathbb{E}_P[U(f)] + c(P)\right\} \]

where $c: \mathcal{M}_1 \to [0, \infty]$ is a convex, lower semicontinuous cost function.

2. Dynamic Extension¶

Conditional Cost Function: For dynamic consistency, require cost function to factor:

\[ c(P) = \sum_{t=0}^{T-1} \mathbb{E}_P[c_t(P_{t+1}(\cdot | \mathcal{F}_t))] \]

Recursive Formulation:

\[ V_t = \min_{P_t} \left\{\mathbb{E}_{P_t}[u(c_t, a_t) + \beta V_{t+1} | \mathcal{F}_t] + c_t(P_t)\right\} \]

3. Multiplier Preferences (Hansen-Sargent)¶

Static: $c(P) = \theta D_{\text{KL}}(P \| P_0)$

Dynamic:

\[ V_t = \min_{P_t} \left\{\mathbb{E}_{P_t}[u_t + \beta V_{t+1} | \mathcal{F}_t] + \theta D_{\text{KL}}(P_t \| P_{0,t})\right\} \]

Solution: Exponential tilting preserves dynamic consistency.

Alternative Approaches to Inconsistency¶

1. Sophisticated Agents¶

Definition: A sophisticated agent anticipates future preference changes and optimizes accordingly.

Backward Induction: Solve for period-$T$ optimal action, then period-$(T-1)$ optimal given anticipated period-$T$ behavior, etc.

Equilibrium: Subgame-perfect equilibrium within the decision-maker.

2. Naive Agents¶

Definition: A naive agent ignores potential preference changes and assumes future selves will follow current optimal plan.

Behavior: Re-optimizes at each period, potentially cycling or exhibiting erratic behavior.

3. Pre-Commitment¶

Definition: Agent commits at $t=0$ to a complete strategy, enforced by external mechanism.

Limitation: Requires credible commitment technology.

Examples: - Pension contributions with withdrawal penalties - Derivative contracts with fixed terms

4. Self-Control Preferences¶

Gul-Pesendorfer: Model temptation and self-control explicitly:

\[ V(B) = \max_{x \in B} \{u(x) + v(x)\} - \max_{y \in B} v(y) \]

where $v$ represents temptation.

Dynamic Consistency: Maintained through recursive structure.

Financial Applications¶

1. Dynamic Portfolio Choice¶

Problem: Invest over $T$ periods with ambiguity about return distribution.

Inconsistent MEU: Without rectangularity, $t=0$ plan may be abandoned at $t=1$.

Rectangular Solution:

\[ V_t(W_t) = \max_{w_t} \min_{P_t \in \mathcal{P}_t} \mathbb{E}_{P_t}[V_{t+1}(W_{t+1}) | \mathcal{F}_t] \]

with $W_{t+1} = W_t(1 + r_f + w_t^\top (R_t - r_f \mathbf{1}))$.

2. Robust Control in Finance¶

Hansen-Sargent Setup: Continuous-time wealth dynamics under model uncertainty.

HJB Equation:

\[ \rho V = \max_{c, w} \min_h \{u(c) + \mathcal{L}^{w,h} V - \frac{\theta}{2} h^2\} \]

Dynamic Consistency: Preserved by quadratic penalty structure.

3. Dynamic Hedging¶

Problem: Hedge derivative over multiple periods with model uncertainty.

Pathwise Hedging: With rectangularity:

\[ \Pi_t = V_t - \Delta_t S_t \]

is a supermartingale under all $P \in \mathcal{P}$.

Rebalancing: Consistent rebalancing strategy exists.

4. Risk Management Over Time¶

Consistent Risk Measures: For multi-period risk assessment, require:

\[ \rho_0(X) = \rho_0(\rho_1(X | \mathcal{F}_1)) \]

Time Consistency of CVaR: Standard CVaR is NOT time-consistent.

Conditional Risk Measures: Use recursive formulation:

\[ \rho_t(X) = \rho_t^{\text{1-period}}(\rho_{t+1}(X)) \]

Behavioral Implications¶

1. Preference for Flexibility¶

Observation: Under ambiguity, agents may prefer to delay decisions.

Mechanism: Later periods reveal information that may resolve ambiguity.

Implication: Excess option value under ambiguity.

2. Aversion to Information¶

Paradox: Under certain non-EU preferences, agents may avoid costless information.

Example: With MEU, information that reveals which prior is worst-case may reduce welfare.

Resolution: Rectangular priors avoid this pathology.

3. Updating and Learning¶

Bayesian Updating Under MEU: For each $P \in \mathcal{P}$:

\[ P(\cdot | A) = \frac{P(\cdot \cap A)}{P(A)} \]

Challenge: Updated set $\mathcal{P}_A = \{P(\cdot | A): P \in \mathcal{P}\}$ may not preserve rectangularity.

Resolution: Impose rectangularity on the dynamic structure, not just the initial set.

Summary¶

Key Results¶

Dynamic Consistency Failure: MEU and other non-EU preferences can be dynamically inconsistent
Rectangularity Characterization: Epstein-Schneider theorem: MEU is dynamically consistent iff priors are rectangular
Recursive Formulation: Proper recursive structure ensures consistency for variational and smooth ambiguity preferences
Alternative Approaches: Sophisticated agents, pre-commitment, and self-control models address inconsistency

Practical Implications¶

Model Design: When building ambiguity models, ensure rectangularity or use recursive formulations
Computational Tractability: Dynamic consistency enables backward induction algorithms
Interpretation: Dynamically inconsistent preferences may reflect genuine psychological phenomena
Risk Management: Time-consistent risk measures require careful recursive construction

Dynamic consistency is essential for coherent multi-period decision-making and must be explicitly addressed when extending standard expected utility to incorporate ambiguity or robustness concerns.

Exercises¶

Exercise 1. Consider a two-period decision problem with states $\Omega = \{uu, ud, du, dd\}$ and a decision maker with maxmin expected utility over the set $\mathcal{P} = \{P_1, P_2\}$ where $P_1(uu) = 0.36$, $P_1(ud) = 0.24$, $P_1(du) = 0.24$, $P_1(dd) = 0.16$ and $P_2(uu) = 0.16$, $P_2(ud) = 0.24$, $P_2(du) = 0.24$, $P_2(dd) = 0.36$. Check whether $\mathcal{P}$ is rectangular. Does the optimal plan at $t = 0$ remain optimal when reconsidered at $t = 1$?

Solution to Exercise 1

Setup. We have $\Omega = \{uu, ud, du, dd\}$ with the first letter indicating the period-1 outcome and the second the period-2 outcome. The information partition at $t = 1$ is $\mathcal{F}_1 = \{\{uu, ud\}, \{du, dd\}\}$, corresponding to observing the first-period outcome ($u$ or $d$).

Checking rectangularity. A set of priors $\mathcal{P} = \{P_1, P_2\}$ is rectangular if we can freely combine the conditional distributions at each time period. Specifically, we need to check whether the conditional distributions of $P_1$ and $P_2$ given $\mathcal{F}_1$ can be "pasted" independently.

Compute the marginals and conditionals:

Marginals at $t = 1$:

$P_1(u) = P_1(uu) + P_1(ud) = 0.36 + 0.24 = 0.60$
$P_1(d) = P_1(du) + P_1(dd) = 0.24 + 0.16 = 0.40$
$P_2(u) = P_2(uu) + P_2(ud) = 0.16 + 0.24 = 0.40$
$P_2(d) = P_2(du) + P_2(dd) = 0.24 + 0.36 = 0.60$

Conditionals given $u$ (i.e., given $\{uu, ud\}$):

$P_1(uu \mid u) = 0.36 / 0.60 = 0.60$, $\;P_1(ud \mid u) = 0.24 / 0.60 = 0.40$
$P_2(uu \mid u) = 0.16 / 0.40 = 0.40$, $\;P_2(ud \mid u) = 0.24 / 0.40 = 0.60$

Conditionals given $d$ (i.e., given $\{du, dd\}$):

$P_1(du \mid d) = 0.24 / 0.40 = 0.60$, $\;P_1(dd \mid d) = 0.16 / 0.40 = 0.40$
$P_2(du \mid d) = 0.24 / 0.60 = 0.40$, $\;P_2(dd \mid d) = 0.36 / 0.60 = 0.60$

Rectangularity check. Both measures happen to have the property that $P_i$ is a product measure: $P_1$ uses $p_1 = 0.6$ for "$u$" at each stage, and $P_2$ uses $p_2 = 0.4$ for "$u$" at each stage. A rectangular set would allow us to freely combine the period-1 marginal from one measure with the period-2 conditional from another. Specifically, rectangularity requires that for any combination of (marginal at $t=0$, conditional at $t=1$), the resulting joint measure belongs to $\mathcal{P}$.

Consider the "pasted" measure: use $P_1$'s marginal ($P_1(u) = 0.6$, $P_1(d) = 0.4$) but $P_2$'s conditionals ($P_2(\cdot | u)$ and $P_2(\cdot | d)$). This gives:

\[ P_{\text{mixed}}(uu) = 0.6 \times 0.4 = 0.24, \; P_{\text{mixed}}(ud) = 0.6 \times 0.6 = 0.36 \]

\[ P_{\text{mixed}}(du) = 0.4 \times 0.4 = 0.16, \; P_{\text{mixed}}(dd) = 0.4 \times 0.6 = 0.24 \]

This "mixed" measure $P_{\text{mixed}}$ is not in $\mathcal{P} = \{P_1, P_2\}$, so the set of priors is not rectangular.

Dynamic inconsistency. Consider a payoff $f$ with $f(uu) = 10$, $f(ud) = 2$, $f(du) = 2$, $f(dd) = 10$, and suppose the agent chooses between act $f$ and a constant act $g = 5$.

At $t = 0$: $\min\{\mathbb{E}_{P_1}[f], \mathbb{E}_{P_2}[f]\} = \min\{0.36 \cdot 10 + 0.24 \cdot 2 + 0.24 \cdot 2 + 0.16 \cdot 10, \; 0.16 \cdot 10 + 0.24 \cdot 2 + 0.24 \cdot 2 + 0.36 \cdot 10\} = \min\{5.56, 5.56\} = 5.56 > 5$, so $f$ is preferred ex ante.

At $t = 1$, conditional on "$u$" observed: $\min\{0.6 \cdot 10 + 0.4 \cdot 2, \; 0.4 \cdot 10 + 0.6 \cdot 2\} = \min\{6.8, 5.2\} = 5.2$. The worst-case measure has changed from ex ante. For a different payoff structure, this shift in worst-case measure can cause the agent to prefer $g$ at $t = 1$ even though $f$ was preferred at $t = 0$, demonstrating dynamic inconsistency.

Exercise 2. The Epstein-Schneider theorem states that maxmin expected utility with multiple priors is dynamically consistent if and only if the set of priors is rectangular. Prove the "only if" direction: if preferences are dynamically consistent and represented by maxmin EU, then the prior set must be rectangular. Use a simple counterexample to illustrate what goes wrong when rectangularity fails.

Solution to Exercise 2

Claim. If max-min expected utility preferences are dynamically consistent, then the prior set $\mathcal{P}$ must be rectangular.

Proof by contrapositive. We show that if $\mathcal{P}$ is not rectangular, then preferences are dynamically inconsistent.

Suppose $\mathcal{P}$ is not rectangular. Then there exist $P, Q \in \mathcal{P}$ and an event $A \in \mathcal{F}_1$ such that the "pasted" measure $R$ defined by:

\[ R(B) = P(B \cap A) + Q(B \cap A^c) \cdot \frac{P(A^c)}{Q(A^c)} \]

(using $P$'s marginal but $Q$'s conditional on $A^c$) is not in $\mathcal{P}$.

Constructing the counterexample. Consider two acts $f$ and $g$ such that:

Conditional on $A$: $f$ is preferred under all priors (i.e., $\mathbb{E}_{P'}[f \mid A] > \mathbb{E}_{P'}[g \mid A]$ for all $P' \in \mathcal{P}$)
Conditional on $A^c$: $f$ is also preferred under all priors

At $t = 1$, regardless of which event is observed, the agent prefers $f$ to $g$ under the worst-case conditional.

However, at $t = 0$, the worst-case joint measure for $f$ versus $g$ may differ from the pasted worst-case conditionals because $\mathcal{P}$ is not rectangular. Specifically:

\[ \min_{P \in \mathcal{P}} \mathbb{E}_P[f] \neq \min_{P_A} \mathbb{E}_{P_A}[f \mid A] \cdot P(A) + \min_{P_{A^c}} \mathbb{E}_{P_{A^c}}[f \mid A^c] \cdot P(A^c) \]

because the minimizing measure cannot be decomposed into independently chosen conditionals. The ex-ante worst-case measure may use conditionals that do not correspond to any measure in $\mathcal{P}$ when restricted, leading to a situation where the ex-ante ranking differs from the conditional rankings.

Simple illustration. Consider three states $\{\omega_1, \omega_2, \omega_3\}$, $\mathcal{F}_1 = \{\{\omega_1\}, \{\omega_2, \omega_3\}\}$, and priors $\mathcal{P} = \{P_1, P_2\}$ with:

$P_1 = (0.5, 0.25, 0.25)$, $P_2 = (0.5, 0.1, 0.4)$

The conditional on $\{\omega_2, \omega_3\}$: $P_1$ gives $(0.5, 0.5)$ and $P_2$ gives $(0.2, 0.8)$.

Take $f = (0, 10, 0)$ and $g = (0, 4, 4)$.

At $t = 0$: $\min\{P_1: 2.5, P_2: 1.0\} = 1.0$ for $f$, $\min\{P_1: 2.0, P_2: 2.0\} = 2.0$ for $g$. So $g \succ_0 f$.

At $t = 1$, conditional on $\{\omega_2, \omega_3\}$: $\min\{5.0, 2.0\} = 2.0$ for $f$, $\min\{4.0, 2.4\} = 2.4$ for $g$. So $g \succ_1 f$ here too.

But at $t = 1$, conditional on $\{\omega_1\}$: both acts give 0, so the agent is indifferent.

Now modify $f$ slightly to break the tie and create a scenario where the ex-ante preference switches while conditional preferences remain the same. The key mechanism is that without rectangularity, the worst-case measure at $t = 0$ may combine conditionals from different measures, leading to a different ranking than what backward induction (applying worst-case conditionally at each node) would produce.

Therefore, non-rectangularity implies dynamic inconsistency. $\blacksquare$

Exercise 3. For the recursive variational preference $V_t = \min_{P} \{\mathbb{E}_P[V_{t+1} | \mathcal{F}_t] + \theta D_{\text{KL}}(P \| P_0 | \mathcal{F}_t)\}$, show that this formulation is automatically dynamically consistent (without requiring rectangularity) by verifying the tower property: $V_0 = \min_{P_0} \{\mathbb{E}_{P_0}[\min_{P_1}\{\mathbb{E}_{P_1}[V_2 | \mathcal{F}_1] + \theta D_1\} | \mathcal{F}_0] + \theta D_0\}$ can be collapsed into a single minimization.

Solution to Exercise 3

Claim. The recursive variational preference with KL divergence penalty is automatically dynamically consistent.

Setup. The recursive variational preference is:

\[ V_t = \min_{P_t} \left\{\mathbb{E}_{P_t}[V_{t+1} \mid \mathcal{F}_t] + \theta D_{\text{KL}}(P_t \| P_0 \mid \mathcal{F}_t)\right\} \]

where $D_{\text{KL}}(P_t \| P_0 \mid \mathcal{F}_t) = \mathbb{E}_{P_t}\!\left[\log \frac{dP_t}{dP_0} \;\Big|\; \mathcal{F}_t\right]$.

Step 1 — Solve the inner minimization.

The minimizer for $\min_P \{\mathbb{E}_P[X] + \theta D_{\text{KL}}(P \| P_0)\}$ is the exponentially tilted measure:

\[ \frac{dP_t^*}{dP_0} \propto \exp\!\left(-\frac{V_{t+1}}{\theta}\right) \]

Specifically:

\[ P_t^*(\omega) = \frac{\exp(-V_{t+1}(\omega)/\theta)}{\mathbb{E}_{P_0}[\exp(-V_{t+1}/\theta) \mid \mathcal{F}_t]} \cdot P_0(\omega \mid \mathcal{F}_t) \]

Step 2 — Substitute back to get the value.

Substituting $P_t^*$ into the objective yields:

\[ V_t = -\theta \log \mathbb{E}_{P_0}\!\left[\exp\!\left(-\frac{V_{t+1}}{\theta}\right) \;\Big|\; \mathcal{F}_t\right] \]

This is the entropic risk measure (or certainty equivalent under exponential utility).

Step 3 — Verify the tower property.

We need to show that the two-period recursion collapses into a single minimization. Consider:

\[ V_0 = -\theta \log \mathbb{E}_{P_0}\!\left[\exp\!\left(-\frac{V_1}{\theta}\right)\right] \]

where $V_1 = -\theta \log \mathbb{E}_{P_0}\!\left[\exp\!\left(-\frac{V_2}{\theta}\right) \;\Big|\; \mathcal{F}_1\right]$.

Substituting:

\[ V_0 = -\theta \log \mathbb{E}_{P_0}\!\left[\exp\!\left(\log \mathbb{E}_{P_0}\!\left[\exp\!\left(-\frac{V_2}{\theta}\right) \;\Big|\; \mathcal{F}_1\right]\right)\right] \]

\[ = -\theta \log \mathbb{E}_{P_0}\!\left[\mathbb{E}_{P_0}\!\left[\exp\!\left(-\frac{V_2}{\theta}\right) \;\Big|\; \mathcal{F}_1\right]\right] \]

By the tower property of conditional expectation under $P_0$:

\[ V_0 = -\theta \log \mathbb{E}_{P_0}\!\left[\exp\!\left(-\frac{V_2}{\theta}\right)\right] \]

Step 4 — Connection to the single minimization.

This last expression is equivalent to:

\[ V_0 = \min_{P} \left\{\mathbb{E}_P[V_2] + \theta D_{\text{KL}}(P \| P_0)\right\} \]

where the minimization is over all measures $P$ on $(\Omega, \mathcal{F}_T)$. This confirms that the recursive (two-step) minimization collapses into a single global minimization, establishing the tower property.

Why rectangularity is not needed. The key is the additive decomposition of the KL divergence cost:

\[ D_{\text{KL}}(P \| P_0) = D_{\text{KL}}(P_0^{\text{marginal}} \| P_0^{\text{marginal}}) + \mathbb{E}_{P}[D_{\text{KL}}(P_1(\cdot \mid \mathcal{F}_1) \| P_{0,1}(\cdot \mid \mathcal{F}_1))] \]

This chain rule for KL divergence means the penalty function factors across time periods automatically. In the language of variational preferences, the cost function $c(P) = \theta D_{\text{KL}}(P \| P_0)$ satisfies the additivity condition $c(P) = c_0(P_0^{\text{marg}}) + \mathbb{E}_P[c_1(P_1)]$, which is precisely the condition Maccheroni-Marinacci-Rustichini identify as sufficient for dynamic consistency. No rectangularity condition on an uncertainty set is needed because the entropic formulation uses a penalty rather than a constraint. $\blacksquare$

Exercise 4. A pre-commitment strategy is one where the agent commits at $t = 0$ to a plan and cannot deviate. For a dynamically inconsistent preference, compare the welfare of (a) the pre-commitment strategy, (b) the naive strategy (re-optimizing at each period as if preferences were consistent), and (c) the sophisticated strategy (backward induction taking future re-optimization into account). Illustrate with a numerical example.

Solution to Exercise 4

Setup. Consider a two-period model with a risky asset and a risk-free asset. At each period, the investor chooses portfolio weight $w_t$ in the risky asset. The risky return $R_t$ takes values $u$ (up) or $d$ (down). The investor has max-min expected utility with a non-rectangular set of priors $\mathcal{P} = \{P_1, P_2\}$.

Numerical example. Let $u = 1.2$, $d = 0.9$, $r_f = 0$, initial wealth $W_0 = 1$, and utility $U(W) = \log(W)$.

$P_1$: $p_1^{(1)} = 0.6$ (period 1 up-probability), $p_2^{(1)} = 0.6$ (period 2)
$P_2$: $p_1^{(2)} = 0.4$ (period 1), $p_2^{(2)} = 0.7$ (period 2)

The set is non-rectangular because the period-2 conditional probabilities are linked to the period-1 choice of measure.

(a) Pre-commitment strategy. The agent commits at $t = 0$ to a pair $(w_0, w_1)$ (here $w_1$ is state-independent for simplicity). The worst-case expected utility is:

\[ V^{\text{pre}} = \max_{w_0, w_1} \min_{P \in \{P_1, P_2\}} \mathbb{E}_P[\log(W_2)] \]

where $W_2 = W_0(1 + w_0(R_1 - 1))(1 + w_1(R_2 - 1))$.

Under $P_1$ ($p_1 = p_2 = 0.6$): Each period has $\mathbb{E}_1[\log(1 + w(R-1))] = 0.6\log(1+0.2w) + 0.4\log(1-0.1w)$. For $w_0 = w_1 = w$, this is $2[0.6\log(1+0.2w) + 0.4\log(1-0.1w)]$.

Under $P_2$ ($p_1 = 0.4, p_2 = 0.7$): $[0.4\log(1+0.2w_0) + 0.6\log(1-0.1w_0)] + [0.7\log(1+0.2w_1) + 0.3\log(1-0.1w_1)]$.

Numerically optimizing (using $w_0 = w_1 = w = 0.4$ as a reasonable choice):

$P_1$: $2[0.6 \log(1.08) + 0.4\log(0.96)] = 2[0.04617 - 0.01633] = 0.05968$
$P_2$: $[0.4\log(1.08) + 0.6\log(0.96)] + [0.7\log(1.08) + 0.3\log(0.96)] = [0.03078 - 0.02449] + [0.05389 - 0.01225] = 0.00629 + 0.04164 = 0.04793$

Pre-commitment worst case $\approx 0.0479$.

(b) Naive strategy. The naive agent re-optimizes at each period assuming preferences are consistent.

At $t = 0$: The agent solves as if the worst-case measure is fixed. Suppose $P_2$ is worst-case, giving optimal $w_0^{\text{naive}} \approx 0.2$ (since $p_1^{(2)} = 0.4$ makes the risky asset less attractive).

At $t = 1$: The agent re-optimizes. Now the worst case for period 2 alone might be different. Under $P_1$, $p_2 = 0.6$; under $P_2$, $p_2 = 0.7$. The worst case for a single period with these probabilities is $P_1$ (lower up-probability), giving optimal $w_1^{\text{naive}} \approx 0.5$ (solving $0.6 \cdot 0.2/(1+0.2w) = 0.4 \cdot 0.1/(1-0.1w)$, which gives $w \approx 5/7 \approx 0.71$; but with $P_1$ worst-case, $w \approx 0.5$).

The naive agent switches worst-case measures between periods, leading to inconsistent behavior.

Evaluating: with $w_0 = 0.2$ and $w_1 = 0.5$:

$P_1$: $0.6\log(1.04) + 0.4\log(0.98) + 0.6\log(1.10) + 0.4\log(0.95) = 0.02353 - 0.00808 + 0.05724 - 0.02051 = 0.05218$
$P_2$: $0.4\log(1.04) + 0.6\log(0.98) + 0.7\log(1.10) + 0.3\log(0.95) = 0.01569 - 0.01212 + 0.06678 - 0.01538 = 0.05497$

Naive worst case $\approx 0.0522$.

(c) Sophisticated strategy. The sophisticated agent uses backward induction, anticipating future re-optimization.

Period 1 (backward induction): At $t = 1$, the agent will solve:

\[ w_1^* = \arg\max_{w_1} \min_{p_2 \in \{0.6, 0.7\}} [p_2 \log(1 + 0.2 w_1) + (1-p_2)\log(1 - 0.1 w_1)] \]

The worst case is $p_2 = 0.6$ (lower probability of up move). The optimal $w_1^*$ under $p_2 = 0.6$ solves:

\[ \frac{0.6 \times 0.2}{1 + 0.2w_1} = \frac{0.4 \times 0.1}{1 - 0.1w_1} \]

\[ 0.12(1 - 0.1w_1) = 0.04(1 + 0.2w_1) \implies 0.12 - 0.012w_1 = 0.04 + 0.008w_1 \implies w_1^* = 4.0 \]

But $w_1 = 4$ implies extreme leverage. In practice we would constrain $w_1 \in [0, 1]$, giving $w_1^* = 1.0$.

Let us use $w_1^* = 1.0$ (constrained). Then at $t = 0$, the sophisticated agent solves:

\[ w_0^* = \arg\max_{w_0} \min_{P \in \{P_1, P_2\}} \mathbb{E}_P[\log(W_1(1 + R_2 - 1))] \]

anticipating $w_1^* = 1.0$. This gives a well-defined but different optimization than either the pre-commitment or naive approaches.

Welfare comparison. In general:

\[ V^{\text{pre}} \geq V^{\text{sophisticated}} \geq V^{\text{naive}} \]

The pre-commitment strategy achieves the highest welfare because the agent can jointly optimize across periods without worrying about future deviations. The sophisticated strategy is second-best because the agent correctly anticipates constraints from future re-optimization. The naive strategy is worst because it fails to account for the time-inconsistency, leading to uncoordinated decisions.

Exercise 5. Time-consistent dynamic risk measures must satisfy the property $\rho_t(X) = \rho_t(-\rho_{t+1}(X))$. Show that the conditional entropic risk measure $\rho_t(X) = \frac{1}{\beta}\log \mathbb{E}_t[e^{\beta X}]$ satisfies this recursion, making it time-consistent. Then show that the conditional CVaR does not satisfy this property in general.

Solution to Exercise 5

Part 1: Conditional entropic risk measure is time-consistent.

The conditional entropic risk measure is defined as:

\[ \rho_t(X) = \frac{1}{\beta}\log \mathbb{E}_t[e^{\beta X}] \]

where $\beta > 0$ is the risk aversion parameter and $X$ represents a loss.

We need to verify: $\rho_t(X) = \rho_t(-\rho_{t+1}(X))$.

Compute the right-hand side:

\[ \rho_{t+1}(X) = \frac{1}{\beta}\log \mathbb{E}_{t+1}[e^{\beta X}] \]

\[ -\rho_{t+1}(X) = -\frac{1}{\beta}\log \mathbb{E}_{t+1}[e^{\beta X}] \]

\[ \rho_t(-\rho_{t+1}(X)) = \frac{1}{\beta}\log \mathbb{E}_t\!\left[\exp\!\left(\beta \cdot \left(-\frac{1}{\beta}\log \mathbb{E}_{t+1}[e^{\beta X}]\right)\right)\right] \]

Note the sign convention: $\rho_t(-\rho_{t+1}(X))$ means we are applying $\rho_t$ to the "loss" $-\rho_{t+1}(X)$. For the standard recursion $\rho_t(X) = \rho_t^{1\text{-period}}(\rho_{t+1}(X))$, we evaluate:

\[ \rho_t(\rho_{t+1}(X)) = \frac{1}{\beta}\log \mathbb{E}_t\!\left[e^{\beta \rho_{t+1}(X)}\right] \]

Substituting $\rho_{t+1}(X)$:

\[ = \frac{1}{\beta}\log \mathbb{E}_t\!\left[\exp\!\left(\log \mathbb{E}_{t+1}[e^{\beta X}]\right)\right] = \frac{1}{\beta}\log \mathbb{E}_t\!\left[\mathbb{E}_{t+1}[e^{\beta X}]\right] \]

By the tower property of conditional expectation:

\[ = \frac{1}{\beta}\log \mathbb{E}_t[e^{\beta X}] = \rho_t(X) \]

This confirms time consistency: $\rho_t(X) = \rho_t(\rho_{t+1}(X))$. $\checkmark$

Part 2: Conditional CVaR is not time-consistent.

The conditional CVaR at level $\alpha$ is:

\[ \text{CVaR}_\alpha^t(X) = \frac{1}{\alpha}\int_0^\alpha \text{VaR}_u^t(X)\,du \]

Counterexample. Consider $t = 0, 1, 2$ with:

At $t = 1$: coin flip reveals $H$ or $T$ (each with probability $1/2$).
At $t = 2$ given $H$: loss $X = 0$ (certain).
At $t = 2$ given $T$: loss $X = 100$ with probability $0.1$, loss $X = 0$ with probability $0.9$.

Take $\alpha = 0.05$.

Compute $\text{CVaR}_{0.05}^1(X)$:

Given $H$: $\text{CVaR}_{0.05}(0) = 0$.
Given $T$: The $5\%$ tail is entirely within the $10\%$ probability of $X = 100$. So $\text{CVaR}_{0.05}(X \mid T) = 100$.

Recursive CVaR: $\rho_0(\rho_1(X))$ where $\rho_1(X)$ takes value 0 (with prob 1/2, state $H$) or 100 (with prob 1/2, state $T$).

\[ \text{CVaR}_{0.05}^0(\rho_1(X)) = \text{CVaR}_{0.05}\text{ of } \{0 \text{ w.p. } 0.5, \; 100 \text{ w.p. } 0.5\} \]

The worst $5\%$ of this distribution is entirely within the $100$ outcome, so $\text{CVaR}_{0.05}^0(\rho_1(X)) = 100$.

Direct CVaR: $\text{CVaR}_{0.05}^0(X)$ applied to the unconditional distribution:

\[ X = 0 \text{ w.p. } 0.5 + 0.5 \times 0.9 = 0.95, \quad X = 100 \text{ w.p. } 0.5 \times 0.1 = 0.05 \]

The worst $5\%$ is exactly the event $\{X = 100\}$, so $\text{CVaR}_{0.05}^0(X) = 100$.

In this specific example, the values happen to match. Let us adjust: take $\alpha = 0.10$.

Direct: $\text{CVaR}_{0.10}^0(X)$: The worst $10\%$ includes all of the $X = 100$ event (prob $0.05$) and some of the $X = 0$ event (prob $0.05$). So $\text{CVaR}_{0.10} = (0.05 \times 100 + 0.05 \times 0)/0.10 = 50$.

Recursive: $\text{CVaR}_{0.10}^1(X \mid T) = 100$ (worst $10\%$ of the conditional). $\text{CVaR}_{0.10}^1(X \mid H) = 0$. So $\rho_1(X) \in \{0, 100\}$ each w.p. $0.5$.

$\text{CVaR}_{0.10}^0(\rho_1) = $ worst $10\%$ of $\{0, 100\}$ each w.p. $0.5$. Worst $10\%$ falls entirely in the $100$ mass, so $\text{CVaR}_{0.10}^0(\rho_1) = 100$.

Since $100 \neq 50$, the recursive CVaR gives a different answer from the direct CVaR, proving that CVaR is not time-consistent. $\blacksquare$

Exercise 6. In a portfolio choice problem over two periods, an ambiguity-averse investor with non-rectangular priors faces a time-inconsistency. The sophisticated strategy uses backward induction: at $t = 1$, the agent optimizes given current beliefs, and at $t = 0$, the agent anticipates this future behavior. Formulate this as a game between the "period-0 self" and "period-1 self" and solve for the subgame perfect equilibrium in a simple two-asset, two-period model.

Solution to Exercise 6

Model setup. Consider a two-period problem ($t = 0, 1$) with:

A risk-free asset with return $r_f = 0$
A risky asset with return $R_t \in \{u, d\}$ at each period
$u = 1.15$ (15% up), $d = 0.90$ (10% down)
Log utility: $U(W) = \log W$
Initial wealth $W_0 = 1$

Non-rectangular priors: $\mathcal{P} = \{P_1, P_2\}$ where:

$P_1$: $p_1 = 0.55, p_2 = 0.55$ (both periods moderately optimistic)
$P_2$: $p_1 = 0.45, p_2 = 0.65$ (pessimistic first period, optimistic second)

This is non-rectangular because the period-2 conditional probability depends on which joint measure is selected.

Game formulation. The "period-0 self" chooses $w_0 \in [0, 1]$. The "period-1 self" chooses $w_1 \in [0, 1]$ after observing the period-1 realization. These are two "players" in an intrapersonal game.

Period-1 self's problem (for each possible $W_1$):

The period-1 self applies max-min over the conditional period-2 distributions:

\[ w_1^* = \arg\max_{w_1 \in [0,1]} \min_{p_2 \in \{0.55, 0.65\}} \left[p_2 \log(W_1(1 + 0.15 w_1)) + (1-p_2)\log(W_1(1 - 0.10 w_1))\right] \]

Since $\log W_1$ is a constant that does not affect the optimization over $w_1$, the period-1 self solves:

\[ w_1^* = \arg\max_{w_1} \min_{p_2 \in \{0.55, 0.65\}} \left[p_2 \log(1 + 0.15 w_1) + (1-p_2)\log(1 - 0.10 w_1)\right] \]

The worst case is $p_2 = 0.55$ (the lower up-probability). The FOC at $p_2 = 0.55$:

\[ \frac{0.55 \times 0.15}{1 + 0.15 w_1} = \frac{0.45 \times 0.10}{1 - 0.10 w_1} \]

\[ 0.0825(1 - 0.10 w_1) = 0.045(1 + 0.15 w_1) \]

\[ 0.0825 - 0.00825 w_1 = 0.045 + 0.00675 w_1 \]

\[ 0.0375 = 0.015 w_1 \implies w_1^* = 2.5 \]

Since $w_1^* > 1$ and we constrain $w_1 \in [0, 1]$, we set $w_1^* = 1.0$.

Let us verify the worst case at $w_1 = 1$:

$p_2 = 0.55$: $0.55 \log(1.15) + 0.45 \log(0.90) = 0.55(0.1398) + 0.45(-0.1054) = 0.0769 - 0.0474 = 0.0295$
$p_2 = 0.65$: $0.65(0.1398) + 0.35(-0.1054) = 0.0909 - 0.0369 = 0.0540$

Worst case is $p_2 = 0.55$, giving continuation value increment $0.0295$ per period-2.

Period-0 self's problem (sophisticated):

The period-0 self anticipates $w_1^* = 1$ and solves:

\[ w_0^* = \arg\max_{w_0 \in [0,1]} \min_{P \in \{P_1, P_2\}} \mathbb{E}_P[\log(W_0(1 + w_0(R_1 - 1))) + \log(1 + w_1^*(R_2 - 1))] \]

Since $w_1^* = 1$ is fixed, the second-period contribution under each measure is:

$P_1$: $0.55 \log(1.15) + 0.45 \log(0.90) = 0.0295$
$P_2$: $0.65 \log(1.15) + 0.35 \log(0.90) = 0.0540$

For the first period under each measure:

$P_1$ ($p_1 = 0.55$): $0.55\log(1 + 0.15w_0) + 0.45\log(1 - 0.10w_0) + 0.0295$
$P_2$ ($p_1 = 0.45$): $0.45\log(1 + 0.15w_0) + 0.55\log(1 - 0.10w_0) + 0.0540$

At $w_0 = 1$:

$P_1$: $0.55(0.1398) + 0.45(-0.1054) + 0.0295 = 0.0769 - 0.0474 + 0.0295 = 0.0590$
$P_2$: $0.45(0.1398) + 0.55(-0.1054) + 0.0540 = 0.0629 - 0.0580 + 0.0540 = 0.0589$

The worst case across the two measures is essentially tied at about $0.059$.

At $w_0 = 0.5$:

$P_1$: $0.55\log(1.075) + 0.45\log(0.95) + 0.0295 = 0.55(0.0723) + 0.45(-0.0513) + 0.0295 = 0.0398 - 0.0231 + 0.0295 = 0.0462$
$P_2$: $0.45(0.0723) + 0.55(-0.0513) + 0.0540 = 0.0325 - 0.0282 + 0.0540 = 0.0583$

Worst case is $0.0462$ (under $P_1$), which is less than $0.059$. So the sophisticated agent prefers $w_0 = 1$.

Subgame perfect equilibrium: $(w_0^*, w_1^*) = (1.0, 1.0)$ with the worst-case expected log-utility approximately $0.059$.

Key insight. The sophisticated strategy differs from the pre-commitment strategy because the period-0 self cannot control the period-1 self's choice. The period-1 self uses the worst-case conditional $p_2 = 0.55$, even though the period-0 self might have preferred to use the joint worst-case measure $P_2$ (which has $p_2 = 0.65$, a more optimistic second period). This inability of the current self to dictate future behavior is the essence of the intrapersonal game and the source of welfare loss relative to pre-commitment.