A two-stage actor-critic RL algorithm learns deterministic equilibrium policies for general time-inconsistent control problems by combining DPG on an auxiliary time-consistent problem with fixed-point iteration on auxiliary functions.
Exploratory optimal stopping: A singular control formulation
4 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Solutions to the regularized exploratory equilibrium HJB equation converge in suitable norms to a strong solution of the original EHJB as the entropy parameter vanishes, yielding existence of equilibria without conventional stringent regularity assumptions.
Proves regularized value solves elliptic HJB system with Gibbs policy, approximates classical optimum with O(λ log 1/λ) error, and shows mirror descent flow converges at O(1/(e^{λs}-1) + λ log 1/λ) or O(log s / sqrt(s)).
citing papers explorer
-
Equilibrium under Time-Inconsistency: A New Existence Theory by Vanishing Entropy Regularization
Solutions to the regularized exploratory equilibrium HJB equation converge in suitable norms to a strong solution of the original EHJB as the entropy parameter vanishes, yielding existence of equilibria without conventional stringent regularity assumptions.
-
Randomized Optimal Switching Problem and Related Mirror Descent Flow
Proves regularized value solves elliptic HJB system with Gibbs policy, approximates classical optimum with O(λ log 1/λ) error, and shows mirror descent flow converges at O(1/(e^{λs}-1) + λ log 1/λ) or O(log s / sqrt(s)).
- A Two-fold Randomization Framework for Impulse Control Problems