Continuous-Time Robust Dynamic Programming
pith:GPMTNEBO Add to your LaTeX paper
What is a Pith Number?\usepackage{pith}
\pithnumber{GPMTNEBO}
Prints a linked pith:GPMTNEBO badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more
read the original abstract
This paper presents a new theory, known as robust dynamic pro- gramming, for a class of continuous-time dynamical systems. Different from traditional dynamic programming (DP) methods, this new theory serves as a fundamental tool to analyze the robustness of DP algorithms, and in par- ticular, to develop novel adaptive optimal control and reinforcement learning methods. In order to demonstrate the potential of this new framework, four illustrative applications in the fields of stochastic optimal control and adaptive DP are presented. Three numerical examples arising from both finance and engineering industries are also given, along with several possible extensions of the proposed framework.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
DVPO: Distributional Value Modeling-based Policy Optimization for LLM Post-Training
DVPO learns token-level value distributions and uses asymmetric risk regularization to contract lower tails while expanding upper tails, outperforming PPO and GRPO under noisy supervision in dialogue, math, and QA tasks.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.