Bellman-Taylor score decoding framework for MDPs with implicit state-dependent action constraints, enabling standard DRL optimization with a decomposed optimality gap guarantee.
arXiv preprint arXiv:2501.10523 , year=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
representative citing papers
citing papers explorer
-
Bellman-Taylor Score Decoding for Markov Decision Processes with State-Dependent Feasible Action Sets
Bellman-Taylor score decoding framework for MDPs with implicit state-dependent action constraints, enabling standard DRL optimization with a decomposed optimality gap guarantee.
- Inpatient Overflow Management with Proximal Policy Optimization