Strong duality holds for weakly communicating average-reward CMDPs, enabling a primal-dual clipped value iteration algorithm with improved regret and constraint violation bounds of order T^{2/3}.
(23) By Lemma 5, we have (II)≤0.(24) Moreover, since (III) is the sum of a martingale difference sequence, it can be bounded using the Azuma-Hoeffding inequality
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Learning Weakly Communicating Average-Reward CMDPs: Strong Duality and Improved Regret
Strong duality holds for weakly communicating average-reward CMDPs, enabling a primal-dual clipped value iteration algorithm with improved regret and constraint violation bounds of order T^{2/3}.