Strong duality holds for weakly communicating average-reward CMDPs, enabling a primal-dual clipped value iteration algorithm with improved regret and constraint violation bounds of order T^{2/3}.
This leads to lim n→∞ 1 n nX t=1 ¯µt(s) = lim n→∞ 1 n τ−1X t=1 ¯µt(s) + n−τ+ 1 n 1 n−τ+ 1 nX t=τ ¯µt(s) ! =d π(s)a.s
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Learning Weakly Communicating Average-Reward CMDPs: Strong Duality and Improved Regret
Strong duality holds for weakly communicating average-reward CMDPs, enabling a primal-dual clipped value iteration algorithm with improved regret and constraint violation bounds of order T^{2/3}.