Strong duality holds for weakly communicating average-reward CMDPs, enabling a primal-dual clipped value iteration algorithm with improved regret and constraint violation bounds of order T^{2/3}.
Jiahui Zhu, Kihyun Yu, Dabeen Lee, Xin Liu, and Honghao Wei
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Learning Weakly Communicating Average-Reward CMDPs: Strong Duality and Improved Regret
Strong duality holds for weakly communicating average-reward CMDPs, enabling a primal-dual clipped value iteration algorithm with improved regret and constraint violation bounds of order T^{2/3}.