Natural policy gradient is a special case of doubly smoothed policy iteration that achieves distribution-free global geometric convergence to an epsilon-optimal policy in O((1-gamma)^{-1} log((1-gamma)^{-1} epsilon^{-1})) iterations.
Our next result gives a bound on the certificate corresponding to the policies generated by the DPI algorithm
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Natural Policy Gradient as Doubly Smoothed Policy Iteration: A Bellman-Operator Framework
Natural policy gradient is a special case of doubly smoothed policy iteration that achieves distribution-free global geometric convergence to an epsilon-optimal policy in O((1-gamma)^{-1} log((1-gamma)^{-1} epsilon^{-1})) iterations.