Q-value iteration enters an invariant tube around Q* plus the all-ones vector in finite time, with distance decaying at rate given by the joint spectral radius of the transverse projected switching family, which can be strictly faster than the discount factor.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
Unified ODE convergence analysis for smooth Q-learning variants via p-norm Lyapunov functions, valid even when the Bellman operator is not a contraction.
citing papers explorer
-
Beyond the Bellman Fixed Point: Geometry and Fast Policy Identification in Value Iteration
Q-value iteration enters an invariant tube around Q* plus the all-ones vector in finite time, with distance decaying at rate given by the joint spectral radius of the transverse projected switching family, which can be strictly faster than the discount factor.
-
Toward a Unified Lyapunov-Certified ODE Convergence Analysis of Smooth Q-Learning with p-Norms
Unified ODE convergence analysis for smooth Q-learning variants via p-norm Lyapunov functions, valid even when the Bellman operator is not a contraction.