Beyond the Bellman Fixed Point: Geometry and Fast Policy Identification in Value Iteration

· 2026 · math.OC · arXiv 2604.17457

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Q-value iteration (Q-VI) is usually analyzed through the \(\gamma\)-contraction of the Bellman operator. This argument proves convergence to \(Q^*\), but it gives only a coarse account of when the induced greedy policy becomes optimal. We study discounted Q-VI as a switching system and focus on the practically optimal solution set (POSS), the set of \(Q\)-functions whose tie-broken greedy policies are optimal. The main result shows that Q-VI reaches the optimal action class in finite time by entering an invariant tube around \(\mathcal X_1=Q^*+\operatorname{span}(\mathbf 1)\), which is contained in the POSS. For every \(\varepsilon>0\), the distance to \(\mathcal X_1\) satisfies an exponential bound with rate \((\bar\rho+\varepsilon)^k\), where \(\bar\rho\) is the joint spectral radius of the projected switching family restricted to directions transverse to \(\mathcal X_1\). When \(\bar\rho<\gamma\), this transverse convergence is faster than the classical contraction rate. The analysis separates fast policy identification from the subsequent convergence to \(Q^*\), which may still be governed by the all-ones mode. We also give spectral and graph-theoretic conditions under which the strict inequality \(\bar\rho<\gamma\) holds or fails.

representative citing papers

Switching-Geometry Analysis of Deflated Q-Value Iteration

math.OC · 2026-05-11 · unverdicted · novelty 7.0

Deflated Q-VI is algebraically equivalent to recentering standard Q-VI, yet its error dynamics are governed by the joint spectral radius of a projected switching system that can be strictly smaller than the discount factor γ.

citing papers explorer

Showing 1 of 1 citing paper.

Switching-Geometry Analysis of Deflated Q-Value Iteration math.OC · 2026-05-11 · unverdicted · none · ref 10 · internal anchor
Deflated Q-VI is algebraically equivalent to recentering standard Q-VI, yet its error dynamics are governed by the joint spectral radius of a projected switching system that can be strictly smaller than the discount factor γ.

Beyond the Bellman Fixed Point: Geometry and Fast Policy Identification in Value Iteration

fields

years

verdicts

representative citing papers

citing papers explorer