Recognition: 2 theorem links
· Lean TheoremSwitching-Geometry Analysis of Deflated Q-Value Iteration
Pith reviewed 2026-05-12 03:14 UTC · model grok-4.3
The pith
Deflated Q-value iteration obtains a convergence rate from a projected joint spectral radius that can be strictly smaller than the discount factor.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The standard Q-VI switching system model has JSR exactly γ since all admissible subsystems share the all-ones vector as an invariant direction. By passing to the quotient space that removes this direction, we obtain a projected switching system model whose JSR governs the relevant error dynamics and may be strictly smaller than γ. The correction is equivalent to a scalar recentering of standard Q-VI. Hence, the projected trajectory, and therefore the greedy-policy sequence, is unchanged relative to standard Q-VI initialized from the same point. The benefit of deflation is not a change in the induced decision-making problem, but a more precise JSR-based description of the convergence geometry
What carries the argument
The projected switching system model obtained by quotienting out the all-ones invariant direction from the standard Q-VI switching model, whose joint spectral radius governs the deflated error dynamics.
If this is right
- The convergence rate of deflated Q-VI is governed by the JSR of the projected system rather than by γ.
- This projected JSR can be strictly smaller than γ for some MDPs, yielding a sharper rate certificate.
- The sequence of greedy policies generated by deflated Q-VI is identical to that of standard Q-VI from the same initialization.
- Deflation functions purely as a geometric recentering that does not alter the underlying decision-making problem.
Where Pith is reading between the lines
- The same quotient-space reduction could be applied to other iterative schemes in reinforcement learning that possess redundant invariant directions.
- Explicit computation of the projected JSR for given MDPs might serve as a diagnostic for convergence speed in practice.
- The framework suggests that analogous geometric projections could tighten rate analyses for undiscounted or average-reward value iteration.
Load-bearing premise
Every admissible subsystem in the standard Q-VI switching model shares the all-ones vector as an invariant direction.
What would settle it
A concrete MDP together with its admissible policies for which at least one subsystem matrix fails to map the all-ones vector to itself; the projection would then be invalid and the claimed possibility of a strictly smaller JSR would not hold.
Figures
read the original abstract
This paper develops a joint spectral radius (JSR) framework for analyzing rank-one deflated Q-value iteration (Q-VI) in discounted Markov decision process control. Focusing on an all-ones residual correction, we interpret the resulting algorithm through the geometry of switching systems and, to the best of our knowledge, give the first JSR-based convergence analysis of deflated Q-VI for policy optimization problems. Our analysis reveals that the standard Q-VI switching system model has JSR exactly the discount factor $\gamma\in (0,1)$, since all admissible subsystems share the all-ones vector as an invariant direction. By passing to the quotient space that removes this direction, we obtain a projected switching system model whose JSR governs the relevant error dynamics and may be strictly smaller than $\gamma$. Therefore, the deflated Q-VI admits a potentially sharper convergence-rate characterization than the ambient-space $\gamma$-bound. Finally, we prove that the correction is equivalent to a scalar recentering of standard Q-VI. Hence, the projected trajectory, and therefore the greedy-policy sequence, is unchanged relative to standard Q-VI initialized from the same point. The benefit of deflation is not a change in the induced decision-making problem, but a more precise JSR-based description of the convergence geometry after the redundant all-ones component is removed.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript develops a joint spectral radius (JSR) framework for rank-one deflated Q-value iteration in discounted MDPs. It shows that every admissible subsystem in the standard Q-VI switching model shares the all-ones vector as an invariant direction (by row-stochasticity of the embedded kernels), yielding ambient JSR exactly equal to the discount factor γ. Passing to the quotient space orthogonal to this direction produces a projected switching system whose JSR governs the relevant error dynamics and may be strictly smaller than γ. The deflation step is proved algebraically equivalent to a scalar recentering of standard Q-VI, so that the projected trajectory and the induced greedy-policy sequence remain identical to those of standard Q-VI started from the same initial point. The claimed benefit is therefore a more precise geometric description of convergence rather than any change in the decision-making problem.
Significance. If the central claims hold, the work supplies the first JSR-based convergence analysis of deflated Q-VI and a clean geometric explanation for why removing the redundant all-ones component can tighten the rate bound. The algebraic equivalence result is a strength, as is the observation that the invariance property holds by construction for any row-stochastic transition kernel. The contribution is primarily conceptual and analytic rather than algorithmic; it refines the understanding of existing methods without altering their output.
major comments (2)
- [main theorem / projected-system section] The claim that the projected JSR 'may be strictly smaller than γ' (abstract and main theorem) is load-bearing for the asserted sharper characterization, yet the manuscript provides neither an explicit construction of the projected switching matrices nor a concrete numerical example in which the reduction is computed and verified. Without such evidence the potential improvement remains unquantified.
- [switching-system model definition] The derivation that the ambient JSR equals γ rests on the joint spectral radius of the family of matrices that all fix the all-ones vector; the manuscript should supply the precise definition of the admissible set of matrices (or policies) and the explicit matrix representation of the switching system before the quotient projection is taken, so that the invariance and the subsequent reduction can be checked directly.
minor comments (2)
- [quotient-space paragraph] Notation for the quotient-space projection operator and the reduced state space should be introduced once and used consistently; currently the transition between ambient and projected quantities is described only informally.
- [equivalence proof] The statement that 'the projected trajectory, and therefore the greedy-policy sequence, is unchanged' would benefit from an explicit one-line reference to the algebraic identity that maps the deflated update onto the recentered standard update.
Simulated Author's Rebuttal
We appreciate the referee's thorough review and valuable feedback on our manuscript. The suggestions to enhance the clarity of the switching system model and to provide concrete evidence for the potential improvement in the JSR bound are well-taken. We will incorporate revisions to address these points directly. Our point-by-point responses follow.
read point-by-point responses
-
Referee: [main theorem / projected-system section] The claim that the projected JSR 'may be strictly smaller than γ' (abstract and main theorem) is load-bearing for the asserted sharper characterization, yet the manuscript provides neither an explicit construction of the projected switching matrices nor a concrete numerical example in which the reduction is computed and verified. Without such evidence the potential improvement remains unquantified.
Authors: We concur that an explicit construction and numerical verification would strengthen the presentation of the main result. Accordingly, we will add to the revised manuscript an explicit description of the projected switching matrices obtained by quotienting out the all-ones direction. Furthermore, we will include a concrete numerical example based on a small MDP instance, where we explicitly compute the family of projected matrices, determine their joint spectral radius, and confirm that it is strictly less than the discount factor γ. This example will illustrate the potential for a sharper convergence characterization. revision: yes
-
Referee: [switching-system model definition] The derivation that the ambient JSR equals γ rests on the joint spectral radius of the family of matrices that all fix the all-ones vector; the manuscript should supply the precise definition of the admissible set of matrices (or policies) and the explicit matrix representation of the switching system before the quotient projection is taken, so that the invariance and the subsequent reduction can be checked directly.
Authors: We will expand the section on the switching-system model to provide the precise definition of the admissible set of matrices. Specifically, we will define the set as the collection of all possible Q-Bellman operators corresponding to deterministic stationary policies, expressed in matrix form. We will also present the explicit matrix representation of these operators prior to the projection step, emphasizing the property that each matrix is row-stochastic (up to the discount factor) and thus leaves the all-ones vector invariant. This addition will enable readers to directly verify the ambient JSR equaling γ and the validity of the subsequent reduction to the quotient space. revision: yes
Circularity Check
No significant circularity identified
full rationale
The paper's core derivation begins from the standard MDP fact that each fixed-policy operator A_π satisfies A_π 1 = γ 1 because the transition kernel is row-stochastic; this is an external modeling assumption, not an internal definition. The claim that the ambient switching system's JSR equals γ follows immediately from the definition of joint spectral radius once all subsystems share the same eigenvector with eigenvalue γ. The subsequent quotient-space projection, the possibility that the projected JSR is strictly smaller than γ, and the algebraic equivalence between deflation and scalar recentering are obtained by direct linear-algebraic manipulation of the error recursion without parameter fitting, without renaming known results, and without load-bearing self-citations. All steps remain self-contained against ordinary MDP theory and the definition of JSR.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Q-value iteration updates can be represented as a finite set of linear subsystems whose selection depends on the current greedy policy.
- domain assumption Every admissible subsystem leaves the all-ones vector invariant.
Reference graph
Works this paper leans on
- [1]
-
[2]
Dynamic programming and optimal control 4th edition, volume ii.Athena Scientific, 2015
Dimitri P Bertsekas. Dynamic programming and optimal control 4th edition, volume ii.Athena Scientific, 2015
work page 2015
-
[3]
Dimitri P. Bertsekas and John N. Tsitsiklis.Neuro-dynamic programming. Athena Scientific Belmont, MA, 1996
work page 1996
-
[4]
Vincent D Blondel and Yurii Nesterov. Computationally efficient approximations of the joint spectral radius.SIAM Journal on Matrix Analysis and Applications, 27(1):256–272, 2005
work page 2005
- [5]
-
[6]
V. Goyal and J. Grand-Clément. A first-order approach to accelerated value iteration.Operations Research, 71(2):517–535, 2023
work page 2023
-
[7]
J. P. Hespanha and A. S. Morse. Stability of switched systems with average dwell-time. In Proceedings of the 38th IEEE Conference on Decision and Control, pages 2655–2660, 1999
work page 1999
-
[8]
Springer Science & Business Media, 2009
Raphaël Jungers.The joint spectral radius: Theory and applications, volume 385. Springer Science & Business Media, 2009
work page 2009
-
[9]
A. S. Kolarijani, T. Ok, P. Mohajerin Esfahani, and M. A. S. Kolarijani. Rank-one modified value iteration. InProceedings of the 42nd International Conference on Machine Learning, volume 267 ofProceedings of Machine Learning Research, pages 31182–31201, 2025
work page 2025
-
[10]
Beyond the Bellman Fixed Point: Geometry and Fast Policy Identification in Value Iteration
Donghwan Lee. Beyond the bellman fixed point: geometry and fast policy identification in value iteration.arXiv preprint arXiv:2604.17457v4, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[11]
Lyapunov-Certified Direct Switching Theory for Q-Learning
Donghwan Lee. Lyapunov-certified direct switching theory for q-learning.arXiv preprint arXiv:2604.19569v2, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[12]
J. Lee, A. Rakhsha, E. K. Ryu, and A.-m. Farahmand. Deflated dynamics value iteration. Transactions on Machine Learning Research, 2025
work page 2025
-
[13]
Springer Science & Business Media, 2003
Daniel Liberzon.Switching in systems and control. Springer Science & Business Media, 2003
work page 2003
-
[14]
Hai Lin and Panos J Antsaklis. Stability and stabilizability of switched linear systems: A survey of recent results.IEEE Transactions on Automatic Control, 54(2):308–322, 2009. 35
work page 2009
-
[15]
Meyer.Matrix Analysis and Applied Linear Algebra
Carl D. Meyer.Matrix Analysis and Applied Linear Algebra. SIAM, 2000
work page 2000
-
[16]
Puterman.Markov decision processes: Discrete stochastic dynamic programming
Martin L. Puterman.Markov decision processes: Discrete stochastic dynamic programming. John Wiley & Sons, 2014
work page 2014
-
[17]
A note on the joint spectral radius.Indag
Gian-Carlo Rota and Gilbert Strang. A note on the joint spectral radius.Indag. Math, 22(4): 379–381, 1960
work page 1960
-
[18]
Stability criteria for switched and hybrid systems.SIAM Review, 49(4):545–592, 2007
Robert Shorten, Fabian Wirth, Oliver Mason, Kai Wulff, and Christopher King. Stability criteria for switched and hybrid systems.SIAM Review, 49(4):545–592, 2007
work page 2007
-
[19]
John N Tsitsiklis and Vincent D Blondel. The lyapunov exponent and joint spectral radius of pairs of matrices are hard—when not impossible—to compute and to approximate.Mathematics of Control, Signals and Systems, 10(1):31–40, 1997. 36
work page 1997
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.