pith. machine review for the scientific record. sign in

arxiv: 2605.10811 · v1 · submitted 2026-05-11 · 🧮 math.OC · cs.AI

Recognition: 2 theorem links

· Lean Theorem

Switching-Geometry Analysis of Deflated Q-Value Iteration

Donghwan Lee

Pith reviewed 2026-05-12 03:14 UTC · model grok-4.3

classification 🧮 math.OC cs.AI
keywords joint spectral radiusdeflated Q-value iterationMarkov decision processesswitching systemsconvergence analysisquotient spacepolicy optimization
0
0 comments X

The pith

Deflated Q-value iteration obtains a convergence rate from a projected joint spectral radius that can be strictly smaller than the discount factor.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper interprets rank-one deflated Q-value iteration as a switching system whose error dynamics are analyzed via the joint spectral radius. It first shows that the standard Q-VI model always has JSR exactly equal to the discount factor γ because every admissible subsystem leaves the all-ones vector invariant. Passing to the quotient space that removes this direction produces a projected switching model whose JSR governs the relevant contraction and may be smaller than γ. The deflation step is shown to be equivalent to a scalar recentering of the standard iterates, so the sequence of greedy policies is identical to that of ordinary Q-VI started from the same point. A reader would care because the tighter geometric description certifies faster convergence without changing the decision-making behavior of the algorithm.

Core claim

The standard Q-VI switching system model has JSR exactly γ since all admissible subsystems share the all-ones vector as an invariant direction. By passing to the quotient space that removes this direction, we obtain a projected switching system model whose JSR governs the relevant error dynamics and may be strictly smaller than γ. The correction is equivalent to a scalar recentering of standard Q-VI. Hence, the projected trajectory, and therefore the greedy-policy sequence, is unchanged relative to standard Q-VI initialized from the same point. The benefit of deflation is not a change in the induced decision-making problem, but a more precise JSR-based description of the convergence geometry

What carries the argument

The projected switching system model obtained by quotienting out the all-ones invariant direction from the standard Q-VI switching model, whose joint spectral radius governs the deflated error dynamics.

If this is right

  • The convergence rate of deflated Q-VI is governed by the JSR of the projected system rather than by γ.
  • This projected JSR can be strictly smaller than γ for some MDPs, yielding a sharper rate certificate.
  • The sequence of greedy policies generated by deflated Q-VI is identical to that of standard Q-VI from the same initialization.
  • Deflation functions purely as a geometric recentering that does not alter the underlying decision-making problem.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same quotient-space reduction could be applied to other iterative schemes in reinforcement learning that possess redundant invariant directions.
  • Explicit computation of the projected JSR for given MDPs might serve as a diagnostic for convergence speed in practice.
  • The framework suggests that analogous geometric projections could tighten rate analyses for undiscounted or average-reward value iteration.

Load-bearing premise

Every admissible subsystem in the standard Q-VI switching model shares the all-ones vector as an invariant direction.

What would settle it

A concrete MDP together with its admissible policies for which at least one subsystem matrix fails to map the all-ones vector to itself; the projection would then be invalid and the claimed possibility of a strictly smaller JSR would not hold.

Figures

Figures reproduced from arXiv: 2605.10811 by Donghwan Lee.

Figure 1
Figure 1. Figure 1: Geometric mechanism of the deflated Q-VI analysis, following the projected Q-VI geometry [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Full Q-function error for standard Q-VI and deflated Q-VI with uniform and nonuniform [PITH_FULL_IMAGE:figures/full_fig_p018_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Convergence of the full Q-function error [PITH_FULL_IMAGE:figures/full_fig_p019_3.png] view at source ↗
read the original abstract

This paper develops a joint spectral radius (JSR) framework for analyzing rank-one deflated Q-value iteration (Q-VI) in discounted Markov decision process control. Focusing on an all-ones residual correction, we interpret the resulting algorithm through the geometry of switching systems and, to the best of our knowledge, give the first JSR-based convergence analysis of deflated Q-VI for policy optimization problems. Our analysis reveals that the standard Q-VI switching system model has JSR exactly the discount factor $\gamma\in (0,1)$, since all admissible subsystems share the all-ones vector as an invariant direction. By passing to the quotient space that removes this direction, we obtain a projected switching system model whose JSR governs the relevant error dynamics and may be strictly smaller than $\gamma$. Therefore, the deflated Q-VI admits a potentially sharper convergence-rate characterization than the ambient-space $\gamma$-bound. Finally, we prove that the correction is equivalent to a scalar recentering of standard Q-VI. Hence, the projected trajectory, and therefore the greedy-policy sequence, is unchanged relative to standard Q-VI initialized from the same point. The benefit of deflation is not a change in the induced decision-making problem, but a more precise JSR-based description of the convergence geometry after the redundant all-ones component is removed.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript develops a joint spectral radius (JSR) framework for rank-one deflated Q-value iteration in discounted MDPs. It shows that every admissible subsystem in the standard Q-VI switching model shares the all-ones vector as an invariant direction (by row-stochasticity of the embedded kernels), yielding ambient JSR exactly equal to the discount factor γ. Passing to the quotient space orthogonal to this direction produces a projected switching system whose JSR governs the relevant error dynamics and may be strictly smaller than γ. The deflation step is proved algebraically equivalent to a scalar recentering of standard Q-VI, so that the projected trajectory and the induced greedy-policy sequence remain identical to those of standard Q-VI started from the same initial point. The claimed benefit is therefore a more precise geometric description of convergence rather than any change in the decision-making problem.

Significance. If the central claims hold, the work supplies the first JSR-based convergence analysis of deflated Q-VI and a clean geometric explanation for why removing the redundant all-ones component can tighten the rate bound. The algebraic equivalence result is a strength, as is the observation that the invariance property holds by construction for any row-stochastic transition kernel. The contribution is primarily conceptual and analytic rather than algorithmic; it refines the understanding of existing methods without altering their output.

major comments (2)
  1. [main theorem / projected-system section] The claim that the projected JSR 'may be strictly smaller than γ' (abstract and main theorem) is load-bearing for the asserted sharper characterization, yet the manuscript provides neither an explicit construction of the projected switching matrices nor a concrete numerical example in which the reduction is computed and verified. Without such evidence the potential improvement remains unquantified.
  2. [switching-system model definition] The derivation that the ambient JSR equals γ rests on the joint spectral radius of the family of matrices that all fix the all-ones vector; the manuscript should supply the precise definition of the admissible set of matrices (or policies) and the explicit matrix representation of the switching system before the quotient projection is taken, so that the invariance and the subsequent reduction can be checked directly.
minor comments (2)
  1. [quotient-space paragraph] Notation for the quotient-space projection operator and the reduced state space should be introduced once and used consistently; currently the transition between ambient and projected quantities is described only informally.
  2. [equivalence proof] The statement that 'the projected trajectory, and therefore the greedy-policy sequence, is unchanged' would benefit from an explicit one-line reference to the algebraic identity that maps the deflated update onto the recentered standard update.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We appreciate the referee's thorough review and valuable feedback on our manuscript. The suggestions to enhance the clarity of the switching system model and to provide concrete evidence for the potential improvement in the JSR bound are well-taken. We will incorporate revisions to address these points directly. Our point-by-point responses follow.

read point-by-point responses
  1. Referee: [main theorem / projected-system section] The claim that the projected JSR 'may be strictly smaller than γ' (abstract and main theorem) is load-bearing for the asserted sharper characterization, yet the manuscript provides neither an explicit construction of the projected switching matrices nor a concrete numerical example in which the reduction is computed and verified. Without such evidence the potential improvement remains unquantified.

    Authors: We concur that an explicit construction and numerical verification would strengthen the presentation of the main result. Accordingly, we will add to the revised manuscript an explicit description of the projected switching matrices obtained by quotienting out the all-ones direction. Furthermore, we will include a concrete numerical example based on a small MDP instance, where we explicitly compute the family of projected matrices, determine their joint spectral radius, and confirm that it is strictly less than the discount factor γ. This example will illustrate the potential for a sharper convergence characterization. revision: yes

  2. Referee: [switching-system model definition] The derivation that the ambient JSR equals γ rests on the joint spectral radius of the family of matrices that all fix the all-ones vector; the manuscript should supply the precise definition of the admissible set of matrices (or policies) and the explicit matrix representation of the switching system before the quotient projection is taken, so that the invariance and the subsequent reduction can be checked directly.

    Authors: We will expand the section on the switching-system model to provide the precise definition of the admissible set of matrices. Specifically, we will define the set as the collection of all possible Q-Bellman operators corresponding to deterministic stationary policies, expressed in matrix form. We will also present the explicit matrix representation of these operators prior to the projection step, emphasizing the property that each matrix is row-stochastic (up to the discount factor) and thus leaves the all-ones vector invariant. This addition will enable readers to directly verify the ambient JSR equaling γ and the validity of the subsequent reduction to the quotient space. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper's core derivation begins from the standard MDP fact that each fixed-policy operator A_π satisfies A_π 1 = γ 1 because the transition kernel is row-stochastic; this is an external modeling assumption, not an internal definition. The claim that the ambient switching system's JSR equals γ follows immediately from the definition of joint spectral radius once all subsystems share the same eigenvector with eigenvalue γ. The subsequent quotient-space projection, the possibility that the projected JSR is strictly smaller than γ, and the algebraic equivalence between deflation and scalar recentering are obtained by direct linear-algebraic manipulation of the error recursion without parameter fitting, without renaming known results, and without load-bearing self-citations. All steps remain self-contained against ordinary MDP theory and the definition of JSR.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The analysis rests on standard MDP assumptions and the modeling of Q-VI as a switching system; no free parameters are introduced and no new entities are postulated.

axioms (2)
  • domain assumption Q-value iteration updates can be represented as a finite set of linear subsystems whose selection depends on the current greedy policy.
    This modeling choice enables the application of joint spectral radius theory.
  • domain assumption Every admissible subsystem leaves the all-ones vector invariant.
    Invoked to conclude that the ambient JSR equals γ.

pith-pipeline@v0.9.0 · 5529 in / 1450 out tokens · 56493 ms · 2026-05-12T03:14:08.604390+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages · 2 internal anchors

  1. [1]

    Bertsekas

    Dimitri P. Bertsekas. Generic rank-one corrections for value iteration in markovian decision problems.Operations Research Letters, 17(3):111–119, 1995

  2. [2]

    Dynamic programming and optimal control 4th edition, volume ii.Athena Scientific, 2015

    Dimitri P Bertsekas. Dynamic programming and optimal control 4th edition, volume ii.Athena Scientific, 2015

  3. [3]

    Bertsekas and John N

    Dimitri P. Bertsekas and John N. Tsitsiklis.Neuro-dynamic programming. Athena Scientific Belmont, MA, 1996

  4. [4]

    Computationally efficient approximations of the joint spectral radius.SIAM Journal on Matrix Analysis and Applications, 27(1):256–272, 2005

    Vincent D Blondel and Yurii Nesterov. Computationally efficient approximations of the joint spectral radius.SIAM Journal on Matrix Analysis and Applications, 27(1):256–272, 2005

  5. [5]

    A. Cicone. A note on the joint spectral radius.arXiv preprint arXiv:1502.01506, 2015

  6. [6]

    Goyal and J

    V. Goyal and J. Grand-Clément. A first-order approach to accelerated value iteration.Operations Research, 71(2):517–535, 2023

  7. [7]

    J. P. Hespanha and A. S. Morse. Stability of switched systems with average dwell-time. In Proceedings of the 38th IEEE Conference on Decision and Control, pages 2655–2660, 1999

  8. [8]

    Springer Science & Business Media, 2009

    Raphaël Jungers.The joint spectral radius: Theory and applications, volume 385. Springer Science & Business Media, 2009

  9. [9]

    A. S. Kolarijani, T. Ok, P. Mohajerin Esfahani, and M. A. S. Kolarijani. Rank-one modified value iteration. InProceedings of the 42nd International Conference on Machine Learning, volume 267 ofProceedings of Machine Learning Research, pages 31182–31201, 2025

  10. [10]

    Beyond the Bellman Fixed Point: Geometry and Fast Policy Identification in Value Iteration

    Donghwan Lee. Beyond the bellman fixed point: geometry and fast policy identification in value iteration.arXiv preprint arXiv:2604.17457v4, 2026

  11. [11]

    Lyapunov-Certified Direct Switching Theory for Q-Learning

    Donghwan Lee. Lyapunov-certified direct switching theory for q-learning.arXiv preprint arXiv:2604.19569v2, 2026

  12. [12]

    J. Lee, A. Rakhsha, E. K. Ryu, and A.-m. Farahmand. Deflated dynamics value iteration. Transactions on Machine Learning Research, 2025

  13. [13]

    Springer Science & Business Media, 2003

    Daniel Liberzon.Switching in systems and control. Springer Science & Business Media, 2003

  14. [14]

    Stability and stabilizability of switched linear systems: A survey of recent results.IEEE Transactions on Automatic Control, 54(2):308–322, 2009

    Hai Lin and Panos J Antsaklis. Stability and stabilizability of switched linear systems: A survey of recent results.IEEE Transactions on Automatic Control, 54(2):308–322, 2009. 35

  15. [15]

    Meyer.Matrix Analysis and Applied Linear Algebra

    Carl D. Meyer.Matrix Analysis and Applied Linear Algebra. SIAM, 2000

  16. [16]

    Puterman.Markov decision processes: Discrete stochastic dynamic programming

    Martin L. Puterman.Markov decision processes: Discrete stochastic dynamic programming. John Wiley & Sons, 2014

  17. [17]

    A note on the joint spectral radius.Indag

    Gian-Carlo Rota and Gilbert Strang. A note on the joint spectral radius.Indag. Math, 22(4): 379–381, 1960

  18. [18]

    Stability criteria for switched and hybrid systems.SIAM Review, 49(4):545–592, 2007

    Robert Shorten, Fabian Wirth, Oliver Mason, Kai Wulff, and Christopher King. Stability criteria for switched and hybrid systems.SIAM Review, 49(4):545–592, 2007

  19. [19]

    John N Tsitsiklis and Vincent D Blondel. The lyapunov exponent and joint spectral radius of pairs of matrices are hard—when not impossible—to compute and to approximate.Mathematics of Control, Signals and Systems, 10(1):31–40, 1997. 36