pith. machine review for the scientific record. sign in

Lyapunov-Certified Direct Switching Theory for Q-Learning

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it
abstract

Q-learning is a fundamental algorithmic primitive in reinforcement learning. This paper develops a new framework for analyzing Q-learning from a switching-system viewpoint. In particular, we derive a direct stochastic switching-system representation of the Q-learning error. The key observation is that the Bellman maximization error can be expressed exactly as an average of action-wise Q-errors under a suitable stochastic policy. The resulting recursion has a switched linear conditional-mean drift and martingale-difference noise. To the best of our knowledge, this is the first convergence-rate analysis of standard Q-learning whose leading exponential rate is expressed through the joint spectral radius (JSR) of a direct switching family. Since the JSR is the exact worst-case exponential rate of the associated switched linear drift, the resulting rate is among the tightest drift-based rates that can be certified for this Q-learning representation. Building on this representation, we prove finite-time bounds based on a product-defined JSR-induced Lyapunov function and also give an optional common quadratic Lyapunov certificate. The quadratic certificate is only a sufficient condition and hence applies only to instances for which the certificate is feasible, whereas the JSR-induced Lyapunov construction applies to the full direct switching family whenever its JSR is below one. When feasible, the quadratic certificate replaces product-based verification by a computable matrix inequality and gives a simpler stochastic bound. We further extend the framework to Markovian observation models.

years

2026 2

verdicts

UNVERDICTED 2

representative citing papers

Switching-Geometry Analysis of Deflated Q-Value Iteration

math.OC · 2026-05-11 · unverdicted · novelty 7.0

Deflated Q-VI is algebraically equivalent to recentering standard Q-VI, yet its error dynamics are governed by the joint spectral radius of a projected switching system that can be strictly smaller than the discount factor γ.

citing papers explorer

Showing 2 of 2 citing papers.

  • Switching-Geometry Analysis of Deflated Q-Value Iteration math.OC · 2026-05-11 · unverdicted · none · ref 11 · internal anchor

    Deflated Q-VI is algebraically equivalent to recentering standard Q-VI, yet its error dynamics are governed by the joint spectral radius of a projected switching system that can be strictly smaller than the discount factor γ.

  • A Switching System Theory of Q-Learning with Linear Function Approximation cs.LG · 2026-05-10 · unverdicted · none · ref 15 · internal anchor

    Q-learning with linear function approximation is recast as a switched linear system whose mean dynamics converge precisely when the joint spectral radius of the switching matrices is less than one.