Lyapunov-Certified Direct Switching Theory for Q-Learning

· 2026 · cs.LG · arXiv 2604.19569

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

open full Pith review browse 6 citing papers arXiv PDF

abstract

Q-learning is a fundamental algorithmic primitive in reinforcement learning. This paper develops a new framework for analyzing Q-learning from a switching linear system (SLS) viewpoint. In particular, we derive a stochastic SLS representation of the Q-learning error, and a finite-time error analysis through the joint spectral radius (JSR) of the corresponding SLS model, where the JSR is the exact worst-case exponential rate of the associated SLS. To the best of our knowledge, this is the first convergence rate analysis of standard Q-learning whose leading exponential rate is expressed through the JSR. The resulting rate is tied to the intrinsic worst-case exponential rate of the direct SLS representation and can be sharper than row-sum upper bounds when those bounds are conservative.

citation-role summary

method 1

citation-polarity summary

use method 1

representative citing papers

Heavy-Ball Q-Learning with Residual Weighting Correction

cs.LG · 2026-06-25 · unverdicted · novelty 7.0

Corrected heavy-ball Q-learning with convergence and acceleration guarantees is derived via switched linear system and joint spectral radius analysis, extended to linear function approximation.

Target Updates May Stabilize Linear Q-Learning: Periodic and Soft Dynamics

stat.ML · 2026-05-31 · unverdicted · novelty 7.0

Periodic and soft target updates guarantee convergence in linear Q-learning to the exact projected Q-Bellman solution under spectral and step-size conditions via joint spectral radius analysis of switched linear systems.

Sign-Separated Finite-Time Error Analysis of Q-Learning

cs.AI · 2026-05-15 · unverdicted · novelty 7.0

Sign-separated analysis decomposes Q-learning errors into negative parts dominated by an optimal-policy LTI system and positive parts controlled by a switching system, yielding finite-time bounds for deterministic and stochastic cases.

Switching-Geometry Analysis of Deflated Q-Value Iteration

math.OC · 2026-05-11 · unverdicted · novelty 7.0 · 2 refs

Deflated Q-value iteration admits a projected switching-system model whose joint spectral radius can be strictly smaller than the discount factor, yielding a sharper convergence characterization while leaving the greedy policy sequence unchanged.

A Switching System Theory of Q-Learning with Linear Function Approximation

cs.LG · 2026-05-10 · unverdicted · novelty 7.0 · 2 refs

Derives an exact linear switched model for the mean dynamics of Q-learning with linear function approximation and relates convergence to joint spectral radius stability of the switched system, extending the view to stochastic and regularized cases.

Geometrically Averaged Hard Target Updates for Linear Q-Learning

cs.LG · 2026-06-09 · unverdicted · novelty 6.0

Introduces and analyzes the λ-target update for linear Q-learning via geometric averaging of periodic target maps, studied with a switching-system model in the deterministic case.

citing papers explorer

Showing 6 of 6 citing papers.

Heavy-Ball Q-Learning with Residual Weighting Correction cs.LG · 2026-06-25 · unverdicted · none · ref 22 · internal anchor
Corrected heavy-ball Q-learning with convergence and acceleration guarantees is derived via switched linear system and joint spectral radius analysis, extended to linear function approximation.
Target Updates May Stabilize Linear Q-Learning: Periodic and Soft Dynamics stat.ML · 2026-05-31 · unverdicted · none · ref 13 · internal anchor
Periodic and soft target updates guarantee convergence in linear Q-learning to the exact projected Q-Bellman solution under spectral and step-size conditions via joint spectral radius analysis of switched linear systems.
Sign-Separated Finite-Time Error Analysis of Q-Learning cs.AI · 2026-05-15 · unverdicted · none · ref 12 · internal anchor
Sign-separated analysis decomposes Q-learning errors into negative parts dominated by an optimal-policy LTI system and positive parts controlled by a switching system, yielding finite-time bounds for deterministic and stochastic cases.
Switching-Geometry Analysis of Deflated Q-Value Iteration math.OC · 2026-05-11 · unverdicted · none · ref 11 · 2 links · internal anchor
Deflated Q-value iteration admits a projected switching-system model whose joint spectral radius can be strictly smaller than the discount factor, yielding a sharper convergence characterization while leaving the greedy policy sequence unchanged.
A Switching System Theory of Q-Learning with Linear Function Approximation cs.LG · 2026-05-10 · unverdicted · none · ref 15 · 2 links · internal anchor
Derives an exact linear switched model for the mean dynamics of Q-learning with linear function approximation and relates convergence to joint spectral radius stability of the switched system, extending the view to stochastic and regularized cases.
Geometrically Averaged Hard Target Updates for Linear Q-Learning cs.LG · 2026-06-09 · unverdicted · none · ref 12 · internal anchor
Introduces and analyzes the λ-target update for linear Q-learning via geometric averaging of periodic target maps, studied with a switching-system model in the deterministic case.

Lyapunov-Certified Direct Switching Theory for Q-Learning

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer