Develops quotient-categorical representations that render the average-reward distributional Bellman operator well-defined, non-expansive, and convergent under i.i.d. and Markovian sampling.
Stochastic fixed-point iterations for nonexpansive maps: Convergence and error bounds.SIAM Journal on Control and Optimization, 62(1):191–219
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
method 1
citation-polarity summary
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2roles
method 1polarities
use method 1representative citing papers
A new two-trajectory sampling algorithm for average-reward TD learning guarantees convergence with quadratic sample complexity and no explicit dimension dependence in both tabular and linear approximation settings.
citing papers explorer
-
Quotient-Categorical Representations for Bellman-Compatible Average-Reward Distributional Reinforcement Learning
Develops quotient-categorical representations that render the average-reward distributional Bellman operator well-defined, non-expansive, and convergent under i.i.d. and Markovian sampling.
-
Bridging the Gap Between Average and Discounted TD Learning
A new two-trajectory sampling algorithm for average-reward TD learning guarantees convergence with quadratic sample complexity and no explicit dimension dependence in both tabular and linear approximation settings.