Approximations and Learning for Continuous State and Action MDPs under Average Cost Criteria

Ali Devran Kara; Serdar Yuksel

arxiv: 2308.07591 · v4 · pith:QNK36WKLnew · submitted 2023-08-15 · 🧮 math.OC · cs.SY· eess.SY

Approximations and Learning for Continuous State and Action MDPs under Average Cost Criteria

Ali Devran Kara , Serdar Yuksel This is my paper

classification 🧮 math.OC cs.SYeess.SY

keywords continuousconvergencemdpsspacesstateunderalgorithmsapproximation

0 comments

read the original abstract

In this paper, for Markov Decision Processes (MDPs) with standard Borel spaces, (i) we first provide a discretization based approximation method for MDPs with continuous spaces under average cost criteria, and provide error bounds for approximations when the dynamics are only weakly continuous (for asymptotic convergence of errors as the grid sizes vanish) or Wasserstein continuous (with a rate in approximation as the grid sizes vanish) under certain ergodicity assumptions. In particular, we relax the total variation condition given in prior work to weak continuity or Wasserstein continuity. (ii) We provide synchronous and asynchronous (quantized) Q-learning algorithms for continuous spaces via quantization (where the quantized state is taken to be the actual state in corresponding Q-learning algorithms presented in the paper), and establish their convergence. (iii) We finally show that the convergence is to the optimal Q values of a finite approximate model constructed via quantization, which implies near optimality of the arrived solution.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

From Set Convergence to Pointwise Convergence: Finite-Time Guarantees for Average-Reward Q-Learning with Adaptive Stepsizes
cs.LG 2025-04 unverdicted novelty 7.0

Establishes Õ(1/k) mean-square last-iterate convergence for asynchronous average-reward Q-learning with adaptive stepsizes and proves adaptivity is necessary.
A note on convergence of Wasserstein policy optimization
cs.LG 2026-05 unverdicted novelty 4.0

The note claims linear convergence of WPO in entropy-regularized MDPs by combining mean-field gradient flow analysis with a local log-Sobolev inequality under a regularity assumption.