Near-Optimal Sample Complexities of Divergence-based S-rectangular Distributionally Robust Reinforcement Learning

Nian Si, Shengbo Wang, Zhenghao Li

Authors on Pith no claims yet

classification 💻 cs.LG stat.ML

keywords mathcals-rectangularmodelsrobustdivergence-baseddr-rllearningsample

read the original abstract

Distributionally robust reinforcement learning (DR-RL) has recently gained significant attention as a principled approach that addresses discrepancies between training and testing environments. To balance robustness, conservatism, and computational traceability, the literature has introduced DR-RL models with SA-rectangular and S-rectangular adversaries. While most existing statistical analyses focus on SA-rectangular models, owing to their algorithmic simplicity and the optimality of deterministic policies, S-rectangular models more accurately capture distributional discrepancies in many real-world applications and often yield more effective robust randomized policies. In this paper, we study the empirical value iteration algorithm for divergence-based S-rectangular DR-RL and establish near-optimal sample complexity bounds of $\widetilde{O}(|\mathcal{S}||\mathcal{A}|(1-\gamma)^{-4}\varepsilon^{-2})$, where $\varepsilon$ is the target accuracy, $|\mathcal{S}|$ and $|\mathcal{A}|$ denote the cardinalities of the state and action spaces, and $\gamma$ is the discount factor. To the best of our knowledge, these are the first sample complexity results for divergence-based S-rectangular models that achieve optimal dependence on $|\mathcal{S}|$, $|\mathcal{A}|$, and $\varepsilon$ simultaneously. We further validate this theoretical dependence through numerical experiments on a robust inventory control problem and a theoretical worst-case example, demonstrating the fast learning performance of our proposed algorithm.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Central Limit Theorem for Two-Time-Scale Approximate Distributionally Robust RL
cs.LG 2026-05 unverdicted novelty 7.0

A two-time-scale stochastic approximation algorithm for approximate distributionally robust RL satisfies a central limit theorem at rate n^{-1/2} with characterized covariances.
MG-Former: A Transformer-Based Framework for Music-Driven 3D Conducting Gesture Generation
cs.SD 2026-05 unverdicted novelty 5.0

TransConductor generates 3D conducting gestures from music via a Trans-Temporal Music Encoder and Gesture Decoder, outperforming baselines on retrieval-based alignment metrics with a new ConductorMotion dataset.