A new primal-dual algorithm for adversarial linear CMDPs achieves the first sublinear regret and constraint violation bounds of order K to the 3/4 using weighted LogSumExp softmax policies with periodic mixing and regularized dual updates.
International Conference on Machine Learning , pages=
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.LG 5years
2026 5roles
method 1polarities
use method 1representative citing papers
The paper establishes the first tilde O(epsilon^{-1}) upper bounds and matching lower bounds for forward-KL-regularized offline contextual bandits under single-policy concentrability in both tabular and general function approximation settings.
Synthetic data augmentation helps channel-mixing time series models but degrades channel-independent ones, with reliable gains only from seasonal-trend generators and gradual schedules in low-resource settings.
Replaces determinant growth with generalized Rayleigh quotient for rare switching in private linear bandits to control worst-direction volume despite non-monotonic design matrices from noise.
POOL is a new RL algorithm that adds privacy protection in continuous spaces with one-sided feedback and achieves sample complexity matching known non-private lower bounds.
citing papers explorer
-
Primal-Dual Policy Optimization for Linear CMDPs with Adversarial Losses
A new primal-dual algorithm for adversarial linear CMDPs achieves the first sublinear regret and constraint violation bounds of order K to the 3/4 using weighted LogSumExp softmax policies with periodic mixing and regularized dual updates.
-
Fast Rates for Offline Contextual Bandits with Forward-KL Regularization under Single-Policy Concentrability
The paper establishes the first tilde O(epsilon^{-1}) upper bounds and matching lower bounds for forward-KL-regularized offline contextual bandits under single-policy concentrability in both tabular and general function approximation settings.
-
Does Synthetic Data Help? Empirical Evidence from Deep Learning Time Series Forecasters
Synthetic data augmentation helps channel-mixing time series models but degrades channel-independent ones, with reliable gains only from seasonal-trend generators and gradual schedules in low-resource settings.
-
When Determinants Are Not Enough: Private Rare Switching
Replaces determinant growth with generalized Rayleigh quotient for rare switching in private linear bandits to control worst-direction volume despite non-monotonic design matrices from noise.
-
Privacy Preserving Reinforcement Learning with One-Sided Feedback
POOL is a new RL algorithm that adds privacy protection in continuous spaces with one-sided feedback and achieves sample complexity matching known non-private lower bounds.