DOSER detects OOD actions via diffusion-model denoising error and applies selective regularization based on predicted transitions, proving gamma-contraction with performance bounds and outperforming priors on offline RL benchmarks.
International Conference on Machine Learning , pages=
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.LG 4years
2026 4verdicts
UNVERDICTED 4roles
background 1polarities
background 1representative citing papers
Ada-Diffuser is a causal diffusion model that jointly learns observed interaction structure and underlying latent dynamics from minimal observations for adaptive planning and policy learning.
DRIFT enables stable offline-to-online fine-tuning of CTMC policies in discrete RL via advantage-weighted discrete flow matching, path-space regularization, and candidate-set approximation.
PCBF learns return distributions via source-consistent Bellman-coupled paths with shared noise and λ-parameterized control variates, reporting improved fidelity and stability on MRPs, OGBench, and D4RL.
citing papers explorer
-
Beyond Penalization: Diffusion-based Out-of-Distribution Detection and Selective Regularization in Offline Reinforcement Learning
DOSER detects OOD actions via diffusion-model denoising error and applies selective regularization based on predicted transitions, proving gamma-contraction with performance bounds and outperforming priors on offline RL benchmarks.
-
Ada-Diffuser: Latent-Aware Adaptive Diffusion for Decision-Making
Ada-Diffuser is a causal diffusion model that jointly learns observed interaction structure and underlying latent dynamics from minimal observations for adaptive planning and policy learning.
-
Discrete Flow Matching for Offline-to-Online Reinforcement Learning
DRIFT enables stable offline-to-online fine-tuning of CTMC policies in discrete RL via advantage-weighted discrete flow matching, path-space regularization, and candidate-set approximation.
-
Path-Coupled Bellman Flows for Distributional Reinforcement Learning
PCBF learns return distributions via source-consistent Bellman-coupled paths with shared noise and λ-parameterized control variates, reporting improved fidelity and stability on MRPs, OGBench, and D4RL.