Decision Transformer casts RL as autoregressive sequence modeling conditioned on desired returns, past states and actions, matching or exceeding offline RL baselines on Atari, Gym and Key-to-Door tasks.
hub
arXiv preprint arXiv:1912.06088 , year=
10 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
LEO enables efficient all-goals learning in goal-conditioned RL by jointly predicting for all goals in one network pass, yielding >250x speedup over relabelling and better performance on Craftax.
Ms.PR applies multi-scale predictive supervision to enforce goal-directed alignment in latent spaces for offline GCRL, yielding improved representation quality and performance on vision and state-based tasks.
RC-aux corrects spatiotemporal mismatch in reconstruction-free latent world models by adding multi-horizon prediction and reachability supervision, improving planning performance on goal-conditioned pixel-control tasks.
RCD steers compositional diffusion sampling toward high-density coherent plans by combining reconstruction-error guidance with overlap consistency, outperforming prior methods on locomotion, manipulation, and pixel-based long-horizon tasks.
The paper presents stable-worldmodel (swm), a platform with high-performance data layer, modern world model baselines, planning solvers, and extended environments for reproducible research and generalization evaluation.
GCImOpt trains compact goal-conditioned neural policies by imitating efficiently generated optimal trajectories, achieving high success rates and near-optimal performance on cart-pole, quadcopter, and robot arm tasks while running thousands of times faster than optimization solvers.
CGCL progressively trains LLMs to generate Toulmin-structured clinical diagnostic arguments across three curriculum stages, achieving accuracy and reasoning quality comparable to RL methods with improved stability and efficiency.
Proposes mean flow policies and LeJEPA loss to overcome Gaussian policy limits and weak subgoal generation in hierarchical offline GCRL, reporting strong results on OGBench state and pixel tasks.
citing papers explorer
-
Multi-scale Predictive Representations for Goal-conditioned Reinforcement Learning
Ms.PR applies multi-scale predictive supervision to enforce goal-directed alignment in latent spaces for offline GCRL, yielding improved representation quality and performance on vision and state-based tasks.