Decision Transformer casts RL as autoregressive sequence modeling conditioned on desired returns, past states and actions, matching or exceeding offline RL baselines on Atari, Gym and Key-to-Door tasks.
Reinforcement learning as one big sequence modeling problem
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
GILP combines a small parameterized world model with LLM agent reasoning via a consistency gate, reducing hallucinated-state rate from 0.176 to 0.035 and raising success from 0.668 to 0.838 on graph planning benchmarks.
Neural CDEs serve as correctors that reduce error accumulation in multi-step forecasts from learned time-series models across synthetic, physics, and real-world data.
DAWM introduces a modular diffusion world model with an inverse dynamics model to produce complete synthetic transitions that improve conservative offline RL algorithms like TD3BC and IQL on D4RL tasks.
A comprehensive benchmark study of offline imitation learning methods on multi-stage robot manipulation tasks identifies key sensitivities to algorithm design, data quality, and stopping criteria while releasing all datasets and code.
Large vision-language models applied to multi-scale remote sensing imagery can generate recommendations on built environment design, constructability, land use, and risks for smart city decision-making.
citing papers explorer
-
Decision Transformer: Reinforcement Learning via Sequence Modeling
Decision Transformer casts RL as autoregressive sequence modeling conditioned on desired returns, past states and actions, matching or exceeding offline RL baselines on Atari, Gym and Key-to-Door tasks.
-
Grounded Iterative Language Planning: How Parameterized World Models Reduce Hallucination Propagation in LLM Agents
GILP combines a small parameterized world model with LLM agent reasoning via a consistency gate, reducing hallucinated-state rate from 0.176 to 0.035 and raising success from 0.668 to 0.838 on graph planning benchmarks.
-
Neural CDEs as Correctors for Learned Time Series Models
Neural CDEs serve as correctors that reduce error accumulation in multi-step forecasts from learned time-series models across synthetic, physics, and real-world data.
-
DAWM: Diffusion Action World Models for Offline Reinforcement Learning via Action-Inferred Transitions
DAWM introduces a modular diffusion world model with an inverse dynamics model to produce complete synthetic transitions that improve conservative offline RL algorithms like TD3BC and IQL on D4RL tasks.
-
What Matters in Learning from Offline Human Demonstrations for Robot Manipulation
A comprehensive benchmark study of offline imitation learning methods on multi-stage robot manipulation tasks identifies key sensitivities to algorithm design, data quality, and stopping criteria while releasing all datasets and code.
-
Built Environment Reasoning from Remote Sensing Imagery Using Large Vision--Language Models
Large vision-language models applied to multi-scale remote sensing imagery can generate recommendations on built environment design, constructability, land use, and risks for smart city decision-making.