super hub Canonical reference

Dream to Control: Learning Behaviors by Latent Imagination

Danijar Hafner, Jimmy Ba, Mohammad Norouzi, Timothy Lillicrap · 2019 · cs.LG · arXiv 1912.01603

Canonical reference. 95% of citing Pith papers cite this work as background.

107 Pith papers citing it

Background 95% of classified citations

open full Pith review browse 107 citing papers more from Danijar Hafner arXiv PDF

abstract

Learned world models summarize an agent's experience to facilitate learning complex behaviors. While learning world models from high-dimensional sensory inputs is becoming feasible through deep learning, there are many potential ways for deriving behaviors from them. We present Dreamer, a reinforcement learning agent that solves long-horizon tasks from images purely by latent imagination. We efficiently learn behaviors by propagating analytic gradients of learned state values back through trajectories imagined in the compact state space of a learned world model. On 20 challenging visual control tasks, Dreamer exceeds existing approaches in data-efficiency, computation time, and final performance.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 20 method 1

citation-polarity summary

background 20 use method 1

claims ledger

abstract Learned world models summarize an agent's experience to facilitate learning complex behaviors. While learning world models from high-dimensional sensory inputs is becoming feasible through deep learning, there are many potential ways for deriving behaviors from them. We present Dreamer, a reinforcement learning agent that solves long-horizon tasks from images purely by latent imagination. We efficiently learn behaviors by propagating analytic gradients of learned state values back through trajectories imagined in the compact state space of a learned world model. On 20 challenging visual contro
background Langugae-Conditoned MoCoGAN [29], U-Net [30], Latte [ 31], Wan [32], Sora 2 [ 33]. . . Embodied World Model SWIM [34], DreamDojo [ 35], RoboDreamer [36], RoboScape [37]. . . WM for VLA Imitation Learning Ctrl-World [38], RoboScape [37], DREMA [ 39] Reinforcement Learning Dreamer to Control [ 40] DreamerV2 [ 41], Dreamer 4 [ 42], RISE [ 43] DreamerV3 [44], DayDreamer [45], World-Env [46], RoboScape-R [47] WMPO [48], WoVR [49], VLA-RFT [50], RWML [51], MoDem-V2 [52] World-Gymnast [53], RWM-U [54],

authors

Danijar Hafner Jimmy Ba Mohammad Norouzi Timothy Lillicrap

co-cited works

representative citing papers

Textual Belief States for World Models: Identifiable Representation Learning Under Strict Mediation

cs.LG · 2026-06-26 · unverdicted · novelty 8.0

Introduces textual belief states and factorized GRPO to enforce strict latent state mediation in text-based world models, yielding preserved prediction accuracy with large gains in representation quality and rollout performance on TextWorld and ScienceWorld.

A Model-Free Universal AI

cs.AI · 2026-02-26 · unverdicted · novelty 8.0

AIQI is the first model-free universal AI agent proven asymptotically ε-optimal in general RL by inducing over distributional Q-functions instead of policies or environments.

Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution

cs.CL · 2023-09-28 · unverdicted · novelty 8.0

Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.

World Model Self-Distillation: Training World Models to Solve General Tasks

cs.CV · 2026-06-10 · unverdicted · novelty 7.0

Self-distillation from a caption-conditioned video diffusion model to an image-and-prompt-conditioned executor, enhanced by RL from VLM feedback, enables task solving in world models.

Do Language Models Need Sleep? Offline Recurrence for Improved Online Inference

cs.CL · 2026-05-25 · unverdicted · novelty 7.0

A sleep mechanism with N offline recurrent passes consolidates context into fast weights, improving performance on reasoning tasks where standard transformers fail.

UWM-JEPA: Predictive World Models That Imagine in Belief Space

cs.LG · 2026-05-25 · unverdicted · novelty 7.0

UWM-JEPA uses a density-matrix latent and unitary predictor in JEPA to preserve joint-state spectrum during blind rollouts, achieving 0.77 accuracy on a five-step hidden-velocity task versus 0.53 for an LSTM baseline.

Micro-Swarm Locomotion Optimization in Dynamic Flow using Multi-Objective Multi-Agent Reinforcement Learning

cs.RO · 2026-05-24 · unverdicted · novelty 7.0

Hybrid CFD-MOMARL framework with PCGrad enables micro-swarm navigation in pulsatile flow, achieving progress 6.5-7.0, energy 0.63-0.65, smoothness 0.97-0.99 with emergent behaviors.

World Models as Group Actions

cs.CV · 2026-05-23 · unverdicted · novelty 7.0

Formalizes video world models as group actions on states and uses latent regularization with synthesized supervision to enforce consistency, introducing GAC and GAR metrics that improve structural correctness in SOTA models.

JEDI: Joint Embedding Diffusion World Model for Online Model-Based Reinforcement Learning

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

JEDI is the first online end-to-end latent diffusion world model that trains latents from denoising loss rather than reconstruction, achieving competitive Atari100k results with 43% less VRAM and over 3x faster sampling than pixel diffusion baselines.

Support-Safe Variational Hybrid Filtering for Contact-Mode and Sparse-Law Recovery

cs.RO · 2026-05-12 · unverdicted · novelty 7.0

VHYDRO is a support-safe variational hybrid filter that jointly recovers continuous latent states, discrete contact modes, and sparse port-Hamiltonian laws per regime while preventing loss of feasible transitions.

One Token Per Frame: Reconsidering Visual Bandwidth in World Models for VLA Policy

cs.CV · 2026-05-08 · conditional · novelty 7.0 · 3 refs

Reducing visual input to one token per frame in VLA world models maintains or improves long-horizon performance on MetaWorld, LIBERO, and real-robot tasks.

Operator-Guided Invariance Learning for Continuous Reinforcement Learning

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

VPSD-RL discovers exact and approximate value-preserving Lie-group operators in continuous RL to stabilize learning via transition augmentation and consistency regularization.

Learning to Theorize the World from Observation

cs.LG · 2026-05-05 · unverdicted · novelty 7.0

NEO is a probabilistic neural model that induces compositional programs as a learned Language of Thought from non-textual observations and executes them via a shared transition model to enable explanation-driven generalization.

Latent State Design for World Models under Sufficiency Constraints

cs.AI · 2026-05-03 · unverdicted · novelty 7.0

World models succeed when their latent states are built to meet task-specific sufficiency constraints rather than preserving the maximum amount of information.

RopeDreamer: A Kinematic Recurrent State Space Model for Dynamics of Flexible Deformable Linear Objects

cs.RO · 2026-04-30 · unverdicted · novelty 7.0

RopeDreamer uses quaternionic kinematic chains in a recurrent state space model with a dual decoder to cut open-loop prediction error by 40.52% over 50 steps on simulated DLO trajectories while preserving physical constraints.

Mask World Model: Predicting What Matters for Robust Robot Policy Learning

cs.RO · 2026-04-21 · unverdicted · novelty 7.0

Mask World Model predicts semantic mask dynamics with video diffusion and integrates it with a diffusion policy head, outperforming RGB world models on LIBERO and RLBench while showing better real-world generalization and texture robustness.

Beyond Static Forecasting: Unleashing the Power of World Models for Mobile Traffic Extrapolation

cs.NI · 2026-04-09 · unverdicted · novelty 7.0

MobiWM is a multimodal world model for mobile networks that learns state-action dynamics to enable unlimited-horizon counterfactual traffic simulations and optimization.

MoRight: Motion Control Done Right

cs.CV · 2026-04-08 · unverdicted · novelty 7.0

MoRight disentangles object and camera motion via canonical-view specification and temporal cross-view attention, while decomposing motion into active user-driven and passive consequence components to learn and apply causality in video generation.

Space-Time Forecasting of Dynamic Scenes with Motion-aware Gaussian Grouping

cs.CV · 2026-02-25 · unverdicted · novelty 7.0

MoGaF groups Gaussians by motion in 4D splatting representations to enable stable long-term forecasting of dynamic scenes.

Training Agents Inside of Scalable World Models

cs.AI · 2025-09-29 · conditional · novelty 7.0

Dreamer 4 is the first agent to obtain diamonds in Minecraft from only offline data by reinforcement learning inside a scalable world model that accurately predicts game mechanics.

Diffusion Models Are Real-Time Game Engines

cs.LG · 2024-08-27 · conditional · novelty 7.0

A diffusion model trained on DOOM play sessions generates stable real-time interactive game frames at 20 FPS with quality near lossy JPEG.

Massive Activations in Large Language Models

cs.CL · 2024-02-27 · unverdicted · novelty 7.0

Massive activations are constant large values in LLMs that function as indispensable bias terms and concentrate attention probabilities on specific tokens.

Zero-Shot Robotic Manipulation with Pretrained Image-Editing Diffusion Models

cs.RO · 2023-10-16 · conditional · novelty 7.0

SuSIE uses a finetuned InstructPix2Pix diffusion model to propose subgoal images that guide a low-level goal-conditioned policy, achieving SOTA zero-shot performance on CALVIN and real-world manipulation.

Learning Interactive Real-World Simulators

cs.AI · 2023-10-09 · conditional · novelty 7.0

UniSim learns a universal real-world simulator from orchestrated diverse datasets, enabling zero-shot deployment of policies trained purely in simulation.

citing papers explorer

Showing 50 of 107 citing papers.

Textual Belief States for World Models: Identifiable Representation Learning Under Strict Mediation cs.LG · 2026-06-26 · unverdicted · none · ref 6 · internal anchor
Introduces textual belief states and factorized GRPO to enforce strict latent state mediation in text-based world models, yielding preserved prediction accuracy with large gains in representation quality and rollout performance on TextWorld and ScienceWorld.
A Model-Free Universal AI cs.AI · 2026-02-26 · unverdicted · none · ref 3 · internal anchor
AIQI is the first model-free universal AI agent proven asymptotically ε-optimal in general RL by inducing over distributional Q-functions instead of policies or environments.
Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution cs.CL · 2023-09-28 · unverdicted · none · ref 132 · internal anchor
Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.
World Model Self-Distillation: Training World Models to Solve General Tasks cs.CV · 2026-06-10 · unverdicted · none · ref 20 · internal anchor
Self-distillation from a caption-conditioned video diffusion model to an image-and-prompt-conditioned executor, enhanced by RL from VLM feedback, enables task solving in world models.
Do Language Models Need Sleep? Offline Recurrence for Improved Online Inference cs.CL · 2026-05-25 · unverdicted · none · ref 28 · internal anchor
A sleep mechanism with N offline recurrent passes consolidates context into fast weights, improving performance on reasoning tasks where standard transformers fail.
UWM-JEPA: Predictive World Models That Imagine in Belief Space cs.LG · 2026-05-25 · unverdicted · none · ref 11 · internal anchor
UWM-JEPA uses a density-matrix latent and unitary predictor in JEPA to preserve joint-state spectrum during blind rollouts, achieving 0.77 accuracy on a five-step hidden-velocity task versus 0.53 for an LSTM baseline.
Micro-Swarm Locomotion Optimization in Dynamic Flow using Multi-Objective Multi-Agent Reinforcement Learning cs.RO · 2026-05-24 · unverdicted · none · ref 27 · internal anchor
Hybrid CFD-MOMARL framework with PCGrad enables micro-swarm navigation in pulsatile flow, achieving progress 6.5-7.0, energy 0.63-0.65, smoothness 0.97-0.99 with emergent behaviors.
World Models as Group Actions cs.CV · 2026-05-23 · unverdicted · none · ref 41 · internal anchor
Formalizes video world models as group actions on states and uses latent regularization with synthesized supervision to enforce consistency, introducing GAC and GAR metrics that improve structural correctness in SOTA models.
JEDI: Joint Embedding Diffusion World Model for Online Model-Based Reinforcement Learning cs.LG · 2026-05-13 · unverdicted · none · ref 3 · internal anchor
JEDI is the first online end-to-end latent diffusion world model that trains latents from denoising loss rather than reconstruction, achieving competitive Atari100k results with 43% less VRAM and over 3x faster sampling than pixel diffusion baselines.
Support-Safe Variational Hybrid Filtering for Contact-Mode and Sparse-Law Recovery cs.RO · 2026-05-12 · unverdicted · none · ref 44 · internal anchor
VHYDRO is a support-safe variational hybrid filter that jointly recovers continuous latent states, discrete contact modes, and sparse port-Hamiltonian laws per regime while preventing loss of feasible transitions.
One Token Per Frame: Reconsidering Visual Bandwidth in World Models for VLA Policy cs.CV · 2026-05-08 · conditional · none · ref 18 · 3 links · internal anchor
Reducing visual input to one token per frame in VLA world models maintains or improves long-horizon performance on MetaWorld, LIBERO, and real-robot tasks.
Operator-Guided Invariance Learning for Continuous Reinforcement Learning cs.LG · 2026-05-07 · unverdicted · none · ref 8 · internal anchor
VPSD-RL discovers exact and approximate value-preserving Lie-group operators in continuous RL to stabilize learning via transition augmentation and consistency regularization.
Learning to Theorize the World from Observation cs.LG · 2026-05-05 · unverdicted · none · ref 147 · internal anchor
NEO is a probabilistic neural model that induces compositional programs as a learned Language of Thought from non-textual observations and executes them via a shared transition model to enable explanation-driven generalization.
Latent State Design for World Models under Sufficiency Constraints cs.AI · 2026-05-03 · unverdicted · none · ref 26 · internal anchor
World models succeed when their latent states are built to meet task-specific sufficiency constraints rather than preserving the maximum amount of information.
RopeDreamer: A Kinematic Recurrent State Space Model for Dynamics of Flexible Deformable Linear Objects cs.RO · 2026-04-30 · unverdicted · none · ref 10 · internal anchor
RopeDreamer uses quaternionic kinematic chains in a recurrent state space model with a dual decoder to cut open-loop prediction error by 40.52% over 50 steps on simulated DLO trajectories while preserving physical constraints.
Mask World Model: Predicting What Matters for Robust Robot Policy Learning cs.RO · 2026-04-21 · unverdicted · none · ref 12 · internal anchor
Mask World Model predicts semantic mask dynamics with video diffusion and integrates it with a diffusion policy head, outperforming RGB world models on LIBERO and RLBench while showing better real-world generalization and texture robustness.
Beyond Static Forecasting: Unleashing the Power of World Models for Mobile Traffic Extrapolation cs.NI · 2026-04-09 · unverdicted · none · ref 13 · internal anchor
MobiWM is a multimodal world model for mobile networks that learns state-action dynamics to enable unlimited-horizon counterfactual traffic simulations and optimization.
MoRight: Motion Control Done Right cs.CV · 2026-04-08 · unverdicted · none · ref 27 · internal anchor
MoRight disentangles object and camera motion via canonical-view specification and temporal cross-view attention, while decomposing motion into active user-driven and passive consequence components to learn and apply causality in video generation.
Space-Time Forecasting of Dynamic Scenes with Motion-aware Gaussian Grouping cs.CV · 2026-02-25 · unverdicted · none · ref 13 · internal anchor
MoGaF groups Gaussians by motion in 4D splatting representations to enable stable long-term forecasting of dynamic scenes.
Training Agents Inside of Scalable World Models cs.AI · 2025-09-29 · conditional · none · ref 65 · internal anchor
Dreamer 4 is the first agent to obtain diamonds in Minecraft from only offline data by reinforcement learning inside a scalable world model that accurately predicts game mechanics.
Diffusion Models Are Real-Time Game Engines cs.LG · 2024-08-27 · conditional · none · ref 66 · internal anchor
A diffusion model trained on DOOM play sessions generates stable real-time interactive game frames at 20 FPS with quality near lossy JPEG.
Massive Activations in Large Language Models cs.CL · 2024-02-27 · unverdicted · none · ref 36 · internal anchor
Massive activations are constant large values in LLMs that function as indispensable bias terms and concentrate attention probabilities on specific tokens.
Zero-Shot Robotic Manipulation with Pretrained Image-Editing Diffusion Models cs.RO · 2023-10-16 · conditional · none · ref 21 · internal anchor
SuSIE uses a finetuned InstructPix2Pix diffusion model to propose subgoal images that guide a low-level goal-conditioned policy, achieving SOTA zero-shot performance on CALVIN and real-world manipulation.
Learning Interactive Real-World Simulators cs.AI · 2023-10-09 · conditional · none · ref 247 · internal anchor
UniSim learns a universal real-world simulator from orchestrated diverse datasets, enabling zero-shot deployment of policies trained purely in simulation.
Mastering Diverse Domains through World Models cs.AI · 2023-01-10 · unverdicted · none · ref 21 · internal anchor
DreamerV3 uses world models and robustness techniques to solve over 150 tasks across domains with a single configuration, including Minecraft diamond collection from scratch.
Mastering Atari with Discrete World Models cs.LG · 2020-10-05 · accept · none · ref 24 · internal anchor
DreamerV2 reaches human-level performance on 55 Atari games by learning behaviors inside a separately trained discrete-latent world model.
Flow Matching in Feature Space for Stochastic World Modeling cs.CV · 2026-06-27 · unverdicted · none · ref 51 · internal anchor
FlowWM applies flow matching directly in pretrained feature space with a one-step projection mechanism, improving perception accuracy, mode coverage, and horizon robustness on synthetic and real-world benchmarks.
Unified Motion-Action Modeling for Heterogeneous Robot Learning cs.RO · 2026-06-15 · unverdicted · none · ref 33 · internal anchor
UMA treats object motion and robot actions as co-evolving variables under a masked generative objective with hindsight relabeling and contrastive disentanglement to support multi-task pretraining and deployment across heterogeneous robot data.
iMaC: Translating Actions into Motion and Contact Images for Embodied World Models cs.RO · 2026-06-08 · unverdicted · none · ref 1 · internal anchor
iMaC introduces image-based action tokens in a dual-branch architecture to improve future state prediction and control in embodied world models over vector-based baselines.
$\omega$-EVA: Envision, Verify, and Act with Latent Interactive World Models cs.RO · 2026-06-08 · unverdicted · none · ref 12 · internal anchor
ω-EVA is a three-stage latent world model framework that trains action-conditioned dynamics, a language-conditioned flow policy, and a tri-branch refiner to improve embodied action generation in simulation.
Dream-Tac: A Unified Tactile World Action Model for Contact-Rich Robot Manipulation cs.RO · 2026-06-07 · unverdicted · none · ref 22 · internal anchor
Dream-Tac unifies visual and tactile signals in a world action model using contact-gated fusion and attention bias, reporting 31.7% average action accuracy gains on six manipulation tasks.
DisCo: World Models with Discrete Camera Motion Control cs.CV · 2026-06-06 · unverdicted · none · ref 13 · internal anchor
DisCo uses discrete action primitives for camera control in video world models to achieve more reliable action following than continuous trajectories.
The Invisible Hand of Physics: When Video Diffusion Models Know More Than They Show cs.GR · 2026-06-03 · unverdicted · none · ref 11 · internal anchor
Physical plausibility is linearly decodable from diffusion transformer states in video models at 81.27% accuracy on IntPhys and InfLevel, absent from VAE latents and outperforming V-JEPA and VideoMAE.
See, Infer, Intervene: Proactive World Modeling for Goal-Oriented Social Intelligence cs.CL · 2026-06-02 · unverdicted · none · ref 34 · internal anchor
Introduces SII framework and PIWM using AIDA and BDI models to predict intent transitions and select from five intervention classes, reporting 0.641 macro F1 with ground-truth state on a new benchmark.
Intercepting the Future: Latent-Space Predictive World Model for Dynamic VLA Manipulation cs.RO · 2026-06-01 · unverdicted · none · ref 16 · internal anchor
AHEAD augments frozen VLAs with a 4.9M-parameter latent world model that forecasts future visual features using optical-flow motion cues, achieving 79-97% success on dynamic simulation tasks and high real-robot success rates where baselines score near zero.
IDOL: Inverse-Dynamics-Guided Future Prediction for End-to-End Autonomous Driving cs.RO · 2026-05-29 · unverdicted · none · ref 15 · internal anchor
IDOL uses inverse dynamics on adjacent predicted latent futures to extract planning-relevant motion deltas, then optimizes trajectories with a closed-loop refinement step, reporting SOTA results on NAVSIM v1 and v2.
LLM-Guided Future Hypotheses for Horizon-Aware Exploration in Multi-Step Robot Manipulation cs.RO · 2026-05-28 · unverdicted · none · ref 7 · internal anchor
FEC conditions policies on LLM-guided short-horizon future videos via a three-stage pipeline, yielding performance gains for BC+RL over no-future baselines on RoboCasa and CALVIN while mismatched futures degrade results.
Parallel Differentiable Reachability for Learning and Planning with Certified Neural Dynamics and Controllers cs.RO · 2026-05-25 · unverdicted · none · ref 21 · internal anchor
A JAX-based differentiable reachability primitive for continuous- and discrete-time NN dynamics and controllers that supports certified training and sampling-based MPC with gradient refinement.
Efficient Agentic Reasoning Through Self-Regulated Simulative Planning cs.AI · 2026-05-21 · unverdicted · none · ref 31 · internal anchor
SR²AM achieves competitive Pass@1 accuracy on diverse tasks with 25.8-95.3% fewer reasoning tokens than much larger models by using self-regulated simulative planning trained via supervised learning and RL.
Spectral Souping: A Unified Framework for Online Preference Alignment cs.LG · 2026-05-19 · unverdicted · none · ref 7 · internal anchor
Spectral Souping learns offline specialized policies for fine-grained preferences and merges them online using a discovered universal spectral representation for efficient LLM alignment.
DiLA: Disentangled Latent Action World Models cs.CV · 2026-05-15 · unverdicted · none · ref 12 · internal anchor
DiLA uses content-structure disentanglement driven by predictive bottlenecks to create semantically structured latent actions for high-fidelity video world models.
Feedback World Model Enables Precise Guidance of Diffusion Policy cs.RO · 2026-05-15 · unverdicted · none · ref 3 · internal anchor
Feedback world model closes the prediction-observation loop at inference time to correct errors and improve diffusion policy performance under distribution shift in robotics.
Latent Video Prediction Learns Better World Models cs.CV · 2026-05-15 · unverdicted · none · ref 12 · internal anchor
Latent prediction video models exhibit a distinct robustness profile across corruption, occlusion, fine-grained discrimination, and temporal sensitivity compared to other self-supervised video models when used as world models.
PanoWorld: Geometry-Consistent Panoramic Video World Modeling cs.CV · 2026-05-14 · unverdicted · none · ref 10 · internal anchor
PanoWorld adds depth consistency and trajectory consistency losses plus spherical adaptations to a pre-trained video model, plus a new PanoGeo dataset, to produce geometry-consistent 360 video.
ReactiveGWM: Steering NPC in Reactive Game World Models cs.CV · 2026-05-14 · unverdicted · none · ref 13 · internal anchor
ReactiveGWM introduces a decoupled diffusion architecture for player-NPC interactions that learns game-agnostic response logic for zero-shot strategy transfer across games.
Zero-Shot Sim-to-Real Robot Learning: A Dexterous Manipulation Study on Reactive Catching cs.RO · 2026-05-10 · unverdicted · none · ref 18 · internal anchor
DRIS improves zero-shot sim-to-real transfer for reactive catching by maintaining and acting on sets of randomized dynamics instances instead of single instances per episode.
LaWM: Least Action World Models for Long-Horizon Physical Consistency from Visual Observations cs.LG · 2026-05-08 · unverdicted · none · ref 8 · internal anchor
LaWM induces latent transitions from a learned discrete variational principle rather than an unconstrained neural predictor, yielding improved physical consistency on synthetic dynamics and robot benchmarks.
Predictive but Not Plannable: RC-aux for Latent World Models cs.LG · 2026-05-08 · unverdicted · none · ref 16 · internal anchor
RC-aux corrects spatiotemporal mismatch in reconstruction-free latent world models by adding multi-horizon prediction and reachability supervision, improving planning performance on goal-conditioned pixel-control tasks.
Neural Control: Adjoint Learning Through Equilibrium Constraints cs.RO · 2026-05-05 · unverdicted · none · ref 5 · 2 links · internal anchor
Neural Control uses adjoint differentiation of equilibrium conditions to compute trajectory-dependent proxy gradients for history-dependent implicit models in deformable object manipulation.
TRAP: Tail-aware Ranking Attack for World-Model Planning cs.LG · 2026-05-03 · unverdicted · none · ref 19 · internal anchor
TRAP is a tail-aware ranking attack that plants a backdoor in world models so that a trigger causes the model to reorder a few critical imagined trajectories and redirect planning while preserving normal behavior on clean inputs.

Dream to Control: Learning Behaviors by Latent Imagination

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer