hub

TD-MPC2: Scalable, Robust World Models for Continuous Control

Nicklas Hansen, Hao Su, Xiaolong Wang · 2023 · cs.LG · arXiv 2310.16828

21 Pith papers cite this work. Polarity classification is still indexing.

21 Pith papers citing it

open full Pith review browse 21 citing papers arXiv PDF

abstract

TD-MPC is a model-based reinforcement learning (RL) algorithm that performs local trajectory optimization in the latent space of a learned implicit (decoder-free) world model. In this work, we present TD-MPC2: a series of improvements upon the TD-MPC algorithm. We demonstrate that TD-MPC2 improves significantly over baselines across 104 online RL tasks spanning 4 diverse task domains, achieving consistently strong results with a single set of hyperparameters. We further show that agent capabilities increase with model and data size, and successfully train a single 317M parameter agent to perform 80 tasks across multiple task domains, embodiments, and action spaces. We conclude with an account of lessons, opportunities, and risks associated with large TD-MPC2 agents. Explore videos, models, data, code, and more at https://tdmpc2.com

hub tools

JSON dossier citing papers JSON arXiv source

representative citing papers

JEDI: Joint Embedding Diffusion World Model for Online Model-Based Reinforcement Learning

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

JEDI is the first online end-to-end latent diffusion world model that trains latents from denoising loss rather than reconstruction, achieving competitive Atari100k results with 43% less VRAM and over 3x faster sampling than pixel diffusion baselines.

OA-WAM: Object-Addressable World Action Model for Robust Robot Manipulation

cs.RO · 2026-05-07 · unverdicted · novelty 7.0

OA-WAM uses persistent address vectors and dynamic content vectors in object slots to enable addressable world-action prediction, improving robustness on manipulation benchmarks under scene changes.

Latent State Design for World Models under Sufficiency Constraints

cs.AI · 2026-05-03 · unverdicted · novelty 7.0

World models succeed when their latent states are built to meet task-specific sufficiency constraints rather than preserving the maximum amount of information.

Agent-Centric Observation Adaptation for Robust Visual Control under Dynamic Perturbations

cs.RO · 2026-04-27 · unverdicted · novelty 7.0 · 2 refs

ACO-MoE recovers 95.3% of clean-input performance in visual control tasks under Markov-switching corruptions by routing restoration experts and anchoring representations to clean foreground masks.

Multi-scale Predictive Representations for Goal-conditioned Reinforcement Learning

cs.LG · 2026-05-10 · unverdicted · novelty 6.0

Ms.PR applies multi-scale predictive supervision to enforce goal-directed alignment in latent spaces for offline GCRL, yielding improved representation quality and performance on vision and state-based tasks.

MolWorld: Molecule World Models for Actionable Molecular Optimization

cs.LG · 2026-05-09 · unverdicted · novelty 6.0

MolWorld expands a molecule-transfer graph using a world model to discover high-property molecules that maintain strong structural connectivity to known compounds for actionable optimization.

Predictive but Not Plannable: RC-aux for Latent World Models

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

RC-aux corrects spatiotemporal mismatch in reconstruction-free latent world models by adding multi-horizon prediction and reachability supervision, improving planning performance on goal-conditioned pixel-control tasks.

TRAP: Tail-aware Ranking Attack for World-Model Planning

cs.LG · 2026-05-03 · unverdicted · novelty 6.0

TRAP is a tail-aware ranking attack that plants a backdoor in world models so that a trigger causes the model to reorder a few critical imagined trajectories and redirect planning while preserving normal behavior on clean inputs.

RAY-TOLD: Ray-Based Latent Dynamics for Dense Dynamic Obstacle Avoidance with TDMPC

cs.RO · 2026-04-30 · unverdicted · novelty 6.0

RAY-TOLD combines ray-based latent dynamics from LiDAR with MPPI control and a learned policy prior via mixture sampling to lower collision rates in high-density dynamic obstacle environments compared to standard MPPI.

Toward Safe Autonomous Robotic Endovascular Interventions using World Models

cs.RO · 2026-04-22 · unverdicted · novelty 6.0

TD-MPC2 world models achieve 58% mean success in simulated endovascular navigation versus 36% for SAC, with comparable in-vitro rates but better path efficiency.

Human Cognition in Machines: A Unified Perspective of World Models

cs.RO · 2026-04-17 · unverdicted · novelty 6.0

The paper introduces a unified framework for world models that fully incorporates all cognitive functions from Cognitive Architecture Theory, highlights under-researched areas in motivation and meta-cognition, and proposes Epistemic World Models as a new category for scientific discovery agents.

GIRL: Generative Imagination Reinforcement Learning via Information-Theoretic Hallucination Control

cs.LG · 2026-04-08 · unverdicted · novelty 6.0

GIRL reduces latent rollout drift by 38-61% versus DreamerV3 in MBRL by grounding transitions with DINOv2 embeddings and using an information-theoretic adaptive bottleneck, yielding better long-horizon returns on control benchmarks.

Neural Operators for Multi-Task Control and Adaptation

cs.LG · 2026-04-03 · unverdicted · novelty 6.0

Neural operators approximate the solution operator for multi-task optimal control, generalizing to new tasks and enabling efficient adaptation via branch-trunk structure and meta-training.

Hierarchical Planning with Latent World Models

cs.LG · 2026-04-03 · unverdicted · novelty 6.0

Hierarchical planning over multi-scale latent world models enables 70% success on real robotic pick-and-place with goal-only input where flat models achieve 0%, while cutting planning compute up to 4x in simulations.

Learning Task-Invariant Properties via Dreamer: Enabling Efficient Policy Transfer for Quadruped Robots

cs.RO · 2026-04-03 · unverdicted · novelty 6.0

DreamTIP adds LLM-identified task-invariant properties as auxiliary targets in Dreamer's world model plus a mixed-replay adaptation step, delivering 28.1% average simulated transfer gains and 100% real-world climb success versus 10% for baselines.

Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning

cs.AI · 2026-01-22 · conditional · novelty 6.0

Single-stage fine-tuning of a video model to generate actions as latent frames plus future states and values yields state-of-the-art robot policy performance on LIBERO, RoboCasa, and bimanual tasks.

V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

cs.AI · 2025-06-11 · unverdicted · novelty 6.0

V-JEPA 2 pre-trained on massive unlabeled video achieves strong results on motion understanding and action anticipation, SOTA video QA at 8B scale, and enables zero-shot robotic planning on Franka arms using only 62 hours of unlabeled robot video.

TOPPO: Rethinking PPO for Multi-Task Reinforcement Learning with Critic Balancing

cs.AI · 2026-05-12 · unverdicted · novelty 5.0

TOPPO reformulates PPO with critic balancing to address gradient ill-conditioning in multi-task RL and reports stronger mean and tail performance than SAC baselines on Meta-World+ using fewer parameters and steps.

Detecting is Easy, Adapting is Hard: Local Expert Growth for Visual Model-Based Reinforcement Learning under Distribution Shift

cs.LG · 2026-04-30 · unverdicted · novelty 5.0

JEPA-Indexed Local Expert Growth adds local action corrections for detected shift clusters and yields statistically significant OOD gains on four shift conditions while keeping in-distribution performance intact.

World-Value-Action Model: Implicit Planning for Vision-Language-Action Systems

cs.RO · 2026-04-16 · unverdicted · novelty 5.0

The World-Value-Action model enables implicit planning for VLA systems by performing inference over a learned latent representation of high-value future trajectories instead of direct action prediction.

Active Inference: A method for Phenotyping Agency in AI systems?

cs.AI · 2026-04-25 · unverdicted · novelty 4.0

Active inference offers a variational way to phenotype agency in AI systems by measuring empowerment in generative models via a T-maze paradigm.

citing papers explorer

Showing 7 of 7 citing papers after filters.

OA-WAM: Object-Addressable World Action Model for Robust Robot Manipulation cs.RO · 2026-05-07 · unverdicted · none · ref 25 · internal anchor
OA-WAM uses persistent address vectors and dynamic content vectors in object slots to enable addressable world-action prediction, improving robustness on manipulation benchmarks under scene changes.
Agent-Centric Observation Adaptation for Robust Visual Control under Dynamic Perturbations cs.RO · 2026-04-27 · unverdicted · none · ref 17 · 2 links · internal anchor
ACO-MoE recovers 95.3% of clean-input performance in visual control tasks under Markov-switching corruptions by routing restoration experts and anchoring representations to clean foreground masks.
RAY-TOLD: Ray-Based Latent Dynamics for Dense Dynamic Obstacle Avoidance with TDMPC cs.RO · 2026-04-30 · unverdicted · none · ref 10 · internal anchor
RAY-TOLD combines ray-based latent dynamics from LiDAR with MPPI control and a learned policy prior via mixture sampling to lower collision rates in high-density dynamic obstacle environments compared to standard MPPI.
Toward Safe Autonomous Robotic Endovascular Interventions using World Models cs.RO · 2026-04-22 · unverdicted · none · ref 16 · internal anchor
TD-MPC2 world models achieve 58% mean success in simulated endovascular navigation versus 36% for SAC, with comparable in-vitro rates but better path efficiency.
Human Cognition in Machines: A Unified Perspective of World Models cs.RO · 2026-04-17 · unverdicted · none · ref 63 · internal anchor
The paper introduces a unified framework for world models that fully incorporates all cognitive functions from Cognitive Architecture Theory, highlights under-researched areas in motivation and meta-cognition, and proposes Epistemic World Models as a new category for scientific discovery agents.
Learning Task-Invariant Properties via Dreamer: Enabling Efficient Policy Transfer for Quadruped Robots cs.RO · 2026-04-03 · unverdicted · none · ref 24 · internal anchor
DreamTIP adds LLM-identified task-invariant properties as auxiliary targets in Dreamer's world model plus a mixed-replay adaptation step, delivering 28.1% average simulated transfer gains and 100% real-world climb success versus 10% for baselines.
World-Value-Action Model: Implicit Planning for Vision-Language-Action Systems cs.RO · 2026-04-16 · unverdicted · none · ref 10 · internal anchor
The World-Value-Action model enables implicit planning for VLA systems by performing inference over a learned latent representation of high-value future trajectories instead of direct action prediction.

TD-MPC2: Scalable, Robust World Models for Continuous Control

hub tools

fields

years

verdicts

representative citing papers

citing papers explorer