JEDI is the first online end-to-end latent diffusion world model that trains latents from denoising loss rather than reconstruction, achieving competitive Atari100k results with 43% less VRAM and over 3x faster sampling than pixel diffusion baselines.
Mastering atari, go, chess and shogi by planning with a learned model.Nature, 588(7839):604–609
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 5verdicts
UNVERDICTED 5roles
background 1polarities
background 1representative citing papers
RC-aux corrects spatiotemporal mismatch in reconstruction-free latent world models by adding multi-horizon prediction and reachability supervision, improving planning performance on goal-conditioned pixel-control tasks.
LLM planning in four-in-a-row is myopic: move choices match a shallow model that ignores deep nodes expanded in reasoning traces.
The paper presents stable-worldmodel (swm), a platform with high-performance data layer, modern world model baselines, planning solvers, and extended environments for reproducible research and generalization evaluation.
HaM-World integrates soft-Hamiltonian dynamics with selective state-space memory to reduce long-horizon rollout error by 55% and achieve top returns under 12 OOD perturbations on DeepMind Control Suite tasks.
citing papers explorer
-
JEDI: Joint Embedding Diffusion World Model for Online Model-Based Reinforcement Learning
JEDI is the first online end-to-end latent diffusion world model that trains latents from denoising loss rather than reconstruction, achieving competitive Atari100k results with 43% less VRAM and over 3x faster sampling than pixel diffusion baselines.
-
Predictive but Not Plannable: RC-aux for Latent World Models
RC-aux corrects spatiotemporal mismatch in reconstruction-free latent world models by adding multi-horizon prediction and reachability supervision, improving planning performance on goal-conditioned pixel-control tasks.
-
Extracting Search Trees from LLM Reasoning Traces Reveals Myopic Planning
LLM planning in four-in-a-row is myopic: move choices match a shallow model that ignores deep nodes expanded in reasoning traces.
-
stable-worldmodel: A Platform for Reproducible World Modeling Research and Evaluation
The paper presents stable-worldmodel (swm), a platform with high-performance data layer, modern world model baselines, planning solvers, and extended environments for reproducible research and generalization evaluation.
-
HaM-World: Soft-Hamiltonian World Models with Selective Memory for Planning
HaM-World integrates soft-Hamiltonian dynamics with selective state-space memory to reduce long-horizon rollout error by 55% and achieve top returns under 12 OOD perturbations on DeepMind Control Suite tasks.