super hub Canonical reference

World Models

David Ha · 2018 · cs.LG · arXiv 1803.10122

Canonical reference. 88% of citing Pith papers cite this work as background.

261 Pith papers citing it

Background 88% of classified citations

open full Pith review browse 261 citing papers more from David Ha arXiv PDF

abstract

We explore building generative neural network models of popular reinforcement learning environments. Our world model can be trained quickly in an unsupervised manner to learn a compressed spatial and temporal representation of the environment. By using features extracted from the world model as inputs to an agent, we can train a very compact and simple policy that can solve the required task. We can even train our agent entirely inside of its own hallucinated dream generated by its world model, and transfer this policy back into the actual environment. An interactive version of this paper is available at https://worldmodels.github.io/

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 36 method 3 other 1

citation-polarity summary

background 35 use method 3 unclear 2

claims ledger

abstract We explore building generative neural network models of popular reinforcement learning environments. Our world model can be trained quickly in an unsupervised manner to learn a compressed spatial and temporal representation of the environment. By using features extracted from the world model as inputs to an agent, we can train a very compact and simple policy that can solve the required task. We can even train our agent entirely inside of its own hallucinated dream generated by its world model, and transfer this policy back into the actual environment. An interactive version of this paper is

authors

David Ha J\"urgen Schmidhuber

co-cited works

representative citing papers

Textual Belief States for World Models: Identifiable Representation Learning Under Strict Mediation

cs.LG · 2026-06-26 · unverdicted · novelty 8.0

Introduces textual belief states and factorized GRPO to enforce strict latent state mediation in text-based world models, yielding preserved prediction accuracy with large gains in representation quality and rollout performance on TextWorld and ScienceWorld.

When Does LeJEPA Learn a World Model?

stat.ML · 2026-05-25 · unverdicted · novelty 8.0

LeJEPA achieves linear identifiability of latent variables uniquely when the latents are Gaussian in worlds with stationary additive-noise transitions.

From Generalist to Specialist Representation

cs.LG · 2026-05-12 · unverdicted · novelty 8.0

Task structure is identifiable across time steps and task-relevant representations are identifiable within steps in a nonparametric setting under sparsity regularization.

EgoMemReason: A Memory-Driven Reasoning Benchmark for Long-Horizon Egocentric Video Understanding

cs.CV · 2026-05-11 · unverdicted · novelty 8.0

EgoMemReason is a new benchmark showing that even the best multimodal models achieve only 39.6% accuracy on reasoning tasks that require integrating sparse evidence across days in egocentric video.

A Model-Free Universal AI

cs.AI · 2026-02-26 · unverdicted · novelty 8.0

AIQI is the first model-free universal AI agent proven asymptotically ε-optimal in general RL by inducing over distributional Q-functions instead of policies or environments.

Pondering the Way: Spatial-perceiving World Action Model for Embodied Navigation

cs.RO · 2026-06-29 · unverdicted · novelty 7.0

SWAM jointly generates intermediate RGB-D sequences and action trajectories from monocular RGB start/goal observations for embodied navigation.

Grounded Iterative Language Planning: How Parameterized World Models Reduce Hallucination Propagation in LLM Agents

cs.AI · 2026-06-26 · unverdicted · novelty 7.0 · 2 refs

GILP trains a parameterized backbone for valid actions and state predictions, then uses a consistency gate with LLM drafts to reduce hallucinated-state rate from 0.176 to 0.035 on GPT-4o-mini while raising success from 0.668 to 0.838.

MemoBench: Benchmarking World Modeling in Dynamically Changing Environments

cs.CV · 2026-06-25 · unverdicted · novelty 7.0 · 4 refs

MemoBench is a new diagnostic benchmark with automated and VQA metrics that evaluates memory consistency in video models under disappear-and-reappear in dynamic environments.

Equilibrium World Models

econ.GN · 2026-06-22 · unverdicted · novelty 7.0

Equilibrium World Models are a deep-learning solver that enforces exact equilibrium conditions on broad model-generated state distributions to globally solve dynamic stochastic models featuring rare disasters, binding constraints, and counterfactual states.

Distilling a Modular Reservoir Through a Genomic Bottleneck

cs.NE · 2026-06-20 · unverdicted · novelty 7.0

Hypernetworks distill modular reservoir connectivity via a genomic bottleneck to generate sparse recurrent networks solving difficult temporal tasks with minimal training and maintained robustness.

Beyond the Next Step: Variable-Length Latent World Models for Long-Horizon Planning

cs.LG · 2026-06-19 · unverdicted · novelty 7.0

VLWMs learn variable-length action-conditioned dynamics in latent space with curriculum training, yielding 13% average gains over prior latent world models on long-horizon tasks.

PreAct: Computer-Using Agents that Get Faster on Repeated Tasks

cs.AI · 2026-06-16 · unverdicted · novelty 7.0

PreAct compiles successful agent executions into verifiable state-machine programs for 8.5-13x faster replay on repeated tasks, with an independent evaluator check before storing each program.

VISA: VLM-Guided Instance Semantic Auditing for 3D Occupancy World Models

cs.CV · 2026-06-11 · unverdicted · novelty 7.0

VISA improves closed-set 3D occupancy mIoU on nuScenes by using VLM instance audits as reliability-weighted semantic supervisors during training of existing world models.

World Model Self-Distillation: Training World Models to Solve General Tasks

cs.CV · 2026-06-10 · unverdicted · novelty 7.0

Self-distillation from a caption-conditioned video diffusion model to an image-and-prompt-conditioned executor, enhanced by RL from VLM feedback, enables task solving in world models.

Dream.exe: Can Video Generation Models Dream Executable Robot Manipulation?

cs.CV · 2026-06-03 · unverdicted · novelty 7.0

Dream.exe evaluates 8 video generation models on 101 manipulation tasks by converting generated videos into executable robot trajectories in a simulator, finding measurable success rates that visual metrics do not predict.

SVI-Bench: A Dynamic Microworld for Strategic Video Intelligence

cs.CV · 2026-05-29 · unverdicted · novelty 7.0 · 2 refs

SVI-Bench provides 35K hours of sports video with 9 tasks across four cognitive levels, revealing models drop from ~74% on action QA to 5% on agentic evidence integration.

YoCausal: How Far is Video Generation from World Model? A Causality Perspective

cs.CV · 2026-05-28 · unverdicted · novelty 7.0

YoCausal benchmark shows video diffusion models detect the arrow of time but lack genuine causal understanding relative to humans.

Benchmarking Single-Factor Physical Video-to-Audio Generation

cs.CV · 2026-05-28 · unverdicted · novelty 7.0

FlatSounds benchmark shows state-of-the-art V2A models rely more on text captions than visual input for physical and semantic accuracy, with captions improving correctness but degrading temporal alignment.

Do Language Models Need Sleep? Offline Recurrence for Improved Online Inference

cs.CL · 2026-05-25 · unverdicted · novelty 7.0

A sleep mechanism with N offline recurrent passes consolidates context into fast weights, improving performance on reasoning tasks where standard transformers fail.

WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation

cs.CV · 2026-05-25 · unverdicted · novelty 7.0

WBench is a benchmark with 289 test cases and 1,058 turns for evaluating interactive world models using 22 automated metrics validated against human judgments.

UWM-JEPA: Predictive World Models That Imagine in Belief Space

cs.LG · 2026-05-25 · unverdicted · novelty 7.0

UWM-JEPA uses a density-matrix latent and unitary predictor in JEPA to preserve joint-state spectrum during blind rollouts, achieving 0.77 accuracy on a five-step hidden-velocity task versus 0.53 for an LSTM baseline.

Beyond Generative Priors: Minority Sampling with JEPA-Guided Diffusion

cs.LG · 2026-05-23 · unverdicted · novelty 7.0

JEPA guidance steers diffusion models toward low-density regions under an implicit density from a world model, producing minority samples with improved fidelity and semantic validity over generator-centric baselines.

SliceWorld: A Predictive and Controllable World-State Model for CT Report Generation

cs.CV · 2026-05-23 · unverdicted · novelty 7.0

SliceWorld introduces a world-state model for CT report generation that uses predictive and factor-aware objectives on axial slice sequences.

CRONOS: Benchmarking Counterfactual Physical Consistency in Video Models

cs.CV · 2026-05-22 · unverdicted · novelty 7.0

CRONOS benchmark shows recent open-source video generators fail to preserve physical consistency under controlled changes to viewpoint, scene, object category, and appearance.

citing papers explorer

Showing 50 of 261 citing papers.

Do Vision-Language Models Understand 3D Scenes or Just Catalogue Objects? cs.CV · 2026-05-19 · unverdicted · none · ref 15 · 2 links · internal anchor
VLMs achieve 53-97% on rearrangement planning but only 6-45% on occlusion and under 7% on reflections, with failures localized to visual token compression after the vision encoder.
Latent Video Prediction Learns Better World Models cs.CV · 2026-05-15 · unverdicted · none · ref 11 · internal anchor
Latent prediction video models exhibit a distinct robustness profile across corruption, occlusion, fine-grained discrimination, and temporal sensitivity compared to other self-supervised video models when used as world models.
Neural Point-Forms cs.LG · 2026-05-15 · unverdicted · none · ref 51 · internal anchor
Neural point-forms are introduced as permutation-invariant neural layers that output learned form-comparison matrices for point clouds, with a claimed consistency proof under sampling and manifold assumptions and competitive results on synthetic and biological data.
EgoExo-WM: Unlocking Exo Video for Ego World Models cs.CV · 2026-05-14 · unverdicted · none · ref 31 · 2 links · internal anchor
Method converts exocentric videos to egocentric format via body-pose extraction and kinematics to improve egocentric world-model prediction and planning.
ReactiveGWM: Steering NPC in Reactive Game World Models cs.CV · 2026-05-14 · unverdicted · none · ref 12 · internal anchor
ReactiveGWM introduces a decoupled diffusion architecture for player-NPC interactions that learns game-agnostic response logic for zero-shot strategy transfer across games.
Slot-MPC: Goal-Conditioned Model Predictive Control with Object-Centric Representations cs.LG · 2026-05-14 · unverdicted · none · ref 3 · internal anchor
Slot-MPC learns slot representations to build a differentiable object-centric dynamics model that supports efficient gradient-based MPC for robotic manipulation in novel situations.
PriorZero: Bridging Language Priors and World Models for Decision Making cs.LG · 2026-05-12 · unverdicted · none · ref 1 · internal anchor
PriorZero uses root-only LLM prior injection in MCTS and alternating world-model training with LLM fine-tuning to raise exploration efficiency and final performance on Jericho text games and BabyAI gridworlds.
WorldComp2D: Spatio-semantic Representations of Object Identity and Location from Local Views cs.CV · 2026-05-12 · unverdicted · none · ref 34 · internal anchor
WorldComp2D explicitly structures latent space geometry by object identity and spatial proximity via a proximity-dependent encoder and localizer, cutting parameters up to 4X and FLOPs 2.2X versus state-of-the-art lightweight models on facial landmark localization while staying real-time on CPU.
Network-Efficient World Model Token Streaming cs.RO · 2026-05-11 · unverdicted · none · ref 1 · internal anchor
An adaptive delta-prioritization algorithm using cosine distance and Hamming-drift thresholds improves embedding distortion by 4.8-7.2% and next-token perplexity by 2.1-6.3% over periodic keyframing at matched low bitrates for tokenized driving world models.
Beyond Thinking: Imagining in 360$^\circ$ for Humanoid Visual Search cs.CV · 2026-05-09 · unverdicted · none · ref 92 · internal anchor
Imagining in 360° decouples visual search into a single-step probabilistic semantic layout predictor and an actor, removing the need for multi-turn CoT reasoning and trajectory annotations while improving efficiency in 360° environments.
MolWorld: Molecule World Models for Actionable Molecular Optimization cs.LG · 2026-05-09 · unverdicted · none · ref 17 · internal anchor
MolWorld expands a molecule-transfer graph using a world model to discover high-property molecules that maintain strong structural connectivity to known compounds for actionable optimization.
Latent Geometry Beyond Search: Amortizing Planning in World Models cs.RO · 2026-05-09 · unverdicted · none · ref 3 · 2 links · internal anchor
A Goal-Conditioned Inverse Dynamics Model amortizes planning in pretrained world model latents, matching or exceeding CEM in seven of eight settings at 100-130x lower per-decision cost.
Reason to Play: Behavioral and Brain Alignment Between Frontier LRMs and Human Game Learners cs.AI · 2026-05-08 · unverdicted · none · ref 6 · internal anchor
Frontier LRMs match human game-learning behavior and predict fMRI signals an order of magnitude better than RL or Bayesian agents because of their in-context game-state representations.
Predictive but Not Plannable: RC-aux for Latent World Models cs.LG · 2026-05-08 · unverdicted · none · ref 15 · internal anchor
RC-aux corrects spatiotemporal mismatch in reconstruction-free latent world models by adding multi-horizon prediction and reachability supervision, improving planning performance on goal-conditioned pixel-control tasks.
Three-in-One World Model: Energy-Based Consistency, Prediction, and Counterfactual Inference for Marketing Intervention cs.AI · 2026-05-08 · unverdicted · none · ref 8 · internal anchor
A DBM-based architecture learns consumer beliefs to enable consistent prediction and counterfactual inference for marketing interventions, outperforming baselines on heterogeneous treatment effects in simulation.
On Training in Imagination cs.LG · 2026-05-07 · unverdicted · none · ref 4 · 2 links · internal anchor
The work derives the optimal ratio of dynamics-to-reward samples that minimizes a bound on return error and characterizes the tradeoff between noisy but cheap rewards versus accurate but expensive ones in imagination-based policy optimization.
Dream-MPC: Gradient-Based Model Predictive Control with Latent Imagination cs.LG · 2026-05-06 · unverdicted · none · ref 2 · 2 links · internal anchor
Dream-MPC refines policy-generated trajectories by gradient ascent in a latent world model with uncertainty regularization and temporal amortization, improving base policy performance and beating gradient-free MPC on 24 continuous control tasks.
TRAP: Tail-aware Ranking Attack for World-Model Planning cs.LG · 2026-05-03 · unverdicted · none · ref 17 · internal anchor
TRAP is a tail-aware ranking attack that plants a backdoor in world models so that a trigger causes the model to reorder a few critical imagined trajectories and redirect planning while preserving normal behavior on clean inputs.
RAY-TOLD: Ray-Based Latent Dynamics for Dense Dynamic Obstacle Avoidance with TDMPC cs.RO · 2026-04-30 · unverdicted · none · ref 15 · internal anchor
RAY-TOLD combines ray-based latent dynamics from LiDAR with MPPI control and a learned policy prior via mixture sampling to lower collision rates in high-density dynamic obstacle environments compared to standard MPPI.
Data-Driven Open-Loop Simulation for Digital-Twin Operator Decision Support in Wastewater Treatment cs.LG · 2026-04-22 · unverdicted · none · ref 26 · internal anchor
CCSS-RS achieves RMSE 0.696 and CRPS 0.349 at 1000-step horizons on a large public WWTP benchmark with 43% missingness, outperforming Neural CDE baselines by 40-46% in RMSE.
Xiaomi OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation cs.CV · 2026-04-20 · unverdicted · none · ref 38 · 2 links · internal anchor
OneVL achieves superior accuracy to explicit chain-of-thought reasoning at answer-only latency by supervising latent tokens with a visual world model decoder that predicts future frames.
Human Cognition in Machines: A Unified Perspective of World Models cs.RO · 2026-04-17 · unverdicted · none · ref 60 · internal anchor
The paper introduces a unified framework for world models that fully incorporates all cognitive functions from Cognitive Architecture Theory, highlights under-researched areas in motivation and meta-cognition, and proposes Epistemic World Models as a new category for scientific discovery agents.
Learning Ad Hoc Network Dynamics via Graph-Structured World Models cs.LG · 2026-04-16 · unverdicted · none · ref 2 · internal anchor
G-RSSM learns per-node dynamics in wireless ad hoc networks via graph attention and trains clustering policies through imagined rollouts, generalizing from N=50 training to larger networks.
Robotic Manipulation is Vision-to-Geometry Mapping ($f(v) \rightarrow G$): Vision-Geometry Backbones over Language and Video Models cs.RO · 2026-04-14 · unverdicted · none · ref 17 · internal anchor
Vision-geometry backbones using pretrained 3D world models outperform vision-language and video models for robotic manipulation by enabling direct mapping from visual input to geometric actions.
LMGenDrive: Bridging Multimodal Understanding and Generative World Modeling for End-to-End Driving cs.CV · 2026-04-09 · unverdicted · none · ref 17 · internal anchor
LMGenDrive unifies LLM-based multimodal understanding with generative world models to output both future driving videos and control signals for end-to-end closed-loop autonomous driving.
GIRL: Generative Imagination Reinforcement Learning via Information-Theoretic Hallucination Control cs.LG · 2026-04-08 · unverdicted · none · ref 2 · internal anchor
GIRL reduces latent rollout drift by 38-61% versus DreamerV3 in MBRL by grounding transitions with DINOv2 embeddings and using an information-theoretic adaptive bottleneck, yielding better long-horizon returns on control benchmarks.
Veo-Act: How Far Can Frontier Video Models Advance Generalizable Robot Manipulation? cs.RO · 2026-04-06 · unverdicted · none · ref 18 · internal anchor
Veo-3 video predictions enable approximate task-level robot trajectories in zero-shot settings but require hierarchical integration with low-level VLA policies for reliable manipulation performance.
Behavior-Constrained Reinforcement Learning with Receding-Horizon Credit Assignment for High-Performance Control cs.RO · 2026-04-03 · unverdicted · none · ref 12 · internal anchor
A behavior-constrained RL framework with receding-horizon credit assignment learns high-performance control policies that stay aligned with expert behavior in race car simulation.
Safety, Security, and Cognitive Risks in World Models cs.CR · 2026-04-01 · unverdicted · none · ref 1 · internal anchor
World models enable efficient AI planning but create risks from adversarial corruption, goal misgeneralization, and human bias, demonstrated via attacks that amplify errors and reduce rewards on models like RSSM and DreamerV3.
Metriplector: From Field Theory to Neural Architecture cs.AI · 2026-03-31 · unverdicted · none · ref 9 · internal anchor
Metriplector treats neural computation as coupled metriplectic field dynamics whose stress-energy tensor readout achieves competitive results on vision, control, Sudoku, language modeling, and pathfinding with small parameter counts.
Video Generation Models as World Models: Efficient Paradigms, Architectures and Algorithms eess.IV · 2026-03-30 · unverdicted · none · ref 16 · internal anchor
Video generation models can function as world simulators if efficiency gaps in spatiotemporal modeling are bridged via organized paradigms, architectures, and algorithms.
Dreamer-CDP: Improving Reconstruction-free World Models Via Continuous Deterministic Representation Prediction cs.LG · 2026-03-07 · unverdicted · none · ref 6 · internal anchor
Dreamer-CDP achieves reconstruction-free world modeling via a JEPA-style predictor on continuous deterministic representations and matches Dreamer's performance on Crafter.
HAIC: Humanoid Agile Object Interaction Control via Dynamics-Aware World Model cs.RO · 2026-02-12 · unverdicted · none · ref 10 · internal anchor
HAIC enables robust humanoid interactions with underactuated objects by predicting their dynamics from proprioceptive history and using a world model for adaptive control.
World model inspired sarcasm reasoning with large language model agents cs.CL · 2025-12-30 · unverdicted · none · ref 8 · internal anchor
WM-SAR decomposes sarcasm into LLM-agent components, quantifies literal-normative inconsistency deterministically, and integrates it with intention via logistic regression to outperform prior sarcasm detectors on benchmarks.
Cambrian-S: Towards Spatial Supersensing in Video cs.CV · 2025-11-06 · unverdicted · none · ref 49 · internal anchor
Cambrian-S introduces VSI-SUPER benchmarks for long-horizon spatial recall and counting, shows data scaling yields 30% gains on existing tests, and demonstrates a self-supervised next-latent predictor using surprise outperforms baselines on the new spatial supersensing tasks.
Co-Evolving Latent Action World Models cs.LG · 2025-10-30 · unverdicted · none · ref 14 · internal anchor
CoLA-World jointly trains latent action models and world models with a warm-up phase to achieve co-evolution, matching or exceeding prior two-stage methods in video simulation quality and visual planning performance.
Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation cs.RO · 2025-08-07 · unverdicted · none · ref 13 · internal anchor
Genie Envisioner unifies robotic policy learning, simulation, and evaluation inside one instruction-conditioned video diffusion framework using GE-Base, GE-Act, and GE-Sim.
Vidar: Embodied Video Diffusion Model for Generalist Manipulation cs.LG · 2025-07-17 · unverdicted · none · ref 38 · internal anchor
Vidar shows that a video diffusion prior continuously pre-trained on 750K multi-view robot trajectories plus a label-free masked inverse dynamics adapter can generalize manipulation to new robot embodiments with 1% of typical demonstration data.
Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling cs.CV · 2025-07-10 · unverdicted · none · ref 28 · internal anchor
Geometry Forcing aligns video diffusion representations with geometric foundation model features via angular cosine and scale regression objectives to improve 3D consistency in generated videos.
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning cs.AI · 2025-06-11 · unverdicted · none · ref 27 · internal anchor
V-JEPA 2 pre-trained on massive unlabeled video achieves strong results on motion understanding and action anticipation, SOTA video QA at 8B scale, and enables zero-shot robotic planning on Franka arms using only 62 hours of unlabeled robot video.
Physically Interpretable World Models via Weakly Supervised Representation Learning cs.LG · 2024-12-17 · unverdicted · none · ref 15 · internal anchor
PIWM aligns latent states in image-based world models with physical variables and constrains their dynamics to known equations via weak distribution supervision, yielding accurate long-horizon predictions and parameter recovery on Cart Pole, Lunar Lander, and Donkey Car.
GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation cs.RO · 2024-10-08 · unverdicted · none · ref 64 · internal anchor
GR-2 pre-trains on web-scale videos then fine-tunes on robot data to reach 97.7% average success across over 100 manipulation tasks with strong generalization to new scenes and objects.
Training Language Models to Self-Correct via Reinforcement Learning cs.LG · 2024-09-19 · unverdicted · none · ref 240 · internal anchor
SCoRe uses multi-turn online RL with regularization on self-generated traces to improve LLM self-correction, achieving 15.6% and 9.1% gains on MATH and HumanEval for Gemini models.
Reasoning with Language Model is Planning with World Model cs.CL · 2023-05-24 · unverdicted · none · ref 74 · internal anchor
RAP turns LLMs into dual world-model and planning agents via MCTS to generate better reasoning paths, outperforming CoT baselines and achieving 33% relative gains over GPT-4 CoT using LLaMA-33B on plan generation.
Learning World Graphs to Accelerate Hierarchical Reinforcement Learning cs.LG · 2019-07-01 · unverdicted · none · ref 37 · internal anchor
A two-stage framework learns a world graph of pivotal states task-agnostically via joint training of a latent model and curiosity-driven policy, then uses the graph to accelerate hierarchical RL on maze tasks.
Emergence of Exploratory Look-Around Behaviors through Active Observation Completion cs.CV · 2019-06-27 · unverdicted · none · ref 45 · internal anchor
An RL agent learns to actively explore by being rewarded for inferring unobserved scene parts after short glimpse sequences, with sidekick policy learning enabling generalization to other active perception tasks.
Path-Measure Dynamics of Attention-Driven World Models: A Nonlocal Onsager--Machlup Approach cond-mat.stat-mech · 2026-07-02 · unverdicted · none · ref 2 · internal anchor
Derives that attention-induced non-Markovian dynamics yield a nonlocal Onsager-Machlup action whose short-memory expansion recovers the local action of a companion paper.
PhysMani: Physics-principled 3D World Model for Dynamic Object Manipulation cs.RO · 2026-07-02 · unverdicted · none · ref 23 · internal anchor
PhysMani couples a physics-principled 3D Gaussian world model with a future-aware policy to achieve higher success rates on dynamic manipulation tasks in simulation and real robots.
Arachne: Orchestrating Cascades for Efficient Text-to-Video Model Training cs.DC · 2026-07-02 · unverdicted · none · ref 15 · internal anchor
Arachne orchestrates cascades for distributed T2V training and reports up to 65% lower iteration time with improving gains at larger scales compared to static bucketing approaches.
Valdi: Value Diffusion World Models cs.LG · 2026-07-01 · unverdicted · none · ref 7 · internal anchor
Valdi pairs a latent diffusion dynamics model with end-to-end MPC training and reports that one diffusion step matches an MLP baseline on CarRacing while exposing a multimodality-control trade-off.

World Models

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer