super hub Canonical reference

World Models

David Ha · 2018 · cs.LG · arXiv 1803.10122

Canonical reference. 88% of citing Pith papers cite this work as background.

264 Pith papers citing it

Background 88% of classified citations

open full Pith review browse 264 citing papers more from David Ha arXiv PDF

abstract

We explore building generative neural network models of popular reinforcement learning environments. Our world model can be trained quickly in an unsupervised manner to learn a compressed spatial and temporal representation of the environment. By using features extracted from the world model as inputs to an agent, we can train a very compact and simple policy that can solve the required task. We can even train our agent entirely inside of its own hallucinated dream generated by its world model, and transfer this policy back into the actual environment. An interactive version of this paper is available at https://worldmodels.github.io/

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 36 method 3 other 1

citation-polarity summary

background 35 use method 3 unclear 2

claims ledger

abstract We explore building generative neural network models of popular reinforcement learning environments. Our world model can be trained quickly in an unsupervised manner to learn a compressed spatial and temporal representation of the environment. By using features extracted from the world model as inputs to an agent, we can train a very compact and simple policy that can solve the required task. We can even train our agent entirely inside of its own hallucinated dream generated by its world model, and transfer this policy back into the actual environment. An interactive version of this paper is

authors

David Ha J\"urgen Schmidhuber

co-cited works

representative citing papers

Textual Belief States for World Models: Identifiable Representation Learning Under Strict Mediation

cs.LG · 2026-06-26 · unverdicted · novelty 8.0

Introduces textual belief states and factorized GRPO to enforce strict latent state mediation in text-based world models, yielding preserved prediction accuracy with large gains in representation quality and rollout performance on TextWorld and ScienceWorld.

When Does LeJEPA Learn a World Model?

stat.ML · 2026-05-25 · unverdicted · novelty 8.0

LeJEPA achieves linear identifiability of latent variables uniquely when the latents are Gaussian in worlds with stationary additive-noise transitions.

From Generalist to Specialist Representation

cs.LG · 2026-05-12 · unverdicted · novelty 8.0

Task structure is identifiable across time steps and task-relevant representations are identifiable within steps in a nonparametric setting under sparsity regularization.

EgoMemReason: A Memory-Driven Reasoning Benchmark for Long-Horizon Egocentric Video Understanding

cs.CV · 2026-05-11 · unverdicted · novelty 8.0

EgoMemReason is a new benchmark showing that even the best multimodal models achieve only 39.6% accuracy on reasoning tasks that require integrating sparse evidence across days in egocentric video.

A Model-Free Universal AI

cs.AI · 2026-02-26 · unverdicted · novelty 8.0

AIQI is the first model-free universal AI agent proven asymptotically ε-optimal in general RL by inducing over distributional Q-functions instead of policies or environments.

Pondering the Way: Spatial-perceiving World Action Model for Embodied Navigation

cs.RO · 2026-06-29 · unverdicted · novelty 7.0

SWAM jointly generates intermediate RGB-D sequences and action trajectories from monocular RGB start/goal observations for embodied navigation.

Grounded Iterative Language Planning: How Parameterized World Models Reduce Hallucination Propagation in LLM Agents

cs.AI · 2026-06-26 · unverdicted · novelty 7.0 · 2 refs

GILP trains a parameterized backbone for valid actions and state predictions, then uses a consistency gate with LLM drafts to reduce hallucinated-state rate from 0.176 to 0.035 on GPT-4o-mini while raising success from 0.668 to 0.838.

MemoBench: Benchmarking World Modeling in Dynamically Changing Environments

cs.CV · 2026-06-25 · unverdicted · novelty 7.0 · 4 refs

MemoBench is a new diagnostic benchmark with automated and VQA metrics that evaluates memory consistency in video models under disappear-and-reappear in dynamic environments.

Equilibrium World Models

econ.GN · 2026-06-22 · unverdicted · novelty 7.0

Equilibrium World Models are a deep-learning solver that enforces exact equilibrium conditions on broad model-generated state distributions to globally solve dynamic stochastic models featuring rare disasters, binding constraints, and counterfactual states.

Distilling a Modular Reservoir Through a Genomic Bottleneck

cs.NE · 2026-06-20 · unverdicted · novelty 7.0

Hypernetworks distill modular reservoir connectivity via a genomic bottleneck to generate sparse recurrent networks solving difficult temporal tasks with minimal training and maintained robustness.

Beyond the Next Step: Variable-Length Latent World Models for Long-Horizon Planning

cs.LG · 2026-06-19 · unverdicted · novelty 7.0

VLWMs learn variable-length action-conditioned dynamics in latent space with curriculum training, yielding 13% average gains over prior latent world models on long-horizon tasks.

PreAct: Computer-Using Agents that Get Faster on Repeated Tasks

cs.AI · 2026-06-16 · unverdicted · novelty 7.0

PreAct compiles successful agent executions into verifiable state-machine programs for 8.5-13x faster replay on repeated tasks, with an independent evaluator check before storing each program.

VISA: VLM-Guided Instance Semantic Auditing for 3D Occupancy World Models

cs.CV · 2026-06-11 · unverdicted · novelty 7.0

VISA improves closed-set 3D occupancy mIoU on nuScenes by using VLM instance audits as reliability-weighted semantic supervisors during training of existing world models.

World Model Self-Distillation: Training World Models to Solve General Tasks

cs.CV · 2026-06-10 · unverdicted · novelty 7.0

Self-distillation from a caption-conditioned video diffusion model to an image-and-prompt-conditioned executor, enhanced by RL from VLM feedback, enables task solving in world models.

Dream.exe: Can Video Generation Models Dream Executable Robot Manipulation?

cs.CV · 2026-06-03 · unverdicted · novelty 7.0

Dream.exe evaluates 8 video generation models on 101 manipulation tasks by converting generated videos into executable robot trajectories in a simulator, finding measurable success rates that visual metrics do not predict.

SVI-Bench: A Dynamic Microworld for Strategic Video Intelligence

cs.CV · 2026-05-29 · unverdicted · novelty 7.0 · 2 refs

SVI-Bench provides 35K hours of sports video with 9 tasks across four cognitive levels, revealing models drop from ~74% on action QA to 5% on agentic evidence integration.

YoCausal: How Far is Video Generation from World Model? A Causality Perspective

cs.CV · 2026-05-28 · unverdicted · novelty 7.0

YoCausal benchmark shows video diffusion models detect the arrow of time but lack genuine causal understanding relative to humans.

Benchmarking Single-Factor Physical Video-to-Audio Generation

cs.CV · 2026-05-28 · unverdicted · novelty 7.0

FlatSounds benchmark shows state-of-the-art V2A models rely more on text captions than visual input for physical and semantic accuracy, with captions improving correctness but degrading temporal alignment.

Do Language Models Need Sleep? Offline Recurrence for Improved Online Inference

cs.CL · 2026-05-25 · unverdicted · novelty 7.0

A sleep mechanism with N offline recurrent passes consolidates context into fast weights, improving performance on reasoning tasks where standard transformers fail.

WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation

cs.CV · 2026-05-25 · unverdicted · novelty 7.0

WBench is a benchmark with 289 test cases and 1,058 turns for evaluating interactive world models using 22 automated metrics validated against human judgments.

UWM-JEPA: Predictive World Models That Imagine in Belief Space

cs.LG · 2026-05-25 · unverdicted · novelty 7.0

UWM-JEPA uses a density-matrix latent and unitary predictor in JEPA to preserve joint-state spectrum during blind rollouts, achieving 0.77 accuracy on a five-step hidden-velocity task versus 0.53 for an LSTM baseline.

Beyond Generative Priors: Minority Sampling with JEPA-Guided Diffusion

cs.LG · 2026-05-23 · unverdicted · novelty 7.0

JEPA guidance steers diffusion models toward low-density regions under an implicit density from a world model, producing minority samples with improved fidelity and semantic validity over generator-centric baselines.

SliceWorld: A Predictive and Controllable World-State Model for CT Report Generation

cs.CV · 2026-05-23 · unverdicted · novelty 7.0

SliceWorld introduces a world-state model for CT report generation that uses predictive and factor-aware objectives on axial slice sequences.

CRONOS: Benchmarking Counterfactual Physical Consistency in Video Models

cs.CV · 2026-05-22 · unverdicted · novelty 7.0

CRONOS benchmark shows recent open-source video generators fail to preserve physical consistency under controlled changes to viewpoint, scene, object category, and appearance.

citing papers explorer

Showing 50 of 264 citing papers.

Physically Native World Models: A Hamiltonian Perspective on Generative World Modeling cs.AI · 2026-05-01 · unverdicted · none · ref 8 · 3 links · internal anchor
The paper introduces Hamiltonian World Models by encoding observations into structured latent phase space and evolving states via Hamiltonian-inspired dynamics for physically meaningful rollouts in embodied AI.
SPLICE: Latent Diffusion over JEPA Embeddings for Conformal Time-Series Inpainting cs.LG · 2026-04-30 · unverdicted · none · ref 13 · internal anchor
SPLICE couples JEPA-based latent diffusion with adaptive conformal inference to deliver accurate time-series inpainting with 93-95% empirical coverage on load datasets.
Toward a Science of Intent: Closure Gaps and Delegation Envelopes for Open-World AI Agents cs.AI · 2026-04-27 · unverdicted · none · ref 6 · internal anchor
Intent compilation turns vague human goals into verifiable artifacts, using closure-gap vectors and delegation envelopes to separate open-world agent challenges from closed-world solvers and to benchmark closure fixes against extra search.
Cortex 2.0: Grounding World Models in Real-World Industrial Deployment cs.RO · 2026-04-22 · unverdicted · none · ref 24 · internal anchor
Cortex 2.0 introduces world-model-based planning that generates and scores future trajectories to outperform reactive vision-language-action baselines on industrial robotic tasks including pick-and-place, sorting, and unpacking.
CausalVAE as a Plug-in for World Models: Towards Reliable Counterfactual Dynamics cs.LG · 2026-04-09 · unverdicted · none · ref 5 · internal anchor
CausalVAE plug-in for world models preserves factual prediction and boosts counterfactual retrieval, with large gains on physics benchmarks and recovered physical interaction trends.
Neural Computers cs.LG · 2026-04-07 · unverdicted · none · ref 12 · internal anchor
Neural Computers are introduced as a new machine form where computation, memory, and I/O are unified in a learned runtime state, with initial video-model experiments showing acquisition of basic interface primitives from traces.
Designing Digital Humans with Ambient Intelligence cs.HC · 2026-04-06 · unverdicted · none · ref 78 · internal anchor
Integrating ambient intelligence with digital humans creates context-aware virtual agents capable of anticipatory assistance based on the user's surroundings.
A Model of Understanding in Deep Learning Systems cs.AI · 2026-04-05 · unverdicted · none · ref 3 · internal anchor
Deep learning systems achieve systematic understanding through internal models tracking regularities but exhibit fractured understanding due to symbolic misalignment, lack of explicit reduction, and weak unification.
UI-Oceanus: Scaling GUI Agents with Synthetic Environmental Dynamics cs.LG · 2026-02-11 · unverdicted · none · ref 16 · internal anchor
UI-Oceanus shows that continual pre-training on forward dynamics predictions from synthetic GUI exploration improves agent success rates by 7% offline and 16.8% online, with gains scaling by data volume.
Cloning Deterministic Worlds: The Critical Role of Latent Geometry in Long-Horizon World Models cs.LG · 2025-10-30 · unverdicted · none · ref 8 · internal anchor
GRWM uses temporal contrastive learning to geometrically regularize latent spaces in world models for high-fidelity cloning of deterministic 3D worlds.
Bio-Inspired Topological Autonomous Navigation with Active Inference in Robotics cs.RO · 2025-08-10 · unverdicted · none · ref 25 · internal anchor
An active-inference agent builds real-time topological maps and plans adaptive trajectories for exploration and goal-reaching in robotics without pre-training.
Is This Just Fantasy? Language Model Representations Reflect Human Judgments of Event Plausibility cs.CL · 2025-07-16 · unverdicted · none · ref 3 · internal anchor
Language models encode modal categories via linear difference vectors in their activations that predict fine-grained human plausibility judgments better than prior reports suggested.
WorldVLA: Towards Autoregressive Action World Model cs.RO · 2025-06-26 · unverdicted · none · ref 12 · internal anchor
WorldVLA unifies VLA and world models in one autoregressive system, shows they boost each other, and adds an attention mask to stop error buildup when generating action chunks.
EvolvingAgent: Curriculum Self-evolving Agent with Continual World Model for Long-Horizon Tasks cs.RO · 2025-02-09 · unverdicted · none · ref 5 · internal anchor
EvolvingAgent autonomously completes long-horizon tasks via a closed-loop planner-controller-reflector system with continual world model updates, reporting 111.74% higher success rates than baselines in Minecraft and human-level Atari performance.
The Platonic Representation Hypothesis cs.LG · 2024-05-13 · unverdicted · none · ref 245 · internal anchor
Representations learned by large AI models are converging toward a shared statistical model of reality.
World Model on Million-Length Video And Language With Blockwise RingAttention cs.LG · 2024-02-13 · unverdicted · none · ref 13 · internal anchor
Presents open-source 7B models for million-token video and language understanding via Blockwise RingAttention, setting new benchmarks in retrieval and long video tasks.
Supervise Thyself: Examining Self-Supervised Representations in Interactive Environments cs.LG · 2019-06-27 · unverdicted · none · ref 12 · internal anchor
Empirical comparison finds that self-supervised representations vary in capturing agent state and generalizing to new levels or textures depending on environment visuals and dynamics.
DynoPlan: Combining Motion Planning and Deep Neural Network based Controllers for Safe HRL cs.RO · 2019-06-24 · unverdicted · none · ref 4 · internal anchor
DynoPlan adds dynamics models and a demonstration-derived heuristic to the options framework so that hierarchical RL can switch between motion planning and DNN controllers via short-horizon model-predictive evaluation.
Shaping Belief States with Generative Environment Models for RL cs.LG · 2019-06-21 · unverdicted · none · ref 30 · internal anchor
Multi-step predictive generative models form stable belief states capturing environment layout and agent pose, yielding higher data efficiency on RL tasks than model-free agents.
Bridge-WA: Predicting Where and How the World Changes for Robotic Action cs.RO · 2026-07-02 · unverdicted · none · ref 16 · internal anchor
Bridge-WA introduces a lightweight distillation-based world-action model that uses future-change priors to improve robotic task success and robustness without deployment-time dense rollouts.
Evolving Intelligent Complex Systems via Intellicise Networks: Architecture, Technologies, and Pathways eess.SP · 2026-07-01 · unverdicted · none · ref 118 · internal anchor
Proposes a cross-layer intellicise network architecture grounded in multiple theories to support intelligent complex systems, with reviews of enabling technologies and a case study.
Understanding Rollout Error in Graph World Models cs.AI · 2026-06-26 · unverdicted · none · ref 9 · internal anchor
Develops graph rollout bounds separating topology and model error sources and proposes Error-Aware GWM with spectral regularization and consistency terms for dynamic graphs.
Risk-Aware Selective Multimodal Driver Monitoring with Driver-State World Modeling cs.RO · 2026-06-25 · unverdicted · none · ref 11 · internal anchor
A cost-aware selective inference framework combines a lightweight multimodal student model and driver-state world modeling to reduce unsafe false negatives in driver monitoring while keeping low latency.
A Compositional Framework for Open-ended Intelligence cs.LG · 2026-06-13 · unverdicted · none · ref 34 · internal anchor
Open-ended intelligence is formalized as the compositional closure L(P,C) of primitives P under operators C, with next primitive prediction proposed as an objective to acquire reusable primitives and grammar for lifelong adaptation.
Detecting Explanatory Insufficiency in Learned Representations: A Framework for Representational Vigilance cs.LG · 2026-06-11 · unverdicted · none · ref 5 · internal anchor
Proposes the VER framework as a diagnostic sequence for identifying explanatory insufficiency in learned representations, distinguishing it from standard errors and shifts.
EWAM: An Enhanced World Action Model for Closed-Loop Online Adaptation in Embodied Intelligence cs.RO · 2026-06-10 · unverdicted · none · ref 1 · internal anchor
EWAM adds four integrated neural modules for inference-time co-reasoning and anomaly handling in a frozen Cosmos3 backbone to reduce deployment data needs under zero-shot protocols.
InternVideo3: Agentify Foundation Models with Multimodal Contextual Reasoning cs.CV · 2026-06-10 · unverdicted · none · ref 5 · internal anchor
InternVideo3 introduces Multimodal Contextual Reasoning and M^2LA attention to enable closed-loop evidence accumulation in long-video understanding and agentic tool use, reporting strong benchmark results.
Bootstrap Theory of Representational Emergence: Explanatory Insufficiency as a Driver of Representation Learning and World Models cs.LG · 2026-06-05 · unverdicted · none · ref 16 · internal anchor
TBER describes representational emergence as a five-stage bootstrap process triggered by explanatory insufficiency in AI, biology, and science.
Towards World Models in Biomedical Research cs.AI · 2026-06-04 · unverdicted · none · ref 30 · internal anchor
Proposes biomedical world models that learn latent states and intervention-conditioned dynamics to enable simulation of future biological trajectories for discovery in virtual cells, organoids, patients, and surgery.
Behavior-Invariant Task Representation Learning with Transformer-based World Models for Offline Meta-Reinforcement Learning cs.LG · 2026-05-30 · unverdicted · none · ref 5 · internal anchor
The work introduces behavior-invariant latent task representations via information-theoretic learning in a Transformer world model plus conservative penalties on imagined rollouts to improve generalization in offline meta-RL.
Toward AI That Understands Self and Others: A World-Model Theory of Cognitive Diversity and Alignment cs.AI · 2026-05-28 · unverdicted · none · ref 43 · internal anchor
The paper introduces the Multi-Phase Inference Assumption and Mechanism to frame cognitive diversity as arising from constrained construction of sufficient statistics and defines alignment as processability between heterogeneous world models via alignment maps and transformation loss.
Affective Music Recommendation: A Rollout-Based World Model for Offline Preference Optimization cs.LG · 2026-05-27 · unverdicted · none · ref 6 · internal anchor
AMRS deploys a rollout-based causal transformer world model for offline DPO-based affective music recommendation under cold-start conditions on health platforms.
Can Predicted Dynamics Exist in the Physical World? cs.RO · 2026-05-23 · unverdicted · none · ref 1 · internal anchor
Physical admissibility is defined as a prediction-control interface using kinematic, dynamic, and composed-horizon conditions to reject invalid dynamics proposals, with AUC 0.957 on LeRobot PushT and 87-89% prevention of invalid actions in interventions.
LASAR: Towards Spatio-temporal Reasoning with Latent Cognitive Map cs.CV · 2026-05-16 · unverdicted · none · ref 14 · internal anchor
LASAR pairs a dual-memory system with spatio-temporal contrastive learning to induce latent cognitive maps, reporting 2-3.5% zero-shot gains on VLN-CE and VSI-Bench plus high map self-consistency.
EfficientTDMPC: Improved MPC Objectives for Sample-Efficient Continuous Control cs.LG · 2026-05-15 · unverdicted · none · ref 32 · internal anchor
EfficientTDMPC extends the TD-MPC family with model ensembles, return averaging, and uncertainty penalties to reach SOTA sample efficiency on hard continuous control benchmarks in low-data regimes.
Agentifying Patient Dynamics within LLMs through Interacting with Clinical World Model cs.AI · 2026-05-14 · unverdicted · none · ref 21 · internal anchor
SepsisAgent is a world-model-augmented LLM agent trained via supervised fine-tuning, behavior cloning, and agentic RL that outperforms RL and LLM baselines on MIMIC-IV sepsis trajectories in off-policy value and safety metrics.
Why We Need World Models for AGI: Where LLMs Fail and How World Models May Outperform cs.AI · 2026-05-13 · unverdicted · none · ref 24 · internal anchor
In the Flux environment, RL agents with explicit latent state access achieve ~79% win rate versus ~11% for LLMs on long-horizon tasks, illustrating limitations of sequence prediction for dynamic reasoning.
Position: agentic AI orchestration should be Bayes-consistent cs.AI · 2026-05-01 · unverdicted · none · ref 8 · internal anchor
Agentic AI orchestration should apply Bayesian principles for belief maintenance, updating from interactions, and utility-based action selection.
A Co-Evolutionary Theory of Human-AI Coexistence: Mutualism, Governance, and Dynamics in Complex Societies cs.CY · 2026-04-24 · unverdicted · none · ref 15 · internal anchor
Human-AI coexistence is best modeled as conditional mutualism under governance, formalized as a multiplex dynamical system whose simulations show stable high-coexistence equilibria only under balanced institutional oversight.
The Global Neural World Model: Spatially Grounded Discrete Topologies for Action-Conditioned Planning cs.LG · 2026-04-17 · unverdicted · none · ref 5 · internal anchor
GNWM maps environments to a discrete 2D grid with snapping to stabilize autoregressive planning and learns generalized dynamics from maximum-entropy random walks.
Dyadic Partnership(DP): A Missing Link Towards Full Autonomy in Medical Robotics cs.RO · 2026-04-13 · unverdicted · none · ref 16 · internal anchor
The paper introduces Dyadic Partnership (DP) as an intermediate paradigm for robot-clinician collaboration that uses foundation models and multi-modal interfaces to enable safer gradual progress toward autonomous medical robotics.
Event-Centric World Modeling with Memory-Augmented Retrieval for Embodied Decision-Making cs.LG · 2026-04-08 · unverdicted · none · ref 10 · internal anchor
An event-centric framework encodes environments as semantic events and retrieves weighted prior maneuvers from a knowledge bank to enable interpretable, physics-aware decision-making for UAVs.
Advancing Open-source World Models cs.CV · 2026-01-28 · unverdicted · none · ref 24 · internal anchor
LingBot-World is presented as an open-source world model that delivers high-fidelity simulation, minute-level contextual consistency, and real-time interactivity under one second latency.
World Simulation with Video Foundation Models for Physical AI cs.CV · 2025-10-28 · unverdicted · none · ref 27 · internal anchor
Cosmos-Predict2.5 unifies text-to-world, image-to-world, and video-to-world generation in one model trained on 200M clips with RL post-training, delivering improved quality and control for physical AI.
Edge Case Detection in Automated Driving: Methods, Challenges and Future Directions cs.RO · 2024-10-11 · unverdicted · none · ref 133 · internal anchor
The paper delivers a two-level hierarchical classification of edge case detection methods in automated driving, covering AV modules and methodologies, plus evaluation metrics and open challenges.
Convolutional Reservoir Computing for World Models cs.LG · 2019-07-18 · unverdicted · none · ref 10 · internal anchor
RCRC uses untrained random CNNs and reservoir computing plus evolution strategies to reach claimed state-of-the-art scores in reinforcement learning tasks while avoiding data storage and heavy training.
Situation Perception: A Necessary Primitive to Artificial Superintelligence cs.CY · 2026-06-29 · unverdicted · none · ref 10 · internal anchor
Situation perception is proposed as a necessary primitive for artificial superintelligence, requiring abstract prediction, long-term compressed memory, and objective-guided active learning.
DreamForge-World 0.1 Preview: A Low-Compute Real-Time Controllable World Model cs.LG · 2026-06-29 · unverdicted · none · ref 1 · internal anchor
A preview system demonstrates real-time controllable world modeling at 14-15 FPS on RTX 4090 by adapting open video backbones with action pathways for keyboard/mouse control and multimodal features.
Building a Scalable, Reproducible, Evaluatable, and Closed-Loop Simulation Environment Foundation for Embodied Intelligence cs.RO · 2026-06-26 · unverdicted · none · ref 5 · 2 links · internal anchor
Presents a four-layer cloud-native framework for scalable, reproducible simulation-based training and evaluation in embodied AI.
World Models: A Comprehensive Survey of Architectures, Methodologies, Reasoning Paradigms, and Applications cs.LG · 2026-05-28 · unverdicted · none · ref 33 · internal anchor
The paper delivers a multi-axis taxonomy for world models that maps architectures, training families, reasoning strategies, and domains from early cognitive foundations through systems such as Dreamer, MuZero, and Sora while noting evaluation gaps.

World Models

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer