hub

World Models

· 2018 · cs.LG · arXiv 1803.10122

75 Pith papers cite this work. Polarity classification is still indexing.

75 Pith papers citing it

open full Pith review browse 75 citing papers arXiv PDF

abstract

We explore building generative neural network models of popular reinforcement learning environments. Our world model can be trained quickly in an unsupervised manner to learn a compressed spatial and temporal representation of the environment. By using features extracted from the world model as inputs to an agent, we can train a very compact and simple policy that can solve the required task. We can even train our agent entirely inside of its own hallucinated dream generated by its world model, and transfer this policy back into the actual environment. An interactive version of this paper is available at https://worldmodels.github.io/

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3

citation-polarity summary

background 3

claims ledger

abstract We explore building generative neural network models of popular reinforcement learning environments. Our world model can be trained quickly in an unsupervised manner to learn a compressed spatial and temporal representation of the environment. By using features extracted from the world model as inputs to an agent, we can train a very compact and simple policy that can solve the required task. We can even train our agent entirely inside of its own hallucinated dream generated by its world model, and transfer this policy back into the actual environment. An interactive version of this paper is

co-cited works

representative citing papers

From Generalist to Specialist Representation

cs.LG · 2026-05-12 · unverdicted · novelty 8.0

Task structure is identifiable across time steps and task-relevant representations are identifiable within steps in a nonparametric setting under sparsity regularization.

EgoMemReason: A Memory-Driven Reasoning Benchmark for Long-Horizon Egocentric Video Understanding

cs.CV · 2026-05-11 · unverdicted · novelty 8.0

EgoMemReason is a new benchmark showing that even the best multimodal models achieve only 39.6% accuracy on reasoning tasks that require integrating sparse evidence across days in egocentric video.

Learning POMDP World Models from Observations with Language-Model Priors

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

Pinductor leverages language-model priors to learn POMDP world models from limited trajectories, matching privileged-access methods in performance and exceeding tabular baselines in sample efficiency.

JEDI: Joint Embedding Diffusion World Model for Online Model-Based Reinforcement Learning

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

JEDI is the first online end-to-end latent diffusion world model that trains latents from denoising loss rather than reconstruction, achieving competitive Atari100k results with 43% less VRAM and over 3x faster sampling than pixel diffusion baselines.

Runtime Monitoring of Perception-Based Autonomous Systems via Embedding Temporal Logic

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

Embedding Temporal Logic enables runtime monitoring of temporally extended perceptual behaviors by defining predicates via distances between observed and reference embeddings in learned spaces, with conformal calibration for reliable evaluation.

The Gordian Knot for VLMs: Diagrammatic Knot Reasoning as a Hard Benchmark

cs.AI · 2026-05-11 · unverdicted · novelty 7.0

KnotBench benchmark shows state-of-the-art VLMs perform near random on diagrammatic knot reasoning tasks and lack ability to simulate structural moves.

SYNCR: A Cross-Video Reasoning Benchmark with Synthetic Grounding

cs.CV · 2026-05-08 · unverdicted · novelty 7.0

SYNCR benchmark shows leading MLLMs reach only 52.5% average accuracy on cross-video reasoning tasks against an 89.5% human baseline, with major weaknesses in physical and spatial reasoning.

Learning Visual Feature-Based World Models via Residual Latent Action

cs.CV · 2026-05-08 · unverdicted · novelty 7.0

RLA-WM predicts residual latent actions via flow matching to create visual feature world models that outperform prior feature-based and diffusion approaches while enabling offline video-based robot RL.

Operator-Guided Invariance Learning for Continuous Reinforcement Learning

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

VPSD-RL discovers exact and approximate value-preserving Lie-group operators in continuous RL to stabilize learning via transition augmentation and consistency regularization.

Render, Don't Decode: Weight-Space World Models with Latent Structural Disentanglement

cs.CV · 2026-05-07 · unverdicted · novelty 7.0 · 2 refs

NOVA represents world states as INR weights for decoder-free rendering, compactness, and unsupervised disentanglement of background, foreground, and motion in video world models.

Dream-MPC: Gradient-Based Model Predictive Control with Latent Imagination

cs.LG · 2026-05-06 · unverdicted · novelty 7.0

Dream-MPC boosts underlying policies on 24 continuous control tasks by optimizing policy-generated trajectories with gradient ascent, uncertainty regularization, and temporal amortization inside a latent world model.

Counterfactual identifiability beyond global monotonicity: non-monotone triangular structural causal models

cs.LG · 2026-05-06 · unverdicted · novelty 7.0

Non-monotone triangular SCMs with mechanism-wise invertibility and context-independent inverse transport are equivalent to exogenous isomorphism and achieve complete counterfactual identifiability, with supporting experiments on synthetic data and MuJoCo tasks.

Latent State Design for World Models under Sufficiency Constraints

cs.AI · 2026-05-03 · unverdicted · novelty 7.0

World models succeed when their latent states are built to meet task-specific sufficiency constraints rather than preserving the maximum amount of information.

Graph World Models: Concepts, Taxonomy, and Future Directions

cs.AI · 2026-04-30 · unverdicted · novelty 7.0

The paper unifies emerging graph-based world models under a new paradigm and proposes a taxonomy organized by spatial, physical, and logical relational inductive biases.

Exploring Spatial Intelligence from a Generative Perspective

cs.CV · 2026-04-22 · unverdicted · novelty 7.0

Fine-tuning multimodal models on a new synthetic spatial benchmark improves generative spatial compliance on real and synthetic tasks and transfers to better spatial understanding.

Curiosity-Critic: Cumulative Prediction Error Improvement as a Tractable Intrinsic Reward for World Model Training

cs.LG · 2026-04-20 · unverdicted · novelty 7.0

Curiosity-Critic rewards the improvement in cumulative prediction error via a tractable per-step surrogate (current error minus learned asymptotic baseline), outperforming prior curiosity methods in a stochastic grid world.

GTASA: Ground Truth Annotations for Spatiotemporal Analysis, Evaluation and Training of Video Models

cs.CV · 2026-04-12 · unverdicted · novelty 7.0

GTASA supplies annotated multi-actor videos with exact 3D spatial and temporal ground truth that outperforms neural video generators in physical and semantic validity while enabling new probes of video encoders.

EgoTL: Egocentric Think-Aloud Chains for Long-Horizon Tasks

cs.CV · 2026-04-10 · unverdicted · novelty 7.0

EgoTL provides a new egocentric dataset with think-aloud chains and metric labels that benchmarks VLMs on long-horizon tasks and improves their planning, reasoning, and spatial grounding after finetuning.

MotionScape: A Large-Scale Real-World Highly Dynamic UAV Video Dataset for World Models

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

MotionScape is a large-scale UAV video dataset with highly dynamic 6-DoF motions, geometric trajectories, and semantic annotations to train world models that better simulate complex 3D dynamics under large viewpoint changes.

Mastering Diverse Domains through World Models

cs.AI · 2023-01-10 · unverdicted · novelty 7.0

DreamerV3 uses world models and robustness techniques to solve over 150 tasks across domains with a single configuration, including Minecraft diamond collection from scratch.

Dream to Control: Learning Behaviors by Latent Imagination

cs.LG · 2019-12-03 · accept · novelty 7.0

Dreamer learns to control from images by imagining and optimizing behaviors in a learned latent world model, outperforming prior methods on 20 visual tasks in data efficiency and final performance.

PriorZero: Bridging Language Priors and World Models for Decision Making

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

PriorZero uses root-only LLM prior injection in MCTS and alternating world-model training with LLM fine-tuning to raise exploration efficiency and final performance on Jericho text games and BabyAI gridworlds.

WorldComp2D: Spatio-semantic Representations of Object Identity and Location from Local Views

cs.CV · 2026-05-12 · unverdicted · novelty 6.0

WorldComp2D explicitly structures latent space geometry by object identity and spatial proximity via a proximity-dependent encoder and localizer, cutting parameters up to 4X and FLOPs 2.2X versus state-of-the-art lightweight models on facial landmark localization while staying real-time on CPU.

Network-Efficient World Model Token Streaming

cs.RO · 2026-05-11 · unverdicted · novelty 6.0

An adaptive delta-prioritization algorithm using cosine distance and Hamming-drift thresholds improves embedding distortion by 4.8-7.2% and next-token perplexity by 2.1-6.3% over periodic keyframing at matched low bitrates for tokenized driving world models.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Designing Digital Humans with Ambient Intelligence cs.HC · 2026-04-06 · unverdicted · none · ref 78 · internal anchor
Integrating ambient intelligence with digital humans creates context-aware virtual agents capable of anticipatory assistance based on the user's surroundings.

World Models

hub tools

citation-role summary

citation-polarity summary

claims ledger

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer