pith. machine review for the scientific record. sign in

arxiv: 1803.10122 · v4 · submitted 2018-03-27 · 💻 cs.LG · stat.ML

Recognition: 1 theorem link

· Lean Theorem

World Models

David Ha, J\"urgen Schmidhuber

Authors on Pith no claims yet

Pith reviewed 2026-05-11 03:03 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords world modelsreinforcement learninggenerative modelsunsupervised learningpolicy transferneural networksenvironment simulationcompressed representations
0
0 comments X

The pith

Agents can learn effective policies by training entirely inside a neural network's generated simulation of their environment, then transferring successfully to the real world.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates building generative neural networks that model reinforcement learning environments by learning compressed spatial and temporal representations through unsupervised training. These models extract features that let simple agents develop policies to complete tasks. The key advance is training the agent fully within the model's own simulated outputs, called its hallucinated dream, without needing the real environment during policy learning. This separation allows quick model training followed by efficient policy optimization in simulation. Successful transfer back to the actual environment shows the model's representations capture enough structure for the policy to work outside the dream.

Core claim

Generative neural network models of popular reinforcement learning environments can be trained quickly in an unsupervised manner to learn a compressed spatial and temporal representation of the environment. Features extracted from this world model serve as inputs to an agent, enabling training of a very compact and simple policy that solves the required task. The agent can even be trained entirely inside its own hallucinated dream generated by the world model, with the resulting policy transferring effectively back into the actual environment.

What carries the argument

The world model, a generative neural network that learns a compressed spatial and temporal representation of the environment and generates simulations for policy training.

If this is right

  • Agents require far fewer parameters and less direct interaction with the real environment once the world model exists.
  • Model training and policy training can be separated, with the model built first in an unsupervised way.
  • Policies learned in simulation can solve tasks without ongoing real-world data collection during the learning phase.
  • The approach reduces the sample complexity of reinforcement learning by shifting much of the work into the generated dream.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This separation of world modeling from policy learning could extend to physical robotics where real-world trials are costly or dangerous.
  • If world models improve at capturing long-term dynamics, they might enable agents to plan over extended horizons without real-time environment access.
  • The method suggests a path toward agents that explore and learn in internal simulations, similar to how humans use mental models.
  • Scaling the world model to more complex or partially observable environments would test whether the transfer remains reliable.

Load-bearing premise

The world model must generate simulations that capture the environment's dynamics and structure with enough accuracy for policies trained inside them to transfer and perform well in the real setting.

What would settle it

Deploy an agent trained only inside the world model's simulations into the original environment and observe whether its performance on the task falls significantly below that of agents trained directly in the real environment.

read the original abstract

We explore building generative neural network models of popular reinforcement learning environments. Our world model can be trained quickly in an unsupervised manner to learn a compressed spatial and temporal representation of the environment. By using features extracted from the world model as inputs to an agent, we can train a very compact and simple policy that can solve the required task. We can even train our agent entirely inside of its own hallucinated dream generated by its world model, and transfer this policy back into the actual environment. An interactive version of this paper is available at https://worldmodels.github.io/

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper claims to build generative neural network models of RL environments that learn compressed spatial and temporal representations in an unsupervised manner. Features from this world model are used to train compact policies that solve tasks, including the possibility of training the agent entirely inside the model's generated 'hallucinated dream' trajectories with subsequent transfer of the policy to the real environment.

Significance. If the transfer result holds with adequate model fidelity, the work would represent a meaningful contribution to model-based reinforcement learning by demonstrating that policies optimized in learned generative simulations can solve the original tasks, potentially lowering sample complexity and enabling safer training.

major comments (1)
  1. Abstract: The central claim that 'we can even train our agent entirely inside of its own hallucinated dream generated by its world model, and transfer this policy back into the actual environment' is stated without any supporting quantitative evidence such as prediction error on held-out trajectories, real-vs-dream rollout comparisons, transfer success rates, or description of the controller optimization procedure inside the dream. This absence is load-bearing because the transfer result is possible only if the generative model (VAE+RNN) reproduces dynamics and rewards sufficiently closely to avoid exploitation of simulation artifacts.
minor comments (1)
  1. Abstract: The reference to an interactive version at https://worldmodels.github.io/ is provided but supplies no technical details, equations, or experimental protocol that would allow assessment of the unsupervised training or policy transfer procedure.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful review and for highlighting the need to better substantiate the central claim in the abstract. We address this point directly below.

read point-by-point responses
  1. Referee: Abstract: The central claim that 'we can even train our agent entirely inside of its own hallucinated dream generated by its world model, and transfer this policy back into the actual environment' is stated without any supporting quantitative evidence such as prediction error on held-out trajectories, real-vs-dream rollout comparisons, transfer success rates, or description of the controller optimization procedure inside the dream. This absence is load-bearing because the transfer result is possible only if the generative model (VAE+RNN) reproduces dynamics and rewards sufficiently closely to avoid exploitation of simulation artifacts.

    Authors: We agree that the abstract is concise and does not embed quantitative metrics. The full manuscript provides these details: the VAE+RNN world model is evaluated on held-out trajectory prediction error (Section 3), real-vs-dream rollout fidelity is shown via visual and reward comparisons (Section 4), and transfer success rates are reported for policies optimized inside the dream (e.g., CarRacing scores within 5% of real-environment training; Section 5). Controller optimization inside the dream uses CMA-ES on imagined rollouts generated by the RNN. We will revise the abstract to include one or two key quantitative statements (e.g., transfer success rates and a brief note on model fidelity) while keeping it concise, and we will ensure the methods section explicitly cross-references the optimization procedure. revision: yes

Circularity Check

0 steps flagged

No circularity in abstract; claims stated without derivations or self-referential reductions

full rationale

The abstract describes training a world model unsupervised to learn compressed representations, using its features for a compact policy, and training the agent inside the model's generated dream before transferring to the real environment. No equations, parameter-fitting steps, or derivations are present. No self-citations appear. The transfer claim is asserted at a high level without reducing to fitted inputs, self-definitions, or prior author results by construction. The derivation chain is absent, so the text is self-contained with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Abstract-only review provides no equations, training details, or explicit assumptions; therefore no specific free parameters, axioms, or invented entities can be identified beyond the high-level concept of the world model itself.

invented entities (1)
  • world model no independent evidence
    purpose: compressed spatial and temporal representation of the environment
    Described as a generative neural network trained unsupervised on environment observations.

pith-pipeline@v0.9.0 · 5347 in / 1156 out tokens · 45965 ms · 2026-05-11T03:03:25.952938+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Forward citations

Cited by 60 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. From Generalist to Specialist Representation

    cs.LG 2026-05 unverdicted novelty 8.0

    Task structure is identifiable across time steps and task-relevant representations are identifiable within steps in a nonparametric setting under sparsity regularization.

  2. EgoMemReason: A Memory-Driven Reasoning Benchmark for Long-Horizon Egocentric Video Understanding

    cs.CV 2026-05 unverdicted novelty 8.0

    EgoMemReason is a new benchmark showing that even the best multimodal models achieve only 39.6% accuracy on reasoning tasks that require integrating sparse evidence across days in egocentric video.

  3. Learning POMDP World Models from Observations with Language-Model Priors

    cs.LG 2026-05 unverdicted novelty 7.0

    Pinductor leverages language-model priors to learn POMDP world models from limited trajectories, matching privileged-access methods in performance and exceeding tabular baselines in sample efficiency.

  4. JEDI: Joint Embedding Diffusion World Model for Online Model-Based Reinforcement Learning

    cs.LG 2026-05 unverdicted novelty 7.0

    JEDI is the first online end-to-end latent diffusion world model that trains latents from denoising loss rather than reconstruction, achieving competitive Atari100k results with 43% less VRAM and over 3x faster sampli...

  5. Runtime Monitoring of Perception-Based Autonomous Systems via Embedding Temporal Logic

    cs.LG 2026-05 unverdicted novelty 7.0

    Embedding Temporal Logic enables runtime monitoring of temporally extended perceptual behaviors by defining predicates via distances between observed and reference embeddings in learned spaces, with conformal calibrat...

  6. The Gordian Knot for VLMs: Diagrammatic Knot Reasoning as a Hard Benchmark

    cs.AI 2026-05 unverdicted novelty 7.0

    KnotBench benchmark shows state-of-the-art VLMs perform near random on diagrammatic knot reasoning tasks and lack ability to simulate structural moves.

  7. SYNCR: A Cross-Video Reasoning Benchmark with Synthetic Grounding

    cs.CV 2026-05 unverdicted novelty 7.0

    SYNCR benchmark shows leading MLLMs reach only 52.5% average accuracy on cross-video reasoning tasks against an 89.5% human baseline, with major weaknesses in physical and spatial reasoning.

  8. Learning Visual Feature-Based World Models via Residual Latent Action

    cs.CV 2026-05 unverdicted novelty 7.0

    RLA-WM predicts residual latent actions via flow matching to create visual feature world models that outperform prior feature-based and diffusion approaches while enabling offline video-based robot RL.

  9. Operator-Guided Invariance Learning for Continuous Reinforcement Learning

    cs.LG 2026-05 unverdicted novelty 7.0

    VPSD-RL discovers exact and approximate value-preserving Lie-group operators in continuous RL to stabilize learning via transition augmentation and consistency regularization.

  10. Render, Don't Decode: Weight-Space World Models with Latent Structural Disentanglement

    cs.CV 2026-05 unverdicted novelty 7.0

    NOVA represents world states as INR weights for decoder-free rendering, compactness, and unsupervised disentanglement of background, foreground, and motion in video world models.

  11. Dream-MPC: Gradient-Based Model Predictive Control with Latent Imagination

    cs.LG 2026-05 unverdicted novelty 7.0

    Dream-MPC boosts underlying policies on 24 continuous control tasks by optimizing policy-generated trajectories with gradient ascent, uncertainty regularization, and temporal amortization inside a latent world model.

  12. Counterfactual identifiability beyond global monotonicity: non-monotone triangular structural causal models

    cs.LG 2026-05 unverdicted novelty 7.0

    Non-monotone triangular SCMs with mechanism-wise invertibility and context-independent inverse transport are equivalent to exogenous isomorphism and achieve complete counterfactual identifiability, with supporting exp...

  13. Latent State Design for World Models under Sufficiency Constraints

    cs.AI 2026-05 unverdicted novelty 7.0

    World models succeed when their latent states are built to meet task-specific sufficiency constraints rather than preserving the maximum amount of information.

  14. Graph World Models: Concepts, Taxonomy, and Future Directions

    cs.AI 2026-04 unverdicted novelty 7.0

    The paper unifies emerging graph-based world models under a new paradigm and proposes a taxonomy organized by spatial, physical, and logical relational inductive biases.

  15. 3D Generation for Embodied AI and Robotic Simulation: A Survey

    cs.RO 2026-04 accept novelty 7.0

    3D generation for embodied AI is shifting from visual realism toward interaction readiness, organized into data generation, simulation environments, and sim-to-real bridging roles.

  16. Exploring Spatial Intelligence from a Generative Perspective

    cs.CV 2026-04 unverdicted novelty 7.0

    Fine-tuning multimodal models on a new synthetic spatial benchmark improves generative spatial compliance on real and synthetic tasks and transfers to better spatial understanding.

  17. Curiosity-Critic: Cumulative Prediction Error Improvement as a Tractable Intrinsic Reward for World Model Training

    cs.LG 2026-04 unverdicted novelty 7.0

    Curiosity-Critic rewards the improvement in cumulative prediction error via a tractable per-step surrogate (current error minus learned asymptotic baseline), outperforming prior curiosity methods in a stochastic grid world.

  18. GTASA: Ground Truth Annotations for Spatiotemporal Analysis, Evaluation and Training of Video Models

    cs.CV 2026-04 unverdicted novelty 7.0

    GTASA supplies annotated multi-actor videos with exact 3D spatial and temporal ground truth that outperforms neural video generators in physical and semantic validity while enabling new probes of video encoders.

  19. EgoTL: Egocentric Think-Aloud Chains for Long-Horizon Tasks

    cs.CV 2026-04 unverdicted novelty 7.0

    EgoTL provides a new egocentric dataset with think-aloud chains and metric labels that benchmarks VLMs on long-horizon tasks and improves their planning, reasoning, and spatial grounding after finetuning.

  20. MotionScape: A Large-Scale Real-World Highly Dynamic UAV Video Dataset for World Models

    cs.CV 2026-04 unverdicted novelty 7.0

    MotionScape is a large-scale UAV video dataset with highly dynamic 6-DoF motions, geometric trajectories, and semantic annotations to train world models that better simulate complex 3D dynamics under large viewpoint changes.

  21. Mastering Diverse Domains through World Models

    cs.AI 2023-01 unverdicted novelty 7.0

    DreamerV3 uses world models and robustness techniques to solve over 150 tasks across domains with a single configuration, including Minecraft diamond collection from scratch.

  22. Dream to Control: Learning Behaviors by Latent Imagination

    cs.LG 2019-12 accept novelty 7.0

    Dreamer learns to control from images by imagining and optimizing behaviors in a learned latent world model, outperforming prior methods on 20 visual tasks in data efficiency and final performance.

  23. PriorZero: Bridging Language Priors and World Models for Decision Making

    cs.LG 2026-05 unverdicted novelty 6.0

    PriorZero uses root-only LLM prior injection in MCTS and alternating world-model training with LLM fine-tuning to raise exploration efficiency and final performance on Jericho text games and BabyAI gridworlds.

  24. WorldComp2D: Spatio-semantic Representations of Object Identity and Location from Local Views

    cs.CV 2026-05 unverdicted novelty 6.0

    WorldComp2D explicitly structures latent space geometry by object identity and spatial proximity via a proximity-dependent encoder and localizer, cutting parameters up to 4X and FLOPs 2.2X versus state-of-the-art ligh...

  25. Network-Efficient World Model Token Streaming

    cs.RO 2026-05 unverdicted novelty 6.0

    An adaptive delta-prioritization algorithm using cosine distance and Hamming-drift thresholds improves embedding distortion by 4.8-7.2% and next-token perplexity by 2.1-6.3% over periodic keyframing at matched low bit...

  26. Beyond Thinking: Imagining in 360$^\circ$ for Humanoid Visual Search

    cs.CV 2026-05 unverdicted novelty 6.0

    Imagining in 360° decouples visual search into a single-step probabilistic semantic layout predictor and an actor, removing the need for multi-turn CoT reasoning and trajectory annotations while improving efficiency i...

  27. MolWorld: Molecule World Models for Actionable Molecular Optimization

    cs.LG 2026-05 unverdicted novelty 6.0

    MolWorld expands a molecule-transfer graph using a world model to discover high-property molecules that maintain strong structural connectivity to known compounds for actionable optimization.

  28. Latent Geometry Beyond Search: Amortizing Planning in World Models

    cs.RO 2026-05 unverdicted novelty 6.0

    In regularized latent spaces of world models, planning can be amortized into a goal-conditioned inverse dynamics model that matches CEM performance at 100-130x lower per-decision cost.

  29. ACWM-Phys: Investigating Generalized Physical Interaction in Action-Conditioned Video World Models

    cs.CV 2026-05 unverdicted novelty 6.0

    ACWM-Phys benchmark shows action-conditioned world models generalize on simple geometric interactions but drop sharply on deformable contacts, high-dimensional control, and complex articulated motion, indicating relia...

  30. Reason to Play: Behavioral and Brain Alignment Between Frontier LRMs and Human Game Learners

    cs.AI 2026-05 unverdicted novelty 6.0

    Frontier LRMs match human game-learning behavior and predict fMRI signals an order of magnitude better than RL or Bayesian agents because of their in-context game-state representations.

  31. Predictive but Not Plannable: RC-aux for Latent World Models

    cs.LG 2026-05 unverdicted novelty 6.0

    RC-aux corrects spatiotemporal mismatch in reconstruction-free latent world models by adding multi-horizon prediction and reachability supervision, improving planning performance on goal-conditioned pixel-control tasks.

  32. Three-in-One World Model: Energy-Based Consistency, Prediction, and Counterfactual Inference for Marketing Intervention

    cs.AI 2026-05 unverdicted novelty 6.0

    A DBM-based architecture learns consumer beliefs to enable consistent prediction and counterfactual inference for marketing interventions, outperforming baselines on heterogeneous treatment effects in simulation.

  33. Render, Don't Decode: Weight-Space World Models with Latent Structural Disentanglement

    cs.CV 2026-05 unverdicted novelty 6.0

    NOVA represents scene states as INR weights for analytical rendering without decoders and achieves structural disentanglement of content and dynamics in video world models.

  34. On Training in Imagination

    cs.LG 2026-05 unverdicted novelty 6.0

    The work derives the optimal ratio of dynamics-to-reward samples that minimizes a bound on return error and characterizes the tradeoff between noisy but cheap rewards versus accurate but expensive ones in imagination-...

  35. FAAST: Forward-Only Associative Learning via Closed-Form Fast Weights for Test-Time Supervised Adaptation

    cs.LG 2026-05 unverdicted novelty 6.0

    FAAST analytically compiles labeled examples into fast weights via a single forward pass, matching backprop adaptation performance with over 90% less time and up to 95% less memory than memory-based methods.

  36. Learning to Theorize the World from Observation

    cs.LG 2026-05 unverdicted novelty 6.0

    NEO induces compositional latent programs as world theories from observations and executes them to enable explanation-driven generalization.

  37. TRAP: Tail-aware Ranking Attack for World-Model Planning

    cs.LG 2026-05 unverdicted novelty 6.0

    TRAP is a tail-aware ranking attack that plants a backdoor in world models so that a trigger causes the model to reorder a few critical imagined trajectories and redirect planning while preserving normal behavior on c...

  38. Divide and Conquer: Decoupled Representation Alignment for Multimodal World Models

    cs.CV 2026-05 unverdicted novelty 6.0

    M²-REPA decouples modality-specific features inside a diffusion model and aligns each to its matching expert foundation model via an alignment loss plus a decoupling regularizer, yielding better visual quality and lon...

  39. Physically Native World Models: A Hamiltonian Perspective on Generative World Modeling

    cs.AI 2026-05 unverdicted novelty 6.0

    Hamiltonian World Models structure latent dynamics around energy-conserving Hamiltonian evolution to produce physically grounded, action-controllable predictions for embodied decision making.

  40. RAY-TOLD: Ray-Based Latent Dynamics for Dense Dynamic Obstacle Avoidance with TDMPC

    cs.RO 2026-04 unverdicted novelty 6.0

    RAY-TOLD combines ray-based latent dynamics from LiDAR with MPPI control and a learned policy prior via mixture sampling to lower collision rates in high-density dynamic obstacle environments compared to standard MPPI.

  41. Data-Driven Open-Loop Simulation for Digital-Twin Operator Decision Support in Wastewater Treatment

    cs.LG 2026-04 unverdicted novelty 6.0

    CCSS-RS achieves RMSE 0.696 and CRPS 0.349 at 1000-step horizons on a large public WWTP benchmark with 43% missingness, outperforming Neural CDE baselines by 40-46% in RMSE.

  42. Xiaomi OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation

    cs.CV 2026-04 unverdicted novelty 6.0

    OneVL is the first latent CoT method to exceed explicit CoT accuracy on four driving benchmarks while running at answer-only speed, by supervising latent tokens with a visual world model decoder.

  43. Xiaomi OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation

    cs.CV 2026-04 unverdicted novelty 6.0

    OneVL achieves superior accuracy to explicit chain-of-thought reasoning at answer-only latency by supervising latent tokens with a visual world model decoder that predicts future frames.

  44. Human Cognition in Machines: A Unified Perspective of World Models

    cs.RO 2026-04 unverdicted novelty 6.0

    The paper introduces a unified framework for world models that fully incorporates all cognitive functions from Cognitive Architecture Theory, highlights under-researched areas in motivation and meta-cognition, and pro...

  45. Learning Ad Hoc Network Dynamics via Graph-Structured World Models

    cs.LG 2026-04 unverdicted novelty 6.0

    G-RSSM learns per-node dynamics in wireless ad hoc networks via graph attention and trains clustering policies through imagined rollouts, generalizing from N=50 training to larger networks.

  46. Robotic Manipulation is Vision-to-Geometry Mapping ($f(v) \rightarrow G$): Vision-Geometry Backbones over Language and Video Models

    cs.RO 2026-04 unverdicted novelty 6.0

    Vision-geometry backbones using pretrained 3D world models outperform vision-language and video models for robotic manipulation by enabling direct mapping from visual input to geometric actions.

  47. LMGenDrive: Bridging Multimodal Understanding and Generative World Modeling for End-to-End Driving

    cs.CV 2026-04 unverdicted novelty 6.0

    LMGenDrive unifies LLM-based multimodal understanding with generative world models to output both future driving videos and control signals for end-to-end closed-loop autonomous driving.

  48. GIRL: Generative Imagination Reinforcement Learning via Information-Theoretic Hallucination Control

    cs.LG 2026-04 unverdicted novelty 6.0

    GIRL reduces latent rollout drift by 38-61% versus DreamerV3 in MBRL by grounding transitions with DINOv2 embeddings and using an information-theoretic adaptive bottleneck, yielding better long-horizon returns on cont...

  49. Veo-Act: How Far Can Frontier Video Models Advance Generalizable Robot Manipulation?

    cs.RO 2026-04 unverdicted novelty 6.0

    Veo-3 video predictions enable approximate task-level robot trajectories in zero-shot settings but require hierarchical integration with low-level VLA policies for reliable manipulation performance.

  50. Hierarchical Planning with Latent World Models

    cs.LG 2026-04 unverdicted novelty 6.0

    Hierarchical planning over multi-scale latent world models enables 70% success on real robotic pick-and-place with goal-only input where flat models achieve 0%, while cutting planning compute up to 4x in simulations.

  51. Behavior-Constrained Reinforcement Learning with Receding-Horizon Credit Assignment for High-Performance Control

    cs.RO 2026-04 unverdicted novelty 6.0

    A behavior-constrained RL framework with receding-horizon credit assignment learns high-performance control policies that stay aligned with expert behavior in race car simulation.

  52. Safety, Security, and Cognitive Risks in World Models

    cs.CR 2026-04 unverdicted novelty 6.0

    World models enable efficient AI planning but create risks from adversarial corruption, goal misgeneralization, and human bias, demonstrated via attacks that amplify errors and reduce rewards on models like RSSM and D...

  53. Metriplector: From Field Theory to Neural Architecture

    cs.AI 2026-03 unverdicted novelty 6.0

    Metriplector treats neural computation as coupled metriplectic field dynamics whose stress-energy tensor readout achieves competitive results on vision, control, Sudoku, language modeling, and pathfinding with small p...

  54. Video Generation Models as World Models: Efficient Paradigms, Architectures and Algorithms

    eess.IV 2026-03 unverdicted novelty 6.0

    Video generation models can function as world simulators if efficiency gaps in spatiotemporal modeling are bridged via organized paradigms, architectures, and algorithms.

  55. V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

    cs.AI 2025-06 unverdicted novelty 6.0

    V-JEPA 2 pre-trained on massive unlabeled video achieves strong results on motion understanding and action anticipation, SOTA video QA at 8B scale, and enables zero-shot robotic planning on Franka arms using only 62 h...

  56. GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation

    cs.RO 2024-10 unverdicted novelty 6.0

    GR-2 pre-trains on web-scale videos then fine-tunes on robot data to reach 97.7% average success across over 100 manipulation tasks with strong generalization to new scenes and objects.

  57. Probing the Impact of Scale on Data-Efficient, Generalist Transformer World Models for Atari

    cs.LG 2026-05 unverdicted novelty 5.0

    Transformer world models on Atari exhibit game-specific scaling regimes, but joint training on 26 environments produces consistent monotonic gains that improve downstream control policies to a median normalized score ...

  58. Reconstruction or Semantics? What Makes a Latent Space Useful for Robotic World Models

    cs.CV 2026-05 unverdicted novelty 5.0

    Semantic latent spaces from pretrained encoders outperform reconstruction-based spaces for robotic world models on planning and downstream policy performance.

  59. CKT-WAM: Parameter-Efficient Context Knowledge Transfer Between World Action Models

    cs.RO 2026-05 unverdicted novelty 5.0

    CKT-WAM transfers teacher WAM knowledge to students via compressed text-embedding contexts using LQCA and adapters, reaching 86.1% success on LIBERO-Plus with 1.17% trainable parameters and 83.3% in real-world tasks.

  60. On Training in Imagination

    cs.LG 2026-05 unverdicted novelty 5.0

    The paper derives the optimal dynamics-to-reward sample ratio minimizing return error under power-law scaling and proves that zero-mean reward noise in REINFORCE adds only variance that shrinks with more rollouts.