Recognition: unknown
Latent State Design for World Models under Sufficiency Constraints
Pith reviewed 2026-05-10 15:53 UTC · model grok-4.3
The pith
A world model is actionable when its latent state is constructed to match the agent's task rather than to retain the most information.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
World models matter to agents only through the states they construct, and these states must satisfy sufficiency constraints tied to concrete functions: prediction, control, planning, memory, grounding, or counterfactual reasoning. By grouping methods into roles such as predictive embedding, recurrent belief state, object or causal structure, latent action interface, grounded planning interface, and memory substrate, the paper shows that architecture-based classifications obscure important gaps, including the difference between predictive sufficiency and control sufficiency and between passive prediction and counterfactual modeling. Evaluation along the seven axes of representation, predicton
What carries the argument
A functional taxonomy that classifies latent states by their intended role (predictive embedding, recurrent belief state, object/causal structure, latent action interface, grounded planning interface, memory substrate), supported by a seven-axis evaluation matrix that diagnoses preservation, discarding, and enabling capabilities.
If this is right
- Predictive sufficiency does not guarantee control sufficiency, so models built for video prediction often fail when actions must be chosen.
- Passive prediction models are distinguished from those that support counterfactual reasoning about interventions.
- Evaluation should focus on what a latent state enables for the agent rather than on raw information content.
- Methods can be compared directly by the sufficiency constraints they were designed to meet instead of by architectural similarity.
- The most useful world model for any given application is the one whose state construction is matched to that application's requirements.
Where Pith is reading between the lines
- The taxonomy could serve as a checklist for designing hybrid models that satisfy multiple sufficiency constraints at once.
- Benchmarks for world models might shift from generic reconstruction accuracy to targeted tests of each sufficiency axis.
- In embodied settings the framework suggests prioritizing minimal states that are grounded for planning over richer but ungrounded representations.
- Automated search over latent-state designs could optimize directly for the relevant subset of the seven axes rather than for a single reconstruction loss.
Load-bearing premise
The proposed functional taxonomy of six roles and the seven-axis evaluation matrix capture the essential distinctions among world models and that sufficiency constraints are the right primary lens for organizing the field.
What would settle it
A head-to-head comparison on a shared planning or control benchmark in which a high-capacity model that maximizes mutual information with observations is evaluated against several task-specific models built under the taxonomy; if the maximal-information model outperforms all others across the seven axes, the central claim is falsified.
Figures
read the original abstract
A world model matters to an agent only through the state it constructs. That state must preserve some information, discard other information, and support some future function: prediction, control, planning, memory, grounding, or counterfactual reasoning. This paper treats world-model research as latent state design under sufficiency constraints. We propose a functional taxonomy that groups methods by what their latent state is for, rather than by architecture or application domain: predictive embedding, recurrent belief state, object/causal structure, latent action interface, grounded planning interface, and memory substrate. These roles expose distinctions that architecture-based groupings hide, including the gap between predictive sufficiency and control sufficiency, and the gap between passive video prediction and counterfactual action modeling. The taxonomy supports an evaluation framework that judges a model by the sufficiency constraint its latent state was built to satisfy. We compare methods along seven axes: representation, prediction, planning, controllability, causal/counterfactual support, memory, and uncertainty. We use the resulting matrix as a diagnostic for what a latent state preserves, discards, and enables. The conclusion that follows is that an actionable world model is the one whose state construction matches the task, not the one that preserves the most information.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that world models should be understood through the lens of latent state design under sufficiency constraints. It proposes a functional taxonomy categorizing methods into predictive embedding, recurrent belief state, object/causal structure, latent action interface, grounded planning interface, and memory substrate. These are evaluated using a seven-axis matrix (representation, prediction, planning, controllability, causal/counterfactual support, memory, uncertainty) to determine what the states preserve, discard, and enable. The resulting insight is that actionable world models prioritize task-matched state construction over maximal information preservation.
Significance. This taxonomy offers a novel organizational tool for world-model research that could reveal overlooked distinctions, such as between passive prediction and counterfactual action modeling. By emphasizing functional roles over architectures, it may guide the development of more efficient, task-specific models. The framework's value lies in its potential as a diagnostic for latent state design, though its impact requires community testing and application.
minor comments (3)
- [Abstract] The phrase 'sufficiency constraints' is central but not defined in the provided abstract; adding a brief definition would aid readers unfamiliar with the concept.
- [Conclusion] The final claim about actionable world models could be illustrated with a brief example contrasting two methods to make the distinction concrete.
- [Evaluation framework] Ensure that the seven axes are clearly distinguished from one another to avoid overlap in the matrix.
Simulated Author's Rebuttal
We thank the referee for their positive assessment of the manuscript and for recommending minor revision. The referee's summary accurately reflects the paper's focus on functional taxonomy for latent state design under sufficiency constraints, and we appreciate the recognition of its potential value as a diagnostic tool. No specific major comments were raised in the report.
Circularity Check
No significant circularity in conceptual taxonomy
full rationale
The paper is a conceptual taxonomy and evaluation framework for world models organized around sufficiency constraints on latent states. It proposes functional categories (predictive embedding, recurrent belief state, etc.) and a seven-axis diagnostic matrix without any formal derivations, equations, fitted parameters, or numerical predictions. The central interpretive claim—that actionable world models match task-specific sufficiency rather than maximize information preservation—follows directly from adopting the proposed lens and does not reduce to self-definition, fitted inputs, or load-bearing self-citations. No load-bearing steps exist that could be circular by construction.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption A world model matters to an agent only through the state it constructs.
- domain assumption The latent state must preserve some information, discard other information, and support some future function such as prediction, control, planning, memory, grounding, or counterfactual reasoning.
Reference graph
Works this paper leans on
-
[1]
Diffusion for world modeling: Visual details matter in Atari, 2024
Eloi Alonso, Adam Jelley, Vincent Micheli, Anssi Kanervisto, Amos Storkey, Tim Pearce, and François Fleuret. Diffusion for world modeling: Visual details matter in Atari, 2024. URL https://arxiv.org/abs/2405. 12399
2024
-
[2]
Mahmoud Assran, Quentin Duval, Ishan Misra, Piotr Bojanowski, Pascal Vincent, Michael Rabbat, Yann LeCun, and Nicolas Ballas. Self-supervised learning from images with a joint-embedding predictive architecture, 2023. URLhttps://arxiv.org/abs/2301.08243
-
[3]
V-jepa 2: Self-supervised video models enable understanding, prediction and planning,
Mido Assran, Adrien Bardes, David Fan, Quentin Garrido, Russell Howes, Mojtaba, Komeili, Matthew Muckley, Ammar Rizvi, Claire Roberts, Koustuv Sinha, Artem Zholus, Sergio Arnaud, Abha Gejji, Ada Martin, Fran- cois Robert Hogan, Daniel Dugas, Piotr Bojanowski, Vasil Khalidov, Patrick Labatut, Francisco Massa, Marc Szafraniec, Kapil Krishnakumar, Yong Li, X...
-
[4]
URLhttps://arxiv.org/abs/2506.09985
work page internal anchor Pith review arXiv
-
[5]
Latent Reasoning VLA: Latent Thinking and Prediction for Vision-Language-Action Models
Shuanghao Bai, Jing Lyu, Wanqi Zhou, Zhe Li, Dakai Wang, Lei Xing, Xiaoguang Zhao, Pengwei Wang, Zhongyuan Wang, Cheng Chi, Badong Chen, and Shanghang Zhang. Latent reasoning vla: Latent thinking and prediction for vision-language-action models, 2026. URLhttps://arxiv.org/abs/2602.01166
work page internal anchor Pith review arXiv 2026
-
[6]
Scalable methods for computing state similarity in deterministic Markov decision processes,
Pablo Samuel Castro. Scalable methods for computing state similarity in deterministic Markov decision processes,
-
[7]
20 Latent State Design for World Models under Sufficiency ConstraintsA PREPRINT
URLhttps://arxiv.org/abs/1911.09291. 20 Latent State Design for World Models under Sufficiency ConstraintsA PREPRINT
-
[8]
MICo: Improved representations via sampling-based state similarity for Markov decision processes, 2021
Pablo Samuel Castro, Tyler Kastner, Prakash Panangaden, and Mark Rowland. MICo: Improved representations via sampling-based state similarity for Markov decision processes, 2021. URL https://arxiv.org/abs/2106. 08229
2021
-
[9]
Boyuan Chen, Zhuo Xu, Sean Kirmani, Brian Ichter, Danny Driess, Pete Florence, Dorsa Sadigh, Leonidas Guibas, and Fei Xia. Spatialvlm: Endowing vision-language models with spatial reasoning capabilities, 2024. URLhttps://arxiv.org/abs/2401.12168
-
[10]
PonderNet: Learning to ponder.arXiv preprint arXiv:2106.01345,
Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, and Igor Mordatch. Decision transformer: Reinforcement learning via sequence modeling, 2021. URL https://arxiv.org/abs/2106.01345
-
[11]
Think deep, not just long: Measuring llm reasoning effort via deep-thinking tokens, 2026
Wei-Lin Chen, Liqian Peng, Tian Tan, Chao Zhao, Blake JianHang Chen, Ziqian Lin, Alec Go, and Yu Meng. Think deep, not just long: Measuring LLM reasoning effort via deep-thinking tokens, 2026. URL https: //arxiv.org/abs/2602.13517
-
[12]
Diffusion Policy: Visuomotor Policy Learning via Action Diffusion
Cheng Chi, Zhenjia Xu, Siyuan Feng, Eric Cousineau, Yilun Du, Benjamin Burchfiel, Russ Tedrake, and Shuran Song. Diffusion policy: Visuomotor policy learning via action diffusion, 2023. URL https://arxiv.org/abs/ 2303.04137
work page internal anchor Pith review arXiv 2023
-
[13]
Latent particle world models: Self-supervised object-centric stochastic dynamics modeling, 2026
Tal Daniel, Carl Qi, Dan Haramati, Amir Zadeh, Chuan Li, Aviv Tamar, Deepak Pathak, and David Held. Latent particle world models: Self-supervised object-centric stochastic dynamics modeling, 2026. URL https: //arxiv.org/abs/2603.04553
-
[14]
Genie 3: A new frontier for world models, 2025
Google DeepMind. Genie 3: A new frontier for world models, 2025. URL https://deepmind.google/blog/ genie-3-a-new-frontier-for-world-models/. Accessed 2026-05-01
2025
-
[15]
arXiv preprint arXiv:2601.00844 , year=
Matthieu Destrade, Oumayma Bounou, Quentin Le Lidec, Jean Ponce, and Yann LeCun. Value-guided action planning with jepa world models, 2025. URLhttps://arxiv.org/abs/2601.00844
-
[16]
CausalVAE as a Plug-in for World Models: Towards Reliable Counterfactual Dynamics
Ziyi Ding, Xianxin Lai, Weiyu Chen, Xiao-Ping Zhang, and Jiayu Chen. CausalV AE as a plug-in for world models: Towards reliable counterfactual dynamics, 2026. URLhttps://arxiv.org/abs/2604.07712
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[17]
Learning interactive world model for object-centric reinforcement learning, 2025
Fan Feng, Phillip Lippe, and Sara Magliacane. Learning interactive world model for object-centric reinforcement learning, 2025. URLhttps://arxiv.org/abs/2511.02225
-
[18]
Metrics for finite Markov decision processes
Norm Ferns, Prakash Panangaden, and Doina Precup. Metrics for finite Markov decision processes. InProceedings of the 20th Conference on Uncertainty in Artificial Intelligence (UAI), pages 162–169, 2004
2004
-
[19]
FOCUS: Object-centric world models for robotics manipulation.arXiv preprint arXiv:2307.02427, 2023
Stefano Ferraro, Pietro Mazzaglia, Tim Verbelen, and Bart Dhoedt. Focus: Object-centric world models for robotics manipulation, 2023. URLhttps://arxiv.org/abs/2307.02427
-
[20]
Adaworld: Learning adaptable world models with latent actions.arXiv preprint arXiv:2503.18938, 2025
Shenyuan Gao, Siyuan Zhou, Yilun Du, Jun Zhang, and Chuang Gan. Adaworld: Learning adaptable world models with latent actions, 2025. URLhttps://arxiv.org/abs/2503.18938
-
[21]
arXiv preprint arXiv:2601.05230 (2026)
Quentin Garrido, Tushar Nagarajan, Basile Terver, Nicolas Ballas, Yann LeCun, and Michael Rabbat. Learning latent action world models in the wild, 2026. URLhttps://arxiv.org/abs/2601.05230
-
[22]
The value equivalence principle for model-based reinforcement learning, 2020
Christopher Grimm, André Barreto, Satinder Singh, and David Silver. The value equivalence principle for model-based reinforcement learning, 2020. URLhttps://arxiv.org/abs/2011.03506
-
[23]
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces, 2023. URL https://arxiv.org/abs/2312.00752
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[24]
David Ha and Jürgen Schmidhuber. World models, 2018. URLhttps://arxiv.org/abs/1803.10122
work page internal anchor Pith review arXiv 2018
-
[25]
Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, and James Davidson. Learning latent dynamics for planning from pixels, 2018. URLhttps://arxiv.org/abs/1811.04551
-
[26]
Dream to Control: Learning Behaviors by Latent Imagination
Danijar Hafner, Timothy Lillicrap, Jimmy Ba, and Mohammad Norouzi. Dream to control: Learning behaviors by latent imagination, 2019. URLhttps://arxiv.org/abs/1912.01603
work page internal anchor Pith review arXiv 2019
-
[27]
Mastering Diverse Domains through World Models
Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering diverse domains through world models, 2023. URLhttps://arxiv.org/abs/2301.04104
work page internal anchor Pith review arXiv 2023
-
[28]
TD-MPC2: Scalable, robust world models for continuous control,
Nicklas Hansen, Hao Su, and Xiaolong Wang. TD-MPC2: Scalable, robust world models for continuous control,
-
[29]
URLhttps://arxiv.org/abs/2310.16828
work page internal anchor Pith review arXiv
-
[30]
Training Large Language Models to Reason in a Continuous Latent Space
Shibo Hao, Sainbayar Sukhbaatar, DiJia Su, Xian Li, Zhiting Hu, Jason Weston, and Yuandong Tian. Training large language models to reason in a continuous latent space, 2024. URLhttps://arxiv.org/abs/2412.06769
work page internal anchor Pith review arXiv 2024
-
[31]
Learning latent state spaces for planning through reward prediction, 2019
Aaron Havens, Yi Ouyang, Prabhat Nagarajan, and Yasuhiro Fujita. Learning latent state spaces for planning through reward prediction, 2019. URLhttps://arxiv.org/abs/1912.04201. 21 Latent State Design for World Models under Sufficiency ConstraintsA PREPRINT
-
[32]
Relic: Interactive video world model with long-horizon memory.arXiv preprint arXiv:2512.04040, 2025
Yicong Hong, Yiqun Mei, Chongjian Ge, Yiran Xu, Yang Zhou, Sai Bi, Yannick Hold-Geoffroy, Mike Roberts, Matthew Fisher, Eli Shechtman, Kalyan Sunkavalli, Feng Liu, Zhengqi Li, and Hao Tan. Relic: Interactive video world model with long-horizon memory, 2025. URLhttps://arxiv.org/abs/2512.04040
-
[33]
GAIA-1: A Generative World Model for Autonomous Driving
Anthony Hu, Lloyd Russell, Hudson Yeo, Zak Murez, George Fedoseev, Alex Kendall, Jamie Shotton, and Gianluca Corrado. GAIA-1: A generative world model for autonomous driving, 2023. URL https://arxiv. org/abs/2309.17080
work page internal anchor Pith review arXiv 2023
-
[34]
Object-centric world model for language- guided manipulation.arXiv preprint arXiv:2503.06170, 2025
Youngjoon Jeong, Junha Chun, Soonwoo Cha, and Taesup Kim. Object-centric world model for language-guided manipulation, 2025. URLhttps://arxiv.org/abs/2503.06170
-
[35]
Bowen Jing, Ruiyang Hao, Weitao Zhou, and Haibao Yu. CounterScene: Counterfactual causal reasoning in generative world models for safety-critical evaluation, 2026. URLhttps://arxiv.org/abs/2603.21104
-
[36]
Leslie Pack Kaelbling, Michael L. Littman, and Anthony R. Cassandra. Planning and acting in partially observable stochastic domains.Artificial Intelligence, 101(1-2):99–134, 1998. doi: 10.1016/S0004-3702(98)00023-X
-
[37]
Model- based reinforcement learning for atari.arXiv preprint arXiv:1903.00374, 2019
Lukasz Kaiser, Mohammad Babaeizadeh, Piotr Milos, Blazej Osinski, Roy H. Campbell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski, Sergey Levine, Afroz Mohiuddin, Ryan Sepassi, George Tucker, and Henryk Michalewski. Model-based reinforcement learning for Atari, 2019. URL https://arxiv.org/ abs/1903.00374
-
[38]
OpenVLA: An open-source vision-language-action model,
Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan Foster, Grace Lam, Pannag Sanketi, Quan Vuong, Thomas Kollar, Benjamin Burchfiel, Russ Tedrake, Dorsa Sadigh, Sergey Levine, Percy Liang, and Chelsea Finn. OpenVLA: An open-source vision-language-action model,
-
[39]
URLhttps://arxiv.org/abs/2406.09246
work page internal anchor Pith review Pith/arXiv arXiv
-
[40]
Object-centric latent action learning, 2025
Albina Klepach, Alexander Nikulin, Ilya Zisman, Denis Tarasov, Alexander Derevyagin, Andrei Polubarov, Nikita Lyubaykin, Igor Kiselev, and Vladislav Kurenkov. Object-centric latent action learning, 2025. URL https://arxiv.org/abs/2502.09680
-
[41]
Grounded World Model for Semantically Generalizable Planning
Quanyi Li, Lan Feng, Haonan Zhang, Wuyang Li, Letian Wang, Alexandre Alahi, and Harold Soh. Grounded world model for semantically generalizable planning, 2026. URLhttps://arxiv.org/abs/2604.11751
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[42]
World-Value-Action Model: Implicit Planning for Vision-Language-Action Systems
Runze Li, Hongyin Zhang, Junxi Jin, Qixin Zeng, Zifeng Zhuang, Yiqi Tang, Shangke Lyu, and Donglin Wang. World-value-action model: Implicit planning for vision-language-action systems, 2026. URL https: //arxiv.org/abs/2604.14732
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[43]
Xinxin Liu, Zhaopan Xu, Ming Li, Kai Wang, Yong Jae Lee, and Yuzhang Shang. Can world simulators reason? gen-vire: A generative visual reasoning benchmark, 2025. URLhttps://arxiv.org/abs/2511.13853
-
[44]
Lucas Maes, Quentin Le Lidec, Damien Scieur, Yann LeCun, and Randall Balestriero. Leworldmodel: Stable end- to-end joint-embedding predictive architecture from pixels, 2026. URLhttps://arxiv.org/abs/2603.19312
-
[45]
Structured world models from human videos.arXiv preprint arXiv:2308.10901, 2023
Russell Mendonca, Shikhar Bahl, and Deepak Pathak. Structured world models from human videos, 2023. URL https://arxiv.org/abs/2308.10901
-
[46]
Transformers are sample-efficient world models.arXiv preprint arXiv:2209.00588, 2022
Vincent Micheli, Eloi Alonso, and François Fleuret. Transformers are sample-efficient world models, 2022. URL https://arxiv.org/abs/2209.00588
-
[47]
SOLD: Slot object-centric latent dynamics models for relational manipulation learning from pixels, 2024
Malte Mosbach, Jan Niklas Ewertz, Angel Villar-Corrales, and Sven Behnke. SOLD: Slot object-centric latent dynamics models for relational manipulation learning from pixels, 2024. URL https://arxiv.org/abs/2410. 08822
2024
-
[48]
arXiv preprint arXiv:2603.14482 (2026)
Lorenzo Mur-Labadia, Matthew Muckley, Amir Bar, Mido Assran, Koustuv Sinha, Mike Rabbat, Yann LeCun, Nicolas Ballas, and Adrien Bardes. V-jepa 2.1: Unlocking dense features in video self-supervised learning, 2026. URLhttps://arxiv.org/abs/2603.14482
-
[49]
Causal-jepa: Learning world models through object-level latent interventions, 2026
Heejeong Nam, Quentin Le Lidec, Lucas Maes, Yann LeCun, and Randall Balestriero. Causal-jepa: Learning world models through object-level latent interventions, 2026. URLhttps://arxiv.org/abs/2602.11389
-
[50]
Temporal predictive coding for model-based planning in latent space, 2021
Tung Nguyen, Rui Shu, Tuan Pham, Hung Bui, and Stefano Ermon. Temporal predictive coding for model-based planning in latent space, 2021. URLhttps://arxiv.org/abs/2106.07156
-
[51]
Cambridge University Press, 2 edition, 2009
Judea Pearl.Causality: Models, Reasoning, and Inference. Cambridge University Press, 2 edition, 2009
2009
-
[52]
π0.7: a steerable generalist robotic foundation model with emergent capabilities
Physical Intelligence. π0.7: a steerable generalist robotic foundation model with emergent capabilities. Technical report, Physical Intelligence, 2026. URLhttps://pi.website/pi07
2026
-
[53]
World simulation with video foundation models for phys- ical ai, 2025
NVIDIA Research. World simulation with video foundation models for phys- ical ai, 2025. URL https://research.nvidia.com/publication/2025-09_ world-simulation-video-foundation-models-physical-ai. Accessed 2026-05-01. 22 Latent State Design for World Models under Sufficiency ConstraintsA PREPRINT
2025
-
[54]
Gaia-2: A controllable multi-view generative world model for autonomous driving,
Lloyd Russell, Anthony Hu, Lorenzo Bertoni, George Fedoseev, Jamie Shotton, Elahe Arani, and Gianluca Corrado. GAIA-2: A controllable multi-view generative world model for autonomous driving, 2025. URL https://arxiv.org/abs/2503.20523
-
[55]
Mastering memory tasks with world models
Mohammad Reza Samsami, Artem Zholus, Janarthanan Rajendran, and Sarath Chandar. Mastering memory tasks with world models. InInternational Conference on Learning Representations (ICLR), 2024. URL https: //arxiv.org/abs/2403.04253
-
[56]
Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, Timothy Lillicrap, and David Silver. Mastering Atari, Go, chess and shogi by planning with a learned model, 2019. URLhttps://arxiv.org/abs/1911.08265
-
[57]
The information bottleneck method
Naftali Tishby, Fernando C. Pereira, and William Bialek. The information bottleneck method, 1999. URL https://arxiv.org/abs/physics/0004057
work page Pith review arXiv 1999
-
[58]
Object-centric world models meet monte carlo tree search, 2026
Rodion Vakhitov, Leonid Ugadiarov, and Aleksandr Panov. Object-centric world models meet monte carlo tree search, 2026. URLhttps://arxiv.org/abs/2601.06604
-
[59]
Latent-wam: Latent world action modeling for end-to-end autonomous driving
Linbo Wang, Yupeng Zheng, Qiang Chen, Shiwei Li, Yichen Zhang, Zebin Xing, Qichao Zhang, Xiang Li, Deheng Qian, Pengxuan Yang, Yihang Dong, Ce Hao, Xiaoqing Ye, Junyu han, Yifeng Pan, and Dongbin Zhao. Latent-wam: Latent world action modeling for end-to-end autonomous driving, 2026. URL https: //arxiv.org/abs/2603.24581
-
[60]
Co-Evolving Latent Action World Models
Yucen Wang, Fengming Zhang, De-Chuan Zhan, Li Zhao, Kaixin Wang, and Jiang Bian. Co-evolving latent action world models, 2025. URLhttps://arxiv.org/abs/2510.26433
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[61]
Dyn-o: Building structured world models with object-centric representations, 2025
Zizhao Wang, Kaixin Wang, Li Zhao, Peter Stone, and Jiang Bian. Dyn-o: Building structured world models with object-centric representations, 2025. URLhttps://arxiv.org/abs/2507.03298
-
[62]
Factored latent action world models, 2026
Zizhao Wang, Chang Shi, Jiaheng Hu, Kevin Rohling, Roberto Martín-Martín, Amy Zhang, and Peter Stone. Factored latent action world models, 2026. URLhttps://arxiv.org/abs/2602.16229
-
[63]
Greg Wayne, Chia-Chun Hung, David Amos, Mehdi Mirza, Arun Ahuja, Agnieszka Grabska-Barwinska, Jack Rae, Piotr Mirowski, Joel Z. Leibo, Adam Santoro, Mevlana Gemici, Malcolm Reynolds, Tim Harley, Josh Abramson, Shakir Mohamed, Danilo Rezende, David Saxton, Adam Cain, Chloe Hillier, David Silver, Koray Kavukcuoglu, Matt Botvinick, Demis Hassabis, and Timoth...
-
[64]
Daydreamer: World models for physical robot learning, 2022
Philipp Wu, Alejandro Escontrela, Danijar Hafner, Ken Goldberg, and Pieter Abbeel. Daydreamer: World models for physical robot learning, 2022. URLhttps://arxiv.org/abs/2206.14176
-
[65]
Video world models with long-term spatial memory.arXiv preprint arXiv:2506.05284, 2025
Tong Wu, Shuai Yang, Ryan Po, Yinghao Xu, Ziwei Liu, Dahua Lin, and Gordon Wetzstein. Video world models with long-term spatial memory, 2025. URLhttps://arxiv.org/abs/2506.05284
-
[66]
Worldmem: Long-term consistent world simulation with memory.arXiv preprint arXiv:2504.12369, 2025
Zeqi Xiao, Yushi Lan, Yifan Zhou, Wenqi Ouyang, Shuai Yang, Yanhong Zeng, and Xingang Pan. Worldmem: Long-term consistent world simulation with memory, 2025. URLhttps://arxiv.org/abs/2504.12369
-
[67]
Chain of World: World model thinking in latent motion.arXiv preprint arXiv:2603.03195, 2026
Fuxiang Yang, Donglin Di, Lulu Tang, Xuancheng Zhang, Lei Fan, Hao Li, Chen Wei, Tonghua Su, and Baorui Ma. Chain of world: World model thinking in latent motion, 2026. URLhttps://arxiv.org/abs/2603.03195
-
[68]
Learning interactive real-world simulators
Mengjiao Yang, Yilun Du, Kamyar Ghasemipour, Jonathan Tompson, Dale Schuurmans, and Pieter Abbeel. Learning interactive real-world simulators, 2023. URLhttps://arxiv.org/abs/2310.06114
-
[69]
Latent Action Pretraining from Videos
Seonghyeon Ye, Joel Jang, Byeongguk Jeon, Sejune Joo, Jianwei Yang, Baolin Peng, Ajay Mandlekar, Reuben Tan, Yu-Wei Chao, Bill Yuchen Lin, Lars Liden, Kimin Lee, Jianfeng Gao, Luke Zettlemoyer, Dieter Fox, and Minjoon Seo. Latent action pretraining from videos, 2024. URLhttps://arxiv.org/abs/2410.11758
work page Pith review arXiv 2024
-
[70]
Mastering Atari games with limited data, 2021
Weirui Ye, Shaohuai Liu, Thanard Kurutach, Pieter Abbeel, and Yang Gao. Mastering Atari games with limited data, 2021. URLhttps://arxiv.org/abs/2111.00210
-
[71]
Chenyu Zhang, Daniil Cherniavskii, Antonios Tragoudaras, Antonios V ozikis, Thijmen Nijdam, Derck W. E. Prinzhorn, Mark Bodracska, Nicu Sebe, Andrii Zadaianchuk, and Efstratios Gavves. Morpheus: Benchmarking physical reasoning of video generative models with real physical experiments, 2025. URL https://arxiv.org/ abs/2504.02918
-
[72]
Hierarchical Planning with Latent World Models
Wancong Zhang, Basile Terver, Artem Zholus, Soham Chitnis, Harsh Sutaria, Mido Assran, Randall Balestriero, Amir Bar, Adrien Bardes, Yann LeCun, and Nicolas Ballas. Hierarchical planning with latent world models, 2026. URLhttps://arxiv.org/abs/2604.03208
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[73]
Object-centric world models from few-shot annotations for sample-efficient reinforcement learning, 2025
Weipu Zhang, Adam Jelley, Trevor McInroe, Amos Storkey, and Gang Wang. Object-centric world models from few-shot annotations for sample-efficient reinforcement learning, 2025. URL https://arxiv.org/abs/2501. 16443. 23 Latent State Design for World Models under Sufficiency ConstraintsA PREPRINT
2025
-
[74]
Yang Zhou, Xiaofeng Wang, Hao Shao, Letian Wang, Guosheng Zhao, Jiangnan Shao, Jiagang Zhu, Tingdong Yu, Zheng Zhu, Guan Huang, and Steven L. Waslander. Drivedreamer-policy: A geometry-grounded world-action model for unified generation and planning, 2026. URLhttps://arxiv.org/abs/2604.01765. 24
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.