Looped World Models

Bowen Cao; Cenyuan Zhang; Haonan Yin; Haoran Xu; Hao Wei; Hao Yang; Hebin Wang; Hongyuan Adam Lu; Jian Chen; Jiawei Zhou

arxiv: 2606.18208 · v1 · pith:JW2YEJ5Nnew · submitted 2026-06-16 · 💻 cs.LG · cs.AI· cs.CL· cs.CV

Looped World Models

Hongyuan Adam Lu , Z.L. Victor Wei , Qun Zhang , Jinrui Zeng , Bowen Cao , Lingwei Meng , Mocheng Li , Zezhong Wang

show 23 more authors

Haonan Yin Naifu Xue Minyu Chen Cenyuan Zhang Zefan Zhang Hao Wei Jiawei Zhou Haoran Xu Hao Yang Ronglai Zuo Tongda Xu Yonghao Li Jian Chen Hebin Wang Zeyu Gao Yang Li Wei Zhao Qimin Zhong Siqi Liu Yumeng Zhang Leyan Cui Zhangyu Wang Wai Lam

This is my paper

Pith reviewed 2026-06-27 01:47 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CLcs.CV

keywords looped world modelsparameter efficiencyadaptive computationiterative refinementlatent statestransformer blocksworld simulationscaling axis

0 comments

The pith

Looped world models use a shared transformer block applied repeatedly to refine latent states, delivering up to 100x parameter efficiency and adaptive depth for each prediction step.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Looped World Models as the first looped architecture for world modelling. It works by taking a single transformer block and applying it multiple times to the same latent environment state until the prediction stabilizes. This setup automatically uses more iterations for complex steps and fewer for simple ones. A reader would care because conventional world models must grow deeper and more expensive to handle long sequences, while this method keeps the parameter count fixed and still scales computation to need. The authors position iterative refinement depth as a new direction for improving simulation quality alongside bigger models or more data.

Core claim

Looped World Models iteratively refine latent environment states through repeated application of a parameter-shared transformer block. This produces faithful long-horizon simulation while achieving up to 100x parameter efficiency over conventional approaches. The loop supplies adaptive computation that automatically increases depth for harder prediction steps and decreases it for easier ones. The approach treats iterative latent depth as an orthogonal scaling axis to model size and training data volume.

What carries the argument

The parameter-shared transformer block that is applied repeatedly to refine a latent state, carrying the argument by trading fixed parameters for variable iteration count.

If this is right

The same fixed parameter budget can support longer or more detailed environment simulations than fixed-depth models.
Computation cost per prediction step varies automatically with scene complexity instead of requiring manual architecture changes.
Iterative depth becomes a controllable variable for trading accuracy against speed during deployment.
World model training can focus resources on learning a strong shared block rather than stacking many unique layers.
The method opens a route to scale simulation quality without proportional increases in memory footprint.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Models built this way could run on hardware with tight memory limits while still handling occasional hard steps through extra iterations.
The loop structure might combine with existing techniques such as larger training datasets to produce multiplicative gains.
Stability of the refinement process over very long loops would need direct testing in domains where small errors grow quickly.
Similar looping could be tried in other sequence modelling tasks that currently rely on deep fixed stacks.

Load-bearing premise

Repeated passes of the same transformer block on a latent state will produce stable refinements without compounding errors across long prediction horizons.

What would settle it

Measure prediction error growth over hundreds of simulation steps in a standard environment benchmark and check whether error remains comparable to or lower than non-looped baselines of similar parameter count.

Figures

Figures reproduced from arXiv: 2606.18208 by Bowen Cao, Cenyuan Zhang, Haonan Yin, Haoran Xu, Hao Wei, Hao Yang, Hebin Wang, Hongyuan Adam Lu, Jian Chen, Jiawei Zhou, Jinrui Zeng, Leyan Cui, Lingwei Meng, Minyu Chen, Mocheng Li, Naifu Xue, Qimin Zhong, Qun Zhang, Ronglai Zuo, Siqi Liu, Tongda Xu, Wai Lam, Wei Zhao, Yang Li, Yonghao Li, Yumeng Zhang, Zefan Zhang, Zeyu Gao, Zezhong Wang, Zhangyu Wang, Z.L. Victor Wei.

**Figure 2.** Figure 2: Relative increase over Qwen3.7-max on automatic online performance, compared against [PITH_FULL_IMAGE:figures/full_fig_p029_2.png] view at source ↗

**Figure 3.** Figure 3: Human evaluation performance with our model, compared against baselines. Note that [PITH_FULL_IMAGE:figures/full_fig_p030_3.png] view at source ↗

read the original abstract

Current world models face a fundamental tension: faithful long-horizon simulation demands deep computation, but deeper models are expensive to deploy and prone to compounding errors. We resolve this by introducing Looped World Models (LoopWM), which are the first looped architectures for world modelling. Our method iteratively refines latent environment states through a parameter-shared transformer block. This yield up to 100x parameter efficiency over conventional approaches with adaptive computation that automatically scales depth to match the complexity of each prediction step. Orthogonal to scaling model size and training data, LoopWM establishes iterative latent depth as a new scaling axis for world simulation, which might significantly push the community forward.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LoopWM claims looped parameter sharing gives 100x efficiency and a new scaling axis for world models, but the abstract supplies no experiments, derivations, or stability checks to support it.

read the letter

The main thing to know is that this paper applies looped transformer blocks with shared parameters to world modeling, letting the same block refine latent states over multiple iterations for adaptive depth. It positions iterative latent depth as orthogonal to scaling model size or data.

The framing is reasonable: long-horizon simulation needs computation but deeper nets cost more and accumulate errors, so parameter sharing plus iteration count as a dial is a straightforward idea. Applying the loop trick from other domains to this setting is the clearest novelty.

The problems are straightforward and central. The abstract asserts up to 100x parameter efficiency and automatic depth scaling with no numbers, no rollouts, no error curves, and no convergence argument. The stress-test point holds: repeated application of one block can diverge or compound errors unless the iteration is provably contractive or stabilized, yet nothing in the text addresses Lipschitz constants, normalization per step, or ablation on iteration count versus horizon length. Without those, the efficiency claim cannot be separated from possible short-horizon testing or unstated tricks.

The paper is aimed at researchers building world models for planning or RL who care about inference cost. A reader could extract the conceptual move, but the lack of any verifiable result means it does not yet support citation or serious discussion. It does not deserve peer review until the empirical and analytic gaps are closed; the idea is simple enough that experiments should have been included from the start.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces Looped World Models (LoopWM) as the first looped architectures for world modelling. It claims that iteratively refining latent environment states through repeated application of a parameter-shared transformer block yields up to 100x parameter efficiency over conventional approaches, with adaptive computation that automatically scales depth to the complexity of each prediction step. It positions iterative latent depth as a new scaling axis orthogonal to model size and training data.

Significance. If the efficiency and stability claims hold with supporting experiments and analysis, the work could open a new scaling dimension for world models by trading parameters for iteration depth. No machine-checked proofs, reproducible code, or falsifiable predictions are evident in the provided text, so the significance remains conditional on future validation.

major comments (2)

Abstract: the central claim of 'up to 100x parameter efficiency' is presented without any experimental results, baselines, ablation studies, or error analysis, rendering the efficiency assertion unsupported and load-bearing for the paper's contribution.
Abstract: the stability of long-horizon simulation via repeated application of the shared transformer block is asserted without convergence analysis, Lipschitz bounds on the block, or examination of compounding error growth versus iteration count, which directly undermines the weakest assumption identified in the stress-test note.

minor comments (1)

Abstract: 'This yield up to' contains a subject-verb agreement error and should read 'This yields up to'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed feedback on our manuscript. We agree that the abstract claims require better support from the body of the paper or should be qualified. We will make revisions to address these issues. Below we respond point by point to the major comments.

read point-by-point responses

Referee: Abstract: the central claim of 'up to 100x parameter efficiency' is presented without any experimental results, baselines, ablation studies, or error analysis, rendering the efficiency assertion unsupported and load-bearing for the paper's contribution.

Authors: We acknowledge this concern. The '100x parameter efficiency' claim in the abstract is based on the parameter-sharing across iterations, which in principle allows for significant reduction compared to deeper non-shared models. However, the manuscript text does not include the specific experimental results, baselines, or ablations to support the quantitative figure. We will revise the abstract to qualify this claim, for example by stating 'potentially up to 100x' or removing the specific number until supported by experiments. We plan to include such empirical validation in the revised manuscript. revision: yes
Referee: Abstract: the stability of long-horizon simulation via repeated application of the shared transformer block is asserted without convergence analysis, Lipschitz bounds on the block, or examination of compounding error growth versus iteration count, which directly undermines the weakest assumption identified in the stress-test note.

Authors: We agree that a rigorous analysis of stability is necessary for claims about long-horizon simulation. The current manuscript asserts stability without providing the requested analyses such as convergence or error growth studies. This is a valid point, and we will revise the abstract to remove or soften the assertion regarding stability of long-horizon simulation. We will also consider adding preliminary analysis if possible, but acknowledge this may require additional work. revision: yes

Circularity Check

0 steps flagged

No circularity; architecture claim is independent of inputs

full rationale

The provided abstract and visible text introduce Looped World Models as a novel parameter-shared iterative architecture without any equations, derivations, fitted parameters, or self-citations that reduce the efficiency claim to a construction or prior result by the authors. The 100x efficiency is stated as an outcome of the method rather than a mathematical identity or renamed input. No load-bearing steps match the enumerated circularity patterns; the derivation chain (if present in full text) is not visible here and cannot be shown to collapse by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. The looped refinement mechanism is presented as the core contribution without detailing underlying assumptions.

pith-pipeline@v0.9.1-grok · 5742 in / 940 out tokens · 20824 ms · 2026-06-27T01:47:02.842845+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

30 extracted references · 22 canonical work pages · 9 internal anchors

[1]

Sangmin Bae, Yujin Kim, Reza Bayat, Sungnyun Kim, Jiyoun Ha, Tal Schuster, Adam Fisch, Hrayr Harutyunyan, Ziwei Ji, Aaron Courville, and Se-Young Yun

URLhttps://www-cdn.anthropic.com/ 14e4fb01875d2a69f646fa5e574dea2b1c0ff7b5.pdf. Sangmin Bae, Yujin Kim, Reza Bayat, Sungnyun Kim, Jiyoun Ha, Tal Schuster, Adam Fisch, Hrayr Harutyunyan, Ziwei Ji, Aaron Courville, and Se-Young Yun. Mixture-of-Recursions: Learn- ing Dynamic Recursive Depths for Adaptive Token-Level Computation.arXiv e-prints, art. arXiv:250...

arXiv
[2]

Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation.Advances in Neural Information Processing Systems (NeurIPS), 2025

doi: 10.48550/arXiv.2507.10524. Shaojie Bai, J. Zico Kolter, and Vladlen Koltun.Deep equilibrium models. Curran Associates Inc., Red Hook, NY , USA,

work page doi:10.48550/arxiv.2507.10524
[3]

Next embedding pre- diction makes world models stronger

George Bredis, Nikita Balagansky, Daniil Gavrilov, and Ruslan Rakhimov. Next embedding pre- diction makes world models stronger. InICLR 2026 the 2nd Workshop on World Models: Un- derstanding, Modelling and Scaling,

2026
[4]

URLhttps: //openreview.net/forum?id=HyzdRiR9Y7. 31 Published by FaceMind Research Asia Fachrina Dewi Puspitasari, Chaoning Zhang, Joseph Cho, Adnan Haider, Noor Ul Eman, Omer Amin, Alexis Mankowski, Muhammad Umair, Jingyao Zheng, Sheng Zheng, Lik-Hang Lee, Caiyan Qin, Tae-Ho Kim, Choong Seon Hong, Yang Yang, and Heng Tao Shen. Sora as a World Model? A Com...

arXiv
[5]

Sora as an agi world model? a complete survey on text-to-video generation.arXiv preprint arXiv:2403.05131, 2024

doi: 10.48550/arXiv.2403.05131. Ying Fan, Yilun Du, Kannan Ramchandran, and Kangwook Lee. Looped Transformers for Length Generalization.arXiv e-prints, art. arXiv:2409.15647, September

work page doi:10.48550/arxiv.2403.05131
[6]

2504.21237

doi: 10.48550/arXiv. 2409.15647. Tuo Feng, Wenguan Wang, and Yi Yang. A Survey of World Models for Autonomous Driving. arXiv e-prints, art. arXiv:2501.11260, January

work page internal anchor Pith review doi:10.48550/arxiv
[7]

Jonas Geiping, Sean McLeish, Neel Jain, John Kirchenbauer, Siddharth Singh, Brian R

doi: 10.48550/arXiv.2501.11260. Jonas Geiping, Sean McLeish, Neel Jain, John Kirchenbauer, Siddharth Singh, Brian R. Bartoldson, Bhavya Kailkhura, Abhinav Bhatele, and Tom Goldstein. Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach.arXiv e-prints, art. arXiv:2502.05171, February

work page doi:10.48550/arxiv.2501.11260
[8]

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

doi: 10.48550/arXiv.2502.05171. Gemini Team, Google DeepMind. Gemini 3 flash model card. Technical report, Google Deep- Mind, December

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2502.05171
[9]

Genie 3: A new frontier for world models, 2025.https://deepmind

Google DeepMind. Genie 3: A new frontier for world models, 2025.https://deepmind. google/blog/genie-3-a-new-frontier-for-world-models/. Alex Graves. Adaptive Computation Time for Recurrent Neural Networks.arXiv e-prints, art. arXiv:1603.08983, March

Pith/arXiv arXiv 2025
[10]

Adaptive Computation Time for Recurrent Neural Networks

doi: 10.48550/arXiv.1603.08983. Yanchen Guan, Haicheng Liao, Zhenning Li, Jia Hu, Runze Yuan, Yunjian Li, Guohui Zhang, and Chengzhong Xu. World Models for Autonomous Driving: An Initial Survey.arXiv e-prints, art. arXiv:2403.02622, March

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1603.08983
[11]

David Ha and J¨urgen Schmidhuber

doi: 10.48550/arXiv.2403.02622. David Ha and J¨urgen Schmidhuber. Recurrent world models facilitate policy evolution. InProceed- ings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, pp. 2455–2467, Red Hook, NY , USA,

work page doi:10.48550/arxiv.2403.02622
[12]

Mastering diverse control tasks through world models.Nature, 640:647–653, 2025

DOI: 10.1038/s41586-025-08744-2. Ahmadreza Jeddi, Marco Ciccone, and Babak Taati. LoopFormer: Elastic-Depth Looped Trans- formers for Latent Reasoning via Shortcut Modulation.arXiv e-prints, art. arXiv:2602.11451, February

work page doi:10.1038/s41586-025-08744-2
[13]

LoopFormer: Elastic-Depth Looped Transformers for Latent Reasoning via Shortcut Modulation.International Conference on Learning Representations (ICLR), 2026

doi: 10.48550/arXiv.2602.11451. Divya Jyoti Bajpai and Manjesh Kumar Hanawal. A Survey of Early Exit Deep Neural Networks in NLP.arXiv e-prints, art. arXiv:2501.07670, January

work page doi:10.48550/arxiv.2602.11451
[14]

32 Published by FaceMind Research Asia Yeskendir Koishekenov, Aldo Lipani, and Nicola Cancedda

doi: 10.48550/arXiv.2501.07670. 32 Published by FaceMind Research Asia Yeskendir Koishekenov, Aldo Lipani, and Nicola Cancedda. Encode, think, decode: Scaling test- time reasoning with recursive latent thoughts,

work page doi:10.48550/arxiv.2501.07670
[15]

Wenxuan Li, Hang Zhao, Zhiyuan Yu, Yu Du, Qin Zou, Ruizhen Hu, and Kai Xu

URLhttps://openreview.net/forum? id=H1eA7AEtvS. Wenxuan Li, Hang Zhao, Zhiyuan Yu, Yu Du, Qin Zou, Ruizhen Hu, and Kai Xu. PIN-WM: Learning Physics-INformed World Models for Non-Prehensile Manipulation.arXiv e-prints, art. arXiv:2504.16693, April 2025a. doi: 10.48550/arXiv.2504.16693. Xinqing Li, Xin He, Le Zhang, Min Wu, Xiaoli Li, and Yun Liu. A Compreh...

work page doi:10.48550/arxiv.2504.16693
[16]

Vincent Micheli, Eloi Alonso, and Franc ¸ois Fleuret

doi: 10.48550/arXiv.2206.09328. Vincent Micheli, Eloi Alonso, and Franc ¸ois Fleuret. Transformers are sample-efficient world models. InThe Eleventh International Conference on Learning Representations,

work page doi:10.48550/arxiv.2206.09328
[17]

URLhttps: //openreview.net/forum?id=BiWIERWBFX. OpenAI. Video generation models as world simulators, 2024.https://openai.com/index/ video-generation-models-as-world-simulators/. Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. In Pierre Isabelle, Eugene Charniak, and Dekang Lin (e...

2024
[18]

Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics , year =

Association for Computational Linguis- tics. doi: 10.3115/1073083.1073135. URLhttps://aclanthology.org/P02-1040/. Francesco Pappone, Donato Crisostomi, and Emanuele Rodol `a. Two-Scale Latent Dynamics for Recurrent-Depth Transformers.arXiv e-prints, art. arXiv:2509.23314, September

work page doi:10.3115/1073083.1073135
[19]

Hayden Prairie, Zachary Novack, Taylor Berg-Kirkpatrick, and Daniel Y

doi: 10.48550/arXiv.2509.23314. Hayden Prairie, Zachary Novack, Taylor Berg-Kirkpatrick, and Daniel Y . Fu. Parcae: Scaling Laws For Stable Looped Language Models.arXiv e-prints, art. arXiv:2604.12946, April

work page doi:10.48550/arxiv.2509.23314
[20]

Parcae: Scaling Laws For Stable Looped Language Models

doi: 10.48550/arXiv.2604.12946. Qwen Team. Qwen3.5: Accelerating productivity with native multimodal agents, February

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.12946
[21]

Mastering atari, go, chess and shogi by planning with a learned model.Nature, 588:604–609, 2020

URLhttps://openreview.net/forum? id=din0lGfZFd. Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, et al. Mastering atari, go, chess and shogi by planning with a learned model.Nature, 588:604–609, 2020.https: //doi.org/10.1038/s41586-020-03051-...

work page internal anchor Pith review doi:10.1038/s41586-020-03051-4 2020
[22]

33 Published by FaceMind Research Asia Surat Teerapittayanon, Bradley McDanel, and H. T. Kung. BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks.arXiv e-prints, art. arXiv:1709.01686, September

Pith/arXiv arXiv
[23]

BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks

doi: 10.48550/arXiv.1709.01686. Luozhou Wang, Zhifei Chen, Yihua Du, Dongyu Yan, Wenhang Ge, Guibao Shen, Xinli Xu, Leyi Wu, Man Chen, Tianshuo Xu, Peiran Ren, Xin Tao, Pengfei Wan, and Ying-Cong Chen. A Mechanistic View on Video Generation as World Models: State and Dynamics.arXiv e-prints, art. arXiv:2601.17067, January

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1709.01686
[24]

Ruoyao Wang, Peter Jansen, Marc-Alexandre C ˆot´e, and Prithviraj Ammanabrolu

doi: 10.48550/arXiv.2601.17067. Ruoyao Wang, Peter Jansen, Marc-Alexandre C ˆot´e, and Prithviraj Ammanabrolu. ScienceWorld: Is your agent smarter than a 5th grader? In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (eds.),Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp. 11279–11298, Abu Dhabi, United Arab Emira...

work page doi:10.48550/arxiv.2601.17067 2022
[25]

ProPhy: Progressive Physical Alignment for Dynamic World Simulation

Associa- tion for Computational Linguistics. doi: 10.18653/v1/2022.emnlp-main.775. URLhttps: //aclanthology.org/2022.emnlp-main.775/. Zijun Wang, Panwen Hu, Jing Wang, Terry Jingchen Zhang, Yuhao Cheng, Long Chen, Yiqiang Yan, Zutao Jiang, Hanhui Li, and Xiaodan Liang. ProPhy: Progressive Physical Alignment for Dynamic World Simulation.arXiv e-prints, art...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.18653/v1/2022.emnlp-main.775 2022
[26]

Chenjun Xiao, Yifan Wu, Chen Ma, Dale Schuurmans, and Martin M ¨uller

48550/arXiv.2512.05564. Chenjun Xiao, Yifan Wu, Chen Ma, Dale Schuurmans, and Martin M ¨uller. Learning to combat compounding-error in model-based reinforcement learning,

Pith/arXiv arXiv
[27]

Liu Yang, Kangwook Lee, Robert Nowak, and Dimitris Papailiopoulos

URLhttps:// openreview.net/forum?id=S1g_S0VYvr. Liu Yang, Kangwook Lee, Robert Nowak, and Dimitris Papailiopoulos. Looped Transformers are Better at Learning Learning Algorithms.arXiv e-prints, art. arXiv:2311.12424, November

arXiv
[28]

Abbas Zeitoun, Lucas Torroba-Hennigen, and Yoon Kim

doi: 10.48550/arXiv.2311.12424. Abbas Zeitoun, Lucas Torroba-Hennigen, and Yoon Kim. Hyperloop Transformers.arXiv e-prints, art. arXiv:2604.21254, April

work page doi:10.48550/arxiv.2311.12424
[29]

doi: 10.48550/arXiv.2604.21254. Rui-Jie Zhu, Zixuan Wang, Kai Hua, Tianyu Zhang, Ziniu Li, Haoran Que, Boyi Wei, Zixin Wen, Fan Yin, He Xing, Lu Li, Jiajun Shi, Kaijing Ma, Shanda Li, Taylor Kergan, Andrew Smith, Xingwei Qu, Mude Hui, Bohong Wu, Qiyang Min, Hongzhi Huang, Xun Zhou, Wei Ye, Jiaheng Liu, Jian Yang, Yunfeng Shi, Chenghua Lin, Enduo Zhao, Tia...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.21254
[30]

doi: 10.48550/arXiv.2510.25741. Łukasz Kaiser, Mohammad Babaeizadeh, Piotr Miłos, Bła ˙zej Osi ´nski, Roy H Campbell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski, Sergey Levine, Afroz Mohiuddin, Ryan Sepassi, George Tucker, and Henryk Michalewski. Model based reinforcement learning for atari. InInternational Conference on Learning Rep...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2510.25741

[1] [1]

Sangmin Bae, Yujin Kim, Reza Bayat, Sungnyun Kim, Jiyoun Ha, Tal Schuster, Adam Fisch, Hrayr Harutyunyan, Ziwei Ji, Aaron Courville, and Se-Young Yun

URLhttps://www-cdn.anthropic.com/ 14e4fb01875d2a69f646fa5e574dea2b1c0ff7b5.pdf. Sangmin Bae, Yujin Kim, Reza Bayat, Sungnyun Kim, Jiyoun Ha, Tal Schuster, Adam Fisch, Hrayr Harutyunyan, Ziwei Ji, Aaron Courville, and Se-Young Yun. Mixture-of-Recursions: Learn- ing Dynamic Recursive Depths for Adaptive Token-Level Computation.arXiv e-prints, art. arXiv:250...

arXiv

[2] [2]

Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation.Advances in Neural Information Processing Systems (NeurIPS), 2025

doi: 10.48550/arXiv.2507.10524. Shaojie Bai, J. Zico Kolter, and Vladlen Koltun.Deep equilibrium models. Curran Associates Inc., Red Hook, NY , USA,

work page doi:10.48550/arxiv.2507.10524

[3] [3]

Next embedding pre- diction makes world models stronger

George Bredis, Nikita Balagansky, Daniil Gavrilov, and Ruslan Rakhimov. Next embedding pre- diction makes world models stronger. InICLR 2026 the 2nd Workshop on World Models: Un- derstanding, Modelling and Scaling,

2026

[4] [4]

URLhttps: //openreview.net/forum?id=HyzdRiR9Y7. 31 Published by FaceMind Research Asia Fachrina Dewi Puspitasari, Chaoning Zhang, Joseph Cho, Adnan Haider, Noor Ul Eman, Omer Amin, Alexis Mankowski, Muhammad Umair, Jingyao Zheng, Sheng Zheng, Lik-Hang Lee, Caiyan Qin, Tae-Ho Kim, Choong Seon Hong, Yang Yang, and Heng Tao Shen. Sora as a World Model? A Com...

arXiv

[5] [5]

Sora as an agi world model? a complete survey on text-to-video generation.arXiv preprint arXiv:2403.05131, 2024

doi: 10.48550/arXiv.2403.05131. Ying Fan, Yilun Du, Kannan Ramchandran, and Kangwook Lee. Looped Transformers for Length Generalization.arXiv e-prints, art. arXiv:2409.15647, September

work page doi:10.48550/arxiv.2403.05131

[6] [6]

2504.21237

doi: 10.48550/arXiv. 2409.15647. Tuo Feng, Wenguan Wang, and Yi Yang. A Survey of World Models for Autonomous Driving. arXiv e-prints, art. arXiv:2501.11260, January

work page internal anchor Pith review doi:10.48550/arxiv

[7] [7]

Jonas Geiping, Sean McLeish, Neel Jain, John Kirchenbauer, Siddharth Singh, Brian R

doi: 10.48550/arXiv.2501.11260. Jonas Geiping, Sean McLeish, Neel Jain, John Kirchenbauer, Siddharth Singh, Brian R. Bartoldson, Bhavya Kailkhura, Abhinav Bhatele, and Tom Goldstein. Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach.arXiv e-prints, art. arXiv:2502.05171, February

work page doi:10.48550/arxiv.2501.11260

[8] [8]

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

doi: 10.48550/arXiv.2502.05171. Gemini Team, Google DeepMind. Gemini 3 flash model card. Technical report, Google Deep- Mind, December

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2502.05171

[9] [9]

Genie 3: A new frontier for world models, 2025.https://deepmind

Google DeepMind. Genie 3: A new frontier for world models, 2025.https://deepmind. google/blog/genie-3-a-new-frontier-for-world-models/. Alex Graves. Adaptive Computation Time for Recurrent Neural Networks.arXiv e-prints, art. arXiv:1603.08983, March

Pith/arXiv arXiv 2025

[10] [10]

Adaptive Computation Time for Recurrent Neural Networks

doi: 10.48550/arXiv.1603.08983. Yanchen Guan, Haicheng Liao, Zhenning Li, Jia Hu, Runze Yuan, Yunjian Li, Guohui Zhang, and Chengzhong Xu. World Models for Autonomous Driving: An Initial Survey.arXiv e-prints, art. arXiv:2403.02622, March

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1603.08983

[11] [11]

David Ha and J¨urgen Schmidhuber

doi: 10.48550/arXiv.2403.02622. David Ha and J¨urgen Schmidhuber. Recurrent world models facilitate policy evolution. InProceed- ings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, pp. 2455–2467, Red Hook, NY , USA,

work page doi:10.48550/arxiv.2403.02622

[12] [12]

Mastering diverse control tasks through world models.Nature, 640:647–653, 2025

DOI: 10.1038/s41586-025-08744-2. Ahmadreza Jeddi, Marco Ciccone, and Babak Taati. LoopFormer: Elastic-Depth Looped Trans- formers for Latent Reasoning via Shortcut Modulation.arXiv e-prints, art. arXiv:2602.11451, February

work page doi:10.1038/s41586-025-08744-2

[13] [13]

LoopFormer: Elastic-Depth Looped Transformers for Latent Reasoning via Shortcut Modulation.International Conference on Learning Representations (ICLR), 2026

doi: 10.48550/arXiv.2602.11451. Divya Jyoti Bajpai and Manjesh Kumar Hanawal. A Survey of Early Exit Deep Neural Networks in NLP.arXiv e-prints, art. arXiv:2501.07670, January

work page doi:10.48550/arxiv.2602.11451

[14] [14]

32 Published by FaceMind Research Asia Yeskendir Koishekenov, Aldo Lipani, and Nicola Cancedda

doi: 10.48550/arXiv.2501.07670. 32 Published by FaceMind Research Asia Yeskendir Koishekenov, Aldo Lipani, and Nicola Cancedda. Encode, think, decode: Scaling test- time reasoning with recursive latent thoughts,

work page doi:10.48550/arxiv.2501.07670

[15] [15]

Wenxuan Li, Hang Zhao, Zhiyuan Yu, Yu Du, Qin Zou, Ruizhen Hu, and Kai Xu

URLhttps://openreview.net/forum? id=H1eA7AEtvS. Wenxuan Li, Hang Zhao, Zhiyuan Yu, Yu Du, Qin Zou, Ruizhen Hu, and Kai Xu. PIN-WM: Learning Physics-INformed World Models for Non-Prehensile Manipulation.arXiv e-prints, art. arXiv:2504.16693, April 2025a. doi: 10.48550/arXiv.2504.16693. Xinqing Li, Xin He, Le Zhang, Min Wu, Xiaoli Li, and Yun Liu. A Compreh...

work page doi:10.48550/arxiv.2504.16693

[16] [16]

Vincent Micheli, Eloi Alonso, and Franc ¸ois Fleuret

doi: 10.48550/arXiv.2206.09328. Vincent Micheli, Eloi Alonso, and Franc ¸ois Fleuret. Transformers are sample-efficient world models. InThe Eleventh International Conference on Learning Representations,

work page doi:10.48550/arxiv.2206.09328

[17] [17]

URLhttps: //openreview.net/forum?id=BiWIERWBFX. OpenAI. Video generation models as world simulators, 2024.https://openai.com/index/ video-generation-models-as-world-simulators/. Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. In Pierre Isabelle, Eugene Charniak, and Dekang Lin (e...

2024

[18] [18]

Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics , year =

Association for Computational Linguis- tics. doi: 10.3115/1073083.1073135. URLhttps://aclanthology.org/P02-1040/. Francesco Pappone, Donato Crisostomi, and Emanuele Rodol `a. Two-Scale Latent Dynamics for Recurrent-Depth Transformers.arXiv e-prints, art. arXiv:2509.23314, September

work page doi:10.3115/1073083.1073135

[19] [19]

Hayden Prairie, Zachary Novack, Taylor Berg-Kirkpatrick, and Daniel Y

doi: 10.48550/arXiv.2509.23314. Hayden Prairie, Zachary Novack, Taylor Berg-Kirkpatrick, and Daniel Y . Fu. Parcae: Scaling Laws For Stable Looped Language Models.arXiv e-prints, art. arXiv:2604.12946, April

work page doi:10.48550/arxiv.2509.23314

[20] [20]

Parcae: Scaling Laws For Stable Looped Language Models

doi: 10.48550/arXiv.2604.12946. Qwen Team. Qwen3.5: Accelerating productivity with native multimodal agents, February

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.12946

[21] [21]

Mastering atari, go, chess and shogi by planning with a learned model.Nature, 588:604–609, 2020

URLhttps://openreview.net/forum? id=din0lGfZFd. Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, et al. Mastering atari, go, chess and shogi by planning with a learned model.Nature, 588:604–609, 2020.https: //doi.org/10.1038/s41586-020-03051-...

work page internal anchor Pith review doi:10.1038/s41586-020-03051-4 2020

[22] [22]

33 Published by FaceMind Research Asia Surat Teerapittayanon, Bradley McDanel, and H. T. Kung. BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks.arXiv e-prints, art. arXiv:1709.01686, September

Pith/arXiv arXiv

[23] [23]

BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks

doi: 10.48550/arXiv.1709.01686. Luozhou Wang, Zhifei Chen, Yihua Du, Dongyu Yan, Wenhang Ge, Guibao Shen, Xinli Xu, Leyi Wu, Man Chen, Tianshuo Xu, Peiran Ren, Xin Tao, Pengfei Wan, and Ying-Cong Chen. A Mechanistic View on Video Generation as World Models: State and Dynamics.arXiv e-prints, art. arXiv:2601.17067, January

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1709.01686

[24] [24]

Ruoyao Wang, Peter Jansen, Marc-Alexandre C ˆot´e, and Prithviraj Ammanabrolu

doi: 10.48550/arXiv.2601.17067. Ruoyao Wang, Peter Jansen, Marc-Alexandre C ˆot´e, and Prithviraj Ammanabrolu. ScienceWorld: Is your agent smarter than a 5th grader? In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (eds.),Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp. 11279–11298, Abu Dhabi, United Arab Emira...

work page doi:10.48550/arxiv.2601.17067 2022

[25] [25]

ProPhy: Progressive Physical Alignment for Dynamic World Simulation

Associa- tion for Computational Linguistics. doi: 10.18653/v1/2022.emnlp-main.775. URLhttps: //aclanthology.org/2022.emnlp-main.775/. Zijun Wang, Panwen Hu, Jing Wang, Terry Jingchen Zhang, Yuhao Cheng, Long Chen, Yiqiang Yan, Zutao Jiang, Hanhui Li, and Xiaodan Liang. ProPhy: Progressive Physical Alignment for Dynamic World Simulation.arXiv e-prints, art...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.18653/v1/2022.emnlp-main.775 2022

[26] [26]

Chenjun Xiao, Yifan Wu, Chen Ma, Dale Schuurmans, and Martin M ¨uller

48550/arXiv.2512.05564. Chenjun Xiao, Yifan Wu, Chen Ma, Dale Schuurmans, and Martin M ¨uller. Learning to combat compounding-error in model-based reinforcement learning,

Pith/arXiv arXiv

[27] [27]

Liu Yang, Kangwook Lee, Robert Nowak, and Dimitris Papailiopoulos

URLhttps:// openreview.net/forum?id=S1g_S0VYvr. Liu Yang, Kangwook Lee, Robert Nowak, and Dimitris Papailiopoulos. Looped Transformers are Better at Learning Learning Algorithms.arXiv e-prints, art. arXiv:2311.12424, November

arXiv

[28] [28]

Abbas Zeitoun, Lucas Torroba-Hennigen, and Yoon Kim

doi: 10.48550/arXiv.2311.12424. Abbas Zeitoun, Lucas Torroba-Hennigen, and Yoon Kim. Hyperloop Transformers.arXiv e-prints, art. arXiv:2604.21254, April

work page doi:10.48550/arxiv.2311.12424

[29] [29]

doi: 10.48550/arXiv.2604.21254. Rui-Jie Zhu, Zixuan Wang, Kai Hua, Tianyu Zhang, Ziniu Li, Haoran Que, Boyi Wei, Zixin Wen, Fan Yin, He Xing, Lu Li, Jiajun Shi, Kaijing Ma, Shanda Li, Taylor Kergan, Andrew Smith, Xingwei Qu, Mude Hui, Bohong Wu, Qiyang Min, Hongzhi Huang, Xun Zhou, Wei Ye, Jiaheng Liu, Jian Yang, Yunfeng Shi, Chenghua Lin, Enduo Zhao, Tia...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.21254

[30] [30]

doi: 10.48550/arXiv.2510.25741. Łukasz Kaiser, Mohammad Babaeizadeh, Piotr Miłos, Bła ˙zej Osi ´nski, Roy H Campbell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski, Sergey Levine, Afroz Mohiuddin, Ryan Sepassi, George Tucker, and Henryk Michalewski. Model based reinforcement learning for atari. InInternational Conference on Learning Rep...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2510.25741