Recognition: 2 theorem links
· Lean TheoremHierarchical Planning with Latent World Models
Pith reviewed 2026-05-13 20:10 UTC · model grok-4.3
The pith
Learning latent world models at multiple temporal scales and planning hierarchically across them enables reliable long-horizon control with far less online computation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Training latent world models at multiple temporal scales and executing hierarchical planning across those scales lets agents solve long-horizon embodied control problems more reliably and with substantially lower inference-time cost than flat planning. The hierarchical planner reaches 70 percent success on real-robot pick-and-place using only a final goal image, while a single-level model reaches zero percent. Across physics-based simulations the method improves success on push manipulation and maze navigation while requiring up to four times less planning compute. The abstraction works as a modular layer on top of diverse latent world-model architectures.
What carries the argument
A hierarchy of latent world models, each trained to predict dynamics at a distinct temporal scale, with planning that optimizes coarse actions at long scales before refining them at shorter scales.
If this is right
- Zero-shot control on real non-greedy robotic tasks becomes feasible using only a final goal specification.
- Planning-time compute drops by a factor of up to four while success rates increase in both real and simulated domains.
- The method functions as a modular planning layer compatible with many existing latent world-model architectures.
- Long-horizon reasoning is possible without the exponential growth in search space that limits flat model-predictive control.
Where Pith is reading between the lines
- The same multi-scale hierarchy could be applied to other sequential decision domains such as video-game planning or long-term scheduling.
- If the coarsest-scale model remains accurate, the approach may scale to horizons orders of magnitude longer than those tested.
- Lower planning cost could make model-based control practical on embedded hardware with limited onboard compute.
Load-bearing premise
The multi-scale models must predict future states accurately enough that planning across scales reduces rather than compounds long-horizon prediction error.
What would settle it
A controlled long-horizon experiment in which the hierarchical planner produces lower task success or higher planning time than a well-tuned single-scale planner.
read the original abstract
Model predictive control (MPC) with learned world models has emerged as a promising paradigm for embodied control, particularly for its ability to generalize zero-shot when deployed in new environments. However, learned world models often struggle with long-horizon control due to the accumulation of prediction errors and the exponentially growing search space. In this work, we address these challenges by learning latent world models at multiple temporal scales and performing hierarchical planning across these scales, enabling long-horizon reasoning while substantially reducing inference-time planning complexity. Our approach serves as a modular planning abstraction that applies across diverse latent world-model architectures and domains. We demonstrate that this hierarchical approach enables zero-shot control on real-world non-greedy robotic tasks, achieving a 70% success rate on pick-&-place using only a final goal specification, compared to 0% for a single-level world model. In addition, across physics-based simulated environments including push manipulation and maze navigation, hierarchical planning achieves higher success while requiring up to 4x less planning-time compute.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes learning latent world models at multiple temporal scales and performing hierarchical planning across them to enable long-horizon model predictive control while reducing inference-time compute. It claims this modular approach yields zero-shot real-world success on non-greedy robotic pick-and-place (70% vs 0% for single-level baselines) and higher success rates with up to 4x less planning compute in simulated push-manipulation and maze-navigation tasks.
Significance. If the central empirical claims hold after proper validation, the work would be significant for embodied control and RL. It offers a practical, architecture-agnostic way to scale planning in learned dynamics models without exponential search costs, directly addressing error accumulation in long-horizon MPC. The reported real-robot zero-shot results and compute savings would be impactful if reproducible.
major comments (3)
- [Abstract and Section 3] Abstract and Section 3: The headline claim that multi-scale latent models can be composed hierarchically without compounding prediction errors (rather than masking single-level failures) is load-bearing but unsupported by direct evidence; no per-level rollout error metrics, horizon-wise accuracy comparisons, or propagation analysis from coarse to fine scales are reported.
- [Section 4 (Experiments)] Section 4 (Experiments): The 70% vs 0% real-robot success rates and simulated gains lack ablations on joint vs separate training of scales, number of trials, variance, or controls isolating hierarchy from other implementation details; without these the improvements cannot be confidently attributed to the proposed mechanism.
- [Methods] Methods: The description of how coarse-scale plans constrain or refine fine-scale rollouts does not include any measurement of how approximation errors at higher temporal scales affect long-horizon accuracy at lower scales, leaving the weakest assumption untested.
minor comments (1)
- [Notation] Notation throughout: The precise definition of temporal scales, their horizons, and the interface between planning levels would benefit from an explicit equation or pseudocode block for reproducibility.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback. We address each major comment below and describe the revisions we will incorporate to strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract and Section 3] Abstract and Section 3: The headline claim that multi-scale latent models can be composed hierarchically without compounding prediction errors (rather than masking single-level failures) is load-bearing but unsupported by direct evidence; no per-level rollout error metrics, horizon-wise accuracy comparisons, or propagation analysis from coarse to fine scales are reported.
Authors: We agree that direct per-level error metrics and propagation analysis would provide stronger support. In the revised manuscript we will add these measurements in Section 3, including horizon-wise prediction accuracy at each scale and an explicit comparison of error accumulation between hierarchical and flat rollouts. revision: yes
-
Referee: [Section 4 (Experiments)] Section 4 (Experiments): The 70% vs 0% real-robot success rates and simulated gains lack ablations on joint vs separate training of scales, number of trials, variance, or controls isolating hierarchy from other implementation details; without these the improvements cannot be confidently attributed to the proposed mechanism.
Authors: We will expand Section 4 with the requested ablations: joint versus separate training of the scales, the exact number of trials performed, standard deviations on success rates, and additional controls that isolate the hierarchical planning component from other implementation choices. revision: yes
-
Referee: [Methods] Methods: The description of how coarse-scale plans constrain or refine fine-scale rollouts does not include any measurement of how approximation errors at higher temporal scales affect long-horizon accuracy at lower scales, leaving the weakest assumption untested.
Authors: We will augment the Methods section with quantitative results that measure the effect of coarse-scale approximation error on fine-scale long-horizon accuracy. This will include controlled experiments that deliberately degrade coarse-scale predictions and report the resulting impact on overall task performance. revision: yes
Circularity Check
No circularity detected; empirical claims rest on experimental comparisons
full rationale
The paper's core contribution is an empirical demonstration of hierarchical planning over multi-scale latent world models, validated through success rates (70% real-robot pick-and-place vs 0% single-level) and compute reductions (up to 4x) in simulation environments. No load-bearing equations, fitted parameters renamed as predictions, or self-citation chains reduce the central result to its inputs by construction. The approach is presented as a modular abstraction applicable across architectures, with performance measured against independent baselines rather than derived tautologically from definitions or prior author work.
Axiom & Free-Parameter Ledger
free parameters (1)
- number of temporal scales and their horizons
axioms (1)
- domain assumption Latent world models can be trained to predict dynamics reliably at multiple distinct temporal resolutions
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
learning latent world models at multiple temporal scales and performing hierarchical planning across these scales... high-level planner optimizes macro-actions... low-level planner optimizes primitive actions... E2(ˆl1:H;z1,zg) ≜ ||zg−P(2)(ˆl1:H;z1)||1
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
8-tick period... three spatial dimensions... J(x)=½(x+x⁻¹)−1
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
Latent State Design for World Models under Sufficiency Constraints
World models succeed when their latent states are built to meet task-specific sufficiency constraints rather than preserving the maximum amount of information.
Reference graph
Works this paper leans on
-
[1]
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...
-
[2]
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning
Mido Assran, Adrien Bardes, David Fan, Quentin Garrido, Russell Howes, Matthew Muckley, Ammar Rizvi, Claire Roberts, Koustuv Sinha, Artem Zholus, et al. V-jepa 2: Self-supervised video models enable understanding, prediction and planning. arXiv preprint arXiv:2506.09985, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[3]
Roboarena: Distributed real-world evaluation of generalist robot policies
Pranav Atreya, Karl Pertsch, Tony Lee, Moo Jin Kim, Arhan Jain, Artur Kuramshin, Clemens Eppner, Cyrus Neary, Edward Hu, Fabio Ramos, et al. Roboarena: Distributed real-world evaluation of generalist robot policies. In Proceedings of the Conference on Robot Learning (CoRL 2025), 2025
work page 2025
-
[4]
The option-critic architecture
Pierre-Luc Bacon, Jean Harb, and Doina Precup. The option-critic architecture. In Proceedings of the AAAI conference on artificial intelligence, volume 31, 2017
work page 2017
-
[5]
TD - JEPA : Latent-predictive representations for zero-shot reinforcement learning
Marco Bagatella, Matteo Pirotta, Ahmed Touati, Alessandro Lazaric, and Andrea Tirinzoni. TD - JEPA : Latent-predictive representations for zero-shot reinforcement learning. In The Fourteenth International Conference on Learning Representations, 2026. https://openreview.net/forum?id=SzXDuBN8M1
work page 2026
-
[6]
Whole- body conditioned egocentric video prediction.arXiv preprint arXiv:2506.21552, 2025
Yutong Bai, Danny Tran, Amir Bar, Yann LeCun, Trevor Darrell, and Jitendra Malik. Whole-body conditioned egocentric video prediction. arXiv preprint arXiv:2506.21552, 2025
-
[7]
Amir Bar, Gaoyue Zhou, Danny Tran, Trevor Darrell, and Yann LeCun. Navigation world models. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 15791--15801, 2025
work page 2025
-
[8]
Adrien Bardes, Jean Ponce, and Yann LeCun. Vicreg: Variance-invariance-covariance regularization for self-supervised learning. arXiv preprint arXiv:2105.04906, 2021
-
[9]
Genie: Generative interactive environments
Jake Bruce, Michael D Dennis, Ashley Edwards, Jack Parker-Holder, Yuge Shi, Edward Hughes, Matthew Lai, Aditi Mavalankar, Richie Steigerwald, Chris Apps, et al. Genie: Generative interactive environments. In Forty-first International Conference on Machine Learning, 2024
work page 2024
-
[10]
Diffusion policy: Visuomotor policy learning via action diffusion
Cheng Chi, Zhenjia Xu, Siyuan Feng, Eric Cousineau, Yilun Du, Benjamin Burchfiel, Russ Tedrake, and Shuran Song. Diffusion policy: Visuomotor policy learning via action diffusion. The International Journal of Robotics Research, 44 0 (10-11): 0 1684--1704, 2025
work page 2025
-
[11]
Iql-td-mpc: Implicit q-learning for hierarchical model predictive control
Rohan Chitnis, Yingchen Xu, Bobak Hashemi, Lucas Lehnert, Urun Dogan, Zheqing Zhu, and Olivier Delalleau. Iql-td-mpc: Implicit q-learning for hierarchical model predictive control. In 2024 IEEE International Conference on Robotics and Automation (ICRA), pages 9154--9160. IEEE, 2024
work page 2024
-
[12]
Pilco: A model-based and data-efficient approach to policy search
Marc Deisenroth and Carl E Rasmussen. Pilco: A model-based and data-efficient approach to policy search. In Proceedings of the 28th International Conference on machine learning (ICML-11), pages 465--472, 2011
work page 2011
-
[13]
Self-supervised visual planning with temporal skip connections
Frederik Ebert, Chelsea Finn, Alex X Lee, and Sergey Levine. Self-supervised visual planning with temporal skip connections. CoRL, 12 0 (16): 0 23, 2017
work page 2017
-
[14]
Dynamics learning with cascaded variational inference for multi-step manipulation
Kuan Fang, Yuke Zhu, Animesh Garg, Silvio Savarese, and Li Fei-Fei. Dynamics learning with cascaded variational inference for multi-step manipulation. arXiv preprint arXiv:1910.13395, 2019
-
[15]
Learning hierarchical world models with adaptive temporal abstractions from discrete latent dynamics
Christian Gumbsch, Noor Sajid, Georg Martius, and Martin V Butz. Learning hierarchical world models with adaptive temporal abstractions from discrete latent dynamics. In The Twelfth International Conference on Learning Representations, 2023
work page 2023
-
[16]
David Ha and J \"u rgen Schmidhuber. World models. arXiv preprint arXiv:1803.10122, 2 0 (3), 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[17]
Learning latent dynamics for planning from pixels
Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, and James Davidson. Learning latent dynamics for planning from pixels. In International conference on machine learning, pages 2555--2565. PMLR, 2019
work page 2019
-
[18]
Deep hierarchical planning from pixels
Danijar Hafner, Kuang-Huei Lee, Ian Fischer, and Pieter Abbeel. Deep hierarchical planning from pixels. Advances in Neural Information Processing Systems, 35: 0 26091--26104, 2022
work page 2022
-
[19]
Mastering Diverse Domains through World Models
Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[20]
TD-MPC2: Scalable, Robust World Models for Continuous Control
Nicklas Hansen, Hao Su, and Xiaolong Wang. Td-mpc2: Scalable, robust world models for continuous control. arXiv preprint arXiv:2310.16828, 2023
work page internal anchor Pith review arXiv 2023
-
[21]
Hierarchical world models as visual whole-body humanoid controllers
Nicklas Hansen, Jyothir SV, Vlad Sobal, Yann LeCun, Xiaolong Wang, and Hao Su. Hierarchical world models as visual whole-body humanoid controllers. arXiv preprint arXiv:2405.18418, 2024
-
[22]
GAIA-1: A Generative World Model for Autonomous Driving
Anthony Hu, Lloyd Russell, Hudson Yeo, Zak Murez, George Fedoseev, Alex Kendall, Jamie Shotton, and Gianluca Corrado. Gaia-1: A generative world model for autonomous driving. arXiv preprint arXiv:2309.17080, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[23]
Broadly-exploring, local-policy trees for long-horizon task planning
Brian Ichter, Pierre Sermanet, and Corey Lynch. Broadly-exploring, local-policy trees for long-horizon task planning. arXiv preprint arXiv:2010.06491, 2020
-
[24]
$\pi_{0.5}$: a Vision-Language-Action Model with Open-World Generalization
Physical Intelligence, Kevin Black, Noah Brown, James Darpinian, Karan Dhabalia, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, et al. _ 0.5 : a vision-language-action model with open-world generalization. arXiv preprint arXiv:2504.16054, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[25]
When to trust your model: Model-based policy optimization
Michael Janner, Justin Fu, Marvin Zhang, and Sergey Levine. When to trust your model: Model-based policy optimization. Advances in neural information processing systems, 32, 2019
work page 2019
-
[26]
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
Alexander Khazatsky, Karl Pertsch, Suraj Nair, Ashwin Balakrishna, Sudeep Dasari, Siddharth Karamcheti, Soroush Nasiriany, Mohan Kumar Srirama, Lawrence Yunliang Chen, Kirsty Ellis, et al. Droid: A large-scale in-the-wild robot manipulation dataset. arXiv preprint arXiv:2403.12945, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[27]
Safe hierarchical model predictive control and planning for autonomous systems
Markus K \"o gel, Mohamed Ibrahim, Christian Kallies, and Rolf Findeisen. Safe hierarchical model predictive control and planning for autonomous systems. International Journal of Robust and Nonlinear Control, 35 0 (7): 0 2658--2676, 2025
work page 2025
-
[28]
Offline Reinforcement Learning with Implicit Q-Learning
Ilya Kostrikov, Ashvin Nair, and Sergey Levine. Offline reinforcement learning with implicit q-learning. arXiv preprint arXiv:2110.06169, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[29]
Robohive: A unified framework for robot learning
Vikash Kumar, Rutav Shah, Gaoyue Zhou, Vincent Moens, Vittorio Caggiano, Abhishek Gupta, and Aravind Rajeswaran. Robohive: A unified framework for robot learning. Advances in Neural Information Processing Systems, 36: 0 44323--44340, 2023
work page 2023
-
[30]
Planning in learned latent action spaces for generalizable legged locomotion
Tianyu Li, Roberto Calandra, Deepak Pathak, Yuandong Tian, Franziska Meier, and Akshara Rai. Planning in learned latent action spaces for generalizable legged locomotion. IEEE Robotics and Automation Letters, 6 0 (2): 0 2682--2689, 2021
work page 2021
-
[31]
stable-worldmodel-v1: Reproducible world modeling research and evaluation, 2026
Lucas Maes, Quentin Le Lidec, Dan Haramati, Nassim Massaudi, Damien Scieur, Yann LeCun, and Randall Balestriero. stable-worldmodel-v1: Reproducible world modeling research and evaluation. arXiv preprint arXiv:2602.08968, 2026
-
[32]
arXiv preprint arXiv:2203.12601 (2022)
Suraj Nair, Aravind Rajeswaran, Vikash Kumar, Chelsea Finn, and Abhinav Gupta. R3m: A universal visual representation for robot manipulation. arXiv preprint arXiv:2203.12601, 2022
-
[33]
DINOv2: Learning Robust Visual Features without Supervision
Maxime Oquab, Timoth \'e e Darcet, Th \'e o Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[34]
Hiql: Offline goal-conditioned rl with latent states as actions
Seohong Park, Dibya Ghosh, Benjamin Eysenbach, and Sergey Levine. Hiql: Offline goal-conditioned rl with latent states as actions. Advances in Neural Information Processing Systems, 36: 0 34866--34891, 2023
work page 2023
-
[35]
Ogbench: Benchmarking offline goal-conditioned rl.arXiv preprint arXiv:2410.20092,
Seohong Park, Kevin Frans, Benjamin Eysenbach, and Sergey Levine. Ogbench: Benchmarking offline goal-conditioned rl. arXiv preprint arXiv:2410.20092, 2024 a
-
[36]
Foundation policies with hilbert representations
Seohong Park, Tobias Kreiman, and Sergey Levine. Foundation policies with hilbert representations. arXiv preprint arXiv:2402.15567, 2024 b
-
[37]
FAST: Efficient Action Tokenization for Vision-Language-Action Models
Karl Pertsch, Kyle Stachowicz, Brian Ichter, Danny Driess, Suraj Nair, Quan Vuong, Oier Mees, Chelsea Finn, and Sergey Levine. Fast: Efficient action tokenization for vision-language-action models. arXiv preprint arXiv:2501.09747, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[38]
Reuven Y Rubinstein and Dirk P Kroese. The cross-entropy method: a unified approach to combinatorial optimization, Monte-Carlo simulation and machine learning. Springer Science & Business Media, 2004
work page 2004
-
[39]
Exploring the limits of hierarchical world models in reinforcement learning
Robin Schiewer, Anand Subramoney, and Laurenz Wiskott. Exploring the limits of hierarchical world models in reinforcement learning. Scientific Reports, 14 0 (1): 0 26856, 2024
work page 2024
-
[40]
Data-efficient reinforcement learning with self-predictive representations
Max Schwarzer, Ankesh Anand, Rishab Goel, R Devon Hjelm, Aaron Courville, and Philip Bachman. Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929, 2020
-
[41]
Learning from reward-free offline data: A case for planning with latent dynamics models
Vlad Sobal, Wancong Zhang, Kyunghyun Cho, Randall Balestriero, Tim GJ Rudner, and Yann LeCun. Learning from reward-free offline data: A case for planning with latent dynamics models. arXiv preprint arXiv:2502.14819, 2025
-
[42]
An adaptive network that constructs and uses and internal model of its world
Richard S Sutton. An adaptive network that constructs and uses and internal model of its world. Cognition and Brain Theory, 4 0 (3): 0 217--246, 1981
work page 1981
-
[43]
Dyna, an integrated architecture for learning, planning, and reacting
Richard S Sutton. Dyna, an integrated architecture for learning, planning, and reacting. ACM Sigart Bulletin, 2 0 (4): 0 160--163, 1991
work page 1991
-
[44]
Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning
Richard S Sutton, Doina Precup, and Satinder Singh. Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial intelligence, 112 0 (1-2): 0 181--211, 1999
work page 1999
-
[45]
Model regularization for stable sample rollouts
Erik Talvitie. Model regularization for stable sample rollouts. In UAI, pages 780--789, 2014
work page 2014
-
[46]
Octo: An Open-Source Generalist Robot Policy
Octo Model Team, Dibya Ghosh, Homer Walke, Karl Pertsch, Kevin Black, Oier Mees, Sudeep Dasari, Joey Hejna, Tobias Kreiman, Charles Xu, et al. Octo: An open-source generalist robot policy. arXiv preprint arXiv:2405.12213, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[47]
A Lightweight Library for Energy-Based Joint-Embedding Predictive Architectures
Basile Terver, Randall Balestriero, Megi Dervishi, David Fan, Quentin Garrido, Tushar Nagarajan, Koustuv Sinha, Wancong Zhang, Mike Rabbat, Yann LeCun, et al. A lightweight library for energy-based joint-embedding predictive architectures. arXiv preprint arXiv:2602.03604, 2026 a
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[48]
What drives success in physical planning with joint-embedding predictive world models?, 2026 b
Basile Terver, Tsung-Yen Yang, Jean Ponce, Adrien Bardes, and Yann LeCun. What drives success in physical planning with joint-embedding predictive world models?, 2026 b . https://arxiv.org/abs/2512.24497
-
[49]
Mujoco: A physics engine for model-based control
Emanuel Todorov, Tom Erez, and Yuval Tassa. Mujoco: A physics engine for model-based control. In IROS, pages 5026--5033. IEEE, 2012. ISBN 978-1-4673-1737-5. http://dblp.uni-trier.de/db/conf/iros/iros2012.html#TodorovET12
work page 2012
-
[50]
Embed to control: A locally linear latent dynamics model for control from raw images
Manuel Watter, Jost Springenberg, Joschka Boedecker, and Martin Riedmiller. Embed to control: A locally linear latent dynamics model for control from raw images. Advances in neural information processing systems, 28, 2015
work page 2015
-
[51]
Information theoretic mpc for model-based reinforcement learning
Grady Williams, Nolan Wagener, Brian Goldfain, Paul Drews, James M Rehg, Byron Boots, and Evangelos A Theodorou. Information theoretic mpc for model-based reinforcement learning. In 2017 IEEE international conference on robotics and automation (ICRA), pages 1714--1721. IEEE, 2017
work page 2017
-
[52]
arXiv preprint arXiv:2310.061141(2), 6 (2023)
Mengjiao Yang, Yilun Du, Kamyar Ghasemipour, Jonathan Tompson, Dale Schuurmans, and Pieter Abbeel. Learning interactive real-world simulators. arXiv preprint arXiv:2310.06114, 1 0 (2): 0 6, 2023
-
[53]
Light-weight probing of unsupervised representations for reinforcement learning
Wancong Zhang, Anthony GX-Chen, Vlad Sobal, Yann LeCun, and Nicolas Carion. Light-weight probing of unsupervised representations for reinforcement learning. arXiv preprint arXiv:2208.12345, 2022
-
[54]
Gaoyue Zhou, Hengkai Pan, Yann LeCun, and Lerrel Pinto. Dino-wm: World models on pre-trained visual features enable zero-shot planning. arXiv preprint arXiv:2411.04983, 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.