MotionDisco: Motion Discovery for Extreme Humanoid Loco-Manipulation
Pith reviewed 2026-06-28 01:23 UTC · model grok-4.3
The pith
MotionDisco discovers long-horizon humanoid loco-manipulation motions via automated LLM-guided evolutionary search without human demonstrations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MotionDisco is the first framework to discover and deploy long-horizon humanoid loco-manipulation skills entirely through automated evolutionary search by using LLM guidance over interaction sequences together with sequential kinodynamic trajectory optimization and pruning to produce viable whole-body trajectories.
What carries the argument
LLM-guided evolutionary search over sequences of interactions, combined with an efficient sequential kinodynamic trajectory optimizer and a pruning strategy.
If this is right
- The method finds successful trajectories for several challenging long-horizon tasks through ablation studies.
- Discovered motions can be used to train reinforcement learning policies that transfer to a real humanoid robot.
- The approach handles the combinatorial growth in possible contact interactions for tasks with multiple objects.
- Whole-body trajectories are generated without relying on teleoperation or motion retargeting.
Where Pith is reading between the lines
- This search-based discovery might apply to other robot platforms facing similar contact-rich planning problems.
- Reducing dependence on human data could accelerate development of autonomous robot behaviors in unstructured environments.
- Further improvements in LLM guidance could allow handling even longer task horizons or more objects.
Load-bearing premise
The combinatorial space of contact interactions can be navigated effectively by LLM-guided evolutionary search plus pruning without missing viable long-horizon solutions.
What would settle it
Observing that the evolutionary search consistently fails to produce any valid trajectory for a task involving more than a certain number of sequential contacts or objects, or that the transferred RL policy fails to execute on the physical robot.
Figures
read the original abstract
We present MotionDisco, a framework that discovers contact-rich, long-horizon humanoid loco-manipulation motions from scratch, without relying on teleoperation or motion retargeting from human demonstrations. This is challenging because the space of possible contact interactions grows combinatorially with the task horizon and the number of objects in the scene. MotionDisco enables rapid discovery of novel motions by coupling a large language model (LLM) guided evolutionary search over sequences of interactions with an efficient sequential kinodynamic trajectory optimizer and pruning strategy, enabling the rapid discovery of novel skills. Through extensive ablation studies, we show that our LLM-guided search discovers successful whole-body trajectories across several challenging long-horizon tasks. Finally, by training reinforcement learning tracking policies on the discovered trajectories, we transfer the motions to a real humanoid robot. This is the first work to discover and deploy long-horizon humanoid loco-manipulation skills entirely through automated evolutionary search. Supplementary videos of the experiments are available at: https://youtu.be/DHiVz34QYlw.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents MotionDisco, a framework for discovering contact-rich, long-horizon humanoid loco-manipulation motions from scratch without teleoperation or human demonstrations. It couples LLM-guided evolutionary search over interaction sequences with sequential kinodynamic trajectory optimization and pruning to navigate the combinatorial contact space, reports successful discovery across tasks via extensive ablations, and transfers the motions to a real humanoid via RL tracking policies. The work claims to be the first to achieve and deploy such skills entirely through automated evolutionary search.
Significance. If the results hold, this would advance automated skill discovery for humanoids by addressing combinatorial growth in contact interactions for long-horizon tasks. The LLM-guided search, pruning strategy, and real-robot transfer via RL are strengths. The manuscript emphasizes ablation studies showing successful whole-body trajectories, which, if quantitatively robust, would support the central claim of effective navigation without human input.
Simulated Author's Rebuttal
We thank the referee for their review and accurate summary of MotionDisco. The report lists no specific major comments despite the 'uncertain' recommendation, so we provide no point-by-point rebuttals. We stand by the manuscript's claims regarding automated discovery via LLM-guided search, trajectory optimization, and real-robot transfer, supported by the ablations described.
Circularity Check
No significant circularity
full rationale
The paper presents an engineering framework for motion discovery via LLM-guided evolutionary search coupled with kinodynamic optimization and pruning. No equations, derivations, fitted parameters, or first-principles results are described in the abstract or claimed central contributions. The method is evaluated through ablation studies and real-robot transfer, with no self-referential definitions, predictions that reduce to inputs by construction, or load-bearing self-citations that substitute for independent justification. The 'first work' claim is a priority statement, not a derived result. The derivation chain is therefore self-contained against external benchmarks with no reductions to circular inputs.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Tonneau, A
S. Tonneau, A. Del Prete, J. Pettr´e, C. Park, D. Manocha, and N. Mansard. An efficient acyclic contact planner for multiped robots.IEEE Transactions on Robotics, 34(3):586–601, 2018
2018
-
[2]
M. A. Toussaint, K. R. Allen, K. A. Smith, and J. B. Tenenbaum. Differentiable physics and stable modes for tool-use and manipulation planning. InRobotics: Science and Systems F oundation, 2018
2018
-
[3]
Ciebielski, V
M. Ciebielski, V . Dh ´edin, and M. Khadiv. Task and motion planning for humanoid loco- manipulation. InIEEE-RAS International Conference on Humanoid Robots (Humanoids), pages 1179–1186, 2025
2025
-
[4]
T. He, J. Gao, W. Xiao, Y . Zhang, Z. Wang, J. Wang, Z. Luo, G. He, N. Sobanbab, C. Pan, Z. Yi, G. Qu, K. Kitani, J. Hodgins, L. J. Fan, Y . Zhu, C. Liu, and G. Shi. ASAP: Aligning simulation and real-world physics for learning agile humanoid whole-body skills. InRobotics: Science and Systems, 2025
2025
-
[5]
T. He, Z. Luo, X. He, W. Xiao, C. Zhang, W. Zhang, K. Kitani, C. Liu, and G. Shi. Om- niH2O: Universal and dexterous human-to-humanoid whole-body teleoperation and learning. InConference on Robot Learning, 2024
2024
-
[6]
L. Yang, X. Huang, Z. Wu, A. Kanazawa, P. Abbeel, C. Sferrazza, C. K. Liu, R. Duan, and G. Shi. OmniRetarget: Interaction-preserving data generation for humanoid whole-body loco- manipulation and scene interaction.arXiv preprint arXiv:2509.26633, 2025
Pith/arXiv arXiv 2025
-
[7]
Taouil, H
I. Taouil, H. Zhao, A. Dai, and M. Khadiv. Physically consistent humanoid loco-manipulation using latent diffusion models. InIEEE-RAS International Conference on Humanoid Robots (Humanoids), pages 1179–1186, 2025
2025
-
[8]
X. B. Peng, P. Abbeel, S. Levine, and M. Van de Panne. DeepMimic: Example-guided deep re- inforcement learning of physics-based character skills.ACM Transactions On Graphics (TOG), 37(4):1–14, 2018
2018
-
[9]
T. E. Truong, Q. Liao, X. Huang, G. Tevet, C. K. Liu, and K. Sreenath. BeyondMimic: From motion tracking to versatile humanoid control via guided diffusion.arXiv preprint arXiv:2508.08241, 2025
Pith/arXiv arXiv 2025
-
[10]
H. Weng, Y . Li, N. Sobanbabu, Z. Wang, Z. Luo, T. He, D. Ramanan, and G. Shi. HDMI: Learning interactive humanoid whole-body control from human videos.arXiv preprint arXiv:2509.16757, 2025
arXiv 2025
-
[11]
S. Zhao, Y . Ze, Y . Wang, C. K. Liu, P. Abbeel, G. Shi, and R. Duan. ResMimic: From gen- eral motion tracking to humanoid whole-body loco-manipulation via residual learning.arXiv preprint arXiv:2510.05070, 2025
arXiv 2025
-
[12]
C. Pan, C. Wang, H. Qi, Z. Liu, H. Bharadhwaj, A. Sharma, T. Wu, G. Shi, J. Malik, and F. Hogan. SPIDER: Scalable physics-informed dexterous retargeting.arXiv preprint arXiv:2511.09484, 2025
arXiv 2025
-
[13]
V . Dhedin, I. Taouil, S. Omar, D. Yu, K. Tao, A. Dai, and M. Khadiv. DynaRetarget: Dynamically-feasible retargeting using sampling-based trajectory optimization.arXiv preprint arXiv:2602.06827, 2026
Pith/arXiv arXiv 2026
-
[14]
M. Ahn, A. Brohan, N. Brown, Y . Chebotar, O. Cortes, B. David, C. Finn, C. Fu, K. Gopalakr- ishnan, K. Hausman, et al. Do as I can, not as I say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691, 2022. 9
Pith/arXiv arXiv 2022
-
[15]
Liang, W
J. Liang, W. Huang, F. Xia, P. Xu, K. Hausman, B. Ichter, P. Florence, and A. Zeng. Code as policies: Language model programs for embodied control. InIEEE International conference on robotics and automation (ICRA), pages 9493–9500, 2023
2023
-
[16]
Curtis, N
A. Curtis, N. Kumar, J. Cao, T. Lozano-P´erez, and L. P. Kaelbling. Trust the PRoC3S: Solving long-horizon robotics problems with LLMs and constraint satisfaction. InConference on Robot Learning, 2024
2024
-
[17]
H. Chi, Z. Feng, Y . Lyu, C. Zheng, L. Luo, Y . S. Ong, I. Tsang, H. Chen, Y . Chang, and H. Yin. InstructFlow: Adaptive symbolic constraint-guided code generation for long-horizon planning. Advances in Neural Information Processing Systems, 38:2602–2632, 2026
2026
-
[18]
Y . J. Ma, W. Liang, G. Wang, D.-A. Huang, O. Bastani, D. Jayaraman, Y . Zhu, J. Fan, et al. Eureka: Human-level reward design via coding large language models. InInternational con- ference on learning Representations, volume 2024, pages 26516–26560, 2024
2024
-
[19]
Z. Wu, J. Li, P. Xu, and C. K. Liu. Human-object interaction from human-level instructions. InIEEE/CVF International Conference on Computer Vision, pages 11176–11186, 2025
2025
-
[20]
Shcherba, E
D. Shcherba, E. Cobo-Briesewitz, C. V . Braun, and M. Toussaint. Meta-optimization and program search using language models for task and motion planning. InConference on Robot Learning, 2025
2025
-
[21]
A. Novikov, N. V ˜u, M. Eisenberger, E. Dupont, P.-S. Huang, A. Z. Wagner, S. Shirobokov, B. Kozlovskii, F. J. Ruiz, A. Mehrabian, et al. AlphaEvolve: A coding agent for scientific and algorithmic discovery.arXiv preprint arXiv:2506.13131, 2025
Pith/arXiv arXiv 2025
-
[22]
R. T. Lange, Y . Imajuku, and E. Cetin. ShinkaEvolve: Towards open-ended and sample- efficient program evolution. InInternational Conference on Learning Representations, 2026
2026
- [23]
-
[24]
S. Liu, S. Agarwal, M. Maheswaran, M. Cemri, Z. Li, Q. Mang, A. Naren, E. Boneh, A. Cheng, M. Z. Pan, et al. EvoX: Meta-evolution for automated discovery.arXiv preprint arXiv:2602.23413, 2026
arXiv 2026
-
[25]
Ciebielski, H
M. Ciebielski, H. Zhao, A. M. Johnson, and M. Khadiv. Discovery of dynamic loco- manipulation behaviors. InICRA Workshop on Contact-Rich Control and Representation, 2026
2026
-
[26]
Verschueren, G
R. Verschueren, G. Frison, D. Kouzoupis, J. Frey, N. van Duijkeren, A. Zanelli, B. Novoselnik, T. Albin, R. Quirynen, and M. Diehl. acados – a modular open-source framework for fast embedded optimal control.Mathematical Programming Computation, 2021
2021
-
[27]
H. Zhao, L. Righetti, and M. Khadiv. Hippo: High-performance interior-point and projection- based solver for generic constrained trajectory optimization.IEEE Robotics and Automation Letters, 11(6):6752–6759, 2026
2026
-
[28]
Claude opus 4.7, 2026
Anthropic. Claude opus 4.7, 2026. URLhttps://www.anthropic.com/news/ claude-opus-4-7. Large language model
2026
-
[29]
C. Xia, K. Zhu, Z. Wang, F. Liu, Z. Zhang, and Y . Duan. SimRecon: SimReady compositional scene reconstruction from real videos. InComputer Vision and Pattern Recognition, 2026
2026
-
[30]
Xia, C.-H
H. Xia, C.-H. Lin, H.-Y . Hsu, Q. Leboutet, K. Gao, M. Paulitsch, B. Ummenhofer, and S. Wang. HoloScene: Simulation-ready interactive 3D worlds from a single video.Advances in Neural Information Processing Systems, 38:32501–32524, 2026. 10
2026
-
[31]
M. Dong, C. Xia, M. Jia, W. Lyu, L. Xu, Z. Zhu, and Y . Duan. ReplicateAnyScene: Zero-shot video-to-3D composition via textual-visual-spatial alignment.arXiv preprint arXiv:2604.10789, 2026. 11 A LLM-guided Search Implementation A.1 Prompt Details The prompt provides the LLM with task-specific scene information, contact-surface identifiers, plan- ning rul...
Pith/arXiv arXiv 2026
-
[32]
Identify the final subgoal(s) the task requires
-
[33]
These are obstacles and need to be relocated first, even if the task description doesn’t mention them
List every movable object whose xy footprint lies between the robot’s start position and any point the base must visit. These are obstacles and need to be relocated first, even if the task description doesn’t mention them
-
[34]
walk to box, two-EE grasp, lift, walk to table, place on table, release
Write the plan: clearing subgoals first, then task subgoals. # Output format First state the high-level plan in one line, e.g.: "walk to box, two-EE grasp, lift, walk to table, place on table, release" Then emit the Python body of generate_contact_plan(scene). It must start from get_initial_mode() and return the resulting dict. # Feedback If the IK solver...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.