MotionDisco: Motion Discovery for Extreme Humanoid Loco-Manipulation

Aaron M. Johnson; Angela Dai; Haizhou Zhao; Ilyass Taouil; Majid Khadiv; Michal Ciebelski; Shafeef Omar

arxiv: 2606.06139 · v1 · pith:MKTUPF7Vnew · submitted 2026-06-04 · 💻 cs.RO

MotionDisco: Motion Discovery for Extreme Humanoid Loco-Manipulation

Ilyass Taouil , Michal Ciebelski , Shafeef Omar , Haizhou Zhao , Angela Dai , Aaron M. Johnson , Majid Khadiv This is my paper

Pith reviewed 2026-06-28 01:23 UTC · model grok-4.3

classification 💻 cs.RO

keywords humanoid robotloco-manipulationmotion discoveryevolutionary searchLLM guidancetrajectory optimizationreinforcement learning

0 comments

The pith

MotionDisco discovers long-horizon humanoid loco-manipulation motions via automated LLM-guided evolutionary search without human demonstrations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces MotionDisco as a way to generate complex whole-body motions for humanoid robots performing tasks that combine locomotion and manipulation over many steps. It couples an LLM to guide an evolutionary search through possible sequences of contacts with an optimizer that plans the actual trajectories and a pruning step to discard bad paths. This removes the usual requirement for human-provided demonstrations or teleoperation data. If the approach works, it opens the door to discovering motions for new tasks automatically and then deploying them on physical robots using reinforcement learning.

Core claim

MotionDisco is the first framework to discover and deploy long-horizon humanoid loco-manipulation skills entirely through automated evolutionary search by using LLM guidance over interaction sequences together with sequential kinodynamic trajectory optimization and pruning to produce viable whole-body trajectories.

What carries the argument

LLM-guided evolutionary search over sequences of interactions, combined with an efficient sequential kinodynamic trajectory optimizer and a pruning strategy.

If this is right

The method finds successful trajectories for several challenging long-horizon tasks through ablation studies.
Discovered motions can be used to train reinforcement learning policies that transfer to a real humanoid robot.
The approach handles the combinatorial growth in possible contact interactions for tasks with multiple objects.
Whole-body trajectories are generated without relying on teleoperation or motion retargeting.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This search-based discovery might apply to other robot platforms facing similar contact-rich planning problems.
Reducing dependence on human data could accelerate development of autonomous robot behaviors in unstructured environments.
Further improvements in LLM guidance could allow handling even longer task horizons or more objects.

Load-bearing premise

The combinatorial space of contact interactions can be navigated effectively by LLM-guided evolutionary search plus pruning without missing viable long-horizon solutions.

What would settle it

Observing that the evolutionary search consistently fails to produce any valid trajectory for a task involving more than a certain number of sequential contacts or objects, or that the transferred RL policy fails to execute on the physical robot.

Figures

Figures reproduced from arXiv: 2606.06139 by Aaron M. Johnson, Angela Dai, Haizhou Zhao, Ilyass Taouil, Majid Khadiv, Michal Ciebelski, Shafeef Omar.

**Figure 1.** Figure 1: Real-world humanoid motion from MotionDisco. Snapshots of a real-world experiment executing a trajectory discovered by our evolutionary tree search, deployed zero-shot on the robot. Abstract: We present MotionDisco, a framework that discovers contact-rich, long-horizon humanoid loco-manipulation motions from scratch, without relying on teleoperation or motion retargeting from human demonstrations. This is… view at source ↗

**Figure 2.** Figure 2: MotionDisco couples LLM-guided evolutionary discovery of contact plans (left) with contact-explicit trajectory optimization (right). Each search node proposes a mutation of its parent program, conditioned on the goal prompt and the parent’s feasibility feedback; executing the mutated program yields a discrete contact plan (denoted by the blue dots). Plans that pass a kinematic feasibility check are sent to… view at source ↗

**Figure 3.** Figure 3: Motion diversity discovered by the search on Parkour Pick & Place 2. Top: xyz hand and foot trajectories across all valid contact plans found by MD. Bottom: whole-body trajectory snapshots of two distinct solutions. The variation in both panels arises from the diversity of the underlying contact plans found in a single run. back returned by the motion planner is translated into language and used to guide s… view at source ↗

**Figure 4.** Figure 4: Experiment scenes. The eight loco-manipulation evaluation tasks, numbered as referenced in the text: (1) Banana, (2) Box Stacking, (3) Climb Table w/ Box, (4) Long-Dist. Pick & Place, (5) Move Through Clutter, (6) Parkour Pick & Place 1, (7) Parkour Pick & Place 2, and (8) Under-Table Pick & Place. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗

**Figure 5.** Figure 5: Under-Table Pick & Place. Real-world snapshots of the humanoid robot reaching into a confined space beneath a table to retrieve the box, demonstrating the low-clearance whole-body posture discovered by MotionDisco [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗

**Figure 6.** Figure 6: Move Through Clutter. Real-world snapshots of the humanoid robot traversing a corridor obstructed by two boxes, interacting with the obstacles along the path to make way for passage [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗

**Figure 7.** Figure 7: Parkour Pick & Place 2. Real-world snapshots of the humanoid robot climbing onto the table using its feet, while carrying the box, before placing it on a distant table after taking some steps [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗

**Figure 8.** Figure 8: Long-Distance Pick & Place. Real-world snapshots of the humanoid robot transporting a box and placing it on a distant table, combining extended locomotion with manipulation. 19 [PITH_FULL_IMAGE:figures/full_fig_p019_8.png] view at source ↗

**Figure 9.** Figure 9: Climb Table w/ Box. Real-world snapshots of the humanoid robot climbing onto a table while moving a box, using its hands as support on the table, and lifting up the box. 20 [PITH_FULL_IMAGE:figures/full_fig_p020_9.png] view at source ↗

read the original abstract

We present MotionDisco, a framework that discovers contact-rich, long-horizon humanoid loco-manipulation motions from scratch, without relying on teleoperation or motion retargeting from human demonstrations. This is challenging because the space of possible contact interactions grows combinatorially with the task horizon and the number of objects in the scene. MotionDisco enables rapid discovery of novel motions by coupling a large language model (LLM) guided evolutionary search over sequences of interactions with an efficient sequential kinodynamic trajectory optimizer and pruning strategy, enabling the rapid discovery of novel skills. Through extensive ablation studies, we show that our LLM-guided search discovers successful whole-body trajectories across several challenging long-horizon tasks. Finally, by training reinforcement learning tracking policies on the discovered trajectories, we transfer the motions to a real humanoid robot. This is the first work to discover and deploy long-horizon humanoid loco-manipulation skills entirely through automated evolutionary search. Supplementary videos of the experiments are available at: https://youtu.be/DHiVz34QYlw.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

They automated discovery of long-horizon humanoid loco-manipulation motions with LLM-guided evolutionary search and transferred the results to hardware via RL.

read the letter

The main point is that this paper claims to discover and run long-horizon contact-rich motions on a humanoid without any teleoperation or human motion data. They guide an evolutionary search over interaction sequences with an LLM, then apply sequential kinodynamic optimization and pruning to handle the combinatorial growth in possible contacts.

The work does a few things cleanly. It shows the pipeline on multiple tasks, includes ablations that separate the LLM guidance and pruning effects, and closes the loop by training RL trackers on the discovered trajectories to get them working on a physical robot. That hardware step gives the result some weight beyond simulation.

The softer part is how reliably the LLM-guided search covers the space. The abstract states that ablations confirm success across tasks, but the real test is whether viable long sequences get missed or whether the LLM introduces systematic gaps. Without seeing the specific success rates, diversity metrics, or failure cases, it is hard to judge how robust the pruning actually is when the horizon or object count grows.

This is aimed at robotics researchers focused on humanoid loco-manipulation and automated skill generation. People already working with evolutionary methods or LLMs in planning would get the most from the implementation details.

The concrete real-robot transfer makes it worth sending out for peer review. The central claim is testable and the experiments provide a direct check on whether the approach works in practice.

Referee Report

0 major / 0 minor

Summary. The paper presents MotionDisco, a framework for discovering contact-rich, long-horizon humanoid loco-manipulation motions from scratch without teleoperation or human demonstrations. It couples LLM-guided evolutionary search over interaction sequences with sequential kinodynamic trajectory optimization and pruning to navigate the combinatorial contact space, reports successful discovery across tasks via extensive ablations, and transfers the motions to a real humanoid via RL tracking policies. The work claims to be the first to achieve and deploy such skills entirely through automated evolutionary search.

Significance. If the results hold, this would advance automated skill discovery for humanoids by addressing combinatorial growth in contact interactions for long-horizon tasks. The LLM-guided search, pruning strategy, and real-robot transfer via RL are strengths. The manuscript emphasizes ablation studies showing successful whole-body trajectories, which, if quantitatively robust, would support the central claim of effective navigation without human input.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their review and accurate summary of MotionDisco. The report lists no specific major comments despite the 'uncertain' recommendation, so we provide no point-by-point rebuttals. We stand by the manuscript's claims regarding automated discovery via LLM-guided search, trajectory optimization, and real-robot transfer, supported by the ablations described.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents an engineering framework for motion discovery via LLM-guided evolutionary search coupled with kinodynamic optimization and pruning. No equations, derivations, fitted parameters, or first-principles results are described in the abstract or claimed central contributions. The method is evaluated through ablation studies and real-robot transfer, with no self-referential definitions, predictions that reduce to inputs by construction, or load-bearing self-citations that substitute for independent justification. The 'first work' claim is a priority statement, not a derived result. The derivation chain is therefore self-contained against external benchmarks with no reductions to circular inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No details available from abstract to identify free parameters, axioms, or invented entities; ledger left empty.

pith-pipeline@v0.9.1-grok · 5729 in / 1050 out tokens · 31586 ms · 2026-06-28T01:23:45.873939+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

34 extracted references · 6 linked inside Pith

[1]

Tonneau, A

S. Tonneau, A. Del Prete, J. Pettr´e, C. Park, D. Manocha, and N. Mansard. An efficient acyclic contact planner for multiped robots.IEEE Transactions on Robotics, 34(3):586–601, 2018

2018
[2]

M. A. Toussaint, K. R. Allen, K. A. Smith, and J. B. Tenenbaum. Differentiable physics and stable modes for tool-use and manipulation planning. InRobotics: Science and Systems F oundation, 2018

2018
[3]

Ciebielski, V

M. Ciebielski, V . Dh ´edin, and M. Khadiv. Task and motion planning for humanoid loco- manipulation. InIEEE-RAS International Conference on Humanoid Robots (Humanoids), pages 1179–1186, 2025

2025
[4]

T. He, J. Gao, W. Xiao, Y . Zhang, Z. Wang, J. Wang, Z. Luo, G. He, N. Sobanbab, C. Pan, Z. Yi, G. Qu, K. Kitani, J. Hodgins, L. J. Fan, Y . Zhu, C. Liu, and G. Shi. ASAP: Aligning simulation and real-world physics for learning agile humanoid whole-body skills. InRobotics: Science and Systems, 2025

2025
[5]

T. He, Z. Luo, X. He, W. Xiao, C. Zhang, W. Zhang, K. Kitani, C. Liu, and G. Shi. Om- niH2O: Universal and dexterous human-to-humanoid whole-body teleoperation and learning. InConference on Robot Learning, 2024

2024
[6]

L. Yang, X. Huang, Z. Wu, A. Kanazawa, P. Abbeel, C. Sferrazza, C. K. Liu, R. Duan, and G. Shi. OmniRetarget: Interaction-preserving data generation for humanoid whole-body loco- manipulation and scene interaction.arXiv preprint arXiv:2509.26633, 2025

Pith/arXiv arXiv 2025
[7]

Taouil, H

I. Taouil, H. Zhao, A. Dai, and M. Khadiv. Physically consistent humanoid loco-manipulation using latent diffusion models. InIEEE-RAS International Conference on Humanoid Robots (Humanoids), pages 1179–1186, 2025

2025
[8]

X. B. Peng, P. Abbeel, S. Levine, and M. Van de Panne. DeepMimic: Example-guided deep re- inforcement learning of physics-based character skills.ACM Transactions On Graphics (TOG), 37(4):1–14, 2018

2018
[9]

T. E. Truong, Q. Liao, X. Huang, G. Tevet, C. K. Liu, and K. Sreenath. BeyondMimic: From motion tracking to versatile humanoid control via guided diffusion.arXiv preprint arXiv:2508.08241, 2025

Pith/arXiv arXiv 2025
[10]

H. Weng, Y . Li, N. Sobanbabu, Z. Wang, Z. Luo, T. He, D. Ramanan, and G. Shi. HDMI: Learning interactive humanoid whole-body control from human videos.arXiv preprint arXiv:2509.16757, 2025

arXiv 2025
[11]

S. Zhao, Y . Ze, Y . Wang, C. K. Liu, P. Abbeel, G. Shi, and R. Duan. ResMimic: From gen- eral motion tracking to humanoid whole-body loco-manipulation via residual learning.arXiv preprint arXiv:2510.05070, 2025

arXiv 2025
[12]

C. Pan, C. Wang, H. Qi, Z. Liu, H. Bharadhwaj, A. Sharma, T. Wu, G. Shi, J. Malik, and F. Hogan. SPIDER: Scalable physics-informed dexterous retargeting.arXiv preprint arXiv:2511.09484, 2025

arXiv 2025
[13]

Dhedin, I

V . Dhedin, I. Taouil, S. Omar, D. Yu, K. Tao, A. Dai, and M. Khadiv. DynaRetarget: Dynamically-feasible retargeting using sampling-based trajectory optimization.arXiv preprint arXiv:2602.06827, 2026

Pith/arXiv arXiv 2026
[14]

M. Ahn, A. Brohan, N. Brown, Y . Chebotar, O. Cortes, B. David, C. Finn, C. Fu, K. Gopalakr- ishnan, K. Hausman, et al. Do as I can, not as I say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691, 2022. 9

Pith/arXiv arXiv 2022
[15]

Liang, W

J. Liang, W. Huang, F. Xia, P. Xu, K. Hausman, B. Ichter, P. Florence, and A. Zeng. Code as policies: Language model programs for embodied control. InIEEE International conference on robotics and automation (ICRA), pages 9493–9500, 2023

2023
[16]

Curtis, N

A. Curtis, N. Kumar, J. Cao, T. Lozano-P´erez, and L. P. Kaelbling. Trust the PRoC3S: Solving long-horizon robotics problems with LLMs and constraint satisfaction. InConference on Robot Learning, 2024

2024
[17]

H. Chi, Z. Feng, Y . Lyu, C. Zheng, L. Luo, Y . S. Ong, I. Tsang, H. Chen, Y . Chang, and H. Yin. InstructFlow: Adaptive symbolic constraint-guided code generation for long-horizon planning. Advances in Neural Information Processing Systems, 38:2602–2632, 2026

2026
[18]

Y . J. Ma, W. Liang, G. Wang, D.-A. Huang, O. Bastani, D. Jayaraman, Y . Zhu, J. Fan, et al. Eureka: Human-level reward design via coding large language models. InInternational con- ference on learning Representations, volume 2024, pages 26516–26560, 2024

2024
[19]

Z. Wu, J. Li, P. Xu, and C. K. Liu. Human-object interaction from human-level instructions. InIEEE/CVF International Conference on Computer Vision, pages 11176–11186, 2025

2025
[20]

Shcherba, E

D. Shcherba, E. Cobo-Briesewitz, C. V . Braun, and M. Toussaint. Meta-optimization and program search using language models for task and motion planning. InConference on Robot Learning, 2025

2025
[21]

Novikov, N

A. Novikov, N. V ˜u, M. Eisenberger, E. Dupont, P.-S. Huang, A. Z. Wagner, S. Shirobokov, B. Kozlovskii, F. J. Ruiz, A. Mehrabian, et al. AlphaEvolve: A coding agent for scientific and algorithmic discovery.arXiv preprint arXiv:2506.13131, 2025

Pith/arXiv arXiv 2025
[22]

R. T. Lange, Y . Imajuku, and E. Cetin. ShinkaEvolve: Towards open-ended and sample- efficient program evolution. InInternational Conference on Learning Representations, 2026

2026
[23]

Cemri, S

M. Cemri, S. Agrawal, A. Gupta, S. Liu, A. Cheng, Q. Mang, A. Naren, L. E. Erdogan, K. Sen, M. Zaharia, et al. AdaEvolve: Adaptive LLM driven zeroth-order optimization.arXiv preprint arXiv:2602.20133, 2026

arXiv 2026
[24]

S. Liu, S. Agarwal, M. Maheswaran, M. Cemri, Z. Li, Q. Mang, A. Naren, E. Boneh, A. Cheng, M. Z. Pan, et al. EvoX: Meta-evolution for automated discovery.arXiv preprint arXiv:2602.23413, 2026

arXiv 2026
[25]

Ciebielski, H

M. Ciebielski, H. Zhao, A. M. Johnson, and M. Khadiv. Discovery of dynamic loco- manipulation behaviors. InICRA Workshop on Contact-Rich Control and Representation, 2026

2026
[26]

Verschueren, G

R. Verschueren, G. Frison, D. Kouzoupis, J. Frey, N. van Duijkeren, A. Zanelli, B. Novoselnik, T. Albin, R. Quirynen, and M. Diehl. acados – a modular open-source framework for fast embedded optimal control.Mathematical Programming Computation, 2021

2021
[27]

H. Zhao, L. Righetti, and M. Khadiv. Hippo: High-performance interior-point and projection- based solver for generic constrained trajectory optimization.IEEE Robotics and Automation Letters, 11(6):6752–6759, 2026

2026
[28]

Claude opus 4.7, 2026

Anthropic. Claude opus 4.7, 2026. URLhttps://www.anthropic.com/news/ claude-opus-4-7. Large language model

2026
[29]

C. Xia, K. Zhu, Z. Wang, F. Liu, Z. Zhang, and Y . Duan. SimRecon: SimReady compositional scene reconstruction from real videos. InComputer Vision and Pattern Recognition, 2026

2026
[30]

Xia, C.-H

H. Xia, C.-H. Lin, H.-Y . Hsu, Q. Leboutet, K. Gao, M. Paulitsch, B. Ummenhofer, and S. Wang. HoloScene: Simulation-ready interactive 3D worlds from a single video.Advances in Neural Information Processing Systems, 38:32501–32524, 2026. 10

2026
[31]

just walk past it

M. Dong, C. Xia, M. Jia, W. Lyu, L. Xu, Z. Zhu, and Y . Duan. ReplicateAnyScene: Zero-shot video-to-3D composition via textual-visual-spatial alignment.arXiv preprint arXiv:2604.10789, 2026. 11 A LLM-guided Search Implementation A.1 Prompt Details The prompt provides the LLM with task-specific scene information, contact-surface identifiers, plan- ning rul...

Pith/arXiv arXiv 2026
[32]

Identify the final subgoal(s) the task requires
[33]

These are obstacles and need to be relocated first, even if the task description doesn’t mention them

List every movable object whose xy footprint lies between the robot’s start position and any point the base must visit. These are obstacles and need to be relocated first, even if the task description doesn’t mention them
[34]

walk to box, two-EE grasp, lift, walk to table, place on table, release

Write the plan: clearing subgoals first, then task subgoals. # Output format First state the high-level plan in one line, e.g.: "walk to box, two-EE grasp, lift, walk to table, place on table, release" Then emit the Python body of generate_contact_plan(scene). It must start from get_initial_mode() and return the resulting dict. # Feedback If the IK solver...

[1] [1]

Tonneau, A

S. Tonneau, A. Del Prete, J. Pettr´e, C. Park, D. Manocha, and N. Mansard. An efficient acyclic contact planner for multiped robots.IEEE Transactions on Robotics, 34(3):586–601, 2018

2018

[2] [2]

M. A. Toussaint, K. R. Allen, K. A. Smith, and J. B. Tenenbaum. Differentiable physics and stable modes for tool-use and manipulation planning. InRobotics: Science and Systems F oundation, 2018

2018

[3] [3]

Ciebielski, V

M. Ciebielski, V . Dh ´edin, and M. Khadiv. Task and motion planning for humanoid loco- manipulation. InIEEE-RAS International Conference on Humanoid Robots (Humanoids), pages 1179–1186, 2025

2025

[4] [4]

T. He, J. Gao, W. Xiao, Y . Zhang, Z. Wang, J. Wang, Z. Luo, G. He, N. Sobanbab, C. Pan, Z. Yi, G. Qu, K. Kitani, J. Hodgins, L. J. Fan, Y . Zhu, C. Liu, and G. Shi. ASAP: Aligning simulation and real-world physics for learning agile humanoid whole-body skills. InRobotics: Science and Systems, 2025

2025

[5] [5]

T. He, Z. Luo, X. He, W. Xiao, C. Zhang, W. Zhang, K. Kitani, C. Liu, and G. Shi. Om- niH2O: Universal and dexterous human-to-humanoid whole-body teleoperation and learning. InConference on Robot Learning, 2024

2024

[6] [6]

L. Yang, X. Huang, Z. Wu, A. Kanazawa, P. Abbeel, C. Sferrazza, C. K. Liu, R. Duan, and G. Shi. OmniRetarget: Interaction-preserving data generation for humanoid whole-body loco- manipulation and scene interaction.arXiv preprint arXiv:2509.26633, 2025

Pith/arXiv arXiv 2025

[7] [7]

Taouil, H

I. Taouil, H. Zhao, A. Dai, and M. Khadiv. Physically consistent humanoid loco-manipulation using latent diffusion models. InIEEE-RAS International Conference on Humanoid Robots (Humanoids), pages 1179–1186, 2025

2025

[8] [8]

X. B. Peng, P. Abbeel, S. Levine, and M. Van de Panne. DeepMimic: Example-guided deep re- inforcement learning of physics-based character skills.ACM Transactions On Graphics (TOG), 37(4):1–14, 2018

2018

[9] [9]

T. E. Truong, Q. Liao, X. Huang, G. Tevet, C. K. Liu, and K. Sreenath. BeyondMimic: From motion tracking to versatile humanoid control via guided diffusion.arXiv preprint arXiv:2508.08241, 2025

Pith/arXiv arXiv 2025

[10] [10]

H. Weng, Y . Li, N. Sobanbabu, Z. Wang, Z. Luo, T. He, D. Ramanan, and G. Shi. HDMI: Learning interactive humanoid whole-body control from human videos.arXiv preprint arXiv:2509.16757, 2025

arXiv 2025

[11] [11]

S. Zhao, Y . Ze, Y . Wang, C. K. Liu, P. Abbeel, G. Shi, and R. Duan. ResMimic: From gen- eral motion tracking to humanoid whole-body loco-manipulation via residual learning.arXiv preprint arXiv:2510.05070, 2025

arXiv 2025

[12] [12]

C. Pan, C. Wang, H. Qi, Z. Liu, H. Bharadhwaj, A. Sharma, T. Wu, G. Shi, J. Malik, and F. Hogan. SPIDER: Scalable physics-informed dexterous retargeting.arXiv preprint arXiv:2511.09484, 2025

arXiv 2025

[13] [13]

Dhedin, I

V . Dhedin, I. Taouil, S. Omar, D. Yu, K. Tao, A. Dai, and M. Khadiv. DynaRetarget: Dynamically-feasible retargeting using sampling-based trajectory optimization.arXiv preprint arXiv:2602.06827, 2026

Pith/arXiv arXiv 2026

[14] [14]

M. Ahn, A. Brohan, N. Brown, Y . Chebotar, O. Cortes, B. David, C. Finn, C. Fu, K. Gopalakr- ishnan, K. Hausman, et al. Do as I can, not as I say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691, 2022. 9

Pith/arXiv arXiv 2022

[15] [15]

Liang, W

J. Liang, W. Huang, F. Xia, P. Xu, K. Hausman, B. Ichter, P. Florence, and A. Zeng. Code as policies: Language model programs for embodied control. InIEEE International conference on robotics and automation (ICRA), pages 9493–9500, 2023

2023

[16] [16]

Curtis, N

A. Curtis, N. Kumar, J. Cao, T. Lozano-P´erez, and L. P. Kaelbling. Trust the PRoC3S: Solving long-horizon robotics problems with LLMs and constraint satisfaction. InConference on Robot Learning, 2024

2024

[17] [17]

H. Chi, Z. Feng, Y . Lyu, C. Zheng, L. Luo, Y . S. Ong, I. Tsang, H. Chen, Y . Chang, and H. Yin. InstructFlow: Adaptive symbolic constraint-guided code generation for long-horizon planning. Advances in Neural Information Processing Systems, 38:2602–2632, 2026

2026

[18] [18]

Y . J. Ma, W. Liang, G. Wang, D.-A. Huang, O. Bastani, D. Jayaraman, Y . Zhu, J. Fan, et al. Eureka: Human-level reward design via coding large language models. InInternational con- ference on learning Representations, volume 2024, pages 26516–26560, 2024

2024

[19] [19]

Z. Wu, J. Li, P. Xu, and C. K. Liu. Human-object interaction from human-level instructions. InIEEE/CVF International Conference on Computer Vision, pages 11176–11186, 2025

2025

[20] [20]

Shcherba, E

D. Shcherba, E. Cobo-Briesewitz, C. V . Braun, and M. Toussaint. Meta-optimization and program search using language models for task and motion planning. InConference on Robot Learning, 2025

2025

[21] [21]

Novikov, N

A. Novikov, N. V ˜u, M. Eisenberger, E. Dupont, P.-S. Huang, A. Z. Wagner, S. Shirobokov, B. Kozlovskii, F. J. Ruiz, A. Mehrabian, et al. AlphaEvolve: A coding agent for scientific and algorithmic discovery.arXiv preprint arXiv:2506.13131, 2025

Pith/arXiv arXiv 2025

[22] [22]

R. T. Lange, Y . Imajuku, and E. Cetin. ShinkaEvolve: Towards open-ended and sample- efficient program evolution. InInternational Conference on Learning Representations, 2026

2026

[23] [23]

Cemri, S

M. Cemri, S. Agrawal, A. Gupta, S. Liu, A. Cheng, Q. Mang, A. Naren, L. E. Erdogan, K. Sen, M. Zaharia, et al. AdaEvolve: Adaptive LLM driven zeroth-order optimization.arXiv preprint arXiv:2602.20133, 2026

arXiv 2026

[24] [24]

S. Liu, S. Agarwal, M. Maheswaran, M. Cemri, Z. Li, Q. Mang, A. Naren, E. Boneh, A. Cheng, M. Z. Pan, et al. EvoX: Meta-evolution for automated discovery.arXiv preprint arXiv:2602.23413, 2026

arXiv 2026

[25] [25]

Ciebielski, H

M. Ciebielski, H. Zhao, A. M. Johnson, and M. Khadiv. Discovery of dynamic loco- manipulation behaviors. InICRA Workshop on Contact-Rich Control and Representation, 2026

2026

[26] [26]

Verschueren, G

R. Verschueren, G. Frison, D. Kouzoupis, J. Frey, N. van Duijkeren, A. Zanelli, B. Novoselnik, T. Albin, R. Quirynen, and M. Diehl. acados – a modular open-source framework for fast embedded optimal control.Mathematical Programming Computation, 2021

2021

[27] [27]

H. Zhao, L. Righetti, and M. Khadiv. Hippo: High-performance interior-point and projection- based solver for generic constrained trajectory optimization.IEEE Robotics and Automation Letters, 11(6):6752–6759, 2026

2026

[28] [28]

Claude opus 4.7, 2026

Anthropic. Claude opus 4.7, 2026. URLhttps://www.anthropic.com/news/ claude-opus-4-7. Large language model

2026

[29] [29]

C. Xia, K. Zhu, Z. Wang, F. Liu, Z. Zhang, and Y . Duan. SimRecon: SimReady compositional scene reconstruction from real videos. InComputer Vision and Pattern Recognition, 2026

2026

[30] [30]

Xia, C.-H

H. Xia, C.-H. Lin, H.-Y . Hsu, Q. Leboutet, K. Gao, M. Paulitsch, B. Ummenhofer, and S. Wang. HoloScene: Simulation-ready interactive 3D worlds from a single video.Advances in Neural Information Processing Systems, 38:32501–32524, 2026. 10

2026

[31] [31]

just walk past it

M. Dong, C. Xia, M. Jia, W. Lyu, L. Xu, Z. Zhu, and Y . Duan. ReplicateAnyScene: Zero-shot video-to-3D composition via textual-visual-spatial alignment.arXiv preprint arXiv:2604.10789, 2026. 11 A LLM-guided Search Implementation A.1 Prompt Details The prompt provides the LLM with task-specific scene information, contact-surface identifiers, plan- ning rul...

Pith/arXiv arXiv 2026

[32] [32]

Identify the final subgoal(s) the task requires

[33] [33]

These are obstacles and need to be relocated first, even if the task description doesn’t mention them

List every movable object whose xy footprint lies between the robot’s start position and any point the base must visit. These are obstacles and need to be relocated first, even if the task description doesn’t mention them

[34] [34]

walk to box, two-EE grasp, lift, walk to table, place on table, release

Write the plan: clearing subgoals first, then task subgoals. # Output format First state the high-level plan in one line, e.g.: "walk to box, two-EE grasp, lift, walk to table, place on table, release" Then emit the Python body of generate_contact_plan(scene). It must start from get_initial_mode() and return the resulting dict. # Feedback If the IK solver...