Recognition: 2 theorem links
· Lean TheoremSumo: Dynamic and Generalizable Whole-Body Loco-Manipulation
Pith reviewed 2026-05-10 17:32 UTC · model grok-4.3
The pith
Steering a pre-trained whole-body control policy with a sample-based planner at test time lets legged robots manipulate large unseen objects dynamically.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By performing test-time steering of a pre-trained whole-body control policy with a sample-based planner, legged robots can solve a variety of dynamic loco-manipulation tasks. The approach generalizes to a diverse set of objects and tasks with no additional tuning or training and can be further enhanced by flexibly adjusting the cost function at test time. Real-world demonstrations on a quadruped include uprighting a tire heavier than the robot's nominal lifting capacity and dragging a crowd-control barrier larger and taller than the robot itself, while the method also applies to humanoid loco-manipulation tasks such as opening a door and pushing a table in simulation.
What carries the argument
Test-time steering of a pre-trained whole-body control policy by a sample-based planner, which generates action sequences to guide the policy toward successful contact-rich loco-manipulation outcomes.
If this is right
- Robots can manipulate objects that exceed their nominal lifting or pushing capacity.
- The same pre-trained policy works on new objects and tasks without retraining.
- Cost-function adjustments at runtime provide task-specific flexibility.
- The sim-to-real pipeline succeeds for contact-rich dynamic behaviors.
- The steering technique applies across both quadrupedal and humanoid platforms.
Where Pith is reading between the lines
- Separating learned policy skills from runtime planning could lower the data volume needed to achieve broad robot competence.
- The same pre-trained policy might support a wider range of robot body types if the planner accounts for kinematic differences.
- Integrating real-time perception with the planner could enable fully autonomous operation on novel objects in unstructured settings.
- Similar test-time steering might compose basic skills into longer manipulation sequences without additional policy training.
Load-bearing premise
The pre-trained whole-body policy already encodes sufficient dynamics and contact behaviors so that test-time planning can reliably steer it to success on unseen objects without model mismatch or instability in real-world execution.
What would settle it
Apply the steered policy to a new object with substantially different mass distribution, geometry, or surface friction in a real-world trial and observe whether the robot completes the loco-manipulation task or instead exhibits instability or failure to make progress.
Figures
read the original abstract
This paper presents a sim-to-real approach that enables legged robots to dynamically manipulate large and heavy objects with whole-body dexterity. Our key insight is that by performing test-time steering of a pre-trained whole-body control policy with a sample-based planner, we can enable these robots to solve a variety of dynamic loco-manipulation tasks. Interestingly, we find our method generalizes to a diverse set of objects and tasks with no additional tuning or training, and can be further enhanced by flexibly adjusting the cost function at test time. We demonstrate the capabilities of our approach through a variety of challenging loco-manipulation tasks on a Spot quadruped robot in the real world, including uprighting a tire heavier than the robot's nominal lifting capacity and dragging a crowd-control barrier larger and taller than the robot itself. Additionally, we show that the same approach can be generalized to humanoid loco-manipulation tasks, such as opening a door and pushing a table, in simulation. Project code and videos are available at https://sumo.rai-inst.com/.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents SUMO, a sim-to-real approach for dynamic whole-body loco-manipulation on legged robots. The core idea is to steer a pre-trained whole-body control policy at test time using a sample-based planner, enabling tasks such as uprighting heavy tires and dragging large barriers on a Spot quadruped in the real world. The authors claim that this method generalizes to diverse objects and tasks without additional training or tuning, and can be enhanced by adjusting the cost function at test time. They also demonstrate extension to humanoid robots for tasks like door opening and table pushing in simulation.
Significance. If the generalization without tuning holds, this approach would be significant for robotics by allowing pre-trained policies to handle a range of loco-manipulation tasks through online planning rather than retraining. The real-world demonstrations on challenging physical tasks provide direct evidence of practical applicability, and the flexibility in cost function adjustment is a notable feature. The provision of project code and videos is a strength for reproducibility.
major comments (2)
- [Abstract] Abstract: The generalization claim ('generalizes to a diverse set of objects and tasks with no additional tuning or training') is not accompanied by quantitative metrics, success rates over multiple trials, ablation studies on planner parameters, or analysis of failure cases. This makes it challenging to assess the reliability of the test-time steering for out-of-distribution objects beyond the two qualitative real-world examples.
- [Method/Experiments] Method/Experiments: The central claim requires that the pre-trained policy has internalized sufficiently accurate contact/friction models so that sample-based rollouts can steer it to success on unseen objects (e.g., tire exceeding nominal lift capacity) without instability. No sensitivity analysis, model-mismatch experiments, or OOD testing under variations in mass distribution or surface properties is reported, leaving the no-tuning generalization dependent on an unverified transfer property.
minor comments (2)
- [Abstract] The abstract refers to 'a variety of challenging loco-manipulation tasks' but provides details on only two real-world examples; listing the full set of evaluated tasks with brief outcomes would improve clarity.
- Ensure that all statements about adjustable cost functions at test time include explicit references to the corresponding planner implementation details and any associated hyperparameters.
Simulated Author's Rebuttal
We thank the referee for the constructive review and positive assessment of the work's significance and reproducibility. We address each major comment point-by-point below, providing clarifications and indicating where revisions have been made to the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: The generalization claim ('generalizes to a diverse set of objects and tasks with no additional tuning or training') is not accompanied by quantitative metrics, success rates over multiple trials, ablation studies on planner parameters, or analysis of failure cases. This makes it challenging to assess the reliability of the test-time steering for out-of-distribution objects beyond the two qualitative real-world examples.
Authors: We agree that quantitative metrics would provide stronger support for the generalization claim in the abstract. The manuscript presents qualitative real-world demonstrations on two challenging tasks (uprighting a heavy tire and dragging a large barrier) plus simulated humanoid extensions to illustrate flexibility without retraining. In the revised manuscript, we have added success rates over multiple trials for the primary real-world tasks, an analysis of observed failure cases, and ablations on planner parameters (number of samples and planning horizon) in the supplementary material. The abstract has been updated to reference these additions. revision: yes
-
Referee: [Method/Experiments] Method/Experiments: The central claim requires that the pre-trained policy has internalized sufficiently accurate contact/friction models so that sample-based rollouts can steer it to success on unseen objects (e.g., tire exceeding nominal lift capacity) without instability. No sensitivity analysis, model-mismatch experiments, or OOD testing under variations in mass distribution or surface properties is reported, leaving the no-tuning generalization dependent on an unverified transfer property.
Authors: The referee correctly identifies that the approach depends on the pre-trained policy having learned sufficiently accurate contact and friction behaviors from simulation. The policy is trained in a physics-based simulator that models these dynamics, and the sample-based planner performs rollouts using the identical simulator and policy. The successful sim-to-real transfer on objects exceeding nominal capacities provides supporting evidence, but we acknowledge the absence of explicit sensitivity analyses or model-mismatch experiments in the original submission. The revised manuscript includes an expanded discussion section that explains the simulator's contact model assumptions and the boundaries of the observed generalization. Comprehensive OOD testing with controlled variations in mass distribution and surface properties was not performed and would require additional hardware setups. revision: partial
Circularity Check
No significant circularity; empirical method with external real-world validation
full rationale
The paper presents an empirical sim-to-real method: a pre-trained whole-body policy is steered at test time by a sample-based planner, with optional cost-function adjustment. Generalization to unseen objects and tasks is asserted via real-world demonstrations on a Spot robot (e.g., tire uprighting, barrier dragging) and simulated humanoid tasks, without any reported equations, fitted parameters, or derivations that reduce the claimed outcomes to quantities defined by the same evaluation data. No self-citation chains, self-definitional loops, or renamed known results appear in the provided text; the pre-training step is treated as an external input whose dynamics are assumed sufficient but are not derived within the paper itself. This is the common case of a non-circular empirical contribution.
Axiom & Free-Parameter Ledger
free parameters (1)
- planner cost function weights
axioms (2)
- domain assumption The pre-trained policy captures transferable whole-body dynamics sufficient for steering on novel objects
- domain assumption Simulation-to-real transfer gap is small enough that test-time planning succeeds in hardware
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Our key insight is that by performing test-time steering of a pre-trained whole-body control policy with a sample-based planner, we can enable these robots to solve a variety of dynamic loco-manipulation tasks.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
hierarchical framework that combines a pre-trained generalist whole-body control policy and test-time planning
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Safe multi-agent navigation guided by goal- conditioned safe reinforcement learning
Juan Alvarez-Padilla, John Z Zhang, Sofia Kwok, John M Dolan, and Zachary Manchester. Real-time whole-body control of legged robots with model-predictive path inte- gral control. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 14721–14727. IEEE, 2025. doi: 10.1109/ICRA55743.2025.11128271
-
[2]
Hind- sight experience replay
Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel, and Wojciech Zaremba. Hind- sight experience replay. InAdvances in Neural Informa- tion Processing Systems, volume 30, pages 5048–5058, 2017
2017
-
[3]
Philip Arm, Mayank Mittal, Hendrik Kolvenbach, and Marco Hutter. Pedipulate: Enabling manipulation skills using a quadruped robot’s leg. In2024 IEEE Interna- tional Conference on Robotics and Automation (ICRA), pages 5717–5723, 2024. doi: 10.1109/ICRA57147.2024. 10611307
-
[4]
Powell, Benjamin Katz, Jared Di Carlo, Patrick M
Gerardo Bledt, Matthew J. Powell, Benjamin Katz, Jared Di Carlo, Patrick M. Wensing, and Sangbae Kim. Mit cheetah 3: Design and control of a robust, dynamic quadruped robot.IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2245–2252,
-
[5]
doi: 10.1109/IROS.2018.8593885
-
[6]
Rambo: Rl-augmented model- based whole-body control for loco-manipulation.IEEE Robotics and Automation Letters, 10(9):9462–9469,
Jin Cheng, Dongho Kang, Gabriele Fadini, Guanya Shi, and Stelian Coros. Rambo: Rl-augmented model- based whole-body control for loco-manipulation.IEEE Robotics and Automation Letters, 10(9):9462–9469,
-
[7]
doi: 10.1109/LRA.2025.3594984
-
[8]
Legs as manipulator: Pushing quadrupedal agility beyond lo- comotion
Xuxin Cheng, Ashish Kumar, and Deepak Pathak. Legs as manipulator: Pushing quadrupedal agility beyond lo- comotion. In2023 IEEE International Conference on Robotics and Automation (ICRA), 2023
2023
-
[9]
Xuxin Cheng, Kexin Shi, Ananye Agarwal, and Deepak Pathak. Extreme parkour with legged robots.IEEE International Conference on Robotics and Automation (ICRA), pages 11443–11450, 2024. doi: 10.1109/ ICRA57147.2024.10610200
-
[10]
Diffusion policy: Visuomotor policy learning via action diffusion.The International Journal of Robotics Research, 2024
Cheng Chi, Zhenjia Xu, Siyuan Feng, Eric Cousineau, Yilun Du, Benjamin Burchfiel, Russ Tedrake, and Shuran Song. Diffusion policy: Visuomotor policy learning via action diffusion.The International Journal of Robotics Research, 2024
2024
-
[11]
Simon Le Cleac’h, Taylor Howell, Shuo Yang, Chi- Yen Lee, John Zhang, Arun Bishop, Mac Schwager, and Zachary Manchester. Fast Contact-Implicit Model- Predictive Control.IEEE Transactions on Robotics and Automation, January 2024. doi: 10.48550/arXiv.2107. 05616
-
[12]
Mujoco warp: Gpu- optimized version of the mujoco physics simulator
Google DeepMind and NVIDIA. Mujoco warp: Gpu- optimized version of the mujoco physics simulator. https: //github.com/google-deepmind/mujoco warp, 2025. Ac- cessed: 2025-11-19
2025
-
[13]
Ruben Grandia, Fabian Jenelten, Shaohui Yang, Far- bod Farshidian, and Marco Hutter. Perceptive locomo- tion through nonlinear model-predictive control.IEEE Transactions on Robotics, 39(5):3402–3421, 2023. doi: 10.1109/TRO.2023.3275384
-
[14]
2016, arXiv e-prints, arXiv:1604.00772, doi: 10.48550/arXiv.1604.00772
Nikolaus Hansen. The cma evolution strategy: A tutorial. arXiv preprint arXiv:1604.00772, 2016
-
[15]
Predictive sampling: Real-time behaviour synthesis with mujoco,
Taylor Howell, Nimrod Gileadi, Saran Tunyasuvunakool, Kevin Zakka, Tom Erez, and Yuval Tassa. Predictive Sampling: Real-time Behaviour Synthesis with MuJoCo. arXiv preprint arXiv:2212.00541, 2022
-
[16]
Bellicoso, Vassilios Tsounis, Vladlen Koltun, and Marco Hutter
Jemin Hwangbo, Joonho Lee, Alexey Dosovitskiy, Dario C. Bellicoso, Vassilios Tsounis, Vladlen Koltun, and Marco Hutter. Learning agile and dynamic motor skills for legged robots.Science Robotics, 4(26), 2019. doi: 10.1126/scirobotics.aau5872
-
[17]
D. H. Jacobson and D. Q. Mayne.Differential Dynamic Programming. Elsevier, 1970
1970
-
[18]
Arnav Kumar Jain, Vibhakar Mohta, Subin Kim, Atiksh Bhardwaj, Juntao Ren, Yunhai Feng, Sanjiban Choud- hury, and Gokul Swamy. A smooth sea never made a skilled sailor: Robust imitation via learning to search. arXiv preprint arXiv:2506.05294, 2025. URL https: //arxiv.org/pdf/2506.05294
-
[19]
Planning with diffusion for flexible behavior synthesis
Michael Janner, Yilun Du, Joshua Tenenbaum, and Sergey Levine. Planning with diffusion for flexible behavior synthesis. InInternational Conference on Machine Learning, 2022
2022
-
[20]
Dtc: Deep tracking control.Sci- ence Robotics, 9(86):eadh5401, 2024
Fabian Jenelten, Junzhe He, Farbod Farshidian, and Marco Hutter. Dtc: Deep tracking control.Sci- ence Robotics, 9(86):eadh5401, 2024. doi: 10.1126/ scirobotics.adh5401
2024
-
[21]
Residual mpc: Blending reinforcement learning with gpu-parallelized model predictive control
Se Hwan Jeon, Ho Jae Lee, Seungwoo Hong, and Sangbae Kim. Residual mpc: Blending reinforcement learning with gpu-parallelized model predictive control. arXiv preprint arXiv:2510.12717, 2025. URL https: //arxiv.org/abs/2510.12717
-
[22]
Matthew Kelly. An introduction to trajectory optimiza- tion: How to do your own direct collocation.SIAM Re- view, 59(4):849–904, 2017. doi: 10.1137/16M1062569
-
[23]
Li, Preston Culbertson, Vince Kurtz, and Aaron D
Albert H. Li, Preston Culbertson, Vince Kurtz, and Aaron D. Ames. Drop: Dexterous reorientation via online planning.arXiv preprint arXiv:2409.14562, 2024. URL https://arxiv.org/abs/2409.14562
-
[24]
Li, Brandon Hung, Aaron D
Albert H. Li, Brandon Hung, Aaron D. Ames, Jiuguang Wang, Simon Le Cleac’h, and Preston Culbertson. Judo: A user-friendly open-source package for sampling-based model predictive control. InProceedings of the Workshop on Fast Motion Planning and Control in the Era of Parallelism at Robotics: Science and Systems (RSS),
-
[25]
URL https://github.com/bdaiinstitute/judo
-
[26]
Iterative linear quadratic regulator design for nonlinear biological move- ment systems
Weiwei Li and Emanuel Todorov. Iterative linear quadratic regulator design for nonlinear biological move- ment systems. InProceedings of the First Inter- national Conference on Informatics in Control, Au- tomation and Robotics (ICINCO), volume 1, pages 222–229. INSTICC, SciTePress, 2004. doi: 10.5220/ 0001143902220229
2004
-
[27]
Yuhan Li, Peiyuan Zhi, Yunshen Wang, Tengyu Liu, Sixu Yan, Wenyu Liu, Xinggang Wang, Baoxiong Jia, and Siyuan Huang. Omnitrack: General motion track- ing via physics-consistent reference.arXiv preprint arXiv:2602.23832, 2026. URL https://arxiv.org/pdf/2602. 23832
-
[28]
Beyondmimic: From mo- tion tracking to versatile humanoid control via guided diffusion,
Qiayuan Liao, Takara E. Truong, Xiaoyu Huang, Yu- man Gao, Guy Tevet, Koushil Sreenath, and C. Karen Liu. Beyondmimic: From motion tracking to versatile humanoid control via guided diffusion.arXiv preprint arXiv:2508.08241, 2025
-
[29]
Guoqing Ma, Siheng Wang, Zeyu Zhang, Shan Yu, and Hao Tang
Zhengyi Luo, Ye Yuan, Tingwu Wang, Chenran Li, Sirui Chen, Fernando Casta ˜neda, Zi-Ang Cao, Jiefeng Li, David Minor, Qingwei Ben, Xingye Da, Runyu Ding, Cyrus Hogg, Lina Song, Edy Lim, Eugene Jeong, Tairan He, Haoru Xue, Wenli Xiao, Zi Wang, Simon Yuen, Jan Kautz, Yan Chang, Umar Iqbal, Linxi ”Jim” Fan, and Yuke Zhu. Sonic: Supersizing motion tracking fo...
-
[30]
Eureka: Human-Level Reward Design via Coding Large Language Models
Yecheng Jason Ma, William Liang, Guanzhi Wang, De- An Huang, Osbert Bastani, Dinesh Jayaraman, Yuke Zhu, Linxi Fan, and Anima Anandkumar. Eureka: Human- level reward design via coding large language models. arXiv preprint arXiv: Arxiv-2310.12931, 2023
work page internal anchor Pith review arXiv 2023
-
[31]
Mayank Mittal, Calvin Yu, Qinxi Yu, Jingzhou Liu, Nikita Rudin, David Hoeller, Jia Lin Yuan, Ritvik Singh, Yunrong Guo, Hammad Mazhar, Ajay Mandlekar, Buck Babich, Gavriel State, Marco Hutter, and Animesh Garg. Orbit: A unified simulation framework for interactive robot learning environments.IEEE Robotics and Au- tomation Letters, 8(6):3740–3747, 2023. do...
-
[32]
mjlab: Isaac lab api, powered by mujoco-warp, for rl and robotics research
MuJoCo Lab Contributors. mjlab: Isaac lab api, powered by mujoco-warp, for rl and robotics research. https:// github.com/mujocolab/mjlab, 2025. Accessed: 2025-11- 20
2025
-
[33]
Mitsuhiko Nakamoto, Oier Mees, Aviral Kumar, and Sergey Levine. Steering your generalists: Improving robotic foundation models via value guidance.arXiv preprint arXiv:2410.13816, 2024. URL https://arxiv.org/ abs/2410.13816
-
[34]
Policy invariance under reward transformations: Theory and application to reward shaping
Andrew Y Ng, Daishi Harada, and Stuart Russell. Policy invariance under reward transformations: Theory and application to reward shaping. InProceedings of the Sixteenth International Conference on Machine Learning (ICML), pages 278–287, 1999
1999
-
[35]
Model-based diffusion for trajectory optimization
Chaoyi Pan, Zeji Yi, Guanya Shi, and Guannan Qu. Model-based diffusion for trajectory optimization. 2024. URL https://arxiv.org/abs/2407.01573
-
[36]
Haozhi Qi, Brent Yi, Mike Lambeta, Yi Ma, Roberto Calandra, and Jitendra Malik. From simple to complex skills: The case of in-hand object reorientation.arXiv preprint arXiv:2501.05439, 2025
-
[37]
Junbin Qiu, Zhengpeng Xie, Xiangda Yan, Yongjie Yang, and Yao Shu. Zeroth-order optimization is se- cretly single-step policy optimization.arXiv preprint arXiv:2506.14460, 2025
-
[38]
Proximal Policy Optimization Algorithms
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[39]
Humanoidbench: Simulated humanoid benchmark for whole-body locomo- tion and manipulation
Carmelo Sferrazza, Dun-Ming Huang, Xingyu Lin, Youngwoon Lee, and Pieter Abbeel. Humanoidbench: Simulated humanoid benchmark for whole-body locomo- tion and manipulation. 2024
2024
-
[40]
Versatile multicontact planning and control for legged loco-manipulation.Science Robotics, 8(81):eadg5014,
Jean-Pierre Sleiman, Farbod Farshidian, and Marco Hut- ter. Versatile multicontact planning and control for legged loco-manipulation.Science Robotics, 8(81):eadg5014,
-
[41]
doi: 10.1126/scirobotics.adg5014
-
[42]
Terry Suh, Tao Pang, Tong Zhao, and Russ Tedrake
H.J. Terry Suh, Tao Pang, Tong Zhao, and Russ Tedrake. Dexterous contact-rich manipulation via the contact trust region.arXiv preprint arXiv:2505.02291, 2025
-
[43]
Domain random- ization for transferring deep neural networks from simu- lation to the real world
Josh Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Wojciech Zaremba, and Pieter Abbeel. Domain random- ization for transferring deep neural networks from simu- lation to the real world. In2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 23–30. IEEE, 2017
2017
-
[44]
Emanuel Todorov, Tom Erez, and Yuval Tassa. Mujoco: A physics engine for model-based control.IEEE/RSJ International Conference on Intelligent Robots and Sys- tems, pages 5026–5033, 2012. doi: 10.1109/IROS.2012. 6386109
-
[45]
Zhang, Jon Arrizabalaga, Stefan Schaal, Yuval Tassa, Tom Erez, and Zachary Manch- ester
Kevin Tracy, John Z. Zhang, Jon Arrizabalaga, Stefan Schaal, Yuval Tassa, Tom Erez, and Zachary Manch- ester. The trajectory bundle method: Unifying sequential- convex programming and sampling-based trajectory op- timization.arXiv preprint arXiv:2509.26575, 2025
-
[46]
Grady Williams, Paul Drews, Brian Goldfain, James M. Rehg, and Evangelos A. Theodorou. Aggressive driv- ing with model predictive path integral control. In 2016 IEEE International Conference on Robotics and Automation (ICRA), pages 1433–1440, May 2016. doi: 10.1109/ICRA.2016.7487277
-
[47]
Rehg, and Evangelos A
Grady Williams, Paul Drews, Brian Goldfain, James M. Rehg, and Evangelos A. Theodorou. Information- Theoretic Model Predictive Control: Theory and Appli- cations to Autonomous Driving.IEEE Transactions on Robotics, 34(6):1603–1622, December 2018. ISSN 1941-
2018
-
[48]
doi: 10.1109/TRO.2018.2865891
-
[49]
Lujie Yang, Xiaoyu Huang, Zhen Wu, Angjoo Kanazawa, Pieter Abbeel, Carmelo Sferrazza, C. Karen Liu, Rocky Duan, and Guanya Shi. Omniretarget: Interaction- preserving data generation for humanoid whole-body loco-manipulation and scene interaction. 2025. URL https://arxiv.org/abs/2509.26633
-
[50]
Yuxiang Yang, Guanya Shi, Xiangyun Meng, Wenhao Yu, Tingnan Zhang, Jie Tan, and Byron Boots. Cajun: Continuous adaptive jumping using a learned centroidal controller.arXiv preprint arXiv:2306.09557, 2023. URL https://arxiv.org/abs/2306.09557
-
[51]
Language to rewards for robotic skill synthesis
Wenhao Yu, Nimrod Gileadi, Chuyuan Fu, Sean Kirmani, Kuang-Huei Lee, Montse Gonzalez Arenas, Hao-Tien Lewis Chiang, Tom Erez, Leonard Hasenclever, Jan Humplik, Brian Ichter, Ted Xiao, Peng Xu, Andy Zeng, Tingnan Zhang, Nicolas Heess, Dorsa Sadigh, Jie Tan, Yuval Tassa, and Fei Xia. Language to rewards for robotic skill synthesis.Arxiv preprint arXiv:2306....
-
[52]
Zhang, Shuo Yang, Gengshan Yang, Arun L
John Z. Zhang, Shuo Yang, Gengshan Yang, Arun L. Bishop, Swaminathan Gurumurthy, Deva Ramanan, and Zachary Manchester. Slomo: A general system for legged robot motion imitation from casual videos. 2023. URL https://ieeexplore.ieee.org/abstract/document/10246373
-
[53]
John Z. Zhang, Taylor A. Howell, Zeji Yi, Chaoyi Pan, Guanya Shi, Guannan Qu, Tom Erez, Yuval Tassa, and Zachary Manchester. Whole-body model-predictive control of legged robots with mujoco. 2025. URL https://arxiv.org/abs/2503.04613
-
[54]
Wei Zhang, Han Wang, Carsten Hartmann, Marcus We- ber, and Christof Sch ¨utte. Applications of the cross- entropy method to importance sampling and optimal con- trol of diffusions.SIAM Journal on Scientific Computing, 36(6):A2654–A2672, 2014. doi: 10.1137/14096493X. URL https://doi.org/10.1137/14096493X
-
[55]
Xinghao Zhu, Yuxin Chen, Lingfeng Sun, Farzad Niroui, Simon Le Cleac’h, Jiuguang Wang, and Kuan Fang. Relic: Versatile loco-manipulation through flexible in- terlimb coordination.arXiv preprint arXiv:2506.07876, 2025. APPENDIX A. Task Rewards Here, we provide a detailed description of the task rewards from the demonstration section as deployed on the Spot...
-
[56]
The desired positionsp des are computed dynamically based on the tire position, encouraging the robot to position itself around the tire for manipulation
Spot Tasks: Tire Upright:The tire upright task uses a cost function with proximity terms to guide the robot’s end-effectors toward the object, an orientation term to upright the tire, and regulariza- tion terms: JTire Upright =w orient ·exp (|y tire z|/σ) +w gripper · ∥pgripper −p des gripper∥2 +w foot ·min ∥pfr −p des fr ∥2,∥p fl −p des fl ∥2 +w torso · ...
-
[57]
The control cost penalizes base velocity and arm deviation from default poses
G1 Tasks: Box Pushing:The G1 box pushing task uses goal-reaching, orientation, and bimanual proximity terms: JG1 Box =w goal · ∥pbox −p goal∥2 +w orient · |1−y box ·z world| +w hand ·min (∥p left −p box∥,∥p right −p box∥) −w pelvis · ∥ppelvis −p box∥2 −w facing ·x robot ·x world +w ctrl ·(∥v base∥2 +∥q arm −q default arm ∥2) +J safety (14) wherey box is t...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.