HANDOFF: Humanoid Agentic Task-Space Whole-Body Control via Distilled Complementary Teachers
Pith reviewed 2026-06-28 01:09 UTC · model grok-4.3
The pith
A single distilled controller lets humanoids perform diverse loco-manipulation tasks from natural language without task-specific fine-tuning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
HANDOFF is a single humanoid whole-body controller distilled via multi-teacher KL distillation under a context-conditioned gating scheme into a mixture-of-experts student from three complementary specialists: whole-body motion tracking with safety-filtered data, locomotion, and fall-recovery. On the Unitree G1 it matches state-of-the-art velocity tracking and offers one of the largest robust manipulation workspaces while supporting multiple natural-language-driven task roll-outs powered by a VLM-driven agentic planner with no task-specific data or controller fine-tuning.
What carries the argument
The context-conditioned gating scheme that routes among the distilled behaviors of the three specialist teachers inside the mixture-of-experts student policy.
If this is right
- The controller matches state-of-the-art performance on velocity tracking.
- It provides one of the largest robust manipulation workspaces demonstrated on the Unitree G1.
- It supports multiple natural-language-driven task executions through a VLM agentic planner.
- No task-specific data collection or controller fine-tuning is required for new behaviors.
Where Pith is reading between the lines
- The same distillation approach might scale to additional teachers covering skills such as precise object placement or dynamic balance recovery.
- The compact command interface could allow planners other than VLMs to generate whole-body references more easily than dense kinematic trajectories.
- Hardware success on the G1 suggests the method could transfer to other humanoid platforms that share similar actuation and sensing.
Load-bearing premise
The three specialist teachers are complementary enough that distilling them under the gating scheme yields one generalist controller able to handle combined loco-manipulation skills without any task-specific data or fine-tuning.
What would settle it
Hardware trials in which the distilled controller cannot execute a combined locomotion-plus-manipulation sequence that none of the individual teachers could produce on its own, even when the gating network is active.
Figures
read the original abstract
For a humanoid robot to be deployed in the real world, the choice of command space (i.e., the interface between task planning and whole-body control) is crucial. Existing whole-body controllers typically demand dense kinematic or spatial references that planners struggle to synthesize from task semantics. We instead propose a compact, explicit interface that is intuitive, general, modular, and expressive enough for diverse loco-manipulation skills. To this end, we introduce HANDOFF, a single humanoid whole-body controller that follows this interface and is distilled via multi-teacher KL distillation under a context-conditioned gating scheme into a mixture-of-experts student from three complementary specialists: whole-body motion tracking with safety-filtered data, locomotion, and fall-recovery. On the Unitree G1, HANDOFF matches state-of-the-art velocity tracking and offers one of the largest robust manipulation workspaces. We further demonstrate hardware feasibility through multiple natural-language-driven task roll-outs, powered by a VLM-driven agentic planner with no task-specific data or controller fine-tuning.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces HANDOFF, a single humanoid whole-body controller for the Unitree G1 that follows a compact task-space interface and is obtained by distilling three complementary specialist teachers (whole-body motion tracking with safety-filtered data, locomotion, and fall-recovery) via multi-teacher KL distillation under a context-conditioned gating scheme into a mixture-of-experts student policy. The central claims are that this yields state-of-the-art velocity tracking, one of the largest robust manipulation workspaces, and successful hardware roll-outs of diverse natural-language-driven loco-manipulation tasks powered by a VLM-based agentic planner, all without task-specific data collection or controller fine-tuning.
Significance. If the hardware results and generality claims hold, the work would be significant for humanoid robotics by providing an intuitive, modular command interface that bridges high-level task planners (including VLMs) with low-level whole-body control, thereby reducing reliance on dense kinematic references or per-task retraining. The distillation approach from complementary teachers and the reported absence of task-specific fine-tuning are notable strengths that could enable more scalable agentic deployment on physical platforms.
minor comments (2)
- [Abstract] The abstract states that HANDOFF 'matches state-of-the-art velocity tracking' and offers 'one of the largest robust manipulation workspaces' but provides no numerical values, baselines, or error metrics; adding these (even in summary form) would strengthen the presentation of the empirical claims.
- The description of the context-conditioned gating scheme and the mixture-of-experts student would benefit from an explicit diagram or pseudocode in the methods section to clarify how the three teachers are combined at inference time.
Simulated Author's Rebuttal
We thank the referee for their review. The report accurately summarizes the HANDOFF controller, its distillation method, and the hardware results on the Unitree G1. We appreciate the positive note on significance if the claims hold, and the recognition of the distillation from complementary teachers and lack of task-specific fine-tuning as strengths. No major comments were listed under the MAJOR COMMENTS section.
Circularity Check
No significant circularity identified
full rationale
The paper describes an empirical method for distilling a generalist whole-body controller from three specialist teachers (motion tracking, locomotion, fall-recovery) via KL distillation and context-conditioned gating, with claims validated through hardware experiments on the Unitree G1. No mathematical derivations, equations, or parameter-fitting steps are presented that reduce any prediction or result to its inputs by construction. The approach relies on standard distillation techniques and external VLM planning without self-referential definitions or load-bearing self-citations that collapse the central claim. The derivation chain is self-contained against empirical benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Z. Gu, J. Li, W. Shen, W. Yu, Z. Xie, S. McCrory, X. Cheng, A. Shamsah, R. Griffin, C. K. Liu, et al. Humanoid locomotion and manipulation: Current progress and challenges in control, planning, and learning.IEEE/ASME Transactions on Mechatronics, 31(2):2300–2330, 2026
2026
-
[2]
Z. Luo, Y . Yuan, T. Wang, C. Li, S. Chen, F. Castaneda, Z.-A. Cao, J. Li, D. Minor, Q. Ben, et al. Sonic: Supersizing motion tracking for natural humanoid whole-body control.arXiv preprint arXiv:2511.07820, 2025
Pith/arXiv arXiv 2025
-
[3]
Q. Liao, T. E. Truong, X. Huang, Y . Gao, G. Tevet, K. Sreenath, and C. K. Liu. Beyondmimic: From motion tracking to versatile humanoid control via guided diffusion.arXiv preprint arXiv:2508.08241, 2025
Pith/arXiv arXiv 2025
-
[4]
Y . Ze, S. Zhao, W. Wang, A. Kanazawa, R. Duan, P. Abbeel, G. Shi, J. Wu, and C. K. Liu. Twist2: Scalable, portable, and holistic humanoid data collection system.arXiv preprint arXiv:2511.02832, 2025
arXiv 2025
-
[5]
Ichter, A
B. Ichter, A. Brohan, Y . Chebotar, C. Finn, K. Hausman, A. Herzog, D. Ho, J. Ibarz, A. Irpan, E. Jang, et al. Do as i can, not as i say: Grounding language in robotic affordances. In K. Liu, D. Kulic, and J. Ichnowski, editors,Proceedings of The 6th Conference on Robot Learning, volume 205 ofProceedings of Machine Learning Research, pages 287–318. PMLR, 2023
2023
-
[6]
Driess, F
D. Driess, F. Xia, M. S. M. Sajjadi, C. Lynch, A. Chowdhery, B. Ichter, A. Wahid, J. Tompson, Q. Vuong, T. Yu, et al. PaLM-e: An embodied multimodal language model. In A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato, and J. Scarlett, editors,Proceedings of the 40th International Conference on Machine Learning, volume 202 ofProceedings of Machine...
2023
-
[7]
Zitkovich, T
B. Zitkovich, T. Yu, S. Xu, P. Xu, T. Xiao, F. Xia, J. Wu, P. Wohlhart, S. Welker, A. Wahid, et al. Rt-2: Vision-language-action models transfer web knowledge to robotic control. In J. Tan, M. Toussaint, and K. Darvish, editors,Proceedings of The 7th Conference on Robot Learning, volume 229 ofProceedings of Machine Learning Research, pages 2165–2183. PMLR, 2023
2023
-
[8]
M. J. Kim, K. Pertsch, S. Karamcheti, T. Xiao, A. Balakrishna, S. Nair, R. Rafailov, E. P. Foster, P. R. Sanketi, Q. Vuong, et al. Openvla: An open-source vision-language-action model. In P. Agrawal, O. Kroemer, and W. Burgard, editors,Proceedings of The 8th Conference on Robot Learning, volume 270 ofProceedings of Machine Learning Research, pages 2679–27...
2025
-
[9]
BONES-SEED: Skeletal everyday embodiment dataset.https://bones
Bones Studio. BONES-SEED: Skeletal everyday embodiment dataset.https://bones. studio/datasets/seed, 2026
2026
-
[10]
L. Yang, B. Werner, M. de Sa, and A. D. Ames. Cbf-rl: Safety filtering reinforcement learning in training with control barrier functions.arXiv preprint arXiv:2510.14959, 2025
Pith/arXiv arXiv 2025
-
[11]
X. B. Peng, Z. Ma, P. Abbeel, S. Levine, and A. Kanazawa. Amp: Adversarial motion priors for stylized physics-based character control.ACM Transactions on Graphics (ToG), 40(4): 1–20, 2021
2021
-
[12]
AMP mjlab: G1 AMP motion control on mjlab + rsl rl.https://github.com/ ccrpRepo/AMP_mjlab, 2025
ccrpRepo. AMP mjlab: G1 AMP motion control on mjlab + rsl rl.https://github.com/ ccrpRepo/AMP_mjlab, 2025. 9
2025
-
[13]
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017
Pith/arXiv arXiv 2017
-
[14]
Hinton, O
G. Hinton, O. Vinyals, and J. Dean. Distilling the knowledge in a neural network.stat, 1050: 9, 2015
2015
-
[15]
N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. Le, G. Hinton, and J. Dean. Outra- geously large neural networks: The sparsely-gated mixture-of-experts layer.arXiv preprint arXiv:1701.06538, 2017
Pith/arXiv arXiv 2017
-
[16]
Q. Ben, F. Jia, J. Zeng, J. Dong, D. Lin, and J. Pang. HOMIE: Humanoid Loco-Manipulation with Isomorphic Exoskeleton Cockpit. InProceedings of Robotics: Science and Systems, LosAngeles, CA, USA, June 2025
2025
- [17]
-
[18]
H. Xue, X. Huang, D. Niu, Q. Liao, T. Kragerud, J. T. Gravdahl, X. B. Peng, G. Shi, T. Dar- rell, K. Sreenath, et al. Leverb: Humanoid whole-body control with latent vision-language instruction.arXiv preprint arXiv:2506.13751, 2025
arXiv 2025
-
[19]
X. B. Peng, P. Abbeel, S. Levine, and M. Van de Panne. Deepmimic: Example-guided deep re- inforcement learning of physics-based character skills.ACM Transactions On Graphics (TOG), 37(4):1–14, 2018
2018
-
[20]
Z. Fu, Q. Zhao, Q. Wu, G. Wetzstein, and C. Finn. Humanplus: Humanoid shadowing and imitation from humans. InConference on Robot Learning, pages 2828–2844. PMLR, 2025
2025
-
[21]
M. Ji, X. Peng, F. Liu, J. Li, G. Yang, X. Cheng, and X. Wang. Exbody2: Advanced expressive humanoid whole-body control.arXiv preprint arXiv:2412.13196, 2024
arXiv 2024
-
[22]
Z. Chen, M. Ji, X. Cheng, X. Peng, X. B. Peng, and X. Wang. Gmt: General motion tracking for humanoid whole-body control.arXiv preprint arXiv:2506.14770, 2025
arXiv 2025
-
[23]
T. He, W. Xiao, T. Lin, Z. Luo, Z. Xu, Z. Jiang, J. Kautz, C. Liu, G. Shi, X. Wang, et al. Hover: Versatile neural whole-body controller for humanoid robots. In2025 IEEE International Con- ference on Robotics and Automation (ICRA), pages 9989–9996. IEEE, 2025
2025
-
[24]
S. Zhao, Y . Ze, Y . Wang, C. K. Liu, P. Abbeel, G. Shi, and R. Duan. Resmimic: From gen- eral motion tracking to humanoid whole-body loco-manipulation via residual learning.arXiv preprint arXiv:2510.05070, 2025
arXiv 2025
-
[25]
S. Yin, Y . Ze, H.-X. Yu, C. K. Liu, and J. Wu. Visualmimic: Visual humanoid loco- manipulation via motion tracking and generation.arXiv preprint arXiv:2509.20322, 2025
arXiv 2025
-
[26]
F. Liu, Z. Gu, Y . Cai, Z. Zhou, H. Jung, J. Jang, S. Zhao, S. Ha, Y . Chen, D. Xu, et al. Opt2skill: Imitating dynamically-feasible whole-body trajectories for versatile humanoid loco- manipulation.IEEE Robotics and Automation Letters, 2025
2025
-
[27]
L. Yang, X. Huang, Z. Wu, A. Kanazawa, P. Abbeel, C. Sferrazza, C. K. Liu, R. Duan, and G. Shi. Omniretarget: Interaction-preserving data generation for humanoid whole-body loco- manipulation and scene interaction.arXiv preprint arXiv:2509.26633, 2025
Pith/arXiv arXiv 2025
-
[28]
Penco, B
L. Penco, B. Cl ´ement, V . Modugno, E. M. Hoffman, G. Nava, D. Pucci, N. G. Tsagarakis, J.-B. Mouret, and S. Ivaldi. Robust real-time whole-body motion retargeting from human to humanoid. In2018 IEEE-RAS 18th International Conference on Humanoid Robots (Hu- manoids), pages 425–432. IEEE, 2018. 10
2018
-
[29]
J. P. Araujo, Y . Ze, P. Xu, J. Wu, and C. K. Liu. Retargeting matters: General motion retargeting for humanoid motion tracking.arXiv preprint arXiv:2510.02252, 2025
arXiv 2025
-
[30]
J. Li, X. Cheng, T. Huang, S. Yang, R.-Z. Qiu, and X. Wang. AMO: Adaptive Motion Opti- mization for Hyper-Dexterous Humanoid Whole-Body Control. InProceedings of Robotics: Science and Systems, LosAngeles, CA, USA, June 2025. doi:10.15607/RSS.2025.XXI.061
-
[31]
T. He, Z. Luo, X. He, W. Xiao, C. Zhang, W. Zhang, K. M. Kitani, C. Liu, and G. Shi. Om- nih2o: Universal and dexterous human-to-humanoid whole-body teleoperation and learning. InConference on Robot Learning, pages 1516–1540. PMLR, 2025
2025
-
[32]
R. Dong, Z. Li, X. He, and S. Gupta. Learning humanoid end-effector control for open- vocabulary visual loco-manipulation.arXiv preprint arXiv:2602.16705, 2026
Pith/arXiv arXiv 2026
-
[33]
Zhang, C
Z. Zhang, C. Chen, H. Xue, J. Wang, S. Liang, Y . Liu, Z. Zhang, H. Wang, and L. Yi. Un- leashing humanoid reaching potential via real-world-ready skill space.IEEE Robotics and Automation Letters, 11(2):2082–2089, 2025
2082
-
[34]
Y . Fu, F. Xie, C. Xu, J. Xiong, H. Yuan, and Z. Lu. Demohlm: From one demonstration to generalizable humanoid loco-manipulation.arXiv preprint arXiv:2510.11258, 2025
arXiv 2025
-
[35]
R. Nai, B. Zheng, J. Zhao, H. Zhu, S. Dai, Z. Chen, Y . Hu, Y . Hu, T. Zhang, C. Wen, et al. Hu- manoid manipulation interface: Humanoid whole-body manipulation from robot-free demon- strations.arXiv preprint arXiv:2602.06643, 2026
arXiv 2026
-
[36]
Z. Su, B. Zhang, N. Rahmanian, Y . Gao, Q. Liao, C. Regan, K. Sreenath, and S. S. Sastry. Hitter: A humanoid table tennis robot via hierarchical planning and learning.arXiv preprint arXiv:2508.21043, 2025
arXiv 2025
-
[37]
J. Dao, H. Duan, and A. Fern. Sim-to-real learning for humanoid box loco-manipulation. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 16930– 16936. IEEE, 2024
2024
- [38]
-
[39]
S. Wei, H. Jing, B. Li, Z. Zhao, J. Mao, Z. Ni, S. He, J. Liu, X. Liu, K. Kang, et al.Ψ 0: An open foundation model towards universal humanoid loco-manipulation.arXiv preprint arXiv:2603.12263, 2026
arXiv 2026
-
[40]
H. Yuan, Y . Bai, Y . Fu, B. Zhou, Y . Feng, X. Xu, Y . Zhan, B. F. Karlsson, and Z. Lu. Being-0: A humanoid robotic agent with vision-language models and modular skills.arXiv preprint arXiv:2503.12533, 2025
arXiv 2025
-
[41]
Y . Zhao, X. Wang, D. Wang, X. Liu, D. Lu, Q. Han, P. Liu, and C. Bai. Towards adaptive humanoid control via multi-behavior distillation and reinforced fine-tuning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 18818–18826, 2026
2026
-
[42]
Y . Wang, M. Yang, G. Ding, Y . Zhang, W. Zeng, X. Xu, H. Jiang, and Z. Lu. From experts to a generalist: Toward general whole-body control for humanoid robots.Advances in Neural Information Processing Systems, 38:147748–147772, 2026
2026
-
[43]
Q. Peng, Y . Lin, Y . Xue, J. Pang, and W. Zhang. Embodiment-aware generalist specialist distillation for unified humanoid whole-body control.arXiv preprint arXiv:2602.02960, 2026
arXiv 2026
-
[44]
Z. Wu, X. Huang, L. Yang, Y . Zhang, K. Sreenath, X. Chen, P. Abbeel, R. Duan, A. Kanazawa, C. Sferrazza, et al. Perceptive humanoid parkour: Chaining dynamic human skills via motion matching.arXiv preprint arXiv:2602.15827, 2026. 11
Pith/arXiv arXiv 2026
-
[45]
J. Li, B. Tang, and F. Wu. Telegate: Whole-body humanoid teleoperation via gated expert selection with motion prior.arXiv preprint arXiv:2602.09628, 2026
Pith/arXiv arXiv 2026
-
[46]
Tessler, Y
C. Tessler, Y . Guo, O. Nabati, G. Chechik, and X. B. Peng. Maskedmimic: Unified physics- based character control through masked motion inpainting.ACM Transactions On Graphics (TOG), 43(6):1–21, 2024
2024
-
[47]
Pinto, M
L. Pinto, M. Andrychowicz, P. Welinder, W. Zaremba, and P. Abbeel. Asymmetric actor critic for image-based robot learning.Robotics: Science and Systems XIV, 2018
2018
- [48]
- [49]
-
[50]
C. Schwarke, M. Mittal, N. Rudin, D. Hoeller, and M. Hutter. Rsl-rl: A learning library for robotics research.arXiv preprint arXiv:2509.10771, 2025
arXiv 2025
-
[51]
in recovery
K. Zakka. mink: Python inverse kinematics based on MuJoCo.https://github.com/ kevinzakka/mink, 2024. 12 A Observations This section enumerates the actor and critic observation groups used by each policy. Asymmetric actor-critic is in force throughout: anything in acritic-onlygroup is privileged information used to fit the value function and is unavailable...
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.