HANDOFF: Humanoid Agentic Task-Space Whole-Body Control via Distilled Complementary Teachers

Aaron Ames; Georgia Gkioxari; Gio Huh; Junheng Li; Lizhi Yang; Nehar Poddar; Robert Griffin; Yiling Hou

arxiv: 2606.06493 · v3 · pith:JWJLYJ4Fnew · submitted 2026-06-04 · 💻 cs.RO · cs.AI· cs.LG

HANDOFF: Humanoid Agentic Task-Space Whole-Body Control via Distilled Complementary Teachers

Lizhi Yang , Junheng Li , Nehar Poddar , Yiling Hou , Gio Huh , Robert Griffin , Georgia Gkioxari , Aaron Ames This is my paper

Pith reviewed 2026-06-28 01:09 UTC · model grok-4.3

classification 💻 cs.RO cs.AIcs.LG

keywords humanoid robotwhole-body controlknowledge distillationlocomotionmanipulationmixture of expertsagentic plannertask-space control

0 comments

The pith

A single distilled controller lets humanoids perform diverse loco-manipulation tasks from natural language without task-specific fine-tuning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents HANDOFF as a whole-body controller for humanoids that accepts a compact task-space command interface and is created by distilling three specialist teachers into one mixture-of-experts student. The teachers cover whole-body motion tracking with safety filtering, locomotion, and fall recovery; a context-conditioned gating scheme combines their behaviors during distillation via KL divergence. A sympathetic reader would care because this setup aims to let high-level planners, including vision-language models, generate commands directly from semantics and produce working hardware behaviors on the Unitree G1 across multiple skills.

Core claim

HANDOFF is a single humanoid whole-body controller distilled via multi-teacher KL distillation under a context-conditioned gating scheme into a mixture-of-experts student from three complementary specialists: whole-body motion tracking with safety-filtered data, locomotion, and fall-recovery. On the Unitree G1 it matches state-of-the-art velocity tracking and offers one of the largest robust manipulation workspaces while supporting multiple natural-language-driven task roll-outs powered by a VLM-driven agentic planner with no task-specific data or controller fine-tuning.

What carries the argument

The context-conditioned gating scheme that routes among the distilled behaviors of the three specialist teachers inside the mixture-of-experts student policy.

If this is right

The controller matches state-of-the-art performance on velocity tracking.
It provides one of the largest robust manipulation workspaces demonstrated on the Unitree G1.
It supports multiple natural-language-driven task executions through a VLM agentic planner.
No task-specific data collection or controller fine-tuning is required for new behaviors.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same distillation approach might scale to additional teachers covering skills such as precise object placement or dynamic balance recovery.
The compact command interface could allow planners other than VLMs to generate whole-body references more easily than dense kinematic trajectories.
Hardware success on the G1 suggests the method could transfer to other humanoid platforms that share similar actuation and sensing.

Load-bearing premise

The three specialist teachers are complementary enough that distilling them under the gating scheme yields one generalist controller able to handle combined loco-manipulation skills without any task-specific data or fine-tuning.

What would settle it

Hardware trials in which the distilled controller cannot execute a combined locomotion-plus-manipulation sequence that none of the individual teachers could produce on its own, even when the gating network is active.

Figures

Figures reproduced from arXiv: 2606.06493 by Aaron Ames, Georgia Gkioxari, Gio Huh, Junheng Li, Lizhi Yang, Nehar Poddar, Robert Griffin, Yiling Hou.

**Figure 1.** Figure 1: HANDOFF is a whole-body controller distilled from multiple teachers that accepts a compact, explicit 10-D planner-facing command. We demonstrate its effectiveness using a VLMpowered agentic planner that does not need extensive demonstration collection or model fine-tuning. Abstract: For a humanoid robot to be deployed in the real world, the choice of command space (i.e., the interface between task plannin… view at source ↗

**Figure 3.** Figure 3: System overview. We train 3 teachers separately: a 29-DoF WBC motion-tracking teacher on CoPfiltered retargeted clips, a 15-DoF body-slice locomotion teacher on flat terrain with curriculum-blended arm perturbations, and a 29-DoF fall-recovery teacher on locomotion + paired fall-recovery clips. The MoE student maps a 10-D command and 11-frame proprioception history to 29-DoF actions. Context-based action-… view at source ↗

**Figure 2.** Figure 2: CoP filtering. The raw retargeted motion dataset contains dynamically infeasible frames, which we correct with a closed-form CBF projection on the staticCoP margin before training. An example is shown here where the corrected reference (left) stays within the support polygon and the unfiltered version (right) drifts. We first describe the three teachers that supply complementary supervision, then the st… view at source ↗

**Figure 4.** Figure 4: Agentic deployment pipeline. A natural-language instruction (0.001 Hz) is decomposed into atomic tasks by a high-level reasoner (regex + LLM fallback). A VLM (0.1 Hz) projects 2D detections onto the RGB-D point cloud to emit pelvis-frame waypoints; a waypoint tracker produces (vx, vy, ωz), and near the target a skill selector emits root height z and bilateral wrist targets p P R/L (1–50 Hz) with a kinemati… view at source ↗

**Figure 5.** Figure 5: Agentic deployment snapshots with VLM text prompt. The same 10-D controller, driven by the agentic planner, executing a range of loco-manipulation tasks on the Unitree G1 hardware and in simulation: pick-and-place, pick-transport-place, squat-pick, bimanual-pick-and-hand-off, bilateral pick-and-place, and task continuation after fall recovery. No controller-side change, data collection or model fine-tunin… view at source ↗

**Figure 6.** Figure 6: Ablation progression: per-axis realized-versus-commanded velocity sweep across [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗

**Figure 7.** Figure 7: SOTA comparison: per-axis realized-versus-commanded velocity sweep across [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗

**Figure 8.** Figure 8: Bimanual wrist-workspace hulls in three orthogonal pelvis-frame views, forward half [PITH_FULL_IMAGE:figures/full_fig_p021_8.png] view at source ↗

**Figure 9.** Figure 9: Real-robot deployment platform. (a) Unitree G1 with bilateral Dex1-1 grippers and a head-mounted ZED-M stereo RGB-D camera. (b) Close-up of one Dex1-1 gripper, replacing the stock 3-finger hand. (c) Head-mounted ZED-M stereo RGB-D camera providing the RGB and depth frames consumed by the VLM. (d) Back-mounted Nvidia Jetson Thor and 140 W USB-PD powerbank that together drive the onboard RL controller, the … view at source ↗

read the original abstract

For a humanoid robot to be deployed in the real world, the choice of command space (i.e., the interface between task planning and whole-body control) is crucial. Existing whole-body controllers typically demand dense kinematic or spatial references that planners struggle to synthesize from task semantics. We instead propose a compact, explicit interface that is intuitive, general, modular, and expressive enough for diverse loco-manipulation skills. To this end, we introduce HANDOFF, a single humanoid whole-body controller that follows this interface and is distilled via multi-teacher KL distillation under a context-conditioned gating scheme into a mixture-of-experts student from three complementary specialists: whole-body motion tracking with safety-filtered data, locomotion, and fall-recovery. On the Unitree G1, HANDOFF matches state-of-the-art velocity tracking and offers one of the largest robust manipulation workspaces. We further demonstrate hardware feasibility through multiple natural-language-driven task roll-outs, powered by a VLM-driven agentic planner with no task-specific data or controller fine-tuning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

HANDOFF distills three specialist teachers into a single policy via context-gated KL for a compact humanoid interface, with hardware rollouts on the G1 that look practically useful.

read the letter

The core contribution is a compact task-space interface for whole-body humanoid control, achieved by distilling a mixture-of-experts student from three teachers (motion tracking with safety filter, locomotion, and fall recovery) using multi-teacher KL and context-conditioned gating. This lets the controller handle diverse loco-manipulation from high-level commands without task-specific data or fine-tuning.

The paper does a few things cleanly. The interface choice is explicit and modular, which addresses a real pain point when connecting planners to low-level controllers. The hardware section shows multiple natural-language tasks executed on the Unitree G1 via a VLM planner, and it reports matching SOTA velocity tracking plus a large robust workspace. Those results are the strongest part of the abstract.

The soft spots are mostly around evidence. The abstract gives no quantitative numbers, error bars, or ablation details on how much each teacher contributes or whether the gating actually prevents interference. Without those, it is difficult to judge how general the single policy really is or how close the workspace claim is to prior work. The complementarity assumption is stated but not stress-tested in the provided text.

This paper is for people working on humanoid loco-manipulation and planner-controller interfaces. A reader who needs a working compact command space and is willing to implement the distillation pipeline could extract value. The hardware feasibility is worth seeing even if the method is incremental.

I would send it to peer review. The engineering is grounded enough and the hardware results are concrete enough that referees can evaluate the claims properly, though they will likely ask for more ablations and metrics.

Referee Report

0 major / 2 minor

Summary. The paper introduces HANDOFF, a single humanoid whole-body controller for the Unitree G1 that follows a compact task-space interface and is obtained by distilling three complementary specialist teachers (whole-body motion tracking with safety-filtered data, locomotion, and fall-recovery) via multi-teacher KL distillation under a context-conditioned gating scheme into a mixture-of-experts student policy. The central claims are that this yields state-of-the-art velocity tracking, one of the largest robust manipulation workspaces, and successful hardware roll-outs of diverse natural-language-driven loco-manipulation tasks powered by a VLM-based agentic planner, all without task-specific data collection or controller fine-tuning.

Significance. If the hardware results and generality claims hold, the work would be significant for humanoid robotics by providing an intuitive, modular command interface that bridges high-level task planners (including VLMs) with low-level whole-body control, thereby reducing reliance on dense kinematic references or per-task retraining. The distillation approach from complementary teachers and the reported absence of task-specific fine-tuning are notable strengths that could enable more scalable agentic deployment on physical platforms.

minor comments (2)

[Abstract] The abstract states that HANDOFF 'matches state-of-the-art velocity tracking' and offers 'one of the largest robust manipulation workspaces' but provides no numerical values, baselines, or error metrics; adding these (even in summary form) would strengthen the presentation of the empirical claims.
The description of the context-conditioned gating scheme and the mixture-of-experts student would benefit from an explicit diagram or pseudocode in the methods section to clarify how the three teachers are combined at inference time.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their review. The report accurately summarizes the HANDOFF controller, its distillation method, and the hardware results on the Unitree G1. We appreciate the positive note on significance if the claims hold, and the recognition of the distillation from complementary teachers and lack of task-specific fine-tuning as strengths. No major comments were listed under the MAJOR COMMENTS section.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper describes an empirical method for distilling a generalist whole-body controller from three specialist teachers (motion tracking, locomotion, fall-recovery) via KL distillation and context-conditioned gating, with claims validated through hardware experiments on the Unitree G1. No mathematical derivations, equations, or parameter-fitting steps are presented that reduce any prediction or result to its inputs by construction. The approach relies on standard distillation techniques and external VLM planning without self-referential definitions or load-bearing self-citations that collapse the central claim. The derivation chain is self-contained against empirical benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the central claim rests on unstated assumptions about teacher complementarity and distillation effectiveness.

pith-pipeline@v0.9.1-grok · 5735 in / 1210 out tokens · 36023 ms · 2026-06-28T01:09:34.115768+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

51 extracted references · 1 canonical work pages

[1]

Z. Gu, J. Li, W. Shen, W. Yu, Z. Xie, S. McCrory, X. Cheng, A. Shamsah, R. Griffin, C. K. Liu, et al. Humanoid locomotion and manipulation: Current progress and challenges in control, planning, and learning.IEEE/ASME Transactions on Mechatronics, 31(2):2300–2330, 2026

2026
[2]

Z. Luo, Y . Yuan, T. Wang, C. Li, S. Chen, F. Castaneda, Z.-A. Cao, J. Li, D. Minor, Q. Ben, et al. Sonic: Supersizing motion tracking for natural humanoid whole-body control.arXiv preprint arXiv:2511.07820, 2025

Pith/arXiv arXiv 2025
[3]

Q. Liao, T. E. Truong, X. Huang, Y . Gao, G. Tevet, K. Sreenath, and C. K. Liu. Beyondmimic: From motion tracking to versatile humanoid control via guided diffusion.arXiv preprint arXiv:2508.08241, 2025

Pith/arXiv arXiv 2025
[4]

Y . Ze, S. Zhao, W. Wang, A. Kanazawa, R. Duan, P. Abbeel, G. Shi, J. Wu, and C. K. Liu. Twist2: Scalable, portable, and holistic humanoid data collection system.arXiv preprint arXiv:2511.02832, 2025

arXiv 2025
[5]

Ichter, A

B. Ichter, A. Brohan, Y . Chebotar, C. Finn, K. Hausman, A. Herzog, D. Ho, J. Ibarz, A. Irpan, E. Jang, et al. Do as i can, not as i say: Grounding language in robotic affordances. In K. Liu, D. Kulic, and J. Ichnowski, editors,Proceedings of The 6th Conference on Robot Learning, volume 205 ofProceedings of Machine Learning Research, pages 287–318. PMLR, 2023

2023
[6]

Driess, F

D. Driess, F. Xia, M. S. M. Sajjadi, C. Lynch, A. Chowdhery, B. Ichter, A. Wahid, J. Tompson, Q. Vuong, T. Yu, et al. PaLM-e: An embodied multimodal language model. In A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato, and J. Scarlett, editors,Proceedings of the 40th International Conference on Machine Learning, volume 202 ofProceedings of Machine...

2023
[7]

Zitkovich, T

B. Zitkovich, T. Yu, S. Xu, P. Xu, T. Xiao, F. Xia, J. Wu, P. Wohlhart, S. Welker, A. Wahid, et al. Rt-2: Vision-language-action models transfer web knowledge to robotic control. In J. Tan, M. Toussaint, and K. Darvish, editors,Proceedings of The 7th Conference on Robot Learning, volume 229 ofProceedings of Machine Learning Research, pages 2165–2183. PMLR, 2023

2023
[8]

M. J. Kim, K. Pertsch, S. Karamcheti, T. Xiao, A. Balakrishna, S. Nair, R. Rafailov, E. P. Foster, P. R. Sanketi, Q. Vuong, et al. Openvla: An open-source vision-language-action model. In P. Agrawal, O. Kroemer, and W. Burgard, editors,Proceedings of The 8th Conference on Robot Learning, volume 270 ofProceedings of Machine Learning Research, pages 2679–27...

2025
[9]

BONES-SEED: Skeletal everyday embodiment dataset.https://bones

Bones Studio. BONES-SEED: Skeletal everyday embodiment dataset.https://bones. studio/datasets/seed, 2026

2026
[10]

L. Yang, B. Werner, M. de Sa, and A. D. Ames. Cbf-rl: Safety filtering reinforcement learning in training with control barrier functions.arXiv preprint arXiv:2510.14959, 2025

Pith/arXiv arXiv 2025
[11]

X. B. Peng, Z. Ma, P. Abbeel, S. Levine, and A. Kanazawa. Amp: Adversarial motion priors for stylized physics-based character control.ACM Transactions on Graphics (ToG), 40(4): 1–20, 2021

2021
[12]

AMP mjlab: G1 AMP motion control on mjlab + rsl rl.https://github.com/ ccrpRepo/AMP_mjlab, 2025

ccrpRepo. AMP mjlab: G1 AMP motion control on mjlab + rsl rl.https://github.com/ ccrpRepo/AMP_mjlab, 2025. 9

2025
[13]

Schulman, F

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017

Pith/arXiv arXiv 2017
[14]

Hinton, O

G. Hinton, O. Vinyals, and J. Dean. Distilling the knowledge in a neural network.stat, 1050: 9, 2015

2015
[15]

Shazeer, A

N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. Le, G. Hinton, and J. Dean. Outra- geously large neural networks: The sparsely-gated mixture-of-experts layer.arXiv preprint arXiv:1701.06538, 2017

Pith/arXiv arXiv 2017
[16]

Q. Ben, F. Jia, J. Zeng, J. Dong, D. Lin, and J. Pang. HOMIE: Humanoid Loco-Manipulation with Isomorphic Exoskeleton Cockpit. InProceedings of Robotics: Science and Systems, LosAngeles, CA, USA, June 2025

2025
[17]

Zhang, Y

Y . Zhang, Y . Yuan, P. Gurunath, I. Gupta, S. Omidshafiei, A.-a. Agha-mohammadi, M. Vazquez-Chanlatte, L. Pedersen, T. He, and G. Shi. Falcon: Learning force-adaptive hu- manoid loco-manipulation.arXiv preprint arXiv:2505.06776, 2025

arXiv 2025
[18]

H. Xue, X. Huang, D. Niu, Q. Liao, T. Kragerud, J. T. Gravdahl, X. B. Peng, G. Shi, T. Dar- rell, K. Sreenath, et al. Leverb: Humanoid whole-body control with latent vision-language instruction.arXiv preprint arXiv:2506.13751, 2025

arXiv 2025
[19]

X. B. Peng, P. Abbeel, S. Levine, and M. Van de Panne. Deepmimic: Example-guided deep re- inforcement learning of physics-based character skills.ACM Transactions On Graphics (TOG), 37(4):1–14, 2018

2018
[20]

Z. Fu, Q. Zhao, Q. Wu, G. Wetzstein, and C. Finn. Humanplus: Humanoid shadowing and imitation from humans. InConference on Robot Learning, pages 2828–2844. PMLR, 2025

2025
[21]

M. Ji, X. Peng, F. Liu, J. Li, G. Yang, X. Cheng, and X. Wang. Exbody2: Advanced expressive humanoid whole-body control.arXiv preprint arXiv:2412.13196, 2024

arXiv 2024
[22]

Z. Chen, M. Ji, X. Cheng, X. Peng, X. B. Peng, and X. Wang. Gmt: General motion tracking for humanoid whole-body control.arXiv preprint arXiv:2506.14770, 2025

arXiv 2025
[23]

T. He, W. Xiao, T. Lin, Z. Luo, Z. Xu, Z. Jiang, J. Kautz, C. Liu, G. Shi, X. Wang, et al. Hover: Versatile neural whole-body controller for humanoid robots. In2025 IEEE International Con- ference on Robotics and Automation (ICRA), pages 9989–9996. IEEE, 2025

2025
[24]

S. Zhao, Y . Ze, Y . Wang, C. K. Liu, P. Abbeel, G. Shi, and R. Duan. Resmimic: From gen- eral motion tracking to humanoid whole-body loco-manipulation via residual learning.arXiv preprint arXiv:2510.05070, 2025

arXiv 2025
[25]

S. Yin, Y . Ze, H.-X. Yu, C. K. Liu, and J. Wu. Visualmimic: Visual humanoid loco- manipulation via motion tracking and generation.arXiv preprint arXiv:2509.20322, 2025

arXiv 2025
[26]

F. Liu, Z. Gu, Y . Cai, Z. Zhou, H. Jung, J. Jang, S. Zhao, S. Ha, Y . Chen, D. Xu, et al. Opt2skill: Imitating dynamically-feasible whole-body trajectories for versatile humanoid loco- manipulation.IEEE Robotics and Automation Letters, 2025

2025
[27]

L. Yang, X. Huang, Z. Wu, A. Kanazawa, P. Abbeel, C. Sferrazza, C. K. Liu, R. Duan, and G. Shi. Omniretarget: Interaction-preserving data generation for humanoid whole-body loco- manipulation and scene interaction.arXiv preprint arXiv:2509.26633, 2025

Pith/arXiv arXiv 2025
[28]

Penco, B

L. Penco, B. Cl ´ement, V . Modugno, E. M. Hoffman, G. Nava, D. Pucci, N. G. Tsagarakis, J.-B. Mouret, and S. Ivaldi. Robust real-time whole-body motion retargeting from human to humanoid. In2018 IEEE-RAS 18th International Conference on Humanoid Robots (Hu- manoids), pages 425–432. IEEE, 2018. 10

2018
[29]

J. P. Araujo, Y . Ze, P. Xu, J. Wu, and C. K. Liu. Retargeting matters: General motion retargeting for humanoid motion tracking.arXiv preprint arXiv:2510.02252, 2025

arXiv 2025
[30]

J. Li, X. Cheng, T. Huang, S. Yang, R.-Z. Qiu, and X. Wang. AMO: Adaptive Motion Opti- mization for Hyper-Dexterous Humanoid Whole-Body Control. InProceedings of Robotics: Science and Systems, LosAngeles, CA, USA, June 2025. doi:10.15607/RSS.2025.XXI.061

work page doi:10.15607/rss.2025.xxi.061 2025
[31]

T. He, Z. Luo, X. He, W. Xiao, C. Zhang, W. Zhang, K. M. Kitani, C. Liu, and G. Shi. Om- nih2o: Universal and dexterous human-to-humanoid whole-body teleoperation and learning. InConference on Robot Learning, pages 1516–1540. PMLR, 2025

2025
[32]

R. Dong, Z. Li, X. He, and S. Gupta. Learning humanoid end-effector control for open- vocabulary visual loco-manipulation.arXiv preprint arXiv:2602.16705, 2026

Pith/arXiv arXiv 2026
[33]

Zhang, C

Z. Zhang, C. Chen, H. Xue, J. Wang, S. Liang, Y . Liu, Z. Zhang, H. Wang, and L. Yi. Un- leashing humanoid reaching potential via real-world-ready skill space.IEEE Robotics and Automation Letters, 11(2):2082–2089, 2025

2082
[34]

Y . Fu, F. Xie, C. Xu, J. Xiong, H. Yuan, and Z. Lu. Demohlm: From one demonstration to generalizable humanoid loco-manipulation.arXiv preprint arXiv:2510.11258, 2025

arXiv 2025
[35]

R. Nai, B. Zheng, J. Zhao, H. Zhu, S. Dai, Z. Chen, Y . Hu, Y . Hu, T. Zhang, C. Wen, et al. Hu- manoid manipulation interface: Humanoid whole-body manipulation from robot-free demon- strations.arXiv preprint arXiv:2602.06643, 2026

arXiv 2026
[36]

Z. Su, B. Zhang, N. Rahmanian, Y . Gao, Q. Liao, C. Regan, K. Sreenath, and S. S. Sastry. Hitter: A humanoid table tennis robot via hierarchical planning and learning.arXiv preprint arXiv:2508.21043, 2025

arXiv 2025
[37]

J. Dao, H. Duan, and A. Fern. Sim-to-real learning for humanoid box loco-manipulation. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 16930– 16936. IEEE, 2024

2024
[38]

Jiang, J

H. Jiang, J. Chen, Q. Bu, L. Chen, M. Shi, Y . Zhang, D. Li, C. Suo, C. Wang, Z. Peng, et al. Wholebodyvla: Towards unified latent vla for whole-body loco-manipulation control.arXiv preprint arXiv:2512.11047, 2025

arXiv 2025
[39]

S. Wei, H. Jing, B. Li, Z. Zhao, J. Mao, Z. Ni, S. He, J. Liu, X. Liu, K. Kang, et al.Ψ 0: An open foundation model towards universal humanoid loco-manipulation.arXiv preprint arXiv:2603.12263, 2026

arXiv 2026
[40]

H. Yuan, Y . Bai, Y . Fu, B. Zhou, Y . Feng, X. Xu, Y . Zhan, B. F. Karlsson, and Z. Lu. Being-0: A humanoid robotic agent with vision-language models and modular skills.arXiv preprint arXiv:2503.12533, 2025

arXiv 2025
[41]

Y . Zhao, X. Wang, D. Wang, X. Liu, D. Lu, Q. Han, P. Liu, and C. Bai. Towards adaptive humanoid control via multi-behavior distillation and reinforced fine-tuning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 18818–18826, 2026

2026
[42]

Y . Wang, M. Yang, G. Ding, Y . Zhang, W. Zeng, X. Xu, H. Jiang, and Z. Lu. From experts to a generalist: Toward general whole-body control for humanoid robots.Advances in Neural Information Processing Systems, 38:147748–147772, 2026

2026
[43]

Q. Peng, Y . Lin, Y . Xue, J. Pang, and W. Zhang. Embodiment-aware generalist specialist distillation for unified humanoid whole-body control.arXiv preprint arXiv:2602.02960, 2026

arXiv 2026
[44]

Z. Wu, X. Huang, L. Yang, Y . Zhang, K. Sreenath, X. Chen, P. Abbeel, R. Duan, A. Kanazawa, C. Sferrazza, et al. Perceptive humanoid parkour: Chaining dynamic human skills via motion matching.arXiv preprint arXiv:2602.15827, 2026. 11

Pith/arXiv arXiv 2026
[45]

J. Li, B. Tang, and F. Wu. Telegate: Whole-body humanoid teleoperation via gated expert selection with motion prior.arXiv preprint arXiv:2602.09628, 2026

Pith/arXiv arXiv 2026
[46]

Tessler, Y

C. Tessler, Y . Guo, O. Nabati, G. Chechik, and X. B. Peng. Maskedmimic: Unified physics- based character control through masked motion inpainting.ACM Transactions On Graphics (TOG), 43(6):1–21, 2024

2024
[47]

Pinto, M

L. Pinto, M. Andrychowicz, P. Welinder, W. Zaremba, and P. Abbeel. Asymmetric actor critic for image-based robot learning.Robotics: Science and Systems XIV, 2018

2018
[48]

Poddar, S

N. Poddar, S. McCrory, L. Penco, G. Clark, H. E. Svil, and R. Griffin. Embedding classical balance control principles in reinforcement learning for humanoid recovery.arXiv preprint arXiv:2603.08619, 2026

arXiv 2026
[49]

Zakka, Q

K. Zakka, Q. Liao, B. Yi, L. L. Lay, K. Sreenath, and P. Abbeel. mjlab: A lightweight frame- work for gpu-accelerated robot learning.arXiv preprint arXiv:2601.22074, 2026

arXiv 2026
[50]

Schwarke, M

C. Schwarke, M. Mittal, N. Rudin, D. Hoeller, and M. Hutter. Rsl-rl: A learning library for robotics research.arXiv preprint arXiv:2509.10771, 2025

arXiv 2025
[51]

in recovery

K. Zakka. mink: Python inverse kinematics based on MuJoCo.https://github.com/ kevinzakka/mink, 2024. 12 A Observations This section enumerates the actor and critic observation groups used by each policy. Asymmetric actor-critic is in force throughout: anything in acritic-onlygroup is privileged information used to fit the value function and is unavailable...

2024

[1] [1]

Z. Gu, J. Li, W. Shen, W. Yu, Z. Xie, S. McCrory, X. Cheng, A. Shamsah, R. Griffin, C. K. Liu, et al. Humanoid locomotion and manipulation: Current progress and challenges in control, planning, and learning.IEEE/ASME Transactions on Mechatronics, 31(2):2300–2330, 2026

2026

[2] [2]

Z. Luo, Y . Yuan, T. Wang, C. Li, S. Chen, F. Castaneda, Z.-A. Cao, J. Li, D. Minor, Q. Ben, et al. Sonic: Supersizing motion tracking for natural humanoid whole-body control.arXiv preprint arXiv:2511.07820, 2025

Pith/arXiv arXiv 2025

[3] [3]

Q. Liao, T. E. Truong, X. Huang, Y . Gao, G. Tevet, K. Sreenath, and C. K. Liu. Beyondmimic: From motion tracking to versatile humanoid control via guided diffusion.arXiv preprint arXiv:2508.08241, 2025

Pith/arXiv arXiv 2025

[4] [4]

Y . Ze, S. Zhao, W. Wang, A. Kanazawa, R. Duan, P. Abbeel, G. Shi, J. Wu, and C. K. Liu. Twist2: Scalable, portable, and holistic humanoid data collection system.arXiv preprint arXiv:2511.02832, 2025

arXiv 2025

[5] [5]

Ichter, A

B. Ichter, A. Brohan, Y . Chebotar, C. Finn, K. Hausman, A. Herzog, D. Ho, J. Ibarz, A. Irpan, E. Jang, et al. Do as i can, not as i say: Grounding language in robotic affordances. In K. Liu, D. Kulic, and J. Ichnowski, editors,Proceedings of The 6th Conference on Robot Learning, volume 205 ofProceedings of Machine Learning Research, pages 287–318. PMLR, 2023

2023

[6] [6]

Driess, F

D. Driess, F. Xia, M. S. M. Sajjadi, C. Lynch, A. Chowdhery, B. Ichter, A. Wahid, J. Tompson, Q. Vuong, T. Yu, et al. PaLM-e: An embodied multimodal language model. In A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato, and J. Scarlett, editors,Proceedings of the 40th International Conference on Machine Learning, volume 202 ofProceedings of Machine...

2023

[7] [7]

Zitkovich, T

B. Zitkovich, T. Yu, S. Xu, P. Xu, T. Xiao, F. Xia, J. Wu, P. Wohlhart, S. Welker, A. Wahid, et al. Rt-2: Vision-language-action models transfer web knowledge to robotic control. In J. Tan, M. Toussaint, and K. Darvish, editors,Proceedings of The 7th Conference on Robot Learning, volume 229 ofProceedings of Machine Learning Research, pages 2165–2183. PMLR, 2023

2023

[8] [8]

M. J. Kim, K. Pertsch, S. Karamcheti, T. Xiao, A. Balakrishna, S. Nair, R. Rafailov, E. P. Foster, P. R. Sanketi, Q. Vuong, et al. Openvla: An open-source vision-language-action model. In P. Agrawal, O. Kroemer, and W. Burgard, editors,Proceedings of The 8th Conference on Robot Learning, volume 270 ofProceedings of Machine Learning Research, pages 2679–27...

2025

[9] [9]

BONES-SEED: Skeletal everyday embodiment dataset.https://bones

Bones Studio. BONES-SEED: Skeletal everyday embodiment dataset.https://bones. studio/datasets/seed, 2026

2026

[10] [10]

L. Yang, B. Werner, M. de Sa, and A. D. Ames. Cbf-rl: Safety filtering reinforcement learning in training with control barrier functions.arXiv preprint arXiv:2510.14959, 2025

Pith/arXiv arXiv 2025

[11] [11]

X. B. Peng, Z. Ma, P. Abbeel, S. Levine, and A. Kanazawa. Amp: Adversarial motion priors for stylized physics-based character control.ACM Transactions on Graphics (ToG), 40(4): 1–20, 2021

2021

[12] [12]

AMP mjlab: G1 AMP motion control on mjlab + rsl rl.https://github.com/ ccrpRepo/AMP_mjlab, 2025

ccrpRepo. AMP mjlab: G1 AMP motion control on mjlab + rsl rl.https://github.com/ ccrpRepo/AMP_mjlab, 2025. 9

2025

[13] [13]

Schulman, F

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017

Pith/arXiv arXiv 2017

[14] [14]

Hinton, O

G. Hinton, O. Vinyals, and J. Dean. Distilling the knowledge in a neural network.stat, 1050: 9, 2015

2015

[15] [15]

Shazeer, A

N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. Le, G. Hinton, and J. Dean. Outra- geously large neural networks: The sparsely-gated mixture-of-experts layer.arXiv preprint arXiv:1701.06538, 2017

Pith/arXiv arXiv 2017

[16] [16]

Q. Ben, F. Jia, J. Zeng, J. Dong, D. Lin, and J. Pang. HOMIE: Humanoid Loco-Manipulation with Isomorphic Exoskeleton Cockpit. InProceedings of Robotics: Science and Systems, LosAngeles, CA, USA, June 2025

2025

[17] [17]

Zhang, Y

Y . Zhang, Y . Yuan, P. Gurunath, I. Gupta, S. Omidshafiei, A.-a. Agha-mohammadi, M. Vazquez-Chanlatte, L. Pedersen, T. He, and G. Shi. Falcon: Learning force-adaptive hu- manoid loco-manipulation.arXiv preprint arXiv:2505.06776, 2025

arXiv 2025

[18] [18]

H. Xue, X. Huang, D. Niu, Q. Liao, T. Kragerud, J. T. Gravdahl, X. B. Peng, G. Shi, T. Dar- rell, K. Sreenath, et al. Leverb: Humanoid whole-body control with latent vision-language instruction.arXiv preprint arXiv:2506.13751, 2025

arXiv 2025

[19] [19]

X. B. Peng, P. Abbeel, S. Levine, and M. Van de Panne. Deepmimic: Example-guided deep re- inforcement learning of physics-based character skills.ACM Transactions On Graphics (TOG), 37(4):1–14, 2018

2018

[20] [20]

Z. Fu, Q. Zhao, Q. Wu, G. Wetzstein, and C. Finn. Humanplus: Humanoid shadowing and imitation from humans. InConference on Robot Learning, pages 2828–2844. PMLR, 2025

2025

[21] [21]

M. Ji, X. Peng, F. Liu, J. Li, G. Yang, X. Cheng, and X. Wang. Exbody2: Advanced expressive humanoid whole-body control.arXiv preprint arXiv:2412.13196, 2024

arXiv 2024

[22] [22]

Z. Chen, M. Ji, X. Cheng, X. Peng, X. B. Peng, and X. Wang. Gmt: General motion tracking for humanoid whole-body control.arXiv preprint arXiv:2506.14770, 2025

arXiv 2025

[23] [23]

T. He, W. Xiao, T. Lin, Z. Luo, Z. Xu, Z. Jiang, J. Kautz, C. Liu, G. Shi, X. Wang, et al. Hover: Versatile neural whole-body controller for humanoid robots. In2025 IEEE International Con- ference on Robotics and Automation (ICRA), pages 9989–9996. IEEE, 2025

2025

[24] [24]

S. Zhao, Y . Ze, Y . Wang, C. K. Liu, P. Abbeel, G. Shi, and R. Duan. Resmimic: From gen- eral motion tracking to humanoid whole-body loco-manipulation via residual learning.arXiv preprint arXiv:2510.05070, 2025

arXiv 2025

[25] [25]

S. Yin, Y . Ze, H.-X. Yu, C. K. Liu, and J. Wu. Visualmimic: Visual humanoid loco- manipulation via motion tracking and generation.arXiv preprint arXiv:2509.20322, 2025

arXiv 2025

[26] [26]

F. Liu, Z. Gu, Y . Cai, Z. Zhou, H. Jung, J. Jang, S. Zhao, S. Ha, Y . Chen, D. Xu, et al. Opt2skill: Imitating dynamically-feasible whole-body trajectories for versatile humanoid loco- manipulation.IEEE Robotics and Automation Letters, 2025

2025

[27] [27]

L. Yang, X. Huang, Z. Wu, A. Kanazawa, P. Abbeel, C. Sferrazza, C. K. Liu, R. Duan, and G. Shi. Omniretarget: Interaction-preserving data generation for humanoid whole-body loco- manipulation and scene interaction.arXiv preprint arXiv:2509.26633, 2025

Pith/arXiv arXiv 2025

[28] [28]

Penco, B

L. Penco, B. Cl ´ement, V . Modugno, E. M. Hoffman, G. Nava, D. Pucci, N. G. Tsagarakis, J.-B. Mouret, and S. Ivaldi. Robust real-time whole-body motion retargeting from human to humanoid. In2018 IEEE-RAS 18th International Conference on Humanoid Robots (Hu- manoids), pages 425–432. IEEE, 2018. 10

2018

[29] [29]

J. P. Araujo, Y . Ze, P. Xu, J. Wu, and C. K. Liu. Retargeting matters: General motion retargeting for humanoid motion tracking.arXiv preprint arXiv:2510.02252, 2025

arXiv 2025

[30] [30]

J. Li, X. Cheng, T. Huang, S. Yang, R.-Z. Qiu, and X. Wang. AMO: Adaptive Motion Opti- mization for Hyper-Dexterous Humanoid Whole-Body Control. InProceedings of Robotics: Science and Systems, LosAngeles, CA, USA, June 2025. doi:10.15607/RSS.2025.XXI.061

work page doi:10.15607/rss.2025.xxi.061 2025

[31] [31]

T. He, Z. Luo, X. He, W. Xiao, C. Zhang, W. Zhang, K. M. Kitani, C. Liu, and G. Shi. Om- nih2o: Universal and dexterous human-to-humanoid whole-body teleoperation and learning. InConference on Robot Learning, pages 1516–1540. PMLR, 2025

2025

[32] [32]

R. Dong, Z. Li, X. He, and S. Gupta. Learning humanoid end-effector control for open- vocabulary visual loco-manipulation.arXiv preprint arXiv:2602.16705, 2026

Pith/arXiv arXiv 2026

[33] [33]

Zhang, C

Z. Zhang, C. Chen, H. Xue, J. Wang, S. Liang, Y . Liu, Z. Zhang, H. Wang, and L. Yi. Un- leashing humanoid reaching potential via real-world-ready skill space.IEEE Robotics and Automation Letters, 11(2):2082–2089, 2025

2082

[34] [34]

Y . Fu, F. Xie, C. Xu, J. Xiong, H. Yuan, and Z. Lu. Demohlm: From one demonstration to generalizable humanoid loco-manipulation.arXiv preprint arXiv:2510.11258, 2025

arXiv 2025

[35] [35]

R. Nai, B. Zheng, J. Zhao, H. Zhu, S. Dai, Z. Chen, Y . Hu, Y . Hu, T. Zhang, C. Wen, et al. Hu- manoid manipulation interface: Humanoid whole-body manipulation from robot-free demon- strations.arXiv preprint arXiv:2602.06643, 2026

arXiv 2026

[36] [36]

Z. Su, B. Zhang, N. Rahmanian, Y . Gao, Q. Liao, C. Regan, K. Sreenath, and S. S. Sastry. Hitter: A humanoid table tennis robot via hierarchical planning and learning.arXiv preprint arXiv:2508.21043, 2025

arXiv 2025

[37] [37]

J. Dao, H. Duan, and A. Fern. Sim-to-real learning for humanoid box loco-manipulation. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 16930– 16936. IEEE, 2024

2024

[38] [38]

Jiang, J

H. Jiang, J. Chen, Q. Bu, L. Chen, M. Shi, Y . Zhang, D. Li, C. Suo, C. Wang, Z. Peng, et al. Wholebodyvla: Towards unified latent vla for whole-body loco-manipulation control.arXiv preprint arXiv:2512.11047, 2025

arXiv 2025

[39] [39]

S. Wei, H. Jing, B. Li, Z. Zhao, J. Mao, Z. Ni, S. He, J. Liu, X. Liu, K. Kang, et al.Ψ 0: An open foundation model towards universal humanoid loco-manipulation.arXiv preprint arXiv:2603.12263, 2026

arXiv 2026

[40] [40]

H. Yuan, Y . Bai, Y . Fu, B. Zhou, Y . Feng, X. Xu, Y . Zhan, B. F. Karlsson, and Z. Lu. Being-0: A humanoid robotic agent with vision-language models and modular skills.arXiv preprint arXiv:2503.12533, 2025

arXiv 2025

[41] [41]

Y . Zhao, X. Wang, D. Wang, X. Liu, D. Lu, Q. Han, P. Liu, and C. Bai. Towards adaptive humanoid control via multi-behavior distillation and reinforced fine-tuning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 18818–18826, 2026

2026

[42] [42]

Y . Wang, M. Yang, G. Ding, Y . Zhang, W. Zeng, X. Xu, H. Jiang, and Z. Lu. From experts to a generalist: Toward general whole-body control for humanoid robots.Advances in Neural Information Processing Systems, 38:147748–147772, 2026

2026

[43] [43]

Q. Peng, Y . Lin, Y . Xue, J. Pang, and W. Zhang. Embodiment-aware generalist specialist distillation for unified humanoid whole-body control.arXiv preprint arXiv:2602.02960, 2026

arXiv 2026

[44] [44]

Z. Wu, X. Huang, L. Yang, Y . Zhang, K. Sreenath, X. Chen, P. Abbeel, R. Duan, A. Kanazawa, C. Sferrazza, et al. Perceptive humanoid parkour: Chaining dynamic human skills via motion matching.arXiv preprint arXiv:2602.15827, 2026. 11

Pith/arXiv arXiv 2026

[45] [45]

J. Li, B. Tang, and F. Wu. Telegate: Whole-body humanoid teleoperation via gated expert selection with motion prior.arXiv preprint arXiv:2602.09628, 2026

Pith/arXiv arXiv 2026

[46] [46]

Tessler, Y

C. Tessler, Y . Guo, O. Nabati, G. Chechik, and X. B. Peng. Maskedmimic: Unified physics- based character control through masked motion inpainting.ACM Transactions On Graphics (TOG), 43(6):1–21, 2024

2024

[47] [47]

Pinto, M

L. Pinto, M. Andrychowicz, P. Welinder, W. Zaremba, and P. Abbeel. Asymmetric actor critic for image-based robot learning.Robotics: Science and Systems XIV, 2018

2018

[48] [48]

Poddar, S

N. Poddar, S. McCrory, L. Penco, G. Clark, H. E. Svil, and R. Griffin. Embedding classical balance control principles in reinforcement learning for humanoid recovery.arXiv preprint arXiv:2603.08619, 2026

arXiv 2026

[49] [49]

Zakka, Q

K. Zakka, Q. Liao, B. Yi, L. L. Lay, K. Sreenath, and P. Abbeel. mjlab: A lightweight frame- work for gpu-accelerated robot learning.arXiv preprint arXiv:2601.22074, 2026

arXiv 2026

[50] [50]

Schwarke, M

C. Schwarke, M. Mittal, N. Rudin, D. Hoeller, and M. Hutter. Rsl-rl: A learning library for robotics research.arXiv preprint arXiv:2509.10771, 2025

arXiv 2025

[51] [51]

in recovery

K. Zakka. mink: Python inverse kinematics based on MuJoCo.https://github.com/ kevinzakka/mink, 2024. 12 A Observations This section enumerates the actor and critic observation groups used by each policy. Asymmetric actor-critic is in force throughout: anything in acritic-onlygroup is privileged information used to fit the value function and is unavailable...

2024