pith. sign in

arxiv: 2606.06493 · v3 · pith:JWJLYJ4Fnew · submitted 2026-06-04 · 💻 cs.RO · cs.AI· cs.LG

HANDOFF: Humanoid Agentic Task-Space Whole-Body Control via Distilled Complementary Teachers

Pith reviewed 2026-06-28 01:09 UTC · model grok-4.3

classification 💻 cs.RO cs.AIcs.LG
keywords humanoid robotwhole-body controlknowledge distillationlocomotionmanipulationmixture of expertsagentic plannertask-space control
0
0 comments X

The pith

A single distilled controller lets humanoids perform diverse loco-manipulation tasks from natural language without task-specific fine-tuning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents HANDOFF as a whole-body controller for humanoids that accepts a compact task-space command interface and is created by distilling three specialist teachers into one mixture-of-experts student. The teachers cover whole-body motion tracking with safety filtering, locomotion, and fall recovery; a context-conditioned gating scheme combines their behaviors during distillation via KL divergence. A sympathetic reader would care because this setup aims to let high-level planners, including vision-language models, generate commands directly from semantics and produce working hardware behaviors on the Unitree G1 across multiple skills.

Core claim

HANDOFF is a single humanoid whole-body controller distilled via multi-teacher KL distillation under a context-conditioned gating scheme into a mixture-of-experts student from three complementary specialists: whole-body motion tracking with safety-filtered data, locomotion, and fall-recovery. On the Unitree G1 it matches state-of-the-art velocity tracking and offers one of the largest robust manipulation workspaces while supporting multiple natural-language-driven task roll-outs powered by a VLM-driven agentic planner with no task-specific data or controller fine-tuning.

What carries the argument

The context-conditioned gating scheme that routes among the distilled behaviors of the three specialist teachers inside the mixture-of-experts student policy.

If this is right

  • The controller matches state-of-the-art performance on velocity tracking.
  • It provides one of the largest robust manipulation workspaces demonstrated on the Unitree G1.
  • It supports multiple natural-language-driven task executions through a VLM agentic planner.
  • No task-specific data collection or controller fine-tuning is required for new behaviors.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same distillation approach might scale to additional teachers covering skills such as precise object placement or dynamic balance recovery.
  • The compact command interface could allow planners other than VLMs to generate whole-body references more easily than dense kinematic trajectories.
  • Hardware success on the G1 suggests the method could transfer to other humanoid platforms that share similar actuation and sensing.

Load-bearing premise

The three specialist teachers are complementary enough that distilling them under the gating scheme yields one generalist controller able to handle combined loco-manipulation skills without any task-specific data or fine-tuning.

What would settle it

Hardware trials in which the distilled controller cannot execute a combined locomotion-plus-manipulation sequence that none of the individual teachers could produce on its own, even when the gating network is active.

Figures

Figures reproduced from arXiv: 2606.06493 by Aaron Ames, Georgia Gkioxari, Gio Huh, Junheng Li, Lizhi Yang, Nehar Poddar, Robert Griffin, Yiling Hou.

Figure 1
Figure 1. Figure 1: HANDOFF is a whole-body controller distilled from multiple teachers that accepts a compact, explicit 10-D planner-facing command. We demonstrate its effectiveness using a VLM￾powered agentic planner that does not need extensive demonstration collection or model fine-tuning. Abstract: For a humanoid robot to be deployed in the real world, the choice of command space (i.e., the interface between task plannin… view at source ↗
Figure 3
Figure 3. Figure 3: System overview. We train 3 teachers separately: a 29-DoF WBC motion-tracking teacher on CoP￾filtered retargeted clips, a 15-DoF body-slice locomotion teacher on flat terrain with curriculum-blended arm perturbations, and a 29-DoF fall-recovery teacher on locomotion + paired fall-recovery clips. The MoE student maps a 10-D command and 11-frame proprioception history to 29-DoF actions. Context-based action-… view at source ↗
Figure 2
Figure 2. Figure 2: CoP filtering. The raw retar￾geted motion dataset contains dynamically infeasible frames, which we correct with a closed-form CBF projection on the static￾CoP margin before training. An example is shown here where the corrected reference (left) stays within the support polygon and the unfiltered version (right) drifts. We first describe the three teachers that supply comple￾mentary supervision, then the st… view at source ↗
Figure 4
Figure 4. Figure 4: Agentic deployment pipeline. A natural-language instruction (0.001 Hz) is decomposed into atomic tasks by a high-level reasoner (regex + LLM fallback). A VLM (0.1 Hz) projects 2D detections onto the RGB-D point cloud to emit pelvis-frame waypoints; a waypoint tracker produces (vx, vy, ωz), and near the target a skill selector emits root height z and bilateral wrist targets p P R/L (1–50 Hz) with a kinemati… view at source ↗
Figure 5
Figure 5. Figure 5: Agentic deployment snapshots with VLM text prompt. The same 10-D controller, driven by the agentic planner, executing a range of loco-manipulation tasks on the Unitree G1 hardware and in simu￾lation: pick-and-place, pick-transport-place, squat-pick, bimanual-pick-and-hand-off, bilateral pick-and-place, and task continuation after fall recovery. No controller-side change, data collection or model fine-tunin… view at source ↗
Figure 6
Figure 6. Figure 6: Ablation progression: per-axis realized-versus-commanded velocity sweep across [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: SOTA comparison: per-axis realized-versus-commanded velocity sweep across [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Bimanual wrist-workspace hulls in three orthogonal pelvis-frame views, forward half [PITH_FULL_IMAGE:figures/full_fig_p021_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Real-robot deployment platform. (a) Unitree G1 with bilateral Dex1-1 grippers and a head-mounted ZED-M stereo RGB-D camera. (b) Close-up of one Dex1-1 gripper, replacing the stock 3-finger hand. (c) Head-mounted ZED-M stereo RGB-D camera providing the RGB and depth frames consumed by the VLM. (d) Back-mounted Nvidia Jetson Thor and 140 W USB-PD power￾bank that together drive the onboard RL controller, the … view at source ↗
read the original abstract

For a humanoid robot to be deployed in the real world, the choice of command space (i.e., the interface between task planning and whole-body control) is crucial. Existing whole-body controllers typically demand dense kinematic or spatial references that planners struggle to synthesize from task semantics. We instead propose a compact, explicit interface that is intuitive, general, modular, and expressive enough for diverse loco-manipulation skills. To this end, we introduce HANDOFF, a single humanoid whole-body controller that follows this interface and is distilled via multi-teacher KL distillation under a context-conditioned gating scheme into a mixture-of-experts student from three complementary specialists: whole-body motion tracking with safety-filtered data, locomotion, and fall-recovery. On the Unitree G1, HANDOFF matches state-of-the-art velocity tracking and offers one of the largest robust manipulation workspaces. We further demonstrate hardware feasibility through multiple natural-language-driven task roll-outs, powered by a VLM-driven agentic planner with no task-specific data or controller fine-tuning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper introduces HANDOFF, a single humanoid whole-body controller for the Unitree G1 that follows a compact task-space interface and is obtained by distilling three complementary specialist teachers (whole-body motion tracking with safety-filtered data, locomotion, and fall-recovery) via multi-teacher KL distillation under a context-conditioned gating scheme into a mixture-of-experts student policy. The central claims are that this yields state-of-the-art velocity tracking, one of the largest robust manipulation workspaces, and successful hardware roll-outs of diverse natural-language-driven loco-manipulation tasks powered by a VLM-based agentic planner, all without task-specific data collection or controller fine-tuning.

Significance. If the hardware results and generality claims hold, the work would be significant for humanoid robotics by providing an intuitive, modular command interface that bridges high-level task planners (including VLMs) with low-level whole-body control, thereby reducing reliance on dense kinematic references or per-task retraining. The distillation approach from complementary teachers and the reported absence of task-specific fine-tuning are notable strengths that could enable more scalable agentic deployment on physical platforms.

minor comments (2)
  1. [Abstract] The abstract states that HANDOFF 'matches state-of-the-art velocity tracking' and offers 'one of the largest robust manipulation workspaces' but provides no numerical values, baselines, or error metrics; adding these (even in summary form) would strengthen the presentation of the empirical claims.
  2. The description of the context-conditioned gating scheme and the mixture-of-experts student would benefit from an explicit diagram or pseudocode in the methods section to clarify how the three teachers are combined at inference time.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their review. The report accurately summarizes the HANDOFF controller, its distillation method, and the hardware results on the Unitree G1. We appreciate the positive note on significance if the claims hold, and the recognition of the distillation from complementary teachers and lack of task-specific fine-tuning as strengths. No major comments were listed under the MAJOR COMMENTS section.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper describes an empirical method for distilling a generalist whole-body controller from three specialist teachers (motion tracking, locomotion, fall-recovery) via KL distillation and context-conditioned gating, with claims validated through hardware experiments on the Unitree G1. No mathematical derivations, equations, or parameter-fitting steps are presented that reduce any prediction or result to its inputs by construction. The approach relies on standard distillation techniques and external VLM planning without self-referential definitions or load-bearing self-citations that collapse the central claim. The derivation chain is self-contained against empirical benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the central claim rests on unstated assumptions about teacher complementarity and distillation effectiveness.

pith-pipeline@v0.9.1-grok · 5735 in / 1210 out tokens · 36023 ms · 2026-06-28T01:09:34.115768+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

51 extracted references · 1 canonical work pages

  1. [1]

    Z. Gu, J. Li, W. Shen, W. Yu, Z. Xie, S. McCrory, X. Cheng, A. Shamsah, R. Griffin, C. K. Liu, et al. Humanoid locomotion and manipulation: Current progress and challenges in control, planning, and learning.IEEE/ASME Transactions on Mechatronics, 31(2):2300–2330, 2026

  2. [2]

    Z. Luo, Y . Yuan, T. Wang, C. Li, S. Chen, F. Castaneda, Z.-A. Cao, J. Li, D. Minor, Q. Ben, et al. Sonic: Supersizing motion tracking for natural humanoid whole-body control.arXiv preprint arXiv:2511.07820, 2025

  3. [3]

    Q. Liao, T. E. Truong, X. Huang, Y . Gao, G. Tevet, K. Sreenath, and C. K. Liu. Beyondmimic: From motion tracking to versatile humanoid control via guided diffusion.arXiv preprint arXiv:2508.08241, 2025

  4. [4]

    Y . Ze, S. Zhao, W. Wang, A. Kanazawa, R. Duan, P. Abbeel, G. Shi, J. Wu, and C. K. Liu. Twist2: Scalable, portable, and holistic humanoid data collection system.arXiv preprint arXiv:2511.02832, 2025

  5. [5]

    Ichter, A

    B. Ichter, A. Brohan, Y . Chebotar, C. Finn, K. Hausman, A. Herzog, D. Ho, J. Ibarz, A. Irpan, E. Jang, et al. Do as i can, not as i say: Grounding language in robotic affordances. In K. Liu, D. Kulic, and J. Ichnowski, editors,Proceedings of The 6th Conference on Robot Learning, volume 205 ofProceedings of Machine Learning Research, pages 287–318. PMLR, 2023

  6. [6]

    Driess, F

    D. Driess, F. Xia, M. S. M. Sajjadi, C. Lynch, A. Chowdhery, B. Ichter, A. Wahid, J. Tompson, Q. Vuong, T. Yu, et al. PaLM-e: An embodied multimodal language model. In A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato, and J. Scarlett, editors,Proceedings of the 40th International Conference on Machine Learning, volume 202 ofProceedings of Machine...

  7. [7]

    Zitkovich, T

    B. Zitkovich, T. Yu, S. Xu, P. Xu, T. Xiao, F. Xia, J. Wu, P. Wohlhart, S. Welker, A. Wahid, et al. Rt-2: Vision-language-action models transfer web knowledge to robotic control. In J. Tan, M. Toussaint, and K. Darvish, editors,Proceedings of The 7th Conference on Robot Learning, volume 229 ofProceedings of Machine Learning Research, pages 2165–2183. PMLR, 2023

  8. [8]

    M. J. Kim, K. Pertsch, S. Karamcheti, T. Xiao, A. Balakrishna, S. Nair, R. Rafailov, E. P. Foster, P. R. Sanketi, Q. Vuong, et al. Openvla: An open-source vision-language-action model. In P. Agrawal, O. Kroemer, and W. Burgard, editors,Proceedings of The 8th Conference on Robot Learning, volume 270 ofProceedings of Machine Learning Research, pages 2679–27...

  9. [9]

    BONES-SEED: Skeletal everyday embodiment dataset.https://bones

    Bones Studio. BONES-SEED: Skeletal everyday embodiment dataset.https://bones. studio/datasets/seed, 2026

  10. [10]

    L. Yang, B. Werner, M. de Sa, and A. D. Ames. Cbf-rl: Safety filtering reinforcement learning in training with control barrier functions.arXiv preprint arXiv:2510.14959, 2025

  11. [11]

    X. B. Peng, Z. Ma, P. Abbeel, S. Levine, and A. Kanazawa. Amp: Adversarial motion priors for stylized physics-based character control.ACM Transactions on Graphics (ToG), 40(4): 1–20, 2021

  12. [12]

    AMP mjlab: G1 AMP motion control on mjlab + rsl rl.https://github.com/ ccrpRepo/AMP_mjlab, 2025

    ccrpRepo. AMP mjlab: G1 AMP motion control on mjlab + rsl rl.https://github.com/ ccrpRepo/AMP_mjlab, 2025. 9

  13. [13]

    Schulman, F

    J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017

  14. [14]

    Hinton, O

    G. Hinton, O. Vinyals, and J. Dean. Distilling the knowledge in a neural network.stat, 1050: 9, 2015

  15. [15]

    Shazeer, A

    N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. Le, G. Hinton, and J. Dean. Outra- geously large neural networks: The sparsely-gated mixture-of-experts layer.arXiv preprint arXiv:1701.06538, 2017

  16. [16]

    Q. Ben, F. Jia, J. Zeng, J. Dong, D. Lin, and J. Pang. HOMIE: Humanoid Loco-Manipulation with Isomorphic Exoskeleton Cockpit. InProceedings of Robotics: Science and Systems, LosAngeles, CA, USA, June 2025

  17. [17]

    Zhang, Y

    Y . Zhang, Y . Yuan, P. Gurunath, I. Gupta, S. Omidshafiei, A.-a. Agha-mohammadi, M. Vazquez-Chanlatte, L. Pedersen, T. He, and G. Shi. Falcon: Learning force-adaptive hu- manoid loco-manipulation.arXiv preprint arXiv:2505.06776, 2025

  18. [18]

    H. Xue, X. Huang, D. Niu, Q. Liao, T. Kragerud, J. T. Gravdahl, X. B. Peng, G. Shi, T. Dar- rell, K. Sreenath, et al. Leverb: Humanoid whole-body control with latent vision-language instruction.arXiv preprint arXiv:2506.13751, 2025

  19. [19]

    X. B. Peng, P. Abbeel, S. Levine, and M. Van de Panne. Deepmimic: Example-guided deep re- inforcement learning of physics-based character skills.ACM Transactions On Graphics (TOG), 37(4):1–14, 2018

  20. [20]

    Z. Fu, Q. Zhao, Q. Wu, G. Wetzstein, and C. Finn. Humanplus: Humanoid shadowing and imitation from humans. InConference on Robot Learning, pages 2828–2844. PMLR, 2025

  21. [21]

    M. Ji, X. Peng, F. Liu, J. Li, G. Yang, X. Cheng, and X. Wang. Exbody2: Advanced expressive humanoid whole-body control.arXiv preprint arXiv:2412.13196, 2024

  22. [22]

    Z. Chen, M. Ji, X. Cheng, X. Peng, X. B. Peng, and X. Wang. Gmt: General motion tracking for humanoid whole-body control.arXiv preprint arXiv:2506.14770, 2025

  23. [23]

    T. He, W. Xiao, T. Lin, Z. Luo, Z. Xu, Z. Jiang, J. Kautz, C. Liu, G. Shi, X. Wang, et al. Hover: Versatile neural whole-body controller for humanoid robots. In2025 IEEE International Con- ference on Robotics and Automation (ICRA), pages 9989–9996. IEEE, 2025

  24. [24]

    S. Zhao, Y . Ze, Y . Wang, C. K. Liu, P. Abbeel, G. Shi, and R. Duan. Resmimic: From gen- eral motion tracking to humanoid whole-body loco-manipulation via residual learning.arXiv preprint arXiv:2510.05070, 2025

  25. [25]

    S. Yin, Y . Ze, H.-X. Yu, C. K. Liu, and J. Wu. Visualmimic: Visual humanoid loco- manipulation via motion tracking and generation.arXiv preprint arXiv:2509.20322, 2025

  26. [26]

    F. Liu, Z. Gu, Y . Cai, Z. Zhou, H. Jung, J. Jang, S. Zhao, S. Ha, Y . Chen, D. Xu, et al. Opt2skill: Imitating dynamically-feasible whole-body trajectories for versatile humanoid loco- manipulation.IEEE Robotics and Automation Letters, 2025

  27. [27]

    L. Yang, X. Huang, Z. Wu, A. Kanazawa, P. Abbeel, C. Sferrazza, C. K. Liu, R. Duan, and G. Shi. Omniretarget: Interaction-preserving data generation for humanoid whole-body loco- manipulation and scene interaction.arXiv preprint arXiv:2509.26633, 2025

  28. [28]

    Penco, B

    L. Penco, B. Cl ´ement, V . Modugno, E. M. Hoffman, G. Nava, D. Pucci, N. G. Tsagarakis, J.-B. Mouret, and S. Ivaldi. Robust real-time whole-body motion retargeting from human to humanoid. In2018 IEEE-RAS 18th International Conference on Humanoid Robots (Hu- manoids), pages 425–432. IEEE, 2018. 10

  29. [29]

    J. P. Araujo, Y . Ze, P. Xu, J. Wu, and C. K. Liu. Retargeting matters: General motion retargeting for humanoid motion tracking.arXiv preprint arXiv:2510.02252, 2025

  30. [30]

    J. Li, X. Cheng, T. Huang, S. Yang, R.-Z. Qiu, and X. Wang. AMO: Adaptive Motion Opti- mization for Hyper-Dexterous Humanoid Whole-Body Control. InProceedings of Robotics: Science and Systems, LosAngeles, CA, USA, June 2025. doi:10.15607/RSS.2025.XXI.061

  31. [31]

    T. He, Z. Luo, X. He, W. Xiao, C. Zhang, W. Zhang, K. M. Kitani, C. Liu, and G. Shi. Om- nih2o: Universal and dexterous human-to-humanoid whole-body teleoperation and learning. InConference on Robot Learning, pages 1516–1540. PMLR, 2025

  32. [32]

    R. Dong, Z. Li, X. He, and S. Gupta. Learning humanoid end-effector control for open- vocabulary visual loco-manipulation.arXiv preprint arXiv:2602.16705, 2026

  33. [33]

    Zhang, C

    Z. Zhang, C. Chen, H. Xue, J. Wang, S. Liang, Y . Liu, Z. Zhang, H. Wang, and L. Yi. Un- leashing humanoid reaching potential via real-world-ready skill space.IEEE Robotics and Automation Letters, 11(2):2082–2089, 2025

  34. [34]

    Y . Fu, F. Xie, C. Xu, J. Xiong, H. Yuan, and Z. Lu. Demohlm: From one demonstration to generalizable humanoid loco-manipulation.arXiv preprint arXiv:2510.11258, 2025

  35. [35]

    R. Nai, B. Zheng, J. Zhao, H. Zhu, S. Dai, Z. Chen, Y . Hu, Y . Hu, T. Zhang, C. Wen, et al. Hu- manoid manipulation interface: Humanoid whole-body manipulation from robot-free demon- strations.arXiv preprint arXiv:2602.06643, 2026

  36. [36]

    Z. Su, B. Zhang, N. Rahmanian, Y . Gao, Q. Liao, C. Regan, K. Sreenath, and S. S. Sastry. Hitter: A humanoid table tennis robot via hierarchical planning and learning.arXiv preprint arXiv:2508.21043, 2025

  37. [37]

    J. Dao, H. Duan, and A. Fern. Sim-to-real learning for humanoid box loco-manipulation. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 16930– 16936. IEEE, 2024

  38. [38]

    Jiang, J

    H. Jiang, J. Chen, Q. Bu, L. Chen, M. Shi, Y . Zhang, D. Li, C. Suo, C. Wang, Z. Peng, et al. Wholebodyvla: Towards unified latent vla for whole-body loco-manipulation control.arXiv preprint arXiv:2512.11047, 2025

  39. [39]

    S. Wei, H. Jing, B. Li, Z. Zhao, J. Mao, Z. Ni, S. He, J. Liu, X. Liu, K. Kang, et al.Ψ 0: An open foundation model towards universal humanoid loco-manipulation.arXiv preprint arXiv:2603.12263, 2026

  40. [40]

    H. Yuan, Y . Bai, Y . Fu, B. Zhou, Y . Feng, X. Xu, Y . Zhan, B. F. Karlsson, and Z. Lu. Being-0: A humanoid robotic agent with vision-language models and modular skills.arXiv preprint arXiv:2503.12533, 2025

  41. [41]

    Y . Zhao, X. Wang, D. Wang, X. Liu, D. Lu, Q. Han, P. Liu, and C. Bai. Towards adaptive humanoid control via multi-behavior distillation and reinforced fine-tuning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 18818–18826, 2026

  42. [42]

    Y . Wang, M. Yang, G. Ding, Y . Zhang, W. Zeng, X. Xu, H. Jiang, and Z. Lu. From experts to a generalist: Toward general whole-body control for humanoid robots.Advances in Neural Information Processing Systems, 38:147748–147772, 2026

  43. [43]

    Q. Peng, Y . Lin, Y . Xue, J. Pang, and W. Zhang. Embodiment-aware generalist specialist distillation for unified humanoid whole-body control.arXiv preprint arXiv:2602.02960, 2026

  44. [44]

    Z. Wu, X. Huang, L. Yang, Y . Zhang, K. Sreenath, X. Chen, P. Abbeel, R. Duan, A. Kanazawa, C. Sferrazza, et al. Perceptive humanoid parkour: Chaining dynamic human skills via motion matching.arXiv preprint arXiv:2602.15827, 2026. 11

  45. [45]

    J. Li, B. Tang, and F. Wu. Telegate: Whole-body humanoid teleoperation via gated expert selection with motion prior.arXiv preprint arXiv:2602.09628, 2026

  46. [46]

    Tessler, Y

    C. Tessler, Y . Guo, O. Nabati, G. Chechik, and X. B. Peng. Maskedmimic: Unified physics- based character control through masked motion inpainting.ACM Transactions On Graphics (TOG), 43(6):1–21, 2024

  47. [47]

    Pinto, M

    L. Pinto, M. Andrychowicz, P. Welinder, W. Zaremba, and P. Abbeel. Asymmetric actor critic for image-based robot learning.Robotics: Science and Systems XIV, 2018

  48. [48]

    Poddar, S

    N. Poddar, S. McCrory, L. Penco, G. Clark, H. E. Svil, and R. Griffin. Embedding classical balance control principles in reinforcement learning for humanoid recovery.arXiv preprint arXiv:2603.08619, 2026

  49. [49]

    Zakka, Q

    K. Zakka, Q. Liao, B. Yi, L. L. Lay, K. Sreenath, and P. Abbeel. mjlab: A lightweight frame- work for gpu-accelerated robot learning.arXiv preprint arXiv:2601.22074, 2026

  50. [50]

    Schwarke, M

    C. Schwarke, M. Mittal, N. Rudin, D. Hoeller, and M. Hutter. Rsl-rl: A learning library for robotics research.arXiv preprint arXiv:2509.10771, 2025

  51. [51]

    in recovery

    K. Zakka. mink: Python inverse kinematics based on MuJoCo.https://github.com/ kevinzakka/mink, 2024. 12 A Observations This section enumerates the actor and critic observation groups used by each policy. Asymmetric actor-critic is in force throughout: anything in acritic-onlygroup is privileged information used to fit the value function and is unavailable...