CEER: Compliant End-Effector and Root Control as a Unified Interface for Hierarchical Humanoid Loco-Manipulation

Boxi Xia; Boyuan Chen; Hongxuan Wu; Jinzhou Li; Xianyi Cheng; Xingrui Chen; Xinyuan Luo; Xunjian Yin; Zhuoqun Chen

arxiv: 2605.19981 · v1 · pith:XIZ3N7O5new · submitted 2026-05-19 · 💻 cs.RO

CEER: Compliant End-Effector and Root Control as a Unified Interface for Hierarchical Humanoid Loco-Manipulation

Xinyuan Luo , Xingrui Chen , Xunjian Yin , Hongxuan Wu , Boxi Xia , Zhuoqun Chen , Jinzhou Li , Boyuan Chen

show 1 more author

Xianyi Cheng

This is my paper

Pith reviewed 2026-05-20 04:54 UTC · model grok-4.3

classification 💻 cs.RO

keywords humanoid roboticsloco-manipulationcompliant controlhierarchical planningend-effector controlwhole-body controlteacher-student distillationmodular interfaces

0 comments

The pith

Compliant end-effector and root control provides a unified interface for humanoid loco-manipulation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that compliant end-effector and root commands can serve as a single abstraction layer for humanoid robots to handle both walking and contact-rich manipulation together. A teacher-student distillation converts a full whole-body controller into a low-level policy that accepts only these high-level commands while keeping the original compliance. This setup lets different planners and task modules plug in without retraining the core controller. A sympathetic reader would care because the method points toward more scalable humanoid systems that combine many skills over long horizons instead of building separate controllers for each new task.

Core claim

The authors show that distilling a general motion-tracking controller via a teacher-student framework produces a low-level policy driven only by end-effector pose targets and root motion commands; this policy retains compliance-aware whole-body behavior and supports a hierarchical framework in which heterogeneous planners integrate through the same EE-root interface for diverse loco-manipulation tasks.

What carries the argument

The CEER abstraction, which defines an interpretable task space from root motion commands and end-effector pose targets to drive compliance-aware whole-body control.

If this is right

Heterogeneous planners and task modules can integrate through the EE-root interface without retraining the underlying whole-body policy.
End-effector tracking reaches 3.3 cm accuracy with substantially lower jerk than baseline methods.
Stable contact-rich manipulation is maintained under teleoperation on hardware.
Success rates reach up to 70 percent in simulated single-object loco-manipulation tasks inside room-scale environments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same EE-root interface could simplify transferring loco-manipulation skills across different humanoid platforms if the command space is treated as a standard.
Pairing the interface with vision-based high-level planners might enable closed-loop autonomous operation without teleoperation.
Scaling the hierarchy to multi-step or multi-object tasks would test whether the abstraction remains stable over longer horizons.

Load-bearing premise

The distilled low-level policy preserves the compliance and stability of the original whole-body controller when it receives only end-effector and root commands, especially during contact-rich tasks on physical hardware.

What would settle it

Hardware trials that show instability, loss of compliance, or markedly higher jerk and tracking error than the reported 3.3 cm during contact-rich manipulation would falsify the claim that the EE-root policy works as a reliable abstraction.

Figures

Figures reproduced from arXiv: 2605.19981 by Boxi Xia, Boyuan Chen, Hongxuan Wu, Jinzhou Li, Xianyi Cheng, Xingrui Chen, Xinyuan Luo, Xunjian Yin, Zhuoqun Chen.

**Figure 2.** Figure 2: Overview of the proposed three-layer hierarchical system. At the high level, a language instruction is interpreted by an LLM-based [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Network architecture. We introduce an additional target joint [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Real-world tele-operation evaluation across task categories. [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: End-effector trjectory generated by diffusion policy (left) [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Screenshots of spacial relation and long-horizon tasks in [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: Failure analysis of robot and human. that ensures higher consistency, potentially explaining its superior grasping performance relative to human operators. However, in long-horizon tasks, humans demonstrate stronger recovery ability, compensating through more timely and reactive adjustment despite longer execution time. Overall, CEER enables reliable execution for moderatecomplexity tasks, while long-hor… view at source ↗

read the original abstract

Humanoid robots have achieved impressive locomotion performance, yet contact-rich and long-horizon manipulation remains a major bottleneck. Manipulation is inherently contact-rich and demands compliant whole-body control for stable interaction, while its diversity and long-horizon nature favor modular, planner-compatible interfaces over joint-space tracking. We propose CEER, a compliant end-effector-root (EE-root) control abstraction for modular humanoid loco-manipulation within a hierarchical planning framework. CEER enables compliance-aware whole-body control in an interpretable task space defined by root motion commands and end-effector pose targets, and supports plug-and-play integration with heterogeneous high-level planners. A teacher-student framework is adopted to distill a general motion-tracking controller into a low-level policy that consumes only EE-root commands. We further construct a hierarchical system that integrates heterogeneous planners and task modules through the EE-root interface, enabling diverse manipulation tasks without retraining the underlying whole-body policy. Experiments in simulation and on hardware demonstrate 3.3 cm end-effector tracking accuracy with substantially reduced jerk compared to baselines, stable contact-rich manipulation under teleoperation, and up to 70% success in simulated single-object loco-manipulation tasks within a room-scale environment. These results indicate that compliant EE-root control provides a practical abstraction for humanoid loco-manipulation, enabling modular and scalable integration of diverse skills.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CEER gives a workable EE-root task-space interface for mixing planners with compliant humanoid control via distillation, but the hardware compliance claim rests on indirect metrics.

read the letter

Colleague, the core idea here is turning whole-body compliant control into a simple plug-in interface of end-effector poses plus root commands, so different high-level planners can drive the robot without retraining the low-level policy. They distill a teacher motion tracker into a student that only sees those EE-root targets, then wire the result into a hierarchical stack for tasks like teleop manipulation and room-scale loco-manipulation. The reported outcomes—3.3 cm tracking accuracy, lower jerk than baselines, and 70% success in simulation—plus stable hardware contact under teleoperation show the interface can actually be used. That combination of abstraction, distillation, and plug-and-play integration is the concrete step forward from separate locomotion or manipulation controllers. The experiments give enough evidence that the approach is worth trying in other humanoid setups. The soft spot is whether the student policy keeps the teacher’s compliance and stability once it loses direct joint and contact feedback. Position accuracy and jerk help, but they do not directly test force behavior or recovery from unexpected contacts on hardware; if the distilled policy turns out stiffer, the modular advantage weakens exactly where contact-rich work matters most. The abstract also skips baseline implementation details and any statistical checks, so those need to be clear in the full methods. This paper is for people building hierarchical humanoid systems who want a reusable control layer between planning and whole-body execution. A reader working on similar modular control would get usable implementation pointers. It has enough experimental grounding and addresses a practical bottleneck to deserve peer review, though referees should press on the compliance preservation and baseline comparisons. I would send it forward.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes CEER, a compliant end-effector and root (EE-root) control abstraction as a unified interface for hierarchical humanoid loco-manipulation. It distills a whole-body teacher controller into a low-level policy that accepts only EE-root commands, enabling plug-and-play integration with heterogeneous high-level planners for diverse tasks. Simulation and hardware results report 3.3 cm end-effector tracking accuracy, reduced jerk relative to baselines, stable contact-rich teleoperated manipulation, and up to 70% success in room-scale single-object loco-manipulation.

Significance. If the distillation successfully transfers compliance and stability, CEER could offer a practical, interpretable task-space layer that decouples high-level planning from low-level whole-body control, supporting modular and scalable humanoid systems without per-task retraining. The hierarchical integration of diverse planners is a notable strength if the closed-loop behavior under contact is validated.

major comments (2)

[Section 4] Section 4 (teacher-student distillation): The central claim requires that the low-level policy, driven only by EE-root commands, reproduces the compliance, contact-force behavior, and root stability of the original whole-body controller. The reported 3.3 cm tracking accuracy and jerk reduction are necessary but do not directly measure force compliance, contact-force variance, or recovery from unexpected collisions; additional hardware metrics (e.g., force/torque residuals or perturbation recovery) are needed to confirm preservation for contact-rich tasks.
[Experiments] Experiments and abstract: The 70% success rate and accuracy figures are presented without trial counts, error bars, statistical tests, or explicit baseline implementations. This omission prevents independent assessment of robustness and undermines the quantitative support for the modular abstraction claim.

minor comments (2)

The abstract states 'substantially reduced jerk compared to baselines' without naming the baselines or providing quantitative deltas; clarify these in the results section.
Notation for the EE-root interface (root motion commands and end-effector pose targets) could be formalized earlier with a clear diagram of the command space.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We address each major comment below and indicate the specific revisions planned for the next version of the manuscript.

read point-by-point responses

Referee: [Section 4] Section 4 (teacher-student distillation): The central claim requires that the low-level policy, driven only by EE-root commands, reproduces the compliance, contact-force behavior, and root stability of the original whole-body controller. The reported 3.3 cm tracking accuracy and jerk reduction are necessary but do not directly measure force compliance, contact-force variance, or recovery from unexpected collisions; additional hardware metrics (e.g., force/torque residuals or perturbation recovery) are needed to confirm preservation for contact-rich tasks.

Authors: We agree that direct measurements of force compliance, contact-force variance, and perturbation recovery would strengthen the evidence that the distilled policy preserves the teacher's behavior. The manuscript currently supports this indirectly through hardware demonstrations of stable contact-rich teleoperation and reduced jerk. In the revision we will add analysis of force/torque residuals measured during the existing hardware contact tasks and include simulation results showing recovery from unexpected external forces while tracking EE-root commands. We note that new hardware perturbation experiments were not performed for this study. revision: partial
Referee: [Experiments] Experiments and abstract: The 70% success rate and accuracy figures are presented without trial counts, error bars, statistical tests, or explicit baseline implementations. This omission prevents independent assessment of robustness and undermines the quantitative support for the modular abstraction claim.

Authors: We acknowledge that the absence of trial counts, error bars, statistical tests, and explicit baseline details limits the ability to assess robustness. In the revised manuscript we will update the Experiments section and abstract to report the exact number of trials for each metric, include error bars or standard deviations on the 3.3 cm accuracy and success-rate figures, describe the statistical tests applied, and provide clear specifications of all baseline controllers and their implementations. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on experimental validation of the EE-root interface rather than self-referential definitions or fitted predictions

full rationale

The paper introduces CEER as a task-space abstraction for humanoid control, distills a whole-body teacher into an EE-root student policy, and validates the approach via simulation and hardware experiments reporting 3.3 cm tracking accuracy, reduced jerk, and task success rates. No equations, parameter fits, or definitions are shown to reduce to their own inputs by construction. The teacher-student step is presented as a training procedure whose closed-loop properties are assessed empirically rather than assumed tautologically. No load-bearing self-citations or uniqueness theorems imported from prior author work are invoked to force the central result. The derivation chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The approach relies on the domain assumption that root and end-effector commands suffice to encode compliant whole-body behavior; no free parameters or invented physical entities are explicitly introduced in the abstract.

axioms (1)

domain assumption End-effector pose targets and root motion commands are sufficient to produce stable, compliant whole-body control for contact-rich tasks.
This premise underpins the design of the EE-root interface and the claim that the distilled policy can replace the full teacher controller.

invented entities (1)

CEER interface no independent evidence
purpose: Unified task-space abstraction for modular loco-manipulation
New control abstraction introduced by the paper; no independent evidence outside this work is provided in the abstract.

pith-pipeline@v0.9.0 · 5805 in / 1341 out tokens · 35093 ms · 2026-05-20T04:54:57.468213+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages · 4 internal anchors

[1]

Deepmimic: example-guided deep reinforcement learning of physics- based character skills,

X. B. Peng, P. Abbeel, S. Levine, and M. van de Panne, “Deepmimic: example-guided deep reinforcement learning of physics- based character skills,”ACM Transactions on Graphics, vol. 37, no. 4, p. 1–14, July 2018. [Online]. Available: http://dx.doi.org/10. 1145/3197517.3201311

work page arXiv 2018
[2]

BeyondMimic: From Motion Tracking to Versatile Humanoid Control via Guided Diffusion

Q. Liao, T. E. Truong, X. Huang, Y . Gao, G. Tevet, K. Sreenath, and C. K. Liu, “Beyondmimic: From motion tracking to versatile humanoid control via guided diffusion,” 2025. [Online]. Available: https://arxiv.org/abs/2508.08241

work page internal anchor Pith review Pith/arXiv arXiv 2025
[3]

arXiv preprint arXiv:2509.26633 (2025)

L. Yang, X. Huang, Z. Wu, A. Kanazawa, P. Abbeel, C. Sferrazza, C. K. Liu, R. Duan, and G. Shi, “Omniretarget: Interaction-preserving data generation for humanoid whole-body loco-manipulation and scene interaction,” 2025. [Online]. Available: https://arxiv.org/abs/ 2509.26633

work page arXiv 2025
[4]

Hdmi: Learning interactive humanoid whole-body control from human videos.arXiv preprint arXiv:2509.16757, 2025

H. Weng, Y . Li, N. Sobanbabu, Z. Wang, Z. Luo, T. He, D. Ramanan, and G. Shi, “Hdmi: Learning interactive humanoid whole-body control from human videos,” 2025. [Online]. Available: https://arxiv.org/abs/2509.16757

work page arXiv 2025
[5]

AMASS: Archive of Motion Capture as Surface Shapes

N. Mahmood, N. Ghorbani, N. F. Troje, G. Pons-Moll, and M. J. Black, “Amass: Archive of motion capture as surface shapes,” 2019. [Online]. Available: https://arxiv.org/abs/1904.03278

work page internal anchor Pith review Pith/arXiv arXiv 2019
[6]

Learning a unified policy for position and force control in legged loco-manipulation,

P. Zhi, P. Li, J. Yin, B. Jia, and S. Huang, “Learning a unified policy for position and force control in legged loco-manipulation,” 2025. [Online]. Available: https://arxiv.org/abs/2505.20829

work page arXiv 2025
[7]

Gentlehumanoid: Learning upper-body compliance for contact- rich human and object interaction,

Q. Lu, Y . Feng, B. Shi, M. Piseno, Z. Bao, and C. K. Liu, “Gentlehumanoid: Learning upper-body compliance for contact- rich human and object interaction,” 2025. [Online]. Available: https://arxiv.org/abs/2511.04679

work page arXiv 2025
[8]

Expressive whole-body control for humanoid robots,

X. Cheng, Y . Ji, J. Chen, R. Yang, G. Yang, and X. Wang, “Expressive whole-body control for humanoid robots,”Robotics: Science and Systems, 2024

work page 2024
[9]

arXiv preprint arXiv:2412.13196 (2024)

M. Ji, X. Peng, F. Liu, J. Li, G. Yang, X. Cheng, and X. Wang, “Exbody2: Advanced expressive humanoid whole-body control,”arXiv preprint arXiv: 2412.13196, 2024

work page arXiv 2024
[10]

Learning

T. He, Z. Luo, W. Xiao, C. Zhang, K. Kitani, C. Liu, and G. Shi, “Learning human-to-humanoid real-time whole-body teleoperation,” arXiv preprint arXiv: 2403.04436, 2024. [Online]. Available: https://arxiv.org/pdf/2403.04436

work page arXiv 2024
[11]

arXiv preprint arXiv:2406.08858 (2024)

T. He, Z. Luo, X. He, W. Xiao, C. Zhang, W. Zhang, K. Kitani, C. Liu, and G. Shi, “Omnih2o: Universal and dexterous human-to-humanoid whole-body teleoperation and learning,” 2024. [Online]. Available: https://arxiv.org/abs/2406.08858

work page arXiv 2024
[12]

Mobile-television: Predictive motion priors for humanoid whole-body control,

C. Lu, X. Cheng, J. Li, S. Yang, M. Ji, C. Yuan, G. Yang, S. Yi, and X. Wang, “Mobile-television: Predictive motion priors for humanoid whole-body control,” in2025 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2025, pp. 5364–5371

work page 2025
[14]

Available: https://arxiv.org/abs/2506.12851

[Online]. Available: https://arxiv.org/abs/2506.12851

work page arXiv
[15]

Visualmimic: Visual humanoid loco-manipulation via motion tracking and generation,

S. Yin, Y . Ze, H.-X. Yu, C. K. Liu, and J. Wu, “Visualmimic: Visual humanoid loco-manipulation via motion tracking and generation,”

work page
[16]

Available: https://arxiv.org/abs/2509.20322

[Online]. Available: https://arxiv.org/abs/2509.20322

work page arXiv
[17]

Softmimic: Learning compliant whole-body control from examples,

G. B. Margolis, M. Wang, N. Fey, and P. Agrawal, “Softmimic: Learning compliant whole-body control from examples,” 2025. [Online]. Available: https://arxiv.org/abs/2510.17792

work page arXiv 2025
[18]

Chip: Adaptive compliance for humanoid control through hindsight perturbation,

S. Chen, Z. ang Cao, Z. Luo, F. Casta ˜neda, C. Li, T. Wang, Y . Yuan, L. J. Fan, C. K. Liu, and Y . Zhu, “Chip: Adaptive compliance for humanoid control through hindsight perturbation,” 2026. [Online]. Available: https://arxiv.org/abs/2512.14689

work page arXiv 2026
[19]

arXiv preprint arXiv:2505.06776 (2025)

Y . Zhang, Y . Yuan, P. Gurunath, I. Gupta, S. Omidshafiei, A. akbar Agha-mohammadi, M. Vazquez-Chanlatte, L. Pedersen, T. He, and G. Shi, “Falcon: Learning force-adaptive humanoid loco-manipulation,” 2025. [Online]. Available: https://arxiv.org/abs/ 2505.06776

work page arXiv 2025
[20]

Facet: Force-adaptive control via impedance reference tracking for legged robots,

B. Xu, H. Weng, Q. Lu, Y . Gao, and H. Xu, “Facet: Force-adaptive control via impedance reference tracking for legged robots,” 2025. [Online]. Available: https://arxiv.org/abs/2505.06883

work page arXiv 2025
[21]

Twist2: Scalable, portable, and holistic humanoid data collection system

Y . Ze, S. Zhao, W. Wang, A. Kanazawa, R. Duan, P. Abbeel, G. Shi, J. Wu, and C. K. Liu, “Twist2: Scalable, portable, and holistic humanoid data collection system,” 2025. [Online]. Available: https://arxiv.org/abs/2511.02832

work page arXiv 2025
[22]

Hover: Versatile neural whole-body controller for humanoid robots,

T. He, W. Xiao, T. Lin, Z. Luo, Z. Xu, Z. Jiang, J. Kautz, C. Liu, G. Shi, X. Wang, L. Fan, and Y . Zhu, “Hover: Versatile neural whole-body controller for humanoid robots,” 2025. [Online]. Available: https://arxiv.org/abs/2410.21229

work page arXiv 2025
[23]

Sonic: Supersizing motion tracking for natural humanoid whole-body control.arXiv preprint arXiv:2511.07820, 2025

Z. Luo, Y . Yuan, T. Wang, C. Li, S. Chen, F. Casta ˜neda, Z.-A. Cao, J. Li, D. Minor, Q. Ben, X. Da, R. Ding, C. Hogg, L. Song, E. Lim, E. Jeong, T. He, H. Xue, W. Xiao, Z. Wang, S. Yuen, J. Kautz, Y . Chang, U. Iqbal, L. J. Fan, and Y . Zhu, “Sonic: Supersizing motion tracking for natural humanoid whole-body control,” 2025. [Online]. Available: https://...

work page internal anchor Pith review arXiv 2025
[24]

Learning humanoid end-effector control for open-vocabulary visual loco-manipulation,

R. Dong, Z. Li, X. He, and S. Gupta, “Learning humanoid end-effector control for open-vocabulary visual loco-manipulation,”

work page
[25]

Available: https://arxiv.org/abs/2602.16705

[Online]. Available: https://arxiv.org/abs/2602.16705

work page arXiv
[26]

ReAct: Synergizing Reasoning and Acting in Language Models

S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. R. Narasimhan, and Y . Cao, “React: Synergizing reasoning and acting in language models,” inThe Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023, 2023. [Online]. Available: https://arxiv.org/abs/2210.03629

work page internal anchor Pith review Pith/arXiv arXiv 2023
[27]

A plug-and-play deploy framework for robots. just deploy, just do

P. L. Zihan Zhuang, Yi Dong, “A plug-and-play deploy framework for robots. just deploy, just do.” 2025. [Online]. Available: https://github.com/HansZ8/RoboJuDo

work page 2025
[28]

Skillblender: Towards versatile humanoid whole-body loco-manipulation via skill blending,

Y . Kuang, H. Geng, A. Elhafsi, T.-D. Do, P. Abbeel, J. Malik, M. Pavone, and Y . Wang, “Skillblender: Towards versatile humanoid whole-body loco-manipulation via skill blending,” 2025. [Online]. Available: https://arxiv.org/abs/2506.09366

work page arXiv 2025

[1] [1]

Deepmimic: example-guided deep reinforcement learning of physics- based character skills,

X. B. Peng, P. Abbeel, S. Levine, and M. van de Panne, “Deepmimic: example-guided deep reinforcement learning of physics- based character skills,”ACM Transactions on Graphics, vol. 37, no. 4, p. 1–14, July 2018. [Online]. Available: http://dx.doi.org/10. 1145/3197517.3201311

work page arXiv 2018

[2] [2]

BeyondMimic: From Motion Tracking to Versatile Humanoid Control via Guided Diffusion

Q. Liao, T. E. Truong, X. Huang, Y . Gao, G. Tevet, K. Sreenath, and C. K. Liu, “Beyondmimic: From motion tracking to versatile humanoid control via guided diffusion,” 2025. [Online]. Available: https://arxiv.org/abs/2508.08241

work page internal anchor Pith review Pith/arXiv arXiv 2025

[3] [3]

arXiv preprint arXiv:2509.26633 (2025)

L. Yang, X. Huang, Z. Wu, A. Kanazawa, P. Abbeel, C. Sferrazza, C. K. Liu, R. Duan, and G. Shi, “Omniretarget: Interaction-preserving data generation for humanoid whole-body loco-manipulation and scene interaction,” 2025. [Online]. Available: https://arxiv.org/abs/ 2509.26633

work page arXiv 2025

[4] [4]

Hdmi: Learning interactive humanoid whole-body control from human videos.arXiv preprint arXiv:2509.16757, 2025

H. Weng, Y . Li, N. Sobanbabu, Z. Wang, Z. Luo, T. He, D. Ramanan, and G. Shi, “Hdmi: Learning interactive humanoid whole-body control from human videos,” 2025. [Online]. Available: https://arxiv.org/abs/2509.16757

work page arXiv 2025

[5] [5]

AMASS: Archive of Motion Capture as Surface Shapes

N. Mahmood, N. Ghorbani, N. F. Troje, G. Pons-Moll, and M. J. Black, “Amass: Archive of motion capture as surface shapes,” 2019. [Online]. Available: https://arxiv.org/abs/1904.03278

work page internal anchor Pith review Pith/arXiv arXiv 2019

[6] [6]

Learning a unified policy for position and force control in legged loco-manipulation,

P. Zhi, P. Li, J. Yin, B. Jia, and S. Huang, “Learning a unified policy for position and force control in legged loco-manipulation,” 2025. [Online]. Available: https://arxiv.org/abs/2505.20829

work page arXiv 2025

[7] [7]

Gentlehumanoid: Learning upper-body compliance for contact- rich human and object interaction,

Q. Lu, Y . Feng, B. Shi, M. Piseno, Z. Bao, and C. K. Liu, “Gentlehumanoid: Learning upper-body compliance for contact- rich human and object interaction,” 2025. [Online]. Available: https://arxiv.org/abs/2511.04679

work page arXiv 2025

[8] [8]

Expressive whole-body control for humanoid robots,

X. Cheng, Y . Ji, J. Chen, R. Yang, G. Yang, and X. Wang, “Expressive whole-body control for humanoid robots,”Robotics: Science and Systems, 2024

work page 2024

[9] [9]

arXiv preprint arXiv:2412.13196 (2024)

M. Ji, X. Peng, F. Liu, J. Li, G. Yang, X. Cheng, and X. Wang, “Exbody2: Advanced expressive humanoid whole-body control,”arXiv preprint arXiv: 2412.13196, 2024

work page arXiv 2024

[10] [10]

Learning

T. He, Z. Luo, W. Xiao, C. Zhang, K. Kitani, C. Liu, and G. Shi, “Learning human-to-humanoid real-time whole-body teleoperation,” arXiv preprint arXiv: 2403.04436, 2024. [Online]. Available: https://arxiv.org/pdf/2403.04436

work page arXiv 2024

[11] [11]

arXiv preprint arXiv:2406.08858 (2024)

T. He, Z. Luo, X. He, W. Xiao, C. Zhang, W. Zhang, K. Kitani, C. Liu, and G. Shi, “Omnih2o: Universal and dexterous human-to-humanoid whole-body teleoperation and learning,” 2024. [Online]. Available: https://arxiv.org/abs/2406.08858

work page arXiv 2024

[12] [12]

Mobile-television: Predictive motion priors for humanoid whole-body control,

C. Lu, X. Cheng, J. Li, S. Yang, M. Ji, C. Yuan, G. Yang, S. Yi, and X. Wang, “Mobile-television: Predictive motion priors for humanoid whole-body control,” in2025 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2025, pp. 5364–5371

work page 2025

[13] [14]

Available: https://arxiv.org/abs/2506.12851

[Online]. Available: https://arxiv.org/abs/2506.12851

work page arXiv

[14] [15]

Visualmimic: Visual humanoid loco-manipulation via motion tracking and generation,

S. Yin, Y . Ze, H.-X. Yu, C. K. Liu, and J. Wu, “Visualmimic: Visual humanoid loco-manipulation via motion tracking and generation,”

work page

[15] [16]

Available: https://arxiv.org/abs/2509.20322

[Online]. Available: https://arxiv.org/abs/2509.20322

work page arXiv

[16] [17]

Softmimic: Learning compliant whole-body control from examples,

G. B. Margolis, M. Wang, N. Fey, and P. Agrawal, “Softmimic: Learning compliant whole-body control from examples,” 2025. [Online]. Available: https://arxiv.org/abs/2510.17792

work page arXiv 2025

[17] [18]

Chip: Adaptive compliance for humanoid control through hindsight perturbation,

S. Chen, Z. ang Cao, Z. Luo, F. Casta ˜neda, C. Li, T. Wang, Y . Yuan, L. J. Fan, C. K. Liu, and Y . Zhu, “Chip: Adaptive compliance for humanoid control through hindsight perturbation,” 2026. [Online]. Available: https://arxiv.org/abs/2512.14689

work page arXiv 2026

[18] [19]

arXiv preprint arXiv:2505.06776 (2025)

Y . Zhang, Y . Yuan, P. Gurunath, I. Gupta, S. Omidshafiei, A. akbar Agha-mohammadi, M. Vazquez-Chanlatte, L. Pedersen, T. He, and G. Shi, “Falcon: Learning force-adaptive humanoid loco-manipulation,” 2025. [Online]. Available: https://arxiv.org/abs/ 2505.06776

work page arXiv 2025

[19] [20]

Facet: Force-adaptive control via impedance reference tracking for legged robots,

B. Xu, H. Weng, Q. Lu, Y . Gao, and H. Xu, “Facet: Force-adaptive control via impedance reference tracking for legged robots,” 2025. [Online]. Available: https://arxiv.org/abs/2505.06883

work page arXiv 2025

[20] [21]

Twist2: Scalable, portable, and holistic humanoid data collection system

Y . Ze, S. Zhao, W. Wang, A. Kanazawa, R. Duan, P. Abbeel, G. Shi, J. Wu, and C. K. Liu, “Twist2: Scalable, portable, and holistic humanoid data collection system,” 2025. [Online]. Available: https://arxiv.org/abs/2511.02832

work page arXiv 2025

[21] [22]

Hover: Versatile neural whole-body controller for humanoid robots,

T. He, W. Xiao, T. Lin, Z. Luo, Z. Xu, Z. Jiang, J. Kautz, C. Liu, G. Shi, X. Wang, L. Fan, and Y . Zhu, “Hover: Versatile neural whole-body controller for humanoid robots,” 2025. [Online]. Available: https://arxiv.org/abs/2410.21229

work page arXiv 2025

[22] [23]

Sonic: Supersizing motion tracking for natural humanoid whole-body control.arXiv preprint arXiv:2511.07820, 2025

Z. Luo, Y . Yuan, T. Wang, C. Li, S. Chen, F. Casta ˜neda, Z.-A. Cao, J. Li, D. Minor, Q. Ben, X. Da, R. Ding, C. Hogg, L. Song, E. Lim, E. Jeong, T. He, H. Xue, W. Xiao, Z. Wang, S. Yuen, J. Kautz, Y . Chang, U. Iqbal, L. J. Fan, and Y . Zhu, “Sonic: Supersizing motion tracking for natural humanoid whole-body control,” 2025. [Online]. Available: https://...

work page internal anchor Pith review arXiv 2025

[23] [24]

Learning humanoid end-effector control for open-vocabulary visual loco-manipulation,

R. Dong, Z. Li, X. He, and S. Gupta, “Learning humanoid end-effector control for open-vocabulary visual loco-manipulation,”

work page

[24] [25]

Available: https://arxiv.org/abs/2602.16705

[Online]. Available: https://arxiv.org/abs/2602.16705

work page arXiv

[25] [26]

ReAct: Synergizing Reasoning and Acting in Language Models

S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. R. Narasimhan, and Y . Cao, “React: Synergizing reasoning and acting in language models,” inThe Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023, 2023. [Online]. Available: https://arxiv.org/abs/2210.03629

work page internal anchor Pith review Pith/arXiv arXiv 2023

[26] [27]

A plug-and-play deploy framework for robots. just deploy, just do

P. L. Zihan Zhuang, Yi Dong, “A plug-and-play deploy framework for robots. just deploy, just do.” 2025. [Online]. Available: https://github.com/HansZ8/RoboJuDo

work page 2025

[27] [28]

Skillblender: Towards versatile humanoid whole-body loco-manipulation via skill blending,

Y . Kuang, H. Geng, A. Elhafsi, T.-D. Do, P. Abbeel, J. Malik, M. Pavone, and Y . Wang, “Skillblender: Towards versatile humanoid whole-body loco-manipulation via skill blending,” 2025. [Online]. Available: https://arxiv.org/abs/2506.09366

work page arXiv 2025