pith. machine review for the scientific record. sign in

arxiv: 2604.21351 · v1 · submitted 2026-04-23 · 💻 cs.RO

Recognition: unknown

Learn Weightlessness: Imitate Non-Self-Stabilizing Motions on Humanoid Robot

Authors on Pith no claims yet

Pith reviewed 2026-05-09 21:45 UTC · model grok-4.3

classification 💻 cs.RO
keywords humanoid robotsimitation learningweightlessness mechanismnon-self-stabilizing motionsenvironmental interactionwhole-body controlcontact-rich tasks
0
0 comments X

The pith

A weightlessness mechanism lets humanoid robots generalize non-self-stabilizing motions by dynamically relaxing specific joints.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to show that imitating the human practice of selectively relaxing joints to enter a weightless state during motions that need environmental support enables robots to complete tasks like sitting on chairs of different heights or leaning on walls. This matters because standard imitation and reinforcement learning approaches rely on rigid trajectory tracking and therefore struggle when the body must interact adaptively with surroundings. The authors introduce an auto-labeling method for weightless states together with a Weightlessness Mechanism that decides which joints to relax and by how much; the resulting controller trains on single-action demonstrations yet produces stable behavior across varied environmental setups. In this way the work connects precise motion reproduction with flexible physical contact.

Core claim

The central claim is that a Weightlessness Mechanism (WM) dynamically selects which joints to relax and to what degree, based on an auto-labeling strategy that identifies weightless states in human demonstrations, thereby allowing stable imitation of non-self-stabilizing motions. When trained on single-action demonstrations without task-specific tuning, the mechanism produces strong generalization across different chair heights, bed inclinations, and wall-leaning configurations while preserving motion stability on a humanoid robot.

What carries the argument

The Weightlessness Mechanism (WM), which dynamically determines which joints to relax and to what level during non-self-stabilizing motions to permit passive body-environment contact while still executing the target action.

If this is right

  • Generalization holds for sitting on chairs of varying heights, lying on beds with different inclinations, and leaning against walls via shoulder or elbow contact.
  • Training on single-action demonstrations is sufficient to achieve stable performance across these tasks.
  • Motion stability is preserved even when the robot must rely on passive environmental support.
  • The approach reduces the separation between rigid trajectory tracking and adaptive environmental interaction in humanoid whole-body control.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This selective relaxation approach could be applied to other contact-rich tasks such as climbing or carrying objects where passive support is useful.
  • Energy use might decrease in extended operations because joints are relaxed rather than actively controlled at every instant.
  • Integration with online feedback could allow the mechanism to adjust relaxation levels in real time when environments change unexpectedly.

Load-bearing premise

The auto-labeling strategy for weightless states accurately captures the relevant human biological process and that this labeling transfers to robot dynamics so that generalization occurs without further tuning or data.

What would settle it

A direct test in which the robot loses balance or fails to complete the motion on a new chair height or bed inclination while the Weightlessness Mechanism is active would falsify the generalization claim.

Figures

Figures reproduced from arXiv: 2604.21351 by Bin Zhao, Dong Wang, Haoran Yang, Jiacheng Bao, Junbo Tan, Wenqiang Que, Xuelong Li, Xueqian Wang, Yucheng Xin.

Figure 1
Figure 1. Figure 1: Our work introduces a human-inspired weightlessness mechanism that controls robot joints to selectively relax when [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: WM in Sitting and Lying-down. During typical human sitting and lying-down motions, the body often re￾laxes after reaching a certain critical point. For example, after attaining a specific extreme position during sitting b), the lower limbs become fully relaxed, allowing the upper body to free-fall onto the chair to complete the sitting motion. Similarly, when leaning back to a certain limit c), the upper b… view at source ↗
Figure 4
Figure 4. Figure 4: WM in leaning against the wall. During the action of leaning against a wall, humans typically relax and undergo a controlled free-fall toward the wall after tilting to a certain critical angle in b). Only the arm and leg in contact with the wall remain actively engaged to provide support, while the remaining arm and leg are free to move in c). Physics simulators such as Isaac Gym [20] and Mu￾JoCo [21] are … view at source ↗
Figure 5
Figure 5. Figure 5: Framework. The method consist of 4 components: (a) We collected videos of sitting, lying, and leaning (shoulder and elbow) motions, from which SMPL motions and terrains were extracted, followed by motion retargeting and terrain reconstructed. (b) Label the robot’s joint weightlessness states and weightless time interval via annotation method, and train a WM network using LSTM. (c) Train the action policy u… view at source ↗
Figure 6
Figure 6. Figure 6: Confirmation of Weightless Time Interval. The weightless interval I is related to contacting-time C(t), CoM offset time {P(t) ∈ S / (t)}, and random temporal shift ∆t. Joint Weightless States. The set of active joints A(t) is determined using a limb-node hierarchy inspired by human motor control. Let C(t) again denote the set of environment￾contact frames at time t. Then A(t) = {cwaist} ∪ [ c∈C(t) ParentPa… view at source ↗
Figure 7
Figure 7. Figure 7: Confirmation of Joint Weightless States. The left image illustrates the limb-node relationship. The right image presents a case study. TABLE I: LSTM Network Configurations Component Specification Input Dimension 230-dims each frame including 4 historical steps, 1 current step, and 5 future steps of joint positions q ∈ R23 , resulting in a total input dimension of 23 × 10 = 230 Hidden Size [256, 256, 64] un… view at source ↗
Figure 8
Figure 8. Figure 8: For different motions, the weightlessness network [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗
read the original abstract

The integration of imitation and reinforcement learning has enabled remarkable advances in humanoid whole-body control, facilitating diverse human-like behaviors. However, research on environment-dependent motions remains limited. Existing methods typically enforce rigid trajectory tracking while neglecting physical interactions with the environment. We observe that humans naturally exploit a "weightless" state during non-self-stabilizing (NSS) motions--selectively relaxing specific joints to allow passive body--environment contact, thereby stabilizing the body and completing the motion. Inspired by this biological mechanism, we design a weightlessness-state auto-labeling strategy for dataset annotation; and we propose the Weightlessness Mechanism (WM), a method that dynamically determines which joints to relax and to what level, together enabling effective environmental interaction while executing target motions. We evaluate our approach on 3 representative NSS tasks: sitting on chairs of varying heights, lying down on beds with different inclinations, and leaning against walls via shoulder or elbow. Extensive experiments in simulation and on the Unitree G1 robot demonstrate that our WM method, trained on single-action demonstrations without any task-specific tuning, achieves strong generalization across diverse environmental configurations while maintaining motion stability. Our work bridges the gap between precise trajectory tracking and adaptive environmental interaction, offering a biologically-inspired solution for contact-rich humanoid control.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper claims that humans exploit a 'weightless' state by selectively relaxing joints during non-self-stabilizing motions to enable passive environmental contact and stabilization. Inspired by this, it introduces a weightlessness-state auto-labeling strategy for annotating single-action demonstrations and proposes the Weightlessness Mechanism (WM) that dynamically determines joint relaxation levels. The method is evaluated on three NSS tasks (sitting on chairs of varying heights, lying on inclined beds, leaning on walls via shoulder/elbow) and is reported to achieve strong generalization across environmental configurations without task-specific tuning, with validation in simulation and on the Unitree G1 robot.

Significance. If the results hold, the work could meaningfully advance contact-rich humanoid control by bridging rigid trajectory tracking with adaptive, biologically-inspired environmental interaction. The real-robot deployment on Unitree G1 provides practical grounding, and the single-demonstration training setup is a positive aspect for scalability.

major comments (3)
  1. [Method (weightlessness-state auto-labeling)] The central generalization claim without task-specific tuning rests on the weightlessness-state auto-labeling strategy (described in the method as 'inspired by' human behavior). However, no quantitative validation against human motion-capture data, no ablation on labeling variants, and no analysis of label transfer across configuration changes (e.g., chair height or bed inclination) are provided, leaving open whether the labels robustly capture the intended mechanism or are specific to the demonstration motions.
  2. [Experiments and Evaluation] The experimental results claim 'strong generalization' and maintained stability across diverse configurations, yet the manuscript reports no error bars, statistical tests, number of trials per configuration, or full details on data collection and variation ranges. This makes it impossible to assess whether the reported performance is reliable or could be explained by the base controller alone.
  3. [§4 (Evaluation)] No ablation studies isolate the contribution of the WM (joint relaxation determination) versus the underlying imitation/reinforcement learning pipeline or the auto-labeling. Without these, the load-bearing role of the proposed mechanism for the no-tuning generalization result cannot be verified.
minor comments (2)
  1. [Abstract] The abstract uses 'strong generalization' without a quantitative definition or baseline comparison; adding a specific metric (e.g., success rate over configuration ranges) would improve clarity.
  2. [Method] Notation for the relaxation level computation in the WM could be formalized with an equation to aid reproducibility.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the constructive review and the recommendation for major revision. The comments highlight important areas for strengthening the validation and evaluation of our method. We address each point below and indicate the planned revisions.

read point-by-point responses
  1. Referee: [Method (weightlessness-state auto-labeling)] The central generalization claim without task-specific tuning rests on the weightlessness-state auto-labeling strategy (described in the method as 'inspired by' human behavior). However, no quantitative validation against human motion-capture data, no ablation on labeling variants, and no analysis of label transfer across configuration changes (e.g., chair height or bed inclination) are provided, leaving open whether the labels robustly capture the intended mechanism or are specific to the demonstration motions.

    Authors: The auto-labeling is presented as inspired by human behavior rather than quantitatively derived from motion-capture data, as the work centers on robotic implementation. We do not possess human mocap datasets for these NSS motions. However, we will add an analysis of label transfer by applying the strategy to varied configuration demonstrations in simulation and reporting label consistency. We will also include ablations on labeling variants (e.g., alternative thresholds and detection rules) to assess robustness. These additions will be made to the method and experiments sections. revision: partial

  2. Referee: [Experiments and Evaluation] The experimental results claim 'strong generalization' and maintained stability across diverse configurations, yet the manuscript reports no error bars, statistical tests, number of trials per configuration, or full details on data collection and variation ranges. This makes it impossible to assess whether the reported performance is reliable or could be explained by the base controller alone.

    Authors: We agree that more rigorous reporting is required. In the revision we will specify the number of trials per configuration, include error bars on key metrics (success rate, stability), detail data collection procedures and exact variation ranges (e.g., chair heights 0.4–0.6 m, bed angles 0–30°), and add statistical tests such as paired t-tests against baselines to demonstrate that gains are not attributable to the base controller alone. revision: yes

  3. Referee: [§4 (Evaluation)] No ablation studies isolate the contribution of the WM (joint relaxation determination) versus the underlying imitation/reinforcement learning pipeline or the auto-labeling. Without these, the load-bearing role of the proposed mechanism for the no-tuning generalization result cannot be verified.

    Authors: We recognize the need to isolate WM's role. The revised §4 will include ablations comparing (i) full WM with auto-labeling, (ii) WM with fixed/heuristic labels, (iii) imitation learning without WM relaxation, and (iv) the RL pipeline alone. Results will quantify WM's contribution to generalization without task-specific tuning. revision: yes

standing simulated objections not resolved
  • Quantitative validation of the weightlessness-state auto-labeling strategy against human motion-capture data, which would require new human data collection outside the scope of the present study.

Circularity Check

0 steps flagged

No circularity: empirical imitation method with independent experimental validation

full rationale

The paper describes an imitation-learning approach that augments demonstrations with a biologically inspired auto-labeling heuristic for weightless states, then trains a WM policy to relax joints during NSS motions. No equations, uniqueness theorems, or self-citations are invoked to derive the central result; generalization across chair heights, bed angles, and wall contacts is presented as an empirical outcome measured in simulation and on the Unitree G1. The training data and labeling rule are external inputs to the learned policy, not definitions that force the reported performance by construction. This is a standard empirical robotics contribution whose validity rests on experimental controls rather than any self-referential reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the biological observation of weightless states and the effectiveness of the proposed labeling and mechanism; without full text these cannot be audited for hidden fitted parameters or unstated assumptions.

axioms (1)
  • domain assumption Humans naturally exploit a weightless state during non-self-stabilizing motions by selectively relaxing joints
    Stated directly in the abstract as the core inspiration for the labeling strategy.
invented entities (1)
  • Weightlessness Mechanism (WM) no independent evidence
    purpose: Dynamically determines which joints to relax and to what level for environmental interaction
    New method introduced in the paper; no independent evidence or external validation provided in abstract.

pith-pipeline@v0.9.0 · 5551 in / 1304 out tokens · 44434 ms · 2026-05-09T21:45:51.394495+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

57 extracted references · 20 canonical work pages · 1 internal anchor

  1. [1]

    Humanplus: Humanoid shadowing and imitation from humans,

    Z. Fu, Q. Zhao, Q. Wu, G. Wetzstein, and C. Finn, “Humanplus: Humanoid shadowing and imitation from humans,”arXiv preprint arXiv:2406.10454, 2024

  2. [2]

    ExBody2: Advanced expressive humanoid whole-body control,

    M. Ji, X. Peng, F. Liu, J. Li, G. Yang, X. Cheng, and X. Wang, “Exbody2: Advanced expressive humanoid whole-body control,”arXiv preprint arXiv:2412.13196, 2024

  3. [3]

    Available: https://arxiv.org/abs/2502.01143

    T. He, J. Gao, W. Xiao, Y . Zhang, Z. Wang, J. Wang, Z. Luo, G. He, N. Sobanbab, C. Panet al., “Asap: Aligning simulation and real-world physics for learning agile humanoid whole-body skills,”arXiv preprint arXiv:2502.01143, 2025

  4. [4]

    Omnih2o: Universal and dexterous human-to-humanoid whole-body teleoperation and learning,

    T. He, Z. Luo, X. He, W. Xiao, C. Zhang, W. Zhang, K. Kitani, C. Liu, and G. Shi, “Omnih2o: Universal and dexterous human- to-humanoid whole-body teleoperation and learning,”arXiv preprint arXiv:2406.08858, 2024

  5. [5]

    Embrace collisions: Humanoid shad- owing for deployable contact-agnostics motions,

    Z. Zhuang and H. Zhao, “Embrace collisions: Humanoid shad- owing for deployable contact-agnostics motions,”arXiv preprint arXiv:2502.01465, 2025

  6. [6]

    KungfuBot: Physics-Based Humanoid Whole-Body Control for Learning Highly-Dynamic Skills,

    W. Xie, J. Han, J. Zheng, H. Li, X. Liu, J. Shi, W. Zhang, C. Bai, and X. Li, “KungfuBot: Physics-Based Humanoid Whole-Body Control for Learning Highly-Dynamic Skills,” Jun. 2025

  7. [7]

    Humanoid-vla: Towards universal humanoid control with visual integration.arXiv preprint arXiv:2502.14795, 2025

    P. Ding, J. Ma, X. Tong, B. Zou, X. Luo, Y . Fan, T. Wang, H. Lu, P. Mo, J. Liuet al., “Humanoid-vla: Towards universal humanoid control with visual integration,”arXiv preprint arXiv:2502.14795, 2025

  8. [8]

    Grab: A dataset of whole-body human grasping of objects,

    O. Taheri, N. Ghorbani, M. J. Black, and D. Tzionas, “Grab: A dataset of whole-body human grasping of objects,” inComputer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IV 16. Springer, 2020, pp. 581–600

  9. [9]

    Behave: Dataset and method for tracking human object interactions,

    B. L. Bhatnagar, X. Xie, I. A. Petrov, C. Sminchisescu, C. Theobalt, and G. Pons-Moll, “Behave: Dataset and method for tracking human object interactions,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 15 935–15 946

  10. [10]

    Intercap: Joint markerless 3d tracking of humans and objects in interaction,

    Y . Huang, O. Taheri, M. J. Black, and D. Tzionas, “Intercap: Joint markerless 3d tracking of humans and objects in interaction,” inDAGM German Conference on Pattern Recognition. Springer, 2022, pp. 281– 299

  11. [11]

    Object motion guided human motion synthesis,

    J. Li, J. Wu, and C. K. Liu, “Object motion guided human motion synthesis,”ACM Transactions on Graphics (TOG), vol. 42, no. 6, pp. 1–11, 2023

  12. [12]

    Parahome: Parameterizing everyday home activities towards 3d generative modeling of human-object interactions,

    J. Kim, J. Kim, J. Na, and H. Joo, “Parahome: Parameterizing everyday home activities towards 3d generative modeling of human-object interactions,”arXiv preprint arXiv:2401.10232, 2024

  13. [13]

    Mimicking-bench: A benchmark for generalizable humanoid-scene interaction learning via human mimicking,

    Y . Liu, B. Yang, L. Zhong, H. Wang, and L. Yi, “Mimicking-bench: A benchmark for generalizable humanoid-scene interaction learning via human mimicking,”arXiv preprint arXiv:2412.17730, 2024

  14. [14]

    Taco: Benchmarking generalizable bimanual tool-action-object understanding,

    Y . Liu, H. Yang, X. Si, L. Liu, Z. Li, Y . Zhang, Y . Liu, and L. Yi, “Taco: Benchmarking generalizable bimanual tool-action-object understanding,” inProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, 2024, pp. 21 740–21 751

  15. [15]

    Himo: A new benchmark for full-body human interacting with multiple objects,

    X. Lv, L. Xu, Y . Yan, X. Jin, C. Xu, S. Wu, Y . Liu, L. Li, M. Bi, W. Zenget al., “Himo: A new benchmark for full-body human interacting with multiple objects,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 300–318

  16. [16]

    Intertrack: Tracking human object interaction without object templates,

    X. Xie, J. E. Lenssen, and G. Pons-Moll, “Intertrack: Tracking human object interaction without object templates,”arXiv preprint arXiv:2408.13953, 2024

  17. [17]

    Core4d: A 4d human-object-human interaction dataset for collaborative object rearrangement,

    Y . Liu, C. Zhang, R. Xing, B. Tang, B. Yang, and L. Yi, “Core4d: A 4d human-object-human interaction dataset for collaborative object rearrangement,”arXiv preprint arXiv:2406.19353, 2024

  18. [18]

    Full-body articulated human-object interaction,

    N. Jiang, T. Liu, Z. Cao, J. Cui, Z. Zhang, Y . Chen, H. Wang, Y . Zhu, and S. Huang, “Full-body articulated human-object interaction,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 9365–9376

  19. [19]

    Couch: Towards controllable human-chair interactions,

    X. Zhang, B. L. Bhatnagar, S. Starke, V . Guzov, and G. Pons-Moll, “Couch: Towards controllable human-chair interactions,” inEuropean Conference on Computer Vision. Springer, 2022, pp. 518–535

  20. [20]

    Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning

    V . Makoviychuk, L. Wawrzyniak, Y . Guo, M. Lu, K. Storey, M. Mack- lin, D. Hoeller, N. Rudin, A. Allshire, A. Handaet al., “Isaac gym: High performance gpu-based physics simulation for robot learning,” arXiv preprint arXiv:2108.10470, 2021

  21. [21]

    Mujoco: A physics engine for model-based control,

    E. Todorov, T. Erez, and Y . Tassa, “Mujoco: A physics engine for model-based control,” in2012 IEEE/RSJ international conference on intelligent robots and systems. IEEE, 2012, pp. 5026–5033

  22. [22]

    Improving sampling-based motion control,

    L. Liu, K. Yin, and B. Guo, “Improving sampling-based motion control,” inComputer Graphics Forum, vol. 34, no. 2. Wiley Online Library, 2015, pp. 415–423

  23. [23]

    Deepmimic: Example-guided deep reinforcement learning of physics-based char- acter skills,

    X. B. Peng, P. Abbeel, S. Levine, and M. Van de Panne, “Deepmimic: Example-guided deep reinforcement learning of physics-based char- acter skills,”ACM Transactions On Graphics (TOG), vol. 37, no. 4, pp. 1–14, 2018

  24. [24]

    Learning to sit: Synthesizing human-chair interactions via hierarchical control,

    Y .-W. Chao, J. Yang, W. Chen, and J. Deng, “Learning to sit: Synthesizing human-chair interactions via hierarchical control,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 7, 2021, pp. 5887–5895

  25. [25]

    Hierar- chical planning and control for box loco-manipulation,

    Z. Xie, J. Tseng, S. Starke, M. van de Panne, and C. K. Liu, “Hierar- chical planning and control for box loco-manipulation,”Proceedings of the ACM on Computer Graphics and Interactive Techniques, vol. 6, no. 3, pp. 1–18, 2023

  26. [26]

    Humanvla: Towards vision-language directed object rearrangement by physical humanoid,

    X. Xu, Y . Zhang, Y .-L. Li, L. Han, and C. Lu, “Humanvla: Towards vision-language directed object rearrangement by physical humanoid,” arXiv preprint arXiv:2406.19972, 2024

  27. [27]

    Amp: Adversarial motion priors for stylized physics-based character con- trol,

    X. B. Peng, Z. Ma, P. Abbeel, S. Levine, and A. Kanazawa, “Amp: Adversarial motion priors for stylized physics-based character con- trol,”ACM Transactions on Graphics (ToG), vol. 40, no. 4, pp. 1–20, 2021

  28. [28]

    Padl: Language- directed physics-based character control,

    J. Juravsky, Y . Guo, S. Fidler, and X. B. Peng, “Padl: Language- directed physics-based character control,” inSIGGRAPH Asia 2022 Conference Papers, 2022, pp. 1–9

  29. [29]

    Ase: Large- scale reusable adversarial skill embeddings for physically simulated characters,

    X. B. Peng, Y . Guo, L. Halper, S. Levine, and S. Fidler, “Ase: Large- scale reusable adversarial skill embeddings for physically simulated characters,”ACM Transactions On Graphics (TOG), vol. 41, no. 4, pp. 1–17, 2022

  30. [30]

    Synthesizing physical character-scene interactions,

    M. Hassan, Y . Guo, T. Wang, M. Black, S. Fidler, and X. B. Peng, “Synthesizing physical character-scene interactions,” inACM SIGGRAPH 2023 Conference Proceedings, 2023, pp. 1–9

  31. [31]

    Synthesizing physically plausible human motions in 3d scenes,

    L. Pan, J. Wang, B. Huang, J. Zhang, H. Wang, X. Tang, and Y . Wang, “Synthesizing physically plausible human motions in 3d scenes,” in 2024 International Conference on 3D Vision (3DV). IEEE, 2024, pp. 1498–1507

  32. [32]

    Unified human-scene interaction via prompted chain-of- contacts,

    Z. Xiao, T. Wang, J. Wang, J. Cao, W. Zhang, B. Dai, D. Lin, and J. Pang, “Unified human-scene interaction via prompted chain-of- contacts,”arXiv preprint arXiv:2309.07918, 2023

  33. [33]

    Hu- manoidbench: Simulated humanoid benchmark for whole-body locomotion and manipulation

    C. Sferrazza, D.-M. Huang, X. Lin, Y . Lee, and P. Abbeel, “Humanoid- bench: Simulated humanoid benchmark for whole-body locomotion and manipulation,”arXiv preprint arXiv:2403.10506, 2024

  34. [34]

    Visual Imitation Enables Contextual Humanoid Control,

    A. Allshire, H. Choi, J. Zhang, D. McAllister, A. Zhang, C. M. Kim, T. Darrell, P. Abbeel, J. Malik, and A. Kanazawa, “Visual Imitation Enables Contextual Humanoid Control,” May 2025

  35. [35]

    On real-time whole-body human to humanoid motion transfer,

    F.-J. Montecillo-Puente, M. Narsipura Sreenivasa, and J.-P. Laumond, “On real-time whole-body human to humanoid motion transfer,” 2010

  36. [36]

    Online human walking imitation in task and joint space based on quadratic programming,

    K. Hu, C. Ott, and D. Lee, “Online human walking imitation in task and joint space based on quadratic programming,” in2014 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2014, pp. 3458–3464

  37. [37]

    Teleoperation of a humanoid robot using full-body motion capture, example movements, and machine learning,

    C. Stanton, A. Bogdanovych, and E. Ratanasena, “Teleoperation of a humanoid robot using full-body motion capture, example movements, and machine learning,” inProc. Australasian Conference on Robotics and Automation, vol. 8, 2012, p. 51

  38. [38]

    Teleoperation of a humanoid robot using an optical motion capture system,

    D. Dajles, F. Sileset al., “Teleoperation of a humanoid robot using an optical motion capture system,” in2018 IEEE International Work Conference on Bioinspired Intelligence (IWOBI). IEEE, 2018, pp. 1–8

  39. [39]

    Homie: Humanoid loco- manipulation with isomorphic exoskeleton cockpit,

    Q. Ben, F. Jia, J. Zeng, J. Dong, D. Lin, and J. Pang, “Homie: Humanoid loco-manipulation with isomorphic exoskeleton cockpit,” arXiv preprint arXiv:2502.13013, 2025

  40. [40]

    Questsim: Human motion tracking from sparse sensors with simulated avatars,

    A. Winkler, J. Won, and Y . Ye, “Questsim: Human motion tracking from sparse sensors with simulated avatars,” inSIGGRAPH Asia 2022 Conference Papers, 2022, pp. 1–8

  41. [41]

    Questenvsim: Environment-aware simulated motion tracking from sparse sensors,

    S. Lee, S. Starke, Y . Ye, J. Won, and A. Winkler, “Questenvsim: Environment-aware simulated motion tracking from sparse sensors,” inACM SIGGRAPH 2023 Conference Proceedings, 2023, pp. 1–9

  42. [42]

    Neural3points: Learning to generate physically realistic full-body motion for virtual reality users,

    Y . Ye, L. Liu, L. Hu, and S. Xia, “Neural3points: Learning to generate physically realistic full-body motion for virtual reality users,” in Computer Graphics Forum, vol. 41, no. 8. Wiley Online Library, 2022, pp. 183–194

  43. [43]

    Universal humanoid motion representations for physics-based control,

    Z. Luo, J. Cao, J. Merel, A. Winkler, J. Huang, K. Kitani, and W. Xu, “Universal humanoid motion representations for physics-based control,”arXiv preprint arXiv:2310.04582, 2023

  44. [44]

    Open-television: Teleoperation with immersive active visual feedback,

    X. Cheng, J. Li, S. Yang, G. Yang, and X. Wang, “Open-television: Teleoperation with immersive active visual feedback,”arXiv preprint arXiv:2407.01512, 2024

  45. [45]

    Humanoid locomotion as next token prediction,

    I. Radosavovic, B. Zhang, B. Shi, J. Rajasegaran, S. Kamat, T. Darrell, K. Sreenath, and J. Malik, “Humanoid locomotion as next token prediction,” inThe Thirty-eighth Annual Conference on Neural In- formation Processing Systems, 2024

  46. [46]

    Expressive whole-body control for humanoid robots,

    X. Cheng, Y . Ji, J. Chen, R. Yang, G. Yang, and X. Wang, “Ex- pressive whole-body control for humanoid robots,”arXiv preprint arXiv:2402.16796, 2024

  47. [47]

    Okami: Teaching humanoid robots manipulation skills through single video imitation,

    J. Li, Y . Zhu, Y . Xie, Z. Jiang, M. Seo, G. Pavlakos, and Y . Zhu, “Okami: Teaching humanoid robots manipulation skills through single video imitation,” in8th Annual Conference on Robot Learning, 2024

  48. [48]

    BeyondMimic: From Motion Tracking to Versatile Humanoid Control via Guided Diffusion,

    T. E. Truong, Q. Liao, X. Huang, G. Tevet, C. K. Liu, and K. Sreenath, “BeyondMimic: From Motion Tracking to Versatile Humanoid Control via Guided Diffusion,” Aug. 2025

  49. [49]

    GBC: Generalized Behavior-Cloning Framework for Whole-Body Humanoid Imitation,

    Y . Yao, C. Luo, J. Du, W. He, and J.-G. Lu, “GBC: Generalized Behavior-Cloning Framework for Whole-Body Humanoid Imitation,” Aug. 2025

  50. [50]

    Learning human-to-humanoid real-time whole-body teleoperation,

    T. He, Z. Luo, W. Xiao, C. Zhang, K. Kitani, C. Liu, and G. Shi, “Learning human-to-humanoid real-time whole-body teleoperation,” in2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2024, pp. 8944–8951

  51. [51]

    HOVER: Versatile neural whole-body controller for humanoid robots,

    T. He, W. Xiao, T. Lin, Z. Luo, Z. Xu, Z. Jiang, J. Kautz, C. Liu, G. Shi, X. Wanget al., “Hover: Versatile neural whole-body controller for humanoid robots,”arXiv preprint arXiv:2410.21229, 2024

  52. [52]

    UniTracker: Learning Universal Whole-Body Motion Tracker for Humanoid Robots,

    K. Yin, W. Zeng, K. Fan, Z. Wang, Q. Zhang, Z. Tian, J. Wang, J. Pang, and W. Zhang, “UniTracker: Learning Universal Whole-Body Motion Tracker for Humanoid Robots,” Jul. 2025

  53. [53]

    World-Grounded Human Motion Recovery via Gravity- View Coordinates,

    Z. Shen, H. Pi, Y . Xia, Z. Cen, S. Peng, Z. Hu, H. Bao, R. Hu, and X. Zhou, “World-Grounded Human Motion Recovery via Gravity- View Coordinates,” inSIGGRAPH Asia 2024 Conference Papers, Dec. 2024, pp. 1–11

  54. [54]

    MegaSaM: Accurate, Fast, and Robust Structure and Motion from Casual Dynamic Videos,

    Z. Li, R. Tucker, F. Cole, Q. Wang, L. Jin, V . Ye, A. Kanazawa, A. Holynski, and N. Snavely, “MegaSaM: Accurate, Fast, and Robust Structure and Motion from Casual Dynamic Videos,” Dec. 2024

  55. [55]

    Neural kernel surface reconstruction,

    J. Huang, Z. Gojcic, M. Atzmon, O. Litany, S. Fidler, and F. Williams, “Neural kernel surface reconstruction,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition, 2023, pp. 4369–4379

  56. [56]

    Retargeting Matters: General Motion Retarget- ing for Humanoid Motion Tracking,

    J. P. Araujo, Y . Ze, P. Xu, J. Wu, and C. K. Liu, “Retargeting matters: General motion retargeting for humanoid motion tracking,” arXiv preprint arXiv:2510.02252, 2025

  57. [57]

    Perpetual Humanoid Control for Real-time Simulated Avatars,

    Z. Luo, J. Cao, A. Winkler, K. Kitani, and W. Xu, “Perpetual Humanoid Control for Real-time Simulated Avatars,” Sep. 2023