pith. sign in

arxiv: 2606.03476 · v1 · pith:STN6WW7Tnew · submitted 2026-06-02 · 💻 cs.RO

Human2Humanoid: Physics-Aware Cross-Morphology Motion Retargeting for Humanoid Robots

Pith reviewed 2026-06-28 10:12 UTC · model grok-4.3

classification 💻 cs.RO
keywords motion retargetinghumanoid robotsunsupervised learningCycleGANphysics constraintscross-morphology transferhuman-robot interactiongraph convolutional networks
0
0 comments X

The pith

An unsupervised CycleGAN framework retargets human motions to humanoid robots like the Unitree G1 without needing paired training data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Human and robot bodies differ in skeleton shape, limb lengths, and joint counts, and paired motion examples are rare. The paper introduces Human2Humanoid, which uses a CycleGAN architecture with graph convolutions to handle topology, a consistency loss on end-effector paths to handle scale differences, and physics rules to keep contacts realistic. This allows motion transfer from unpaired human videos to robot commands. Tests on the Unitree G1 show the outputs support better robot control and fewer physical violations than prior approaches.

Core claim

The Human2Humanoid method transfers human motion to humanoid robot behaviors with high fidelity by adopting a CycleGAN-based architecture equipped with a skeleton-aware graph convolutional network to capture topology-dependent motion features, a morphology-invariant end-effector consistency loss to align normalized end-effector trajectories, and explicit physics-aware feasibility constraints to encourage reproduction of contact patterns.

What carries the argument

CycleGAN architecture with skeleton-aware GCN, morphology-invariant end-effector consistency loss, and physics-aware feasibility constraints

If this is right

  • Retargeting works without any paired human-robot motion examples.
  • Normalized end-effector trajectories preserve motion intent across different body sizes.
  • Physics constraints reduce contact artifacts in the generated robot motions.
  • Downstream robot control tasks show improved performance compared to existing retargeting methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar techniques could extend to retargeting between different robot morphologies without new data collection.
  • The approach might reduce the need for expensive motion capture sessions in robot training pipelines.
  • Contact pattern preservation could improve safety in physical human-robot interactions.

Load-bearing premise

The CycleGAN equipped with the skeleton GCN, end-effector loss, and physics constraints can bridge the morphological differences using only unpaired data.

What would settle it

If applying the method to human motions on the Unitree G1 produces robot trajectories that violate contact patterns or result in unstable control performance worse than baseline methods.

Figures

Figures reproduced from arXiv: 2606.03476 by Feiyang Yuan, Junchi Gu, Shiwu Zhang, Shurui Fang, Tianchen Huang, Wei Gao, Xiaohu Zhang, Yu Wang.

Figure 1
Figure 1. Figure 1: Overview of the proposed Human2Humanoid framework. [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Representative failure cases of the optimization-based [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Qualitative comparisons between Human2Humanoid [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
read the original abstract

Retargeting human motion to humanoid robots is critical for teleoperation, imitation learning and human-robot interaction. However, it remains challenging because of substantial morphological discrepancies between humans and robots, including differences in skeletal topology, limb proportions and degrees of freedom, as well as the scarcity of paired motion data. This paper presents Human2Humanoid, an unsupervised motion retargeting framework that transfers human motions to humanoid robot behaviors with high fidelity. To bridge the domain gap under unpaired data, we adopt a CycleGAN-based architecture equipped with a skeleton-aware graph convolutional network to capture topology-dependent motion features. To address cross-domain scale mismatches, we introduce a morphology-invariant end-effector consistency loss that aligns normalized end-effector trajectories to preserve motion semantics across embodiments. To improve physical plausibility and reduce contact artifacts, we impose explicit physics-aware feasibility constraints to encourage reproduction of the contact patterns in the source motion. Experimental results show that the proposed method successfully retargets human motion to the Unitree G1 humanoid robot without paired data, and outperforms existing methods in both downstream controllability and physical feasibility.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper presents Human2Humanoid, an unsupervised motion retargeting framework that uses a CycleGAN-based architecture with a skeleton-aware graph convolutional network, a morphology-invariant end-effector consistency loss, and explicit physics-aware feasibility constraints to transfer human motions to the Unitree G1 humanoid robot without paired data. It claims successful retargeting and outperformance over existing methods in downstream controllability and physical feasibility.

Significance. If the experimental claims hold, this would represent a meaningful advance in cross-morphology retargeting for humanoid robots, particularly by removing the need for paired human-robot motion datasets and incorporating physics constraints for feasibility. Such a method could directly benefit teleoperation and imitation learning pipelines.

major comments (1)
  1. [Abstract] Abstract: the central claim of outperformance on controllability and physical feasibility is asserted without any quantitative metrics, baselines, dataset sizes, or ablation results. This prevents evaluation of whether the CycleGAN + GCN + end-effector loss + physics constraints actually deliver the stated gains.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their feedback. We address the concern about the abstract below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim of outperformance on controllability and physical feasibility is asserted without any quantitative metrics, baselines, dataset sizes, or ablation results. This prevents evaluation of whether the CycleGAN + GCN + end-effector loss + physics constraints actually deliver the stated gains.

    Authors: We agree that the abstract would be strengthened by including concrete quantitative indicators. The full manuscript reports these details in Sections 4–5, including baseline comparisons, dataset sizes (e.g., number of human motion sequences), ablation studies on the GCN, end-effector loss, and physics constraints, plus metrics for controllability (success rates on downstream tasks) and physical feasibility (contact reproduction accuracy). In the revised version we will update the abstract to cite the key quantitative gains (e.g., percentage improvements over baselines) while remaining within length limits. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper describes an unsupervised CycleGAN-based retargeting architecture with skeleton-aware GCN, morphology-invariant end-effector loss, and physics-aware constraints. All claims of successful retargeting and outperformance rest on experimental results rather than any derivation chain. No equations, self-definitional steps, fitted inputs presented as predictions, or load-bearing self-citations appear in the provided text; the central result is an empirical demonstration of the proposed model on the Unitree G1, which is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the effectiveness of CycleGAN for unpaired domain translation in motion space and the sufficiency of the added losses to enforce semantic and physical consistency; these are domain assumptions rather than derived results.

axioms (2)
  • domain assumption CycleGAN can learn bidirectional mappings between unpaired human and robot motion domains
    Invoked to enable retargeting without paired data; stated in the abstract description of the architecture.
  • domain assumption Normalized end-effector trajectories preserve motion semantics across different morphologies
    Basis for the morphology-invariant consistency loss.

pith-pipeline@v0.9.1-grok · 5749 in / 1302 out tokens · 21277 ms · 2026-06-28T10:12:45.009647+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

29 extracted references · 5 canonical work pages

  1. [1]

    Deepmimic: example-guided deep reinforcement learning of physics-based character skills,

    X. B. Peng, P. Abbeel, S. Levine, and M. van de Panne, “Deepmimic: example-guided deep reinforcement learning of physics-based character skills,”ACM Transactions on Graphics, vol. 37, no. 4, p. 1–14, Jul

  2. [2]

    Available: http://dx.doi.org/10.1145/3197517.3201311

    [Online]. Available: http://dx.doi.org/10.1145/3197517.3201311

  3. [3]

    Gmt: General motion tracking for humanoid whole-body control,

    Z. Chen, M. Ji, X. Cheng, X. Peng, X. B. Peng, and X. Wang, “Gmt: General motion tracking for humanoid whole-body control,” arXiv:2506.14770, 2025

  4. [4]

    Human–robot interaction: A survey,

    M. A. Goodrich and A. C. Schultz, “Human–robot interaction: A survey,”F oundations and Trends in Human-Computer Interaction, vol. 1, no. 3, pp. 203–275, 01 2008. [Online]. Available: https: //doi.org/10.1561/1100000005

  5. [5]

    Retargetting motion to new characters,

    M. Gleicher, “Retargetting motion to new characters,” inProceedings of the 25th Annual Conference on Computer Graphics and Interactive Techniques, ser. SIGGRAPH ’98. New York, NY , USA: Association for Computing Machinery, 1998, p. 33–42. [Online]. Available: https://doi.org/10.1145/280814.280820

  6. [6]

    Skeleton-aware networks for deep motion retargeting,

    K. Aberman, P. Li, D. Lischinski, O. Sorkine-Hornung, D. Cohen-Or, and B. Chen, “Skeleton-aware networks for deep motion retargeting,” ACM Trans. Graph., vol. 39, no. 4, Aug. 2020. [Online]. Available: https://doi.org/10.1145/3386569.3392462

  7. [7]

    Self-supervised motion retargeting with safety guarantee,

    S. Choi, M. J. Song, H. Ahn, and J. Kim, “Self-supervised motion retargeting with safety guarantee,” 2021. [Online]. Available: https://arxiv.org/abs/2103.06447

  8. [8]

    Motion retargeting for humanoid robots based on simultaneous morphing parameter identification and motion optimization,

    K. Ayusawa and E. Yoshida, “Motion retargeting for humanoid robots based on simultaneous morphing parameter identification and motion optimization,”IEEE Transactions on Robotics, vol. 33, no. 6, pp. 1343– 1357, 2017

  9. [9]

    Whole-body geometric retargeting for humanoid robots,

    K. Darvish, Y . Tirupachuri, G. Romualdi, L. Rapetti, D. Ferigo, F. J. A. Chavez, and D. Pucci, “Whole-body geometric retargeting for humanoid robots,” 2019. [Online]. Available: https://arxiv.org/abs/1909.10080

  10. [10]

    Global inverse kinematics via mixed-integer convex optimization,

    H. Dai, G. Izatt, and R. Tedrake, “Global inverse kinematics via mixed-integer convex optimization,”Int. J. Rob. Res., vol. 38, no. 12–13, p. 1420–1441, Oct. 2019. [Online]. Available: https: //doi.org/10.1177/0278364919846512

  11. [11]

    Retargeting matters: General motion retargeting for humanoid motion tracking,

    J. P. Araujo, Y . Ze, P. Xu, J. Wu, and C. K. Liu, “Retargeting matters: General motion retargeting for humanoid motion tracking,”

  12. [12]

    Available: https://arxiv.org/abs/2510.02252

    [Online]. Available: https://arxiv.org/abs/2510.02252

  13. [13]

    Skinned motion retargeting with residual perception of motion semantics and geometry,

    J. Zhang, J. Weng, D. Kang, F. Zhao, S. Huang, X. Zhe, L. Bao, Y . Shan, J. Wang, and Z. Tu, “Skinned motion retargeting with residual perception of motion semantics and geometry,” 2023. [Online]. Available: https://arxiv.org/abs/2303.08658

  14. [14]

    Pose-aware attention network for flexible motion retargeting by body part,

    HuLei, ZhangZihao, ZhongChongyang, JiangBoyuan, and XiaShihong, “Pose-aware attention network for flexible motion retargeting by body part,”IEEE Transactions on Visualization and Computer Graphics, 2024

  15. [15]

    Unpaired image-to-image translation using cycle-consistent adversarial networks,

    J. Y . Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,”arXiv e-prints, 2017

  16. [16]

    Neural kinematic networks for unsupervised motion retargetting,

    R. Villegas, J. Yang, D. Ceylan, and H. Lee, “Neural kinematic networks for unsupervised motion retargetting,” 2018. [Online]. Available: https://arxiv.org/abs/1804.05653

  17. [17]

    Pose-to-motion: Cross-domain motion retargeting with pose prior,

    Q. Zhao, P. Li, W. Yifan, O. Sorkine-Hornung, and G. Wetzstein, “Pose-to-motion: Cross-domain motion retargeting with pose prior,”

  18. [18]

    Available: https://arxiv.org/abs/2310.20249

    [Online]. Available: https://arxiv.org/abs/2310.20249

  19. [19]

    Motion retargeting for humanoid robots based on simultaneous morphing parameter identification and motion optimization,

    K. Ayusawa and E. Yoshida, “Motion retargeting for humanoid robots based on simultaneous morphing parameter identification and motion optimization,”Robotics, IEEE Trans. on (T-RO), vol. 33, no. 6, p. 15, 2017

  20. [20]

    Ro- bust real-time whole-body motion retargeting from human to humanoid,

    L. Penco, B. Clement, V . Moduano, E. M. Hoffman, and S. Ivaldi, “Ro- bust real-time whole-body motion retargeting from human to humanoid,” IEEE, 2018

  21. [21]

    Make tracking easy: Neural motion retargeting for humanoid whole-body control,

    Q. Zhao, K. Yang, X. Wang, S. Zhao, Y . Lu, X. Zhang, W. Yin, Q. Shen, X.-X. Long, and X. Cao, “Make tracking easy: Neural motion retargeting for humanoid whole-body control,” 2026. [Online]. Available: https://arxiv.org/abs/2603.22201

  22. [22]

    G-dream: Graph- conditioned diffusion retargeting across multiple embodiments,

    Z. Cao, B. Liu, S. Li, W. Zhang, and H. Chen, “G-dream: Graph- conditioned diffusion retargeting across multiple embodiments,” 2025. [Online]. Available: https://arxiv.org/abs/2505.20857

  23. [23]

    Pmnet: Learning of disentangled pose and movement for unsupervised motion retargeting,

    J. Lim, H. Chang, and J. Choi, “Pmnet: Learning of disentangled pose and movement for unsupervised motion retargeting,”British Machine Vision Association, BMVA, 2019

  24. [24]

    Reconform : Real-time contact-aware motion retargeting for more diverse character morphologies,

    T. Cheynel, T. Rossi, B. Bellot-Gurlet, D. Rohmer, and M.-P. Cani, “Reconform : Real-time contact-aware motion retargeting for more diverse character morphologies,” 2025. [Online]. Available: https://arxiv.org/abs/2502.21207

  25. [25]

    Moreflow: Motion retargeting learning through unsupervised flow matching,

    W. Kim, T. Li, and S. Ha, “Moreflow: Motion retargeting learning through unsupervised flow matching,” 2025. [Online]. Available: https://arxiv.org/abs/2509.25600

  26. [26]

    Least squares generative adversarial networks,

    X. Mao, Q. Li, H. Xie, R. Y . K. Lau, Z. Wang, and S. P. Smolley, “Least squares generative adversarial networks,” 2017. [Online]. Available: https://arxiv.org/abs/1611.04076

  27. [27]

    Motion-x: A large-scale 3d expressive whole-body human motion dataset,

    J. Lin, A. Zeng, S. Lu, Y . Cai, R. Zhang, H. Wang, and L. Zhang, “Motion-x: A large-scale 3d expressive whole-body human motion dataset,” 2024. [Online]. Available: https://arxiv.org/abs/2307.00818

  28. [28]

    Phuma: Physically-grounded humanoid locomotion dataset,

    K. Lee, S. Kim, M. Park, H. Kim, D. Hwang, H. Lee, and J. Choo, “Phuma: Physically-grounded humanoid locomotion dataset,” 2025. [Online]. Available: https://arxiv.org/abs/2510.26236

  29. [29]

    Perpetual humanoid control for real-time simulated avatars,

    Z. Luo, J. Cao, A. W. Winkler, K. Kitani, and W. Xu, “Perpetual humanoid control for real-time simulated avatars,” inInternational Conference on Computer Vision (ICCV), 2023