Human2Humanoid: Physics-Aware Cross-Morphology Motion Retargeting for Humanoid Robots
Pith reviewed 2026-06-28 10:12 UTC · model grok-4.3
The pith
An unsupervised CycleGAN framework retargets human motions to humanoid robots like the Unitree G1 without needing paired training data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The Human2Humanoid method transfers human motion to humanoid robot behaviors with high fidelity by adopting a CycleGAN-based architecture equipped with a skeleton-aware graph convolutional network to capture topology-dependent motion features, a morphology-invariant end-effector consistency loss to align normalized end-effector trajectories, and explicit physics-aware feasibility constraints to encourage reproduction of contact patterns.
What carries the argument
CycleGAN architecture with skeleton-aware GCN, morphology-invariant end-effector consistency loss, and physics-aware feasibility constraints
If this is right
- Retargeting works without any paired human-robot motion examples.
- Normalized end-effector trajectories preserve motion intent across different body sizes.
- Physics constraints reduce contact artifacts in the generated robot motions.
- Downstream robot control tasks show improved performance compared to existing retargeting methods.
Where Pith is reading between the lines
- Similar techniques could extend to retargeting between different robot morphologies without new data collection.
- The approach might reduce the need for expensive motion capture sessions in robot training pipelines.
- Contact pattern preservation could improve safety in physical human-robot interactions.
Load-bearing premise
The CycleGAN equipped with the skeleton GCN, end-effector loss, and physics constraints can bridge the morphological differences using only unpaired data.
What would settle it
If applying the method to human motions on the Unitree G1 produces robot trajectories that violate contact patterns or result in unstable control performance worse than baseline methods.
Figures
read the original abstract
Retargeting human motion to humanoid robots is critical for teleoperation, imitation learning and human-robot interaction. However, it remains challenging because of substantial morphological discrepancies between humans and robots, including differences in skeletal topology, limb proportions and degrees of freedom, as well as the scarcity of paired motion data. This paper presents Human2Humanoid, an unsupervised motion retargeting framework that transfers human motions to humanoid robot behaviors with high fidelity. To bridge the domain gap under unpaired data, we adopt a CycleGAN-based architecture equipped with a skeleton-aware graph convolutional network to capture topology-dependent motion features. To address cross-domain scale mismatches, we introduce a morphology-invariant end-effector consistency loss that aligns normalized end-effector trajectories to preserve motion semantics across embodiments. To improve physical plausibility and reduce contact artifacts, we impose explicit physics-aware feasibility constraints to encourage reproduction of the contact patterns in the source motion. Experimental results show that the proposed method successfully retargets human motion to the Unitree G1 humanoid robot without paired data, and outperforms existing methods in both downstream controllability and physical feasibility.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents Human2Humanoid, an unsupervised motion retargeting framework that uses a CycleGAN-based architecture with a skeleton-aware graph convolutional network, a morphology-invariant end-effector consistency loss, and explicit physics-aware feasibility constraints to transfer human motions to the Unitree G1 humanoid robot without paired data. It claims successful retargeting and outperformance over existing methods in downstream controllability and physical feasibility.
Significance. If the experimental claims hold, this would represent a meaningful advance in cross-morphology retargeting for humanoid robots, particularly by removing the need for paired human-robot motion datasets and incorporating physics constraints for feasibility. Such a method could directly benefit teleoperation and imitation learning pipelines.
major comments (1)
- [Abstract] Abstract: the central claim of outperformance on controllability and physical feasibility is asserted without any quantitative metrics, baselines, dataset sizes, or ablation results. This prevents evaluation of whether the CycleGAN + GCN + end-effector loss + physics constraints actually deliver the stated gains.
Simulated Author's Rebuttal
We thank the referee for their feedback. We address the concern about the abstract below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim of outperformance on controllability and physical feasibility is asserted without any quantitative metrics, baselines, dataset sizes, or ablation results. This prevents evaluation of whether the CycleGAN + GCN + end-effector loss + physics constraints actually deliver the stated gains.
Authors: We agree that the abstract would be strengthened by including concrete quantitative indicators. The full manuscript reports these details in Sections 4–5, including baseline comparisons, dataset sizes (e.g., number of human motion sequences), ablation studies on the GCN, end-effector loss, and physics constraints, plus metrics for controllability (success rates on downstream tasks) and physical feasibility (contact reproduction accuracy). In the revised version we will update the abstract to cite the key quantitative gains (e.g., percentage improvements over baselines) while remaining within length limits. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper describes an unsupervised CycleGAN-based retargeting architecture with skeleton-aware GCN, morphology-invariant end-effector loss, and physics-aware constraints. All claims of successful retargeting and outperformance rest on experimental results rather than any derivation chain. No equations, self-definitional steps, fitted inputs presented as predictions, or load-bearing self-citations appear in the provided text; the central result is an empirical demonstration of the proposed model on the Unitree G1, which is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption CycleGAN can learn bidirectional mappings between unpaired human and robot motion domains
- domain assumption Normalized end-effector trajectories preserve motion semantics across different morphologies
Reference graph
Works this paper leans on
-
[1]
Deepmimic: example-guided deep reinforcement learning of physics-based character skills,
X. B. Peng, P. Abbeel, S. Levine, and M. van de Panne, “Deepmimic: example-guided deep reinforcement learning of physics-based character skills,”ACM Transactions on Graphics, vol. 37, no. 4, p. 1–14, Jul
-
[2]
Available: http://dx.doi.org/10.1145/3197517.3201311
[Online]. Available: http://dx.doi.org/10.1145/3197517.3201311
-
[3]
Gmt: General motion tracking for humanoid whole-body control,
Z. Chen, M. Ji, X. Cheng, X. Peng, X. B. Peng, and X. Wang, “Gmt: General motion tracking for humanoid whole-body control,” arXiv:2506.14770, 2025
arXiv 2025
-
[4]
Human–robot interaction: A survey,
M. A. Goodrich and A. C. Schultz, “Human–robot interaction: A survey,”F oundations and Trends in Human-Computer Interaction, vol. 1, no. 3, pp. 203–275, 01 2008. [Online]. Available: https: //doi.org/10.1561/1100000005
-
[5]
Retargetting motion to new characters,
M. Gleicher, “Retargetting motion to new characters,” inProceedings of the 25th Annual Conference on Computer Graphics and Interactive Techniques, ser. SIGGRAPH ’98. New York, NY , USA: Association for Computing Machinery, 1998, p. 33–42. [Online]. Available: https://doi.org/10.1145/280814.280820
-
[6]
Skeleton-aware networks for deep motion retargeting,
K. Aberman, P. Li, D. Lischinski, O. Sorkine-Hornung, D. Cohen-Or, and B. Chen, “Skeleton-aware networks for deep motion retargeting,” ACM Trans. Graph., vol. 39, no. 4, Aug. 2020. [Online]. Available: https://doi.org/10.1145/3386569.3392462
-
[7]
Self-supervised motion retargeting with safety guarantee,
S. Choi, M. J. Song, H. Ahn, and J. Kim, “Self-supervised motion retargeting with safety guarantee,” 2021. [Online]. Available: https://arxiv.org/abs/2103.06447
arXiv 2021
-
[8]
Motion retargeting for humanoid robots based on simultaneous morphing parameter identification and motion optimization,
K. Ayusawa and E. Yoshida, “Motion retargeting for humanoid robots based on simultaneous morphing parameter identification and motion optimization,”IEEE Transactions on Robotics, vol. 33, no. 6, pp. 1343– 1357, 2017
2017
-
[9]
Whole-body geometric retargeting for humanoid robots,
K. Darvish, Y . Tirupachuri, G. Romualdi, L. Rapetti, D. Ferigo, F. J. A. Chavez, and D. Pucci, “Whole-body geometric retargeting for humanoid robots,” 2019. [Online]. Available: https://arxiv.org/abs/1909.10080
arXiv 2019
-
[10]
Global inverse kinematics via mixed-integer convex optimization,
H. Dai, G. Izatt, and R. Tedrake, “Global inverse kinematics via mixed-integer convex optimization,”Int. J. Rob. Res., vol. 38, no. 12–13, p. 1420–1441, Oct. 2019. [Online]. Available: https: //doi.org/10.1177/0278364919846512
-
[11]
Retargeting matters: General motion retargeting for humanoid motion tracking,
J. P. Araujo, Y . Ze, P. Xu, J. Wu, and C. K. Liu, “Retargeting matters: General motion retargeting for humanoid motion tracking,”
-
[12]
Available: https://arxiv.org/abs/2510.02252
[Online]. Available: https://arxiv.org/abs/2510.02252
-
[13]
Skinned motion retargeting with residual perception of motion semantics and geometry,
J. Zhang, J. Weng, D. Kang, F. Zhao, S. Huang, X. Zhe, L. Bao, Y . Shan, J. Wang, and Z. Tu, “Skinned motion retargeting with residual perception of motion semantics and geometry,” 2023. [Online]. Available: https://arxiv.org/abs/2303.08658
arXiv 2023
-
[14]
Pose-aware attention network for flexible motion retargeting by body part,
HuLei, ZhangZihao, ZhongChongyang, JiangBoyuan, and XiaShihong, “Pose-aware attention network for flexible motion retargeting by body part,”IEEE Transactions on Visualization and Computer Graphics, 2024
2024
-
[15]
Unpaired image-to-image translation using cycle-consistent adversarial networks,
J. Y . Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,”arXiv e-prints, 2017
2017
-
[16]
Neural kinematic networks for unsupervised motion retargetting,
R. Villegas, J. Yang, D. Ceylan, and H. Lee, “Neural kinematic networks for unsupervised motion retargetting,” 2018. [Online]. Available: https://arxiv.org/abs/1804.05653
Pith/arXiv arXiv 2018
-
[17]
Pose-to-motion: Cross-domain motion retargeting with pose prior,
Q. Zhao, P. Li, W. Yifan, O. Sorkine-Hornung, and G. Wetzstein, “Pose-to-motion: Cross-domain motion retargeting with pose prior,”
-
[18]
Available: https://arxiv.org/abs/2310.20249
[Online]. Available: https://arxiv.org/abs/2310.20249
-
[19]
Motion retargeting for humanoid robots based on simultaneous morphing parameter identification and motion optimization,
K. Ayusawa and E. Yoshida, “Motion retargeting for humanoid robots based on simultaneous morphing parameter identification and motion optimization,”Robotics, IEEE Trans. on (T-RO), vol. 33, no. 6, p. 15, 2017
2017
-
[20]
Ro- bust real-time whole-body motion retargeting from human to humanoid,
L. Penco, B. Clement, V . Moduano, E. M. Hoffman, and S. Ivaldi, “Ro- bust real-time whole-body motion retargeting from human to humanoid,” IEEE, 2018
2018
-
[21]
Make tracking easy: Neural motion retargeting for humanoid whole-body control,
Q. Zhao, K. Yang, X. Wang, S. Zhao, Y . Lu, X. Zhang, W. Yin, Q. Shen, X.-X. Long, and X. Cao, “Make tracking easy: Neural motion retargeting for humanoid whole-body control,” 2026. [Online]. Available: https://arxiv.org/abs/2603.22201
Pith/arXiv arXiv 2026
-
[22]
G-dream: Graph- conditioned diffusion retargeting across multiple embodiments,
Z. Cao, B. Liu, S. Li, W. Zhang, and H. Chen, “G-dream: Graph- conditioned diffusion retargeting across multiple embodiments,” 2025. [Online]. Available: https://arxiv.org/abs/2505.20857
arXiv 2025
-
[23]
Pmnet: Learning of disentangled pose and movement for unsupervised motion retargeting,
J. Lim, H. Chang, and J. Choi, “Pmnet: Learning of disentangled pose and movement for unsupervised motion retargeting,”British Machine Vision Association, BMVA, 2019
2019
-
[24]
Reconform : Real-time contact-aware motion retargeting for more diverse character morphologies,
T. Cheynel, T. Rossi, B. Bellot-Gurlet, D. Rohmer, and M.-P. Cani, “Reconform : Real-time contact-aware motion retargeting for more diverse character morphologies,” 2025. [Online]. Available: https://arxiv.org/abs/2502.21207
arXiv 2025
-
[25]
Moreflow: Motion retargeting learning through unsupervised flow matching,
W. Kim, T. Li, and S. Ha, “Moreflow: Motion retargeting learning through unsupervised flow matching,” 2025. [Online]. Available: https://arxiv.org/abs/2509.25600
arXiv 2025
-
[26]
Least squares generative adversarial networks,
X. Mao, Q. Li, H. Xie, R. Y . K. Lau, Z. Wang, and S. P. Smolley, “Least squares generative adversarial networks,” 2017. [Online]. Available: https://arxiv.org/abs/1611.04076
Pith/arXiv arXiv 2017
-
[27]
Motion-x: A large-scale 3d expressive whole-body human motion dataset,
J. Lin, A. Zeng, S. Lu, Y . Cai, R. Zhang, H. Wang, and L. Zhang, “Motion-x: A large-scale 3d expressive whole-body human motion dataset,” 2024. [Online]. Available: https://arxiv.org/abs/2307.00818
arXiv 2024
-
[28]
Phuma: Physically-grounded humanoid locomotion dataset,
K. Lee, S. Kim, M. Park, H. Kim, D. Hwang, H. Lee, and J. Choo, “Phuma: Physically-grounded humanoid locomotion dataset,” 2025. [Online]. Available: https://arxiv.org/abs/2510.26236
Pith/arXiv arXiv 2025
-
[29]
Perpetual humanoid control for real-time simulated avatars,
Z. Luo, J. Cao, A. W. Winkler, K. Kitani, and W. Xu, “Perpetual humanoid control for real-time simulated avatars,” inInternational Conference on Computer Vision (ICCV), 2023
2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.