Recognition: unknown
Switch: Learning Agile Skills Switching for Humanoid Robots
Pith reviewed 2026-05-10 11:02 UTC · model grok-4.3
The pith
A motion-similarity graph plus online search lets humanoid robots switch between locomotion skills in real time while staying balanced.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Switch constructs a Skill Graph from kinematic similarity in multi-skill motion capture data, trains a single whole-body tracking policy on that graph with deep reinforcement learning, and runs an online scheduler that performs graph search at every step or deviation to select the next feasible skill. This combination produces high-success-rate transitions between distinct locomotion skills while preserving accurate motion imitation.
What carries the argument
The Skill Graph (SG), which encodes potential cross-skill transitions by measuring kinematic similarity across the motion dataset and supplies the feasible paths that the online scheduler searches in real time.
If this is right
- Humanoid robots can change locomotion skills at any moment rather than being locked into one behavior until a fixed switch point.
- The same learned tracker works for many skills because the graph supplies the transition constraints instead of requiring separate policies per switch.
- Online graph search keeps computation low enough for real-time execution while still finding stable paths.
- Motion imitation quality remains high because the policy is trained to track the original reference motions within the graph.
Where Pith is reading between the lines
- The same graph-plus-scheduler structure could be tested on other robots whose motion data already exists, without retraining the entire policy from scratch.
- If kinematic similarity alone proves insufficient in some edge cases, the graph could be augmented with dynamic or terrain information that the paper does not yet include.
- Real-world deployment would likely require monitoring how often the scheduler falls back to a safe idle skill when no good path exists.
Load-bearing premise
That a graph built only from kinematic similarity in the training motions, searched online, will always produce transitions that the tracking policy can execute stably without extra safety checks or recovery behaviors.
What would settle it
Run the system on a pair of skills whose graph path looks short but causes the robot to fall or lose balance in repeated physical trials, even when the policy was trained on both skills individually.
Figures
read the original abstract
Recent advancements in whole-body control through deep reinforcement learning have enabled humanoid robots to achieve remarkable progress in real-world chal lenging locomotion skills. However, existing approaches often struggle with flexible transitions between distinct skills, cre ating safety concerns and practical limitations. To address this challenge, we introduce a hierarchical multi-skill system, Switch, enabling seamless skill transitions at any moment. Our approach comprises three key components: (1) a Skill Graph (SG) that establishes potential cross-skill transitions based on kinematic similarity within multi-skill motion data, (2) a whole-body tracking policy trained on this skill graph through deep reinforcement learning, and (3) an online skill scheduler to drive the tracking policy for robust skill execution and smooth transitions. For skill switching or significant tracking deviations, the scheduler performs online graph search to find the optimal feasible path, which ensures efficient, stable, and real-time execution of diverse locomotion skills. Comprehensive experiments demonstrate that Switch empowers humanoid to execute agile skill transitions with high success rates while maintaining strong motion imitation performance.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Switch, a hierarchical multi-skill framework for humanoid robots comprising (1) a Skill Graph (SG) whose edges are added based on kinematic similarity extracted from multi-skill motion capture data, (2) a whole-body tracking policy trained via deep reinforcement learning to follow references on this graph, and (3) an online scheduler that, upon a switch request or large tracking error, runs graph search to select an 'optimal feasible path' before commanding the tracking policy. The central claim is that this architecture enables agile, stable, real-time skill transitions at arbitrary moments while preserving strong motion-imitation performance.
Significance. If the empirical claims are substantiated, the work would offer a practical route to versatile humanoid locomotion by decoupling discrete skill planning from continuous control. The explicit separation of a kinematic graph planner from a learned tracker is a clean architectural choice that could be reused in other multi-behavior settings. No machine-checked proofs or parameter-free derivations are present; the contribution is therefore judged on the strength of its experimental validation.
major comments (2)
- [Abstract] Abstract: the statement that 'comprehensive experiments demonstrate ... high success rates' is unsupported by any quantitative numbers, success-rate definitions, baseline comparisons, or ablation studies. Because the central claim of the paper is precisely that Switch 'empowers humanoid to execute agile skill transitions with high success rates,' the absence of these data is load-bearing.
- [Approach (Skill Graph and online scheduler)] Skill Graph construction (approach section): edges are added solely when kinematic poses are close; no forward-dynamics roll-out, contact-wrench-cone check, or momentum-continuity test is described. The online scheduler then treats any path found by graph search as 'feasible.' For humanoids this is problematic because kinematic proximity does not imply dynamic compatibility (different foot-placement timing or CoM velocity can produce unstable contacts even when end poses match). The DRL tracker is trained only to imitate the reference and therefore cannot be assumed to recover from such violations.
minor comments (2)
- [Abstract] Abstract contains typographical spacing errors: 'chal lenging' and 'cre ating'.
- [Approach] The manuscript never states the concrete humanoid platform, its degrees of freedom, or the motion-capture dataset size used to build the Skill Graph.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback on our manuscript. We appreciate the recognition of the architectural separation between the kinematic graph planner and the learned tracker. We will revise the paper to strengthen the abstract with quantitative results and to clarify the design choices and limitations of the skill graph construction, as detailed below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the statement that 'comprehensive experiments demonstrate ... high success rates' is unsupported by any quantitative numbers, success-rate definitions, baseline comparisons, or ablation studies. Because the central claim of the paper is precisely that Switch 'empowers humanoid to execute agile skill transitions with high success rates,' the absence of these data is load-bearing.
Authors: We agree that the abstract would be strengthened by including quantitative support for the central claim. The experimental section of the manuscript already contains success-rate metrics (defined as the percentage of trials where the robot completes the target skill without falling or violating joint limits), baseline comparisons against single-skill policies and naive switching methods, and ablation studies on graph density and scheduler frequency. In the revised version we will condense these results into the abstract (e.g., reporting overall transition success rates above 90% across locomotion skills with specific numbers and definitions) while preserving conciseness. revision: yes
-
Referee: [Approach (Skill Graph and online scheduler)] Skill Graph construction (approach section): edges are added solely when kinematic poses are close; no forward-dynamics roll-out, contact-wrench-cone check, or momentum-continuity test is described. The online scheduler then treats any path found by graph search as 'feasible.' For humanoids this is problematic because kinematic proximity does not imply dynamic compatibility (different foot-placement timing or CoM velocity can produce unstable contacts even when end poses match). The DRL tracker is trained only to imitate the reference and therefore cannot be assumed to recover from such violations.
Authors: The referee correctly notes that the Skill Graph is built exclusively from kinematic similarity of motion-capture poses and that no explicit dynamic feasibility checks (forward roll-outs, wrench cones, or momentum continuity) are performed at construction time. Our design choice relies on the DRL tracking policy, which is trained end-to-end on the full graph to minimize tracking error and recover from small deviations; the online scheduler then selects the shortest path whose reference the policy can follow with low error in real time. Empirical results in the paper show that this combination yields stable transitions, but we acknowledge that kinematic proximity alone does not guarantee dynamic compatibility in all cases. In the revision we will add an explicit discussion paragraph in the Approach section explaining this reliance on the learned tracker, citing the observed robustness in hardware experiments, and noting the absence of formal dynamic verification as a limitation and direction for future work. revision: partial
Circularity Check
No circularity: empirical DRL training and graph construction are independent of claimed outcomes
full rationale
The paper presents a hierarchical system whose core components (kinematic-similarity Skill Graph, DRL whole-body tracking policy, and online graph-search scheduler) are constructed and trained from motion data and reinforcement learning objectives. No equations, fitted parameters, or self-citations are shown to reduce the reported success rates or stability claims back to the inputs by definition. Experimental validation is described as the source of performance evidence rather than any analytic identity or self-referential prediction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Kinematic similarity between skills in motion data is sufficient to define feasible transition edges in the skill graph.
Reference graph
Works this paper leans on
-
[1]
Beyondmimic: From mo- tion tracking to versatile humanoid control via guided diffusion,
T. E. Truong, Q. Liao, X. Huang, G. Tevet, C. K. Liu, and K. Sreenath, “Beyondmimic: From motion tracking to ver- satile humanoid control via guided diffusion,”arXiv preprint arXiv:2508.08241, 2025
-
[2]
Available: https://arxiv.org/abs/2502.01143
T. He et al., “Asap: Aligning simulation and real-world physics for learning agile humanoid whole-body skills,”arXiv preprint arXiv:2502.01143, 2025
-
[3]
Kungfubot: Physics-based humanoid whole-body control for learning highly-dynamic skills,
W. Xie et al., “Kungfubot: Physics-based humanoid whole- body control for learning highly-dynamic skills,”arXiv preprint arXiv:2506.12851, 2025
-
[4]
Expressive whole-body control for humanoid robots,
X. Cheng, Y . Ji, J. Chen, R. Yang, G. Yang, and X. Wang, “Ex- pressive whole-body control for humanoid robots,”arXiv preprint arXiv:2402.16796, 2024
-
[5]
ExBody2: Advanced expressive humanoid whole-body control,
M. Ji et al., “Exbody2: Advanced expressive humanoid whole-body control,”arXiv preprint arXiv:2412.13196, 2024
-
[6]
Gmt: General motion tracking for humanoid whole-body control,
Z. Chen, M. Ji, X. Cheng, X. Peng, X. B. Peng, and X. Wang, “Gmt: General motion tracking for humanoid whole-body control,”arXiv preprint arXiv:2506.14770, 2025
-
[7]
Unitracker: Learning universal whole-body motion tracker for humanoid robots,
K. Yin et al., “Unitracker: Learning universal whole-body motion tracker for humanoid robots,”arXiv preprint arXiv:2507.07356, 2025
-
[8]
Omnih2o: Universal and dexterous human-to-humanoid whole-body teleoperation and learning,
T. He et al., “Omnih2o: Universal and dexterous human-to- humanoid whole-body teleoperation and learning,”arXiv preprint arXiv:2406.08858, 2024
-
[9]
Clone: Closed-loop whole-body humanoid teleoperation for long-horizon tasks,
Y . Li et al., “Clone: Closed-loop whole-body humanoid teleoperation for long-horizon tasks,”arXiv preprint arXiv:2506.08931, 2025
-
[10]
Learning human-to-humanoid real-time whole-body teleoperation. in 2024 ieee,
T. He et al., “Learning human-to-humanoid real-time whole-body teleoperation. in 2024 ieee,” inRSJ International Conference on Intelligent Robots and Systems (IROS), pp. 8944–8951
2024
-
[11]
Open-television: Teleoperation with immersive active visual feedback,
X. Cheng, J. Li, S. Yang, G. Yang, and X. Wang, “Open-television: Teleoperation with immersive active visual feedback,”arXiv preprint arXiv:2407.01512, 2024
-
[12]
Humanplus: Humanoid shadowing and imitation from humans,
Z. Fu, Q. Zhao, Q. Wu, G. Wetzstein, and C. Finn, “Humanplus: Humanoid shadowing and imitation from humans,”arXiv preprint arXiv:2406.10454, 2024
-
[13]
Mobile-television: Predictive motion priors for hu- manoid whole-body control,
C. Lu et al., “Mobile-television: Predictive motion priors for hu- manoid whole-body control,” in2025 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 2025, pp. 5364–5371
2025
-
[14]
Deepmimic: Example-guided deep reinforcement learning of physics-based char- acter skills,
X. B. Peng, P. Abbeel, S. Levine, and M. Van de Panne, “Deepmimic: Example-guided deep reinforcement learning of physics-based char- acter skills,”ACM Transactions On Graphics (TOG), vol. 37, no. 4, pp. 1–14, 2018
2018
-
[15]
Y . Wang, J. Lin, A. Zeng, Z. Luo, J. Zhang, and L. Zhang, “Physhoi: Physics-based imitation of dynamic human-object interaction,”arXiv preprint arXiv:2312.04393, 2023
-
[16]
Parc: Physics-based augmentation with reinforcement learning for character controllers,
M. Xu, Y . Shi, K. Yin, and X. B. Peng, “Parc: Physics-based augmentation with reinforcement learning for character controllers,” inProceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers, 2025, pp. 1–11
2025
-
[17]
Skillmimic: Learning basketball interaction skills from demonstrations,
Y . Wang et al., “Skillmimic: Learning basketball interaction skills from demonstrations,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 17 540–17 549
2025
-
[18]
Perpetual humanoid control for real-time simulated avatars,
Z. Luo, J. Cao, K. Kitani, W. Xu, et al., “Perpetual humanoid control for real-time simulated avatars,” inProceedings of the IEEE/CVF In- ternational Conference on Computer Vision, 2023, pp. 10 895–10 904
2023
-
[19]
Universal humanoid motion representations for physics-based control,
Z. Luo et al., “Universal humanoid motion representations for physics-based control,” inThe Twelfth International Conference on Learning Representations, 2024
2024
-
[20]
Intermimic: To- wards universal whole-body control for physics-based human-object interactions,
S. Xu, H. Y . Ling, Y .-X. Wang, and L.-Y . Gui, “Intermimic: To- wards universal whole-body control for physics-based human-object interactions,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 12 266–12 277
2025
-
[21]
Masked- mimic: Unified physics-based character control through masked mo- tion inpainting,
C. Tessler, Y . Guo, O. Nabati, G. Chechik, and X. B. Peng, “Masked- mimic: Unified physics-based character control through masked mo- tion inpainting,”ACM Transactions on Graphics (TOG), vol. 43, no. 6, pp. 1–21, 2024
2024
-
[22]
Maskedmanipulator: Versatile whole-body control for loco- manipulation,
C. Tessler, Y . Jiang, E. Coumans, Z. Luo, G. Chechik, and X. B. Peng, “Maskedmanipulator: Versatile whole-body control for loco- manipulation,”arXiv preprint arXiv:2505.19086, 2025
-
[23]
TWIST: Teleoperated whole-body imitation system
Y . Ze et al., “Twist: Teleoperated whole-body imitation system,”arXiv preprint arXiv:2505.02833, 2025
-
[24]
Langwbc: Language-directed humanoid whole-body control via end-to-end learning,
Y . Shao et al., “Langwbc: Language-directed humanoid whole-body control via end-to-end learning,”arXiv preprint arXiv:2504.21738, 2025
-
[25]
Zihan Wang, Jiashun Wang, Jeff Tan, Yiwen Zhao, Jessica Hodgins, Shubham Tulsiani, and Deva Ramanan
Y . Wang et al., “From experts to a generalist: Toward general whole- body control for humanoid robots,”arXiv preprint arXiv:2506.12779, 2025
-
[26]
LeVERB: Humanoid whole-body control with latent vision-language instruction,
H. Xue et al., “Leverb: Humanoid whole-body control with latent vision-language instruction,”arXiv preprint arXiv:2506.13751, 2025
-
[27]
Humanx: Toward agile and generalizable hu- manoid interaction skills from human videos,
Y . Wang et al., “Humanx: Toward agile and generalizable hu- manoid interaction skills from human videos,”arXiv preprint arXiv:2602.02473, 2026
-
[28]
Hold my beer: Learning gentle humanoid locomotion and end-effector stabilization control,
Y . Li et al., “Hold my beer: Learning gentle humanoid locomotion and end-effector stabilization control,” inRSS 2025 Workshop on Whole-body Control and Bimanual Manipulation: Applications in Humanoids and Beyond
2025
-
[29]
Motion graphs,
L. Kovar, M. Gleicher, and F. Pighin, “Motion graphs,”ACM Trans- actions on Graphics (TOG), vol. 21, no. 3, pp. 473–482, 2002
2002
-
[30]
Parametric motion graphs,
R. Heck and M. Gleicher, “Parametric motion graphs,” inProceedings of the 2007 Symposium on Interactive 3D Graphics and Games (I3D), ACM, 2007, pp. 129–136
2007
-
[31]
Achieving good connectivity in motion graphs,
L. Zhao and A. Safonova, “Achieving good connectivity in motion graphs,” inEurographics/SIGGRAPH Symposium on Computer Ani- mation (SCA), Eurographics Association, 2008, pp. 127–136
2008
-
[32]
Motion- motif graphs,
P. Beaudoin, S. Coros, M. van de Panne, and P. Poulin, “Motion- motif graphs,” inEurographics/SIGGRAPH Symposium on Computer Animation (SCA), Eurographics Association, 2008, pp. 117–126
2008
-
[33]
Evaluating motion graphs for character animation,
P. S. A. Reitsma and N. S. Pollard, “Evaluating motion graphs for character animation,”ACM Transactions on Graphics (TOG), vol. 26, no. 4, 18:1–18:es, 2007
2007
-
[34]
Automated extraction and parameteriza- tion of motions in large data sets,
L. Kovar and M. Gleicher, “Automated extraction and parameteriza- tion of motions in large data sets,”ACM Transactions on Graphics (TOG), vol. 23, no. 3, pp. 559–568, 2004
2004
-
[35]
Parametric motion graphs,
R. Heck and M. Gleicher, “Parametric motion graphs,” inProceedings of the 2007 symposium on Interactive 3D graphics and games, 2007, pp. 129–136
2007
-
[36]
Dynamical movement primitives: Learning attractor models for motor behaviors,
A. J. Ijspeert, J. Nakanishi, H. Hoffmann, P. Pastor, and S. Schaal, “Dynamical movement primitives: Learning attractor models for motor behaviors,”Neural computation, vol. 25, no. 2, pp. 328–373, 2013
2013
-
[37]
Learning stable nonlinear dy- namical systems with gaussian mixture models,
S. M. Khansari-Zadeh and A. Billard, “Learning stable nonlinear dy- namical systems with gaussian mixture models,”IEEE Transactions on Robotics, vol. 27, no. 5, pp. 943–957, 2011
2011
-
[38]
Rmp flow: A computational graph for automatic motion policy generation,
C.-A. Cheng et al., “Rmp flow: A computational graph for automatic motion policy generation,” inInternational Workshop on the Algo- rithmic Foundations of Robotics, Springer, 2018, pp. 441–457
2018
-
[39]
Between mdps and semi- mdps: A framework for temporal abstraction in reinforcement learn- ing,
R. S. Sutton, D. Precup, and S. Singh, “Between mdps and semi- mdps: A framework for temporal abstraction in reinforcement learn- ing,”Artificial intelligence, vol. 112, no. 1-2, pp. 181–211, 1999
1999
-
[40]
Skill discovery in continuous reinforce- ment learning domains using skill chaining,
G. Konidaris and A. Barto, “Skill discovery in continuous reinforce- ment learning domains using skill chaining,”Advances in neural information processing systems, vol. 22, 2009
2009
-
[41]
Skill discovery for exploration and planning using deep skill graphs,
A. Bagaria, J. K. Senthil, and G. Konidaris, “Skill discovery for exploration and planning using deep skill graphs,” inInternational conference on machine learning, PMLR, 2021, pp. 521–531
2021
-
[42]
Prm-rl: Long-range robotic navigation tasks by combining reinforcement learning and sampling-based planning,
A. Faust et al., “Prm-rl: Long-range robotic navigation tasks by combining reinforcement learning and sampling-based planning,” in 2018 IEEE international conference on robotics and automation (ICRA), IEEE, 2018, pp. 5113–5120
2018
-
[43]
Reference governor for constrained nonlinear sys- tems,
A. Bemporad, “Reference governor for constrained nonlinear sys- tems,”IEEE Transactions on Automatic Control, vol. 43, no. 3, pp. 415–419, 2002
2002
-
[44]
A simple tube controller for efficient robust model predictive control of constrained linear discrete time systems subject to bounded disturbances,
S. Rakovi ´c and D. Mayne, “A simple tube controller for efficient robust model predictive control of constrained linear discrete time systems subject to bounded disturbances,”IFAC proceedings volumes, vol. 38, no. 1, pp. 241–246, 2005
2005
-
[45]
Aubin, A
J.-P. Aubin, A. M. Bayen, and P. Saint-Pierre,Viability theory: new directions. Springer Science & Business Media, 2011
2011
-
[46]
Learning agile soccer skills for a bipedal robot with deep reinforcement learning,
T. Haarnoja et al., “Learning agile soccer skills for a bipedal robot with deep reinforcement learning,”Science Robotics, vol. 9, no. 89, eadi8022, 2024
2024
-
[47]
Hover: Versatile neural whole-body controller for hu- manoid robots,
T. He et al., “Hover: Versatile neural whole-body controller for hu- manoid robots,” in2025 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 2025, pp. 9989–9996
2025
-
[48]
Ase: Large- scale reusable adversarial skill embeddings for physically simulated characters,
X. B. Peng, Y . Guo, L. Halper, S. Levine, and S. Fidler, “Ase: Large- scale reusable adversarial skill embeddings for physically simulated characters,”ACM Transactions On Graphics (TOG), vol. 41, no. 4, pp. 1–17, 2022
2022
-
[49]
Skillmimic-v2: Learning robust and generalizable inter- action skills from sparse and noisy demonstrations,
R. Yu et al., “Skillmimic-v2: Learning robust and generalizable inter- action skills from sparse and noisy demonstrations,” inProceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers, 2025, pp. 1–11
2025
-
[50]
Mujoco: A physics engine for model-based control,
E. Todorov, T. Erez, and Y . Tassa, “Mujoco: A physics engine for model-based control,” in2012 IEEE/RSJ international conference on intelligent robots and systems, IEEE, 2012, pp. 5026–5033
2012
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.