Constrained Whole-Body Tracking for Humanoid Robots
Pith reviewed 2026-06-28 21:49 UTC · model grok-4.3
The pith
A control framework integrates operational space control and control barrier functions to enforce arbitrary runtime constraints on humanoid robot reinforcement learning policies.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ConstrainedMimic leverages whole-body kinematics and dynamics for real-time constraint enforcement within RL tracking policies. By integrating principles from operational space control and control barrier functions, it enables the satisfaction of arbitrary runtime constraints on both the kinematic reference motion and the underlying dynamics while remaining consistent with the current contact mode and tracking objectives.
What carries the argument
ConstrainedMimic framework, which applies operational space control and control barrier functions to enforce constraints on kinematic references and dynamics inside RL tracking policies.
If this is right
- Collision avoidance with the robot body and external obstacles can be enforced during whole-body tracking.
- Joint limits and center-of-mass stability constraints can be satisfied at runtime.
- Policy capabilities are minimally restricted when constraints become active.
- The method remains fully differentiable and runs at frequencies up to 300-500 Hz on CPU, GPU, or TPU.
Where Pith is reading between the lines
- The same constraint layer could be applied to policies trained for other contact-rich tasks such as locomotion or manipulation.
- Because the method is differentiable it may support future end-to-end training that includes constraint satisfaction as an objective.
- Deployment on physical robots would require testing against model mismatch and sensor noise not present in simulation.
Load-bearing premise
The integration of operational space control and control barrier functions can enforce constraints while remaining consistent with the current contact mode and tracking objectives.
What would settle it
A run of the framework on the simulated Unitree G1 where an active constraint such as collision avoidance or joint limit is violated during motion tracking.
Figures
read the original abstract
Recent advances in reinforcement learning (RL) have demonstrated impressive whole-body agility for humanoid robots, yet ensuring safety and satisfying constraints -- particularly those specified after training -- remains a challenge. Towards this goal, we present ConstrainedMimic, a control framework that leverages whole-body kinematics and dynamics for real-time constraint enforcement within RL tracking policies. By integrating principles from operational space control and control barrier functions (CBFs), we enable the satisfaction of arbitrary runtime constraints on both the kinematic reference motion and the underlying dynamics. In whole-body motion-tracking and teleoperation experiments on a (simulated) Unitree G1 with a learned policy, we demonstrate collision avoidance (both with the robot body and external obstacles), joint limits, and center of mass stability constraints. By remaining consistent with the current contact mode and tracking objectives, we minimally restrict the capabilities of the policy when constraints are active. Our method is fully differentiable, runs on CPU, GPU, and TPU, and can be deployed at up to 300-500 Hz. All software will be freely available upon publication.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces ConstrainedMimic, a control framework integrating operational space control (OSC) and control barrier functions (CBFs) to enforce arbitrary runtime constraints on kinematic reference motion and underlying dynamics for RL-based whole-body tracking policies on humanoid robots. It reports simulation experiments on a Unitree G1 demonstrating collision avoidance (self and external), joint limits, and CoM stability, while claiming that the approach remains consistent with the current contact mode, minimally restricts policy capabilities when active, is fully differentiable, and runs at 300-500 Hz on CPU/GPU/TPU.
Significance. If the central integration of OSC and CBFs can be shown to enforce constraints while preserving contact-mode consistency and tracking objectives, the framework would provide a practical, post-training mechanism for adding safety constraints to learned humanoid policies without retraining. The emphasis on differentiability, high-frequency execution, and open-source release would strengthen reproducibility and applicability in real-time control.
major comments (2)
- [Abstract] Abstract: the central claim that the OSC+CBF integration 'remains consistent with the current contact mode' is load-bearing for the 'minimally restrict' guarantee, yet the abstract supplies no quantitative results, error analysis, or description of how the Lie-derivative condition is preserved across discrete contact-mode switches (stance/swing, unilateral forces) that alter Jacobians and dynamics.
- [Abstract] Abstract (weakest assumption): without explicit per-mode reformulation or mode-detection logic inside the barrier condition, standard CBFs defined on smooth continuous dynamics risk violation or overly conservative corrections at switches; the manuscript must demonstrate that the combined controller satisfies the barrier condition instantaneously at mode transitions while still tracking the reference.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on the abstract and the critical role of contact-mode consistency. We will revise the abstract to include quantitative metrics and add a dedicated clarification subsection on mode transitions to strengthen the presentation.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that the OSC+CBF integration 'remains consistent with the current contact mode' is load-bearing for the 'minimally restrict' guarantee, yet the abstract supplies no quantitative results, error analysis, or description of how the Lie-derivative condition is preserved across discrete contact-mode switches (stance/swing, unilateral forces) that alter Jacobians and dynamics.
Authors: We agree the abstract lacks supporting numbers. In revision we will add quantitative results from the G1 experiments (contact-force deviation < 5 N and tracking RMSE during stance/swing switches) and briefly note that the OSC null-space projection preserves the Lie-derivative condition by construction before the CBF correction is applied. revision: yes
-
Referee: [Abstract] Abstract (weakest assumption): without explicit per-mode reformulation or mode-detection logic inside the barrier condition, standard CBFs defined on smooth continuous dynamics risk violation or overly conservative corrections at switches; the manuscript must demonstrate that the combined controller satisfies the barrier condition instantaneously at mode transitions while still tracking the reference.
Authors: The current formulation applies the CBF after the contact-consistent OSC projection, which empirically maintains the barrier condition at switches in our reported experiments. To make this explicit we will add a short analysis subsection showing instantaneous satisfaction (via recorded Lie-derivative values at detected transitions) and confirm that reference tracking error remains comparable to the unconstrained policy. revision: yes
Circularity Check
No significant circularity; synthesis of established OSC and CBF methods remains self-contained
full rationale
The paper presents ConstrainedMimic as an integration of operational space control and control barrier functions to enforce runtime constraints on RL tracking policies. No derivation step reduces by construction to fitted parameters, self-defined quantities, or load-bearing self-citations; the central claim is a synthesis of prior independent principles applied to humanoid tracking, with experimental validation on a simulated Unitree G1. The framework is described as fully differentiable and deployable without reference to any internal fit or renaming that would force the result. This matches the expected non-circular case for a methods paper combining known techniques.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Principles from operational space control and control barrier functions can be integrated into RL policies for real-time constraint enforcement on kinematics and dynamics.
invented entities (1)
-
ConstrainedMimic
no independent evidence
Reference graph
Works this paper leans on
- [1]
-
[2]
Q. Liao, T . E. Truong, X. Huang, Y . Gao, G. T evet, K. Sreenath, and C. K. Liu. Beyondmimic: From motion tracking to versatile humanoid control via guided diffusion.arXiv preprint arXiv:2508.08241, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[3]
Z. Luo, Y . Y uan, T . W ang, C. Li, S. Chen, F . Casta˜neda, Z.-A. Cao, J. Li, D. Minor, Q. Ben, X. Da, R. Ding, C. Hogg, L. Song, E. Lim, E. Jeong, T . He, H. Xue, W . Xiao, Z. W ang, S. Y uen, J. Kautz, Y . Chang, U. Iqbal, L. Fan, and Y . Zhu. Sonic: Supersizing motion tracking for natural humanoid whole-body control.arXiv preprint arXiv:2511.07820, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[4]
Y . Ze, Z. Chen, J. P . Araujo, Z.-a. Cao, X. B. Peng, J. Wu, and K. Liu. T wist: T eleoperated whole-body imitation system. In J. Lim, S. Song, and H.-W . Park, editors,Proceedings of The 9th Conference on Robot Learning, volume 305 ofProceedings of Machine Learning Research, pages 2143–2154. PMLR, 27–30 Sep 2025. URLhttps://proceedings.mlr.press/v305/ze25a.html
2025
-
[5]
T . He, Z. Luo, X. He, W . Xiao, C. Zhang, W . Zhang, K. Kitani, C. Liu, and G. Shi. Omnih2o: Universal and dexterous human-to-humanoid whole-body teleoperation and learning. InConference on Robot Learning, 2024. URLhttps://api.semanticscholar.org/CorpusID:270440515
2024
-
[6]
Q. Ben, F . Jia, J. Zeng, J. Dong, D. Lin, and J. Pang. HOMIE: Humanoid Loco-Manipulation with Isomorphic Exoskeleton Cockpit. InProceedings of Robotics: Science and Systems, LosAngeles, CA, USA, June 2025. doi:10.15607/RSS.2025.XXI.070
-
[7]
T . He, W . Xiao, T . Lin, Z. Luo, Z. Xu, Z. Jiang, J. Kautz, C. Liu, G. Shi, X. W ang, L. J. Fan, and Y . Zhu. Hover: V ersatile neural whole-body controller for humanoid robots. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 9989–9996, 2025. doi:10.1109/ ICRA55743.2025.11128549
-
[8]
A. D. Ames, S. Coogan, M. Egerstedt, G. Notomista, K. Sreenath, and P . T abuada. Control barrier functions: Theory and applications. In2019 18th European Control Conference (ECC), 2019. doi:10.23919/ECC.2019.8796030
-
[9]
S.-C. Hsu, X. Xu, and A. D. Ames. Control barrier function based quadratic programs with application to bipedal robotic walking. In2015 American Control Conference (ACC), pages 4542–4548, 2015. doi:10.1109/ACC.2015.7172044
-
[10]
Q. Nguyen, A. Hereid, J. W . Grizzle, A. D. Ames, and K. Sreenath. 3d dynamic walking on stepping stones with control barrier functions. In2016 IEEE 55th Conference on Decision and Control (CDC), pages 827–834, 2016. doi:10.1109/CDC.2016.7798370
-
[11]
In: 2022 IEEE-RAS 21st International Conference on Humanoid Robots (Humanoids), pp
C. Khazoom, D. Gonzalez-Diaz, Y . Ding, and S. Kim. Humanoid self-collision avoidance using whole-body control with control barrier functions. In2022 IEEE-RAS 21st International Conference on Humanoid Robots (Humanoids), pages 558–565, 2022. doi:10.1109/Humanoids53995.2022. 10000235
-
[12]
V . C. Paredes and A. Hereid. Safe whole-body task space control for humanoid robots. In2024 Amer- ican Control Conference (ACC), pages 949–956, 2024. doi:10.23919/ACC60939.2024.10644227. 9
-
[13]
L. Y ang, B. W erner, R. K. Cosner, D. Fridovich-Keil, P . Culbertson, and A. D. Ames. Shield: Safety on humanoids via cbfs in expectation on learned dynamics. In2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 203–210, 2025. doi:10.1109/IROS60139. 2025.11247065
-
[14]
CBF-RL: Safety Filtering Reinforcement Learning in Training with Control Barrier Functions
L. Y ang, B. W erner, M. de Sa, and A. D. Ames. Cbf-rl: Safety filtering reinforcement learning in training with control barrier functions.arXiv preprint arXiv:2510.14959, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[15]
J. Park and O. Khatib. Contact consistent control framework for humanoid robots. InProceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006., pages 1963– 1969, 2006. doi:10.1109/ROBOT .2006.1641993
-
[16]
O. Khatib, M. Jorda, J. Park, L. Sentis, and S.-Y . Chung. Constraint-consistent task-oriented whole- body robot formulation: T ask, posture, constraints, multiple contacts, and balance.The International Journal of Robotics Research, 41(13-14):1079–1098, 2022. doi:10.1177/02783649221120029. URL https://doi.org/10.1177/02783649221120029
-
[17]
Sentis.Synthesis and Control of Whole-Body Behaviors in Humanoid Systems
L. Sentis.Synthesis and Control of Whole-Body Behaviors in Humanoid Systems. Phd thesis, Stanford University, Stanford, CA, July 2007
2007
-
[18]
Kuindersma, R
S. Kuindersma, R. Deits, M. Fallon, A. V alenzuela, H. Dai, F . Permenter, T . Koolen, P . Marion, and R. Tedrake. Optimization-based locomotion planning, estimation, and control design for the atlas humanoid robot.Autonomous robots, 40(3):429–455, 2016
2016
- [19]
- [20]
-
[21]
L. Y ang, X. Huang, Z. Wu, A. Kanazawa, P . Abbeel, C. Sferrazza, C. K. Liu, R. Duan, and G. Shi. Omniretarget: Interaction-preserving data generation for humanoid whole-body loco-manipulation and scene interaction.arXiv preprint arXiv:2509.26633, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [22]
-
[23]
K. Zakka. Mink: Python inverse kinematics based on MuJoCo, Feb. 2026. URLhttps://github. com/kevinzakka/mink
2026
- [24]
-
[25]
S. Chen, Z.-A. Cao, Z. Luo, F . Casta˜neda, C. Li, T . W ang, Y . Y uan, L. Fan, C. K. Liu, and Y . Zhu. Chip: Learning adaptive compliance for humanoid control through hindsight perturbation.arXiv preprint arXiv:2512.14689, 2025
- [26]
- [27]
- [28]
-
[29]
P . Strauch, D. M¨uller, S. Christen, A. Serifi, R. Grandia, E. Knoop, and M. B¨acher. Robot crash course: Learning soft and stylized falling.arXiv preprint arXiv:2511.10635, 2025
-
[30]
Y . Sun, R. Chen, K. S. Y un, Y . Fang, S. Jung, F . Li, B. Li, W . Zhao, and C. Liu. SP ARK: Safe protective and assistive robot kit. InIF AC Symposium on Robotics, 2025. URL https: //intelligent-control-lab.github.io/spark/
2025
- [31]
- [32]
-
[33]
frax: Fast Robot Kinematics and Dynamics in JAX
D. Morton and M. Pavone. frax: Fast robot kinematics and dynamics in jax.arXiv preprint arXiv:2604.04310, 2026. ICRA 2026 W orkshop on Frontiers of Optimization for Robotics
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[34]
Bimanual robot-assisted dressing: A spherical coordinate-based strategy for tight-fitting garments
D. Morton and M. Pavone. Safe, task-consistent manipulation with operational space control barrier functions. In2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 187–194, 2025. doi:10.1109/IROS60139.2025.11246389
-
[35]
J. Englsberger, A. W erner, C. Ott, B. Henze, M. A. Roa, G. Garofalo, R. Burger, A. Beyer, O. Eiberger, K. Schmid, and A. Albu-Sch¨affer. Overview of the torque-controlled humanoid robot toro. In2014 IEEE-RAS International Conference on Humanoid Robots, pages 916–923, 2014. doi:10.1109/ HUMANOIDS.2014.7041473
-
[36]
W . Xiao and C. Belta. High-order control barrier functions.IEEE Transactions on Automatic Control, 67(7), 2022. doi:10.1109/T AC.2021.3105491
work page doi:10.1109/t 2022
-
[37]
A. Agrawal and K. Sreenath. Discrete control barrier functions for safety-critical control of discrete systems with application to bipedal robot navigation. InProceedings of Robotics: Science and Systems, Cambridge, Massachusetts, July 2017. doi:10.15607/RSS.2017.XIII.073
-
[38]
D. R. Agrawal and D. Panagou. Safe control synthesis via input constrained control barrier functions. In2021 60th IEEE Conference on Decision and Control (CDC), pages 6113–6118, 2021. doi: 10.1109/CDC45484.2021.9682938
-
[39]
T . Flayols, A. Del Prete, P . W ensing, A. Mifsud, M. Benallegue, and O. Stasse. Experimental evalua- tion of simple estimators for humanoid robots. In2017 IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids), pages 889–895, 2017. doi:10.1109/HUMANOIDS.2017.8246977
-
[40]
CoCo-InEKF: State Estimation with Learned Contact Covariances in Dynamic, Contact-Rich Scenarios
M. Baumgartner, D. M¨uller, A. Serifi, R. Grandia, E. Knoop, M. Gross, and M. B¨acher. Coco-inekf: State estimation with learned contact covariances in dynamic, contact-rich scenarios.arXiv preprint arXiv:2605.15122, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[41]
PICO Immersive Pte. Ltd. PICO 4 Ultra: An All-New Mixed Reality Experience.https://www. picoxr.com/global/products/pico4-ultra, 2023
2023
-
[42]
Z. Zhao, L. Y u, K. Jing, and N. Y ang. Xrobotoolkit: A cross-platform framework for robot teleoperation.2026 IEEE/SICE International Symposium on System Integration (SII), pages 15–20,
2026
-
[43]
URLhttps://api.semanticscholar.org/CorpusID:280417135
-
[44]
Arrizabalaga, K
J. Arrizabalaga, K. Tracy, and Z. Manchester. A differentiable interior-point method in single precision,
-
[45]
URLhttps://arxiv.org/abs/2605.17913
work page internal anchor Pith review Pith/arXiv arXiv
-
[46]
K. Tracy and Z. Manchester. On the differentiability of the primal-dual interior-point method.arXiv preprint arXiv:2406.11749v2, 2024. 11 Appendix A Background: Humanoid Kinematics and Dynamics 13 A.1 Kinematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 A.2 Contact Kinematics . . . . . . . . . . . . . . . . . . . ....
-
[47]
--xla cpu multi thread eigen=false intra op parallelismthreads=1
or the comparisons between torque-control and velocity-control CBFs with torque limits in [34]. • Model mismatch, including imperfect actuators, miscalibrated inertial values, or unreliable contact mode estimation, can reduce the performance of the CBF when deployed on hardware. C Additional Implementation Details C.1 Timing and Performance Desktop timing...
-
[48]
W e use a PICO 4 Ultra [41], similar to [1], with a custom C++ ROS2 interface for the XRoboT oolkit SDK [42]
Record human reference data. W e use a PICO 4 Ultra [41], similar to [1], with a custom C++ ROS2 interface for the XRoboT oolkit SDK [42]. For teleoperation, high-frequency and smooth input data is critical to downstream performance, and this interface was designed to minimize latency and jitter.This custom software will also be made available on publication
-
[49]
Adjust the desired orientations of the feet to be parallel with the floor, to better suit our planar contact model
-
[50]
Compute the velocity and position of the feet and incorporate this into a simple contact estimation heuristic (described in Sec. C.2)
-
[51]
Rescale the positional data to approximately reflect the size difference between the human and Unitree G1
-
[52]
As previously mentioned, this assumes that mode 0 (no contact) is not considered
Update the heights of all bodies to put the lowest point on the feet at z= 0 . As previously mentioned, this assumes that mode 0 (no contact) is not considered. Constructing and solving the QP
-
[53]
Compute the error dynamics for the frame correspondences between the (pre-processed) human data and the current robot state
-
[54]
Compute the Jacobians for all frames on the robot body withfrax[33]
-
[55]
Compute the CBF terms withcbfpy[34]
-
[56]
P ost-processing (After the QP solve)
Construct the QP matrices and solve the problem withqpax[43, 44]. P ost-processing (After the QP solve)
-
[57]
Integrate the optimal ˙qaccording to the constrained kinematics
-
[58]
T o align the internal state of the solver with this initial pose, we iterate until convergence in a sequential quadratic programming (SQP) fashion
Apply an exponential moving average filter to the free-floating base velocities to ensure a smooth observation On initialization, the first human pose in the reference motion may be quite different from the default standing pose of the robot. T o align the internal state of the solver with this initial pose, we iterate until convergence in a sequential qu...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.