Recognition: unknown
Learning Reactive Dexterous Grasping via Hierarchical Task-Space RL Planning and Joint-Space QP Control
Pith reviewed 2026-05-07 15:52 UTC · model grok-4.3
The pith
A hybrid multi-agent RL planner and QP controller decouples high-level task intent from low-level joint execution to enable reactive dexterous grasping with zero-shot steerability.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By training separate arm and hand reinforcement learning agents to produce task-space velocity commands and then feeding those commands into a GPU-parallelized quadratic programming controller that enforces kinematic limits and collision constraints, the framework achieves both faster policy learning and the ability to adjust safety margins or avoid dynamic obstacles at runtime without retraining.
What carries the argument
Multi-agent RL high-level planner that outputs task-space velocities, processed by a GPU-parallelized quadratic programming low-level controller that maps them to feasible joint velocities while enforcing safety constraints.
If this is right
- Training convergence accelerates because the RL agents only learn task-space behavior while the QP layer handles all joint-level constraints.
- Hardware safety is strictly enforced at every time step regardless of what the RL policy outputs.
- System operators can change collision avoidance margins or add new obstacles dynamically without retraining.
- The same policy transfers zero-shot to real hardware and recovers from unexpected physical disturbances on diverse unseen objects.
- The architecture isolates high-level spatial intent from low-level execution, allowing independent development or tuning of each layer.
Where Pith is reading between the lines
- The same separation of task-space planning from joint-space enforcement could apply to other contact-rich manipulation skills such as in-hand reorientation or tool use.
- Adding online perception to the high-level planner might allow the system to react to moving targets or changing object properties without altering the QP layer.
- The explicit safety layer may reduce the amount of reward shaping needed during RL training compared with end-to-end joint-space policies.
Load-bearing premise
The simulation environment matches real-world contact dynamics closely enough that the learned velocity commands remain feasible for the QP solver in real time.
What would settle it
A real-world trial in which the QP solver returns infeasible solutions or the robot collides or drops objects under the same velocity commands that worked in simulation.
Figures
read the original abstract
In this work, we propose a hybrid hierarchical control framework for reactive dexterous grasping that explicitly decouples high-level spatial intent from low-level joint execution. We introduce a multi-agent reinforcement learning architecture, specialized into distinct arm and hand agents, that acts as a high-level planner by generating desired task-space velocity commands. These commands are then processed by a GPU-parallelized quadratic programming controller, which translates them into feasible joint velocities while strictly enforcing kinematic limits and collision avoidance. This structural isolation not only accelerates training convergence but also strictly enforces hardware safety. Furthermore, the architecture unlocks zero-shot steerability, allowing system operators to dynamically adjust safety margins and avoid dynamic obstacles without retraining the policy. We extensively validate the proposed framework through a rigorous simulation-to-reality pipeline. Real-world hardware experiments on a 7-DoF arm equipped with a 20-DoF anthropomorphic hand demonstrate highly robust zero-shot transferability for dexterous grasping to a diverse set of unseen objects, highlighting the system's ability to reactively recover from unexpected physical disturbances in unstructured environments.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a hybrid hierarchical framework for reactive dexterous grasping that decouples high-level task-space velocity planning (via multi-agent RL with separate arm and hand agents) from low-level joint-space execution (via a GPU-parallelized QP controller enforcing kinematic limits and collision avoidance). It claims this structure accelerates RL training, strictly enforces hardware safety, and enables zero-shot steerability: operators can dynamically tighten safety margins or introduce dynamic obstacles at runtime without retraining the policy. The approach is validated via a sim-to-real pipeline on a 7-DoF arm with 20-DoF anthropomorphic hand, demonstrating robust grasping of unseen objects and recovery from physical disturbances in unstructured environments.
Significance. If the empirical claims hold, the work offers a practical route to combining RL adaptability with optimization-based safety in dexterous manipulation. The explicit decoupling and resulting steerability could reduce the need for retraining when deployment conditions change, which is valuable for real-world robotics. The sim-to-real focus and hardware validation on a high-DoF system are positive, though the absence of detailed quantitative metrics limits immediate assessment of impact.
major comments (2)
- [abstract and §4 (experiments)] The central claim of zero-shot steerability (abstract and §4) rests on the QP controller remaining feasible and real-time solvable when safety margins are tightened or dynamic obstacles are added at runtime. However, the RL agents are trained exclusively under the nominal constraint set; no regularization, adversarial training, or post-training analysis is described that ensures the QP feasible set remains non-empty under operator-induced changes. This is load-bearing for the steerability result.
- [§5 and abstract] §5 (or equivalent results section) and the abstract assert successful sim-to-real transfer and disturbance recovery, yet no quantitative metrics (success rates, recovery times, failure rates), ablation studies on the hierarchical RL+QP split, or baseline comparisons are referenced. Without these, the robustness claims cannot be evaluated and the soundness of the sim-to-real pipeline remains unverified.
minor comments (2)
- [§3] Notation for the task-space velocity commands generated by the RL agents versus the QP inputs should be clarified in §3 to avoid ambiguity between planning and control layers.
- [§3] The description of the multi-agent RL architecture would benefit from an explicit diagram or pseudocode showing the information flow between arm and hand agents and the shared task-space output.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below and will revise the paper to incorporate the suggested improvements.
read point-by-point responses
-
Referee: [abstract and §4 (experiments)] The central claim of zero-shot steerability (abstract and §4) rests on the QP controller remaining feasible and real-time solvable when safety margins are tightened or dynamic obstacles are added at runtime. However, the RL agents are trained exclusively under the nominal constraint set; no regularization, adversarial training, or post-training analysis is described that ensures the QP feasible set remains non-empty under operator-induced changes. This is load-bearing for the steerability result.
Authors: We acknowledge that the manuscript does not provide an explicit post-training analysis of QP feasibility under modified constraints. The QP controller is formulated as a prioritized optimization problem that minimizes task-space tracking error subject to hard kinematic and collision constraints; in practice this remains feasible for moderate runtime changes because the high-level RL policy produces commands that are typically well inside the nominal feasible set. To strengthen the claim, we will add a new subsection in §4 with both theoretical conditions for feasibility (based on the null-space projection and slack prioritization) and empirical verification by re-running the QP solver offline on logged trajectories with tightened margins and injected obstacles. revision: yes
-
Referee: [§5 and abstract] §5 (or equivalent results section) and the abstract assert successful sim-to-real transfer and disturbance recovery, yet no quantitative metrics (success rates, recovery times, failure rates), ablation studies on the hierarchical RL+QP split, or baseline comparisons are referenced. Without these, the robustness claims cannot be evaluated and the soundness of the sim-to-real pipeline remains unverified.
Authors: We agree that the current results section would benefit from more detailed quantitative reporting. We will expand §5 with tables reporting success rates (over at least 50 trials per object category), mean recovery times from external disturbances, and failure-mode breakdowns. We will also add ablation experiments that isolate the contribution of the multi-agent RL planner versus the QP layer, as well as comparisons against an end-to-end RL baseline and a pure model-based QP tracker. These additions will be included in the revised manuscript. revision: yes
Circularity Check
No significant circularity; claims rest on empirical validation of decoupled RL-QP architecture
full rationale
The paper presents a hybrid hierarchical framework with multi-agent RL generating task-space velocities that are then mapped by a QP controller enforcing constraints. The central claim of zero-shot steerability via runtime safety-margin adjustment is asserted as a consequence of the structural decoupling and is supported by sim-to-real hardware experiments on unseen objects and disturbances. No equations, fitted parameters, or self-citations are shown that reduce this claim to the training inputs by construction; the RL training occurs under nominal constraints while steerability is treated as an emergent property verified externally. The derivation chain is therefore self-contained as an empirical architecture proposal rather than a self-referential definition or renamed known result.
Axiom & Free-Parameter Ledger
free parameters (2)
- RL reward weights and learning rates
- QP objective weights
axioms (2)
- domain assumption The physics simulator accurately reproduces real contact and friction behavior for the tested objects.
- domain assumption The QP always admits a feasible solution within the real-time budget.
Reference graph
Works this paper leans on
-
[1]
K. M. Lynch and F. C. Park,Modern Robotics: Mechanics, Planning, and Control. Cambridge University Press, 2017, Chapter 12: Grasping and Manipulation
2017
-
[2]
Graspqp: Differentiable optimization of force closure for diverse and robust dexterous grasp- ing,
R. Zurbr ¨ugg, A. Cramariuc, and M. Hutter, “Graspqp: Differentiable optimization of force closure for diverse and robust dexterous grasp- ing,”arXiv preprint arXiv:2508.15002, 2025
-
[3]
Dexonomy: Synthesiz- ing all dexterous grasp types in a grasp taxonomy,
J. Chen, Y . Ke, L. Peng, and H. Wang, “Dexonomy: Synthesizing all dexterous grasp types in a grasp taxonomy,”arXiv preprint arXiv:2504.18829, 2025
-
[4]
Dexterous contact-rich manipulation via the contact trust region,
H. T. Suh, T. Pang, T. Zhao, and R. Tedrake, “Dexterous contact-rich manipulation via the contact trust region,”The International Journal of Robotics Research, p. 02 783 649 251 398 875, 2025
2025
-
[5]
Planning contact points for humanoid robots,
A. Escande, A. Kheddar, and S. Miossec, “Planning contact points for humanoid robots,”Robotics and Autonomous Systems, vol. 61, no. 5, pp. 428–442, 2013
2013
-
[6]
Global planning for contact-rich manipulation via local smoothing of quasi-dynamic contact models,
T. Pang, H. T. Suh, L. Yang, and R. Tedrake, “Global planning for contact-rich manipulation via local smoothing of quasi-dynamic contact models,”IEEE Transactions on robotics, vol. 39, no. 6, pp. 4691–4711, 2023
2023
-
[7]
A direct method for trajectory optimization of rigid bodies through contact,
M. Posa, C. Cantu, and R. Tedrake, “A direct method for trajectory optimization of rigid bodies through contact,”The International Journal of Robotics Research, vol. 33, no. 1, pp. 69–81, 2014
2014
-
[8]
Dextrah- rgb: Visuomotor policies to grasp anything with dexterous hands,
R. Singh, A. Allshire, A. Handa, N. Ratliff, and K. Van Wyk, “Dextrah-rgb: Visuomotor policies to grasp anything with dexterous hands,”arXiv preprint arXiv:2412.01791, 2024
-
[9]
Robustdex- grasp: Robust dexterous grasping of general objects,
H. Zhang, Z. Wu, L. Huang, S. Christen, and J. Song, “Robustdex- grasp: Robust dexterous grasping of general objects,”arXiv preprint arXiv:2504.05287, 2025
-
[10]
Real-time obstacle avoidance for manipulators and mobile robots,
O. Khatib, “Real-time obstacle avoidance for manipulators and mobile robots,”The international journal of robotics research, vol. 5, no. 1, pp. 90–98, 1986. 16
1986
-
[11]
Avoidance of concave obsta- cles through rotation of nonlinear dynamics,
L. Huber, J.-J. Slotine, and A. Billard, “Avoidance of concave obsta- cles through rotation of nonlinear dynamics,”IEEE Transactions on Robotics, vol. 40, pp. 1983–2002, 2023
1983
-
[12]
Behavior-controllable stable dynamics models on riemannian configuration manifolds,
B. Lee, Y . Lee, J. Ha, and F. C. Park, “Behavior-controllable stable dynamics models on riemannian configuration manifolds,”IEEE Transactions on Robotics, 2025
2025
-
[13]
Mapping behavioral repertoire onto the cortex,
M. S. Graziano and T. N. Aflalo, “Mapping behavioral repertoire onto the cortex,”Neuron, vol. 56, no. 2, pp. 239–251, 2007
2007
-
[14]
Postural hand synergies for tool use,
M. Santello, M. Flanders, and J. F. Soechting, “Postural hand synergies for tool use,”Journal of neuroscience, vol. 18, no. 23, pp. 10 105–10 115, 1998
1998
-
[15]
The timing of natural prehension movements,
M. Jeannerod, “The timing of natural prehension movements,”Jour- nal of motor behavior, vol. 16, no. 3, pp. 235–254, 1984
1984
-
[16]
The reach- to-grasp movement in parkinson’s disease before and after dopamin- ergic medication,
U. Castiello, K. Bennett, C. Bonfiglioli, and R. Peppard, “The reach- to-grasp movement in parkinson’s disease before and after dopamin- ergic medication,”Neuropsychologia, vol. 38, no. 1, pp. 46–59, 2000
2000
-
[17]
Learning humanoid arm mo- tion via centroidal momentum regularized multi-agent reinforcement learning,
H. J. Lee, S. H. Jeon, and S. Kim, “Learning humanoid arm mo- tion via centroidal momentum regularized multi-agent reinforcement learning,”IEEE Robotics and Automation Letters, 2025
2025
-
[18]
Gradient surgery for multi-task learning,
T. Yu, S. Kumar, A. Gupta, S. Levine, K. Hausman, and C. Finn, “Gradient surgery for multi-task learning,”Advances in neural infor- mation processing systems, vol. 33, pp. 5824–5836, 2020
2020
-
[19]
Ray interference: A source of plateaus in deep reinforcement learning,
T. Schaul, D. Borsa, J. Modayil, and R. Pascanu, “Ray interference: A source of plateaus in deep reinforcement learning,”arXiv preprint arXiv:1904.11455, 2019
-
[20]
Cusadi: A gpu parallelization framework for symbolic expressions and optimal control,
S. H. Jeon, S. Hong, H. J. Lee, C. Khazoom, and S. Kim, “Cusadi: A gpu parallelization framework for symbolic expressions and optimal control,”IEEE Robotics and Automation Letters, 2024
2024
-
[21]
Rimon,Exact robot navigation using artificial potential functions
E. Rimon,Exact robot navigation using artificial potential functions. Yale University, 1990
1990
-
[22]
Safety assessment and control of robotic manipulators using danger field,
B. Lacevic, P. Rocco, and A. M. Zanchettin, “Safety assessment and control of robotic manipulators using danger field,”IEEE Transac- tions on Robotics, vol. 29, no. 5, pp. 1257–1270, 2013
2013
-
[23]
Navigation func- tions for convex potentials in a space with convex obstacles,
S. Paternain, D. E. Koditschek, and A. Ribeiro, “Navigation func- tions for convex potentials in a space with convex obstacles,”IEEE Transactions on Automatic Control, vol. 63, no. 9, pp. 2944–2959, 2017
2017
-
[24]
Vector fields for robot navigation along time-varying curves inn-dimensions,
V . M. Goncalves, L. C. Pimenta, C. A. Maia, B. C. Dutra, and G. A. Pereira, “Vector fields for robot navigation along time-varying curves inn-dimensions,”IEEE Transactions on Robotics, vol. 26, no. 4, pp. 647–659, 2010
2010
-
[25]
A dynamical system approach to realtime obstacle avoidance,
S. M. Khansari-Zadeh and A. Billard, “A dynamical system approach to realtime obstacle avoidance,”Autonomous Robots, vol. 32, pp. 433– 454, 2012
2012
-
[26]
Avoiding dense and dynamic obstacles in enclosed spaces: Application to moving in crowds,
L. Huber, J.-J. Slotine, and A. Billard, “Avoiding dense and dynamic obstacles in enclosed spaces: Application to moving in crowds,”IEEE Transactions on Robotics, vol. 38, no. 5, pp. 3113–3132, 2022
2022
-
[27]
Billard, S
A. Billard, S. Mirrazavi, and N. Figueroa,Learning for adaptive and reactive robot control: a dynamical systems approach. Mit Press, 2022
2022
-
[28]
Reactive collision-free motion generation in joint space via dynamical systems and sampling- based mpc,
M. Koptev, N. Figueroa, and A. Billard, “Reactive collision-free motion generation in joint space via dynamical systems and sampling- based mpc,”The International Journal of Robotics Research, vol. 43, no. 13, pp. 2049–2069, 2024
2049
-
[29]
Hierarchical reactive grasping via task-space velocity fields and joint-space quadratic programming,
Y . Lee, T.-Y . Lin, A. Alexiev, and S. Kim, “Hierarchical reactive grasping via task-space velocity fields and joint-space quadratic programming,”arXiv preprint arXiv:2509.01044, 2025
-
[30]
Contact-implicit differential dynamic programming for model predictive control with relaxed complementarity constraints,
G. Kim, D. Kang, J.-H. Kim, and H.-W. Park, “Contact-implicit differential dynamic programming for model predictive control with relaxed complementarity constraints,” in2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 2022, pp. 11 978–11 985
2022
-
[31]
Dexterous manipulation for multi-fingered robotic hands with reinforcement learning: A review,
C. Yu and P. Wang, “Dexterous manipulation for multi-fingered robotic hands with reinforcement learning: A review,”Frontiers in Neurorobotics, vol. 16, p. 861 825, 2022
2022
-
[32]
Learning dexterous in-hand manipula- tion,
O. M. Andrychowicz et al., “Learning dexterous in-hand manipula- tion,”The International Journal of Robotics Research, vol. 39, no. 1, pp. 3–20, 2020
2020
-
[33]
Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations
A. Rajeswaran et al., “Learning complex dexterous manipulation with deep reinforcement learning and demonstrations,”arXiv preprint arXiv:1709.10087, 2017
work page Pith review arXiv 2017
-
[34]
Unidexgrasp++: Improving dexterous grasping pol- icy learning via geometry-aware curriculum and iterative generalist- specialist learning,
W. Wan et al., “Unidexgrasp++: Improving dexterous grasping pol- icy learning via geometry-aware curriculum and iterative generalist- specialist learning,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 3891–3902
2023
-
[35]
Dexterous functional grasping,
A. Agarwal, S. Uppal, K. Shaw, and D. Pathak, “Dexterous functional grasping,”arXiv preprint arXiv:2312.02975, 2023
-
[36]
End-to-end training of deep visuomotor policies,
S. Levine, C. Finn, T. Darrell, and P. Abbeel, “End-to-end training of deep visuomotor policies,”Journal of Machine Learning Research, vol. 17, no. 39, pp. 1–40, 2016
2016
-
[37]
What Matters in Learning from Offline Human Demonstrations for Robot Manipulation
A. Mandlekar et al., “What matters in learning from offline human demonstrations for robot manipulation,”arXiv preprint arXiv:2108.03298, 2021
work page internal anchor Pith review arXiv 2021
-
[38]
Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks,
R. Mart ´ın-Mart´ın, M. A. Lee, R. Gardner, S. Savarese, J. Bohg, and A. Garg, “Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks,” in2019 IEEE/RSJ international conference on intelligent robots and systems (IROS), IEEE, 2019, pp. 1010–1017
2019
-
[39]
Curious exploration via struc- tured world models yields zero-shot object manipulation,
C. Sancaktar, S. Blaes, and G. Martius, “Curious exploration via struc- tured world models yields zero-shot object manipulation,”Advances in Neural Information Processing Systems, vol. 35, pp. 24 170–24 183, 2022
2022
-
[40]
Relay policy learning: Solving long-horizon tasks via imitation and reinforcement learning
A. Gupta, V . Kumar, C. Lynch, S. Levine, and K. Hausman, “Relay policy learning: Solving long-horizon tasks via imitation and rein- forcement learning,”arXiv preprint arXiv:1910.11956, 2019
-
[41]
Dextrah-g: Pixels-to- action dexterous arm-hand grasping with geometric fabrics,
T. G. W. Lum et al., “Dextrah-g: Pixels-to-action dexterous arm-hand grasping with geometric fabrics,”arXiv preprint arXiv:2407.02274, 2024
-
[42]
Reinforcement learning in robotics: A survey,
J. Kober, J. A. Bagnell, and J. Peters, “Reinforcement learning in robotics: A survey,”The International Journal of Robotics Research, vol. 32, no. 11, pp. 1238–1274, 2013
2013
-
[43]
R. S. Sutton, A. G. Barto, et al.,Reinforcement learning: An intro- duction. MIT press Cambridge, 1998, vol. 1
1998
-
[44]
Multi-agent deep reinforcement learn- ing: A survey,
S. Gronauer and K. Diepold, “Multi-agent deep reinforcement learn- ing: A survey,”Artificial Intelligence Review, vol. 55, no. 2, pp. 895– 943, 2022
2022
-
[45]
Multi-agent actor-critic for mixed cooperative-competitive environments,
R. Lowe, Y . I. Wu, A. Tamar, J. Harb, O. Pieter Abbeel, and I. Mordatch, “Multi-agent actor-critic for mixed cooperative-competitive environments,”Advances in neural information processing systems, vol. 30, 2017
2017
-
[46]
Counterfactual multi-agent policy gradients,
J. Foerster, G. Farquhar, T. Afouras, N. Nardelli, and S. Whiteson, “Counterfactual multi-agent policy gradients,” inProceedings of the AAAI conference on artificial intelligence, vol. 32, 2018
2018
-
[47]
Boyd and L
S. Boyd and L. Vandenberghe,Convex Optimization. Cambridge University Press, 2004, Chapter 11: Interior-point methods
2004
-
[48]
Casadi: A software framework for nonlinear optimization and op- timal control,
J. A. Andersson, J. Gillis, G. Horn, J. B. Rawlings, and M. Diehl, “Casadi: A software framework for nonlinear optimization and op- timal control,”Mathematical Programming Computation, vol. 11, pp. 1–36, 2019
2019
-
[49]
To- wards robust autonomous grasping with reflexes using high-bandwidth sensing and actuation,
A. SaLoutos, H. Kim, E. Stanger-Jones, M. Guo, and S. Kim, “To- wards robust autonomous grasping with reflexes using high-bandwidth sensing and actuation,”arXiv preprint arXiv:2209.11367, 2022
-
[50]
Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning
M. Mittal et al., “Isaac lab: A gpu-accelerated simulation framework for multi-modal robot learning,”arXiv preprint arXiv:2511.04831, 2025
work page internal anchor Pith review arXiv 2025
-
[51]
Perceptive locomotion through nonlinear model-predictive control,
R. Grandia, F. Jenelten, S. Yang, F. Farshidian, and M. Hutter, “Perceptive locomotion through nonlinear model-predictive control,” IEEE Transactions on Robotics, vol. 39, no. 5, pp. 3402–3421, 2023
2023
-
[52]
Mujoco: A physics engine for model-based control,
E. Todorov, T. Erez, and Y . Tassa, “Mujoco: A physics engine for model-based control,” in2012 IEEE/RSJ international conference on intelligent robots and systems, IEEE, 2012, pp. 5026–5033
2012
-
[53]
Proximal Policy Optimization Algorithms
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017
work page internal anchor Pith review arXiv 2017
-
[54]
B. Stellato, G. Banjac, P. Goulart, A. Bemporad, and S. Boyd, “OSQP: An operator splitting solver for quadratic programs,”Mathematical Programming Computation, vol. 12, no. 4, pp. 637–672, 2020.DOI: 10.1007/s12532-020-00179-2 [Online]. Available: https://doi.org/10. 1007/s12532-020-00179-2
-
[55]
The ycb object and model set: Towards common benchmarks for manipulation research,
B. Calli, A. Singh, A. Walsman, S. Srinivasa, P. Abbeel, and A. M. Dollar, “The ycb object and model set: Towards common benchmarks for manipulation research,” in2015 international conference on advanced robotics (ICAR), IEEE, 2015, pp. 510–517
2015
-
[56]
T.-Y . Lin, H. J. Lee, K. Doherty, Y . Lee, and S. Kim, “Point2pose: Occlusion-recovering 6d pose tracking and 3d reconstruction for multiple unknown objects via 2d point trackers,”arXiv preprint arXiv:2604.10415, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[57]
The pinocchio c++ library: A fast and flexible implementation of rigid body dynamics algorithms and their analytical derivatives,
J. Carpentier et al., “The pinocchio c++ library: A fast and flexible implementation of rigid body dynamics algorithms and their analytical derivatives,” in2019 IEEE/SICE International Symposium on System Integration (SII), IEEE, 2019, pp. 614–619. 17 APPENDIXA MULTI-AGENTRL FORMULATIONDETAILS This section provides the comprehensive implementation details...
2019
-
[57]
The pinocchio c++ library: A fast and flexible implementation of rigid body dynamics algorithms and their analytical derivatives,
J. Carpentier et al., “The pinocchio c++ library: A fast and flexible implementation of rigid body dynamics algorithms and their analytical derivatives,” in2019 IEEE/SICE International Symposium on System Integration (SII), IEEE, 2019, pp. 614–619. 17 APPENDIXA MULTI-AGENTRL FORMULATIONDETAILS This section provides the comprehensive implementation details...
2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.