MPC-RL combines a centroidal-dynamics MPC reward with a batched GPU solver (π^n MPC) to accelerate RL training for humanoid locomotion and manipulation tasks.
A Differentiable Interior-Point Method in Single Precision
2 Pith papers cite this work. Polarity classification is still indexing.
abstract
Primal-dual interior-point methods solve constrained convex optimization problems to tight tolerances with speed and robustness. Their solutions are also efficiently differentiable with respect to the problem data through the implicit function theorem. However, the standard treatment of primal-dual complementarity makes the underlying linear systems increasingly ill-conditioned near the solution. While this ill-conditioning is often benign in double precision, it can be catastrophic in single precision, preventing interior-point methods from fully exploiting the accelerated hardware that underpins modern machine learning. This paper introduces a differentiable interior-point method designed for low-precision arithmetic. By using an alternative complementarity representation, we ensure that the underlying linear systems remain spectrally bounded -- even near the solution -- a property that is essential for computing accurate gradients and avoiding arithmetic exceptions. As a result, our method enables interior-point solvers to reliably solve and differentiate optimization problems in single precision that were previously confined to double precision. We demonstrate the approach through an ablation study against the standard interior-point formulation and applications in bilevel and end-to-end learning settings where differentiating through constrained optimization is essential. The source code is available at https://github.com/qpax-solver/qpax.
fields
cs.RO 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
ConstrainedMimic integrates operational space control and control barrier functions into RL tracking policies to enforce arbitrary runtime constraints on humanoid kinematics and dynamics while preserving contact modes and tracking goals.
citing papers explorer
-
Accelerating and Scaling MPC-Guided Reinforcement Learning for Humanoid Locomotion and Manipulation
MPC-RL combines a centroidal-dynamics MPC reward with a batched GPU solver (π^n MPC) to accelerate RL training for humanoid locomotion and manipulation tasks.
-
Constrained Whole-Body Tracking for Humanoid Robots
ConstrainedMimic integrates operational space control and control barrier functions into RL tracking policies to enforce arbitrary runtime constraints on humanoid kinematics and dynamics while preserving contact modes and tracking goals.