A two-stage RL framework with a thermal-aware residual policy enables a Unitree A1 quadruped to achieve over 13 minutes of stable locomotion under 3 kg payload versus 5 minutes before overheating with the nominal policy alone.
Toward Reliable Sim-to-Real Predictability for MoE-based Robust Quadrupedal Locomotion
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
Reinforcement learning has shown strong promise for quadrupedal agile locomotion, even with proprioception-only sensing. In practice, however, sim-to-real gap and reward overfitting in complex terrains can produce policies that fail to transfer, while physical validation remains risky and inefficient. To address these challenges, we introduce a unified framework encompassing a Mixture-of-Experts (MoE) locomotion policy for robust multi-terrain representation with RoboGauge, a predictive assessment suite that quantifies sim-to-real transferability. The MoE policy employs a gated set of specialist experts to decompose latent terrain and command modeling, achieving superior deployment robustness and generalization via proprioception alone. RoboGauge further provides multi-dimensional proprioception-based metrics via sim-to-sim tests over terrains, difficulty levels, and domain randomizations, enabling reliable MoE policy selection without extensive physical trials. Experiments on a Unitree Go2 demonstrate robust locomotion on unseen challenging terrains, including snow, sand, stairs, slopes, and 30 cm obstacles. In dedicated high-speed tests, the robot reaches 4 m/s and exhibits an emergent narrow-width gait associated with improved stability at high velocity.
fields
cs.RO 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Learning to Balance Motor Thermal Safety and Quadrupedal Locomotion Performance with Residual Policy
A two-stage RL framework with a thermal-aware residual policy enables a Unitree A1 quadruped to achieve over 13 minutes of stable locomotion under 3 kg payload versus 5 minutes before overheating with the nominal policy alone.