An equilibrium-propagation-based PPO controller for a 12-DoF quadruped achieves locomotion performance comparable to backpropagation-trained PPO on uneven terrain while using 4.3 times less GPU memory.
Learning quadrupedal locomotion over challenging terrain
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
FlashSAC scales up Soft Actor-Critic with fewer updates, larger models, higher data throughput, and norm bounds to deliver faster, more stable training than PPO on high-dimensional robot control tasks across dozens of simulators.
Integrating foot position maps into heightmaps and adding a locomotion-stability reward in an attention-based RL framework improves quadrupedal success rates on both trained and out-of-domain complex terrains.
citing papers explorer
-
Neuromorphic Reinforcement Learning for Quadruped Locomotion Control on Uneven Terrain
An equilibrium-propagation-based PPO controller for a 12-DoF quadruped achieves locomotion performance comparable to backpropagation-trained PPO on uneven terrain while using 4.3 times less GPU memory.
-
FlashSAC: Fast and Stable Off-Policy Reinforcement Learning for High-Dimensional Robot Control
FlashSAC scales up Soft Actor-Critic with fewer updates, larger models, higher data throughput, and norm bounds to deliver faster, more stable training than PPO on high-dimensional robot control tasks across dozens of simulators.
-
Learning Locomotion on Complex Terrain for Quadrupedal Robots with Foot Position Maps and Stability Rewards
Integrating foot position maps into heightmaps and adding a locomotion-stability reward in an attention-based RL framework improves quadrupedal success rates on both trained and out-of-domain complex terrains.