Learning Perceptive Platform Adaptive Locomotion Controllers for Quadrupedal Robots
Pith reviewed 2026-06-25 23:41 UTC · model grok-4.3
The pith
Critic-only perception improves robustness and tracking consistency over blind baselines while remaining more stable than fully perceptive policies under perception noise.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Building on morphology-aware reinforcement learning, the work trains universal controllers specialized to multiple reference quadrupeds via adaptive terrain curricula. Evaluation in simulation on flat and rough terrain plus deployment on ANYmal hardware shows that the critic-perceptive variant improves robustness and tracking consistency over blind baselines while remaining more stable than fully perceptive policies when perception is noisy.
What carries the argument
The critic-perceptive architecture (MorAL+), in which perception informs only the value critic during morphology-specialized training rather than direct action selection.
If this is right
- Morphology-specialized training allows a single controller family to handle related quadruped platforms without retraining from scratch.
- Adaptive terrain curricula during simulation training enable effective learning of terrain-aware locomotion that transfers to hardware.
- Limiting perception to the critic preserves terrain awareness benefits while avoiding the noise sensitivity of full actor-critic perception.
- Such controllers can be deployed directly on physical quadrupeds like ANYmal with maintained tracking performance.
Where Pith is reading between the lines
- The same critic-only placement might reduce noise sensitivity in other legged robot domains where full perception in the policy leads to brittle behavior.
- Curriculum design that gradually increases terrain difficulty could be combined with variable noise injection to further improve sim-to-real gaps.
- Future controllers might dynamically gate perception input based on estimated sensor reliability without changing the core architecture.
Load-bearing premise
That the relative performance ordering among blind, critic-perceptive, and fully perceptive policies observed under simulated adaptive terrain curricula will transfer to real ANYmal hardware experiencing actual sensor noise.
What would settle it
On physical ANYmal hardware, if the fully perceptive policy shows higher robustness or better tracking than the critic-only variant under realistic sensor conditions, or if the critic-only variant shows no improvement over the blind baseline.
Figures
read the original abstract
Universal quadrupedal locomotion remains limited by the difficulty of integrating perception across diverse robot morphologies. State-of-the-art controllers rely on single-robot training or blind policies that omit real-time perception, leading to poor cross-embodiment generalization. Designing locomotion policies that remain robust across related quadruped morphologies while incorporating perception is challenging. Moreover, fully perceptive policies are often sensitive to noise, whereas blind controllers lack terrain awareness. In this work, we study how perception should be integrated into morphology-aware reinforcement learning architectures for deployable quadrupedal control. Building on MorAL, we train morphology-specialized universal controllers on multiple reference quadrupeds using adaptive terrain curricula. We compare a blind baseline, a critic-perceptive variant (MorAL+), and a fully perceptive actor-critic (PPAL). Policies are evaluated in simulation on flat and rough terrains, and deployed on ANYmal hardware. Results show that critic-only perception improves robustness and tracking consistency over blind baselines while remaining more stable than fully perceptive policies under perception noise. These findings highlight that perception placement and curriculum design are key factors for scalable, morphology-aware locomotion.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper studies perception integration in morphology-aware RL for quadrupedal locomotion. Building on MorAL, it trains morphology-specialized controllers for multiple quadruped platforms using adaptive terrain curricula in simulation. It compares three variants: a blind baseline, a critic-only perceptive policy (MorAL+), and a fully perceptive actor-critic (PPAL). Policies are evaluated in simulation on flat/rough terrain and deployed on ANYmal hardware. The central claim is that critic-only perception yields better robustness and tracking consistency than blind policies while remaining more stable than full-perception policies under perception noise.
Significance. If the reported performance ordering is substantiated with quantitative hardware data, the work would provide actionable guidance on perception placement within morphology-adaptive locomotion architectures, helping address cross-embodiment generalization and noise sensitivity in real-world quadrupedal control.
major comments (3)
- [Abstract, §5] Abstract and §5 (Hardware Deployment): The central claim asserts a specific performance ordering on ANYmal hardware, yet no quantitative metrics (e.g., tracking error, success rate, robustness scores), error bars, exclusion criteria, or statistical tests are supplied. This prevents verification of whether the sim-trained ordering transfers under realistic sensor noise.
- [§4, §5] §4 (Simulation Experiments) and §5: The manuscript provides no description of how real depth or proprioceptive sensor noise statistics on ANYmal were measured, matched to simulation, or injected during hardware trials. Without this, the claim that MorAL+ remains more stable than PPAL under perception noise cannot be evaluated on hardware.
- [§3] §3 (Training Protocol): The adaptive terrain curricula and morphology-specialized training are described at a high level, but no ablation or sensitivity analysis shows that the relative ordering (critic-only vs. blind vs. full) is robust to variations in curriculum parameters or embodiment differences. This makes the sim-to-real transfer assumption load-bearing and untested.
minor comments (2)
- [Abstract] The abstract refers to 'MorAL+' and 'PPAL' without an early definition or pointer to the section where these acronyms are introduced.
- [Figures in §4, §5] Figure captions for simulation and hardware results should explicitly state the number of trials, seeds, and whether noise was present.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback, which highlights important aspects of our hardware evaluation and analysis. We address each major comment below and have revised the manuscript to incorporate the requested details and additional supporting analysis.
read point-by-point responses
-
Referee: [Abstract, §5] Abstract and §5 (Hardware Deployment): The central claim asserts a specific performance ordering on ANYmal hardware, yet no quantitative metrics (e.g., tracking error, success rate, robustness scores), error bars, exclusion criteria, or statistical tests are supplied. This prevents verification of whether the sim-trained ordering transfers under realistic sensor noise.
Authors: We agree that the original manuscript did not provide sufficient quantitative hardware metrics to fully substantiate the performance ordering. In the revised version, §5 now includes a table reporting mean tracking errors, success rates (over 10 trials per variant), and robustness scores with standard deviations and error bars. We also specify exclusion criteria for failed trials and include results from paired statistical tests confirming the significance of differences between variants. revision: yes
-
Referee: [§4, §5] §4 (Simulation Experiments) and §5: The manuscript provides no description of how real depth or proprioceptive sensor noise statistics on ANYmal were measured, matched to simulation, or injected during hardware trials. Without this, the claim that MorAL+ remains more stable than PPAL under perception noise cannot be evaluated on hardware.
Authors: We acknowledge the need for explicit noise modeling details. The revised manuscript adds a subsection in §5 describing the characterization of real sensor noise from ANYmal hardware logs (empirical mean/variance of depth camera and proprioceptive/IMU errors across terrains). These statistics were used to calibrate and inject matching noise in simulation for the robustness experiments, while hardware trials used the platform's native sensors without synthetic injection. revision: yes
-
Referee: [§3] §3 (Training Protocol): The adaptive terrain curricula and morphology-specialized training are described at a high level, but no ablation or sensitivity analysis shows that the relative ordering (critic-only vs. blind vs. full) is robust to variations in curriculum parameters or embodiment differences. This makes the sim-to-real transfer assumption load-bearing and untested.
Authors: We agree that sensitivity to curriculum parameters warrants explicit verification. The revised manuscript includes an ablation study (added to the appendix) showing that the relative performance ordering among the three variants remains consistent across multiple curriculum progression schedules and terrain parameter variations. This analysis was performed on the same set of morphologies used in the main experiments. revision: yes
Circularity Check
Empirical RL comparison; no derivation chain present
full rationale
The paper describes training morphology-specialized RL controllers (blind, critic-perceptive MorAL+, fully perceptive PPAL) via adaptive terrain curricula in simulation, followed by hardware deployment on ANYmal and empirical comparison of robustness/tracking under noise. No equations, first-principles derivations, fitted parameters renamed as predictions, or uniqueness theorems are invoked. Claims rest on reported experimental outcomes rather than any chain that reduces to its own inputs by construction. Self-citations (if present) are not load-bearing for a derivation. This matches the default case of a non-circular empirical study.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Gemini robotics: Bringing AI into the physical world
G. R. Team, S. Abeyruwan, J. Ainslie, J.-B. Alayrac, M. G. Arenas, T. Armstrong, A. Balakrishna, R. Baruch, M. Bauza, M. Blokzijl, S. Bohez, K. Bousmalis, A. Brohan, T. Buschmann, A. Byravan, S. Cabi, K. Caluwaerts, F. Casarini, O. Chang, J. E. Chen, X. Chen, H.-T. L. Chiang, K. Choromanski, D. D’Ambrosio, S. Dasari, T. Davchev, C. Devin, N. D. Palo, T. D...
-
[2]
LocoFormer: Generalist locomo- tion via long-context adaptation,
M. Liu, D. Pathak, and A. Agarwal, “LocoFormer: Generalist locomo- tion via long-context adaptation,” in9th annual conference on robot learning, 2025
2025
-
[3]
Z. Zhuang, Z. Fu, J. Wang, C. Atkeson, S. Schwertfeger, C. Finn, and H. Zhao, “Robot parkour learning,” inProceedings of The 7th Conference on Robot Learning, vol. 229. PMLR, pp. 73–92. [Online]. Available: http://arxiv.org/abs/2309.05665
-
[4]
ANYmal parkour: Learning agile navigation for quadrupedal robots,
D. Hoeller, N. Rudin, D. Sako, and M. Hutter, “ANYmal parkour: Learning agile navigation for quadrupedal robots,” in Science Robotics, vol. 9, p. eadi7566, 2024. [Online]. Available: https://www.science.org/doi/10.1126/scirobotics.adi7566
-
[5]
High-speed control and navigation for quadrupedal robots on complex and discrete terrain,
H. Kim, H. Oh, J. Park, Y . Kim, D. Youm, M. Jung, M. Lee, and J. Hwangbo, “High-speed control and navigation for quadrupedal robots on complex and discrete terrain,” inScience Robotics, vol. 10, p. eads6192, 2025. [Online]. Available: https://www.science.org/doi/ abs/10.1126/scirobotics.ads6192
-
[6]
Z. Luo, Y . Dong, X. Li, R. Huang, Z. Shu, E. Xiao, and P. Lu, “MorAL: Learning morphologically adaptive locomotion controller for quadrupedal robots on challenging terrains,” in 2024 Robotics and Automation Letters. IEEE. [Online]. Available: https://ieeexplore.ieee.org/document/10463132/
arXiv 2024
-
[7]
One policy to run them all: an end-to-end learning approach to multi-embodiment locomotion,
N. Bohlinger, G. Czechmanowski, M. Krupka, P. Kicki, K. Walas, J. Peters, and D. Tateo, “One policy to run them all: an end-to-end learning approach to multi-embodiment locomotion,” inProceedings of The 8th Conference on Robot Learning, vol. 270. PMLR, 2025. [Online]. Available: https://proceedings.mlr.press/v270/bohlinger25a. html
2025
-
[8]
Multi-loco: Unifying multi-embodiment legged locomotion via reinforcement learning augmented diffusion,
S. Yang, Z. Fu, Z. Cao, G. Junde, P. Wensing, W. Zhang, and H. Chen, “Multi-loco: Unifying multi-embodiment legged locomotion via reinforcement learning augmented diffusion,” inProceedings of The 9th Conference on Robot Learning. PMLR, pp. 1030–1048, 2025. [Online]. Available: https://proceedings.mlr.press/v305/yang25a.html
2025
-
[9]
Science Robotics5(47), eabc5986 (2020) https://doi.org/10.1126/scirobotics.abc5986 29
J. Lee, J. Hwangbo, L. Wellhausen, V . Koltun, and M. Hutter, “Learning quadrupedal locomotion over challenging terrain,” in Science Robotics, vol. 5, p. eabc5986, 2020. [Online]. Available: https://www.science.org/doi/abs/10.1126/scirobotics.abc5986
-
[10]
Learning agile loco- motion on risky terrains,
C. Zhang, N. Rudin, D. Hoeller, and M. Hutter, “Learning agile loco- motion on risky terrains,” in2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp. 11 864–11 871
-
[11]
Legged locomotion in challenging terrains using egocentric vision,
A. Agarwal, A. Kumar, J. Malik, and D. Pathak, “Legged locomotion in challenging terrains using egocentric vision,” in Proceedings of The 6th Conference on Robot Learning, vol
-
[12]
403–415, 2022
PMLR, pp. 403–415, 2022. [Online]. Available: https: //proceedings.mlr.press/v205/agarwal23a.html
2022
-
[13]
Extreme parkour with legged robots,
X. Cheng, K. Shi, A. Agarwal, and D. Pathak, “Extreme parkour with legged robots,” in2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE, pp. 11 443–11 450
-
[14]
Science Robotics 7(62), eabk2822 (2022) https://doi.org/10.1126/scirobotics.abk2822
T. Miki, J. Lee, J. Hwangbo, L. Wellhausen, V . Koltun, and M. Hutter, “Learning robust perceptive locomotion for quadrupedal robots in the wild,” inScience Robotics, vol. 7, 2022. [Online]. Available: https://www.science.org/doi/abs/10.1126/scirobotics.abk2822
-
[15]
GenLoco: Generalized locomotion controllers for quadrupedal robots,
G. Feng, H. Zhang, Z. Li, X. B. Peng, B. Basireddy, L. Yue, Z. Song, L. Yang, Y . Liu, K. Sreenath, and S. Levine, “GenLoco: Generalized locomotion controllers for quadrupedal robots,” in 2022 Conference on robot learning. PMLR. [Online]. Available: http://arxiv.org/abs/2209.05309
arXiv 2022
-
[16]
ManyQuadrupeds: Learning a single locomotion policy for diverse quadruped robots,
M. Shafiee, G. Bellegarda, and A. Ijspeert, “ManyQuadrupeds: Learning a single locomotion policy for diverse quadruped robots,” in2024 IEEE International Conference on Robotics and Automation (ICRA), pp. 3471–3477. [Online]. Available: https://api.semanticscholar.org/CorpusID:264146177
-
[17]
Sampling strategies for robust universal quadrupedal locomotion policies
D. Rytz, K. T. Ly, and I. Havoutis, “Sampling strategies for robust universal quadrupedal locomotion policies.” [Online]. Available: http://arxiv.org/abs/2510.07094
-
[18]
Articulated systems — RaiSim v1.1.7 documentation
RaisimTech. Articulated systems — RaiSim v1.1.7 documentation. [Online]. Available: https://raisim.com/sections/ArticulatedSystem. html
-
[19]
Reference free platform adaptive locomotion for quadrupedal robots using a dynamics conditioned policy,
D. Rytz, S. Choi, W. Yu, W. Merkt, J. Hwangbo, and I. Havoutis, “Reference free platform adaptive locomotion for quadrupedal robots using a dynamics conditioned policy,” in2025 European Conference on Mobile Robots (ECMR)
-
[20]
Learning to walk in minutes using massively parallel deep reinforcement learning,
N. Rudin, D. Hoeller, P. Reist, and M. Hutter, “Learning to walk in minutes using massively parallel deep reinforcement learning,” inProceedings of the 5th Conference on Robot Learning, vol. 164. PMLR, pp. 91–100, 2021. [Online]. Available: https://proceedings.mlr.press/v164/rudin22a.html
2021
-
[21]
Science Robotics4(26), eaau5872 (2019) https://doi.org/10.1126/scirobotics.aau5872
J. Hwangbo, J. Lee, A. Dosovitskiy, D. Bellicoso, V . Tsounis, V . Koltun, and M. Hutter, “Learning agile and dynamic motor skills for legged robots,” in2019 Science Robotics, vol. 4. [Online]. Available: https://www.science.org/doi/10.1126/scirobotics.aau5872
-
[22]
Unitree. A1. [Online]. Available: https://www.unitree.com/en/a1/
-
[23]
ANYmal - a highly mobile and dynamic quadrupedal robot,
M. Hutter, C. Gehring, D. Jud, A. Lauber, C. D. Bellicoso, V . Tsounis, J. Hwangbo, K. Bodie, P. Fankhauser, M. Bloesch, R. Diethelm, S. Bachmann, A. Melzer, and M. Hoepflinger, “ANYmal - a highly mobile and dynamic quadrupedal robot,” in2016 International Con- ference on Intelligent Robots and Systems. IEEE/RSJ, pp. 38–44
-
[24]
Ackerman
E. Ackerman. ANYbotics introduces sleek new ANYmal c quadruped. [Online]. Available: https://spectrum.ieee.org/ anybotics-introduces-sleek-new-anymal-c-quadruped
-
[25]
Proximal policy optimization algorithms,
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” inarXiv preprint, 2017. [Online]. Available: https://arxiv.org/abs/1707.06347
Pith/arXiv arXiv 2017
-
[26]
Concurrent training of a control policy and a state estimator for dynamic and robust legged locomotion,
G. Ji, J. Mun, H. Kim, and J. Hwangbo, “Concurrent training of a control policy and a state estimator for dynamic and robust legged locomotion,” inIEEE Robotics and Automation Letters, vol. 7. IEEE, pp. 4630–4637
-
[27]
Architecture is all you need: Diversity-enabled sweet spots for robust humanoid locomotion
B. Werner, L. Yang, and A. D. Ames, “Architecture is all you need: Diversity-enabled sweet spots for robust humanoid locomotion.” arXiv. [Online]. Available: http://arxiv.org/abs/2510.14947
-
[28]
Learning low- frequency motion control for robust and dynamic robot locomotion,
S. Gangapurwala, L. Campanaro, and I. Havoutis, “Learning low- frequency motion control for robust and dynamic robot locomotion,” in2023 IEEE International Conference on Robotics and Automation, pp. 5085–5091
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.