pith. machine review for the scientific record. sign in

arxiv: 2605.09595 · v1 · submitted 2026-05-10 · 💻 cs.NE · cs.RO

Recognition: no theorem link

Neuromorphic Reinforcement Learning for Quadruped Locomotion Control on Uneven Terrain

Abhronil Sengupta, Zhuangyu Han

Pith reviewed 2026-05-12 04:18 UTC · model grok-4.3

classification 💻 cs.NE cs.RO
keywords neuromorphic reinforcement learningequilibrium propagationquadruped locomotionuneven terraincentral pattern generatorproximal policy optimizationlocal learningon-robot adaptation
0
0 comments X

The pith

Equilibrium propagation trains a reinforcement learning controller for quadruped locomotion on uneven terrain that matches backpropagation performance while using far less memory.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a local-learning approach to train policies for a quadruped robot walking on uneven ground. It replaces standard backpropagation with equilibrium propagation, a method where updates depend only on local neuron states rather than global gradients. A central pattern generator provides the base rhythmic motion, and a residual network adjusts posture. Experiments on a simulated 12-joint robot show the new method reaches similar success rates, speeds, and stability as a conventional PPO agent. The key gain is a 4.3-fold reduction in GPU memory use during training, which could support running learning directly on the robot.

Core claim

The paper introduces an equilibrium-propagation-based proximal policy optimization algorithm for training continuous-control policies in a neuromorphic-compatible way. It combines this with a CPG-based policy for basic locomotion and a residual policy for terrain adaptation, deriving a nudging signal and clipping rule to stabilize the updates. On a 12-DoF A1 quadruped in a two-stage uneven terrain task, the resulting controller matches a backpropagation-trained PPO baseline in success rate, velocity tracking, power use, and stability, while cutting memory requirements by 4.3 times relative to BPTT.

What carries the argument

The EP-compatible PPO output-nudging signal combined with a two-sided ratio clipping mechanism that stabilizes policy updates during the relaxation phase of equilibrium propagation.

If this is right

  • The controller achieves stable policy convergence in a two-stage uneven terrain locomotion task.
  • Locomotion performance matches a backpropagation-trained PPO baseline in success rate, velocity tracking, actuator power, and body stability.
  • GPU memory efficiency improves by 4.3 times compared with backpropagation through time.
  • Local equilibrium-based learning can support high-dimensional embodied locomotion and provide an algorithmic foundation for low-power on-robot adaptation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This approach could allow robots to adapt their locomotion policies on the fly to changes like payload shifts or actuator wear without needing to return to a full simulation environment.
  • Integration with actual neuromorphic chips might further reduce power consumption for continuous learning on physical hardware.
  • Similar local-learning techniques might extend to other embodied control problems such as manipulation or navigation where backpropagation is costly.

Load-bearing premise

The assumption that the derived EP-compatible signals and clipping mechanism will transfer stably from simulation to real hardware despite sensor noise, actuator delays, and unmodeled dynamics.

What would settle it

A physical experiment on the A1 quadruped robot showing either unstable policy updates, significant performance degradation, or memory efficiency not translating when sensor noise and delays are present.

Figures

Figures reproduced from arXiv: 2605.09595 by Abhronil Sengupta, Zhuangyu Han.

Figure 1
Figure 1. Figure 1: The control architecture of the EP-based quadruped locomotion control. This schematic [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The convergence process of the locomotion training of BP and EP for stage-1 and stage-2. [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Locomotion performance measured in the walking test. From (a) to (g), the indicators are [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Per-update KL-divergence. The KL-divergence of all 5 versions, EP and BP algorithms, [PITH_FULL_IMAGE:figures/full_fig_p022_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The advantage-rnudging relationship of 1000 samples in the middle of a training process. The variation of log10(rt,nudging(ξt,out)) in positive and negative nudge phases are shown. The black dots represent the log10(rt(ξ ∗ t,out)) at free phase equilibrium. The gray line segments represent variation range of log10(rt,nudging(ξt,out)) in the entire positive or negative phase. selected 10 consecutive rollout… view at source ↗
Figure 6
Figure 6. Figure 6: Statistics of the aggregated extreme log10(rt,nudging(ξt,out)) across 10 rollouts. It can be observed that in the left panel, in the normalized advantage range [2.5, 4], the extreme log10(rt,nudging(ξt,out)) is significantly lower for ϵrev = 1.0 compared with two other cases. 24 [PITH_FULL_IMAGE:figures/full_fig_p024_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Convergence of stage-1 training for three values of [PITH_FULL_IMAGE:figures/full_fig_p025_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Convergence of stage-2 training for gradient scaling factor [PITH_FULL_IMAGE:figures/full_fig_p025_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Convergence of stage-1 training for dynamic and static gradient mask. The center line and [PITH_FULL_IMAGE:figures/full_fig_p026_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Step to convergence in the EP relaxation for both policy and value networks. The test [PITH_FULL_IMAGE:figures/full_fig_p026_10.png] view at source ↗
read the original abstract

Reinforcement learning (RL) has enabled robust quadruped locomotion over complex terrain, but most learned controllers are trained offline with backpropagation in massively parallel simulation and deployed as fixed policies, limiting adaptation to terrain variation, payload changes, actuator wear, and other real-world conditions under onboard power constraints. Local learning provides a potential path toward energy-aware on-robot adaptation by replacing global backpropagation graphs with updates driven by local neural states, making the learning rule more compatible with neuromorphic and in-memory computing substrates. This work proposes an equilibrium-propagation (EP)-based proximal policy optimization (PPO) framework for uneven-terrain quadruped locomotion. The controller combines a bio-inspired central pattern generator (CPG) policy with a residual postural adjustment policy, while replacing conventional backpropagation-trained policy and value networks with EP-enabled local learning. To train stochastic continuous-control policies with EP, we derive an EP-compatible PPO output-nudging signal and introduce a two-sided ratio clipping mechanism that stabilizes policy updates during relaxation. Experiments on a 12-DoF A1 quadruped show that the proposed controller achieves stable policy convergence in a two-stage uneven terrain locomotion task. Its locomotion performance is comparable to a backpropagation-trained PPO baseline in success rate, velocity tracking, actuator power, and body stability, while improving GPU memory efficiency by 4.3\(\times\) compared with backpropagation through time (BPTT). These results suggest that local equilibrium-based learning can support high-dimensional embodied locomotion and provide an algorithmic foundation for low-power on-robot adaptation and fine-tuning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes an equilibrium-propagation (EP)-based proximal policy optimization (PPO) framework for quadruped locomotion control. It combines a bio-inspired central pattern generator (CPG) policy with a residual postural adjustment policy, derives an EP-compatible output-nudging signal and two-sided ratio clipping mechanism to enable local learning for continuous-control policies, and reports simulation results on a 12-DoF A1 quadruped showing stable convergence on a two-stage uneven-terrain task with performance metrics comparable to a backpropagation-trained PPO baseline and 4.3× lower GPU memory usage than BPTT.

Significance. If the local EP updates remain stable, the work could support energy-efficient on-device adaptation for robotic systems by making RL compatible with neuromorphic substrates. Credit is due for the first-principles derivation of the nudging signal and clipping rule that allows EP to handle stochastic policies, as well as the concrete memory-efficiency demonstration in a high-dimensional locomotion task.

major comments (2)
  1. [Abstract and Experiments] Abstract and Experiments section: the claim of 'comparable' performance in success rate, velocity tracking, actuator power, and body stability is presented without error bars, standard deviations across random seeds, or any statistical tests. This leaves the quantitative support for equivalence to the BPTT baseline moderate at best and weakens the central empirical claim.
  2. [Introduction and Conclusion] Introduction and Conclusion: the manuscript states that the results 'provide an algorithmic foundation for low-power on-robot adaptation and fine-tuning.' However, all validation occurs in clean simulation; no analysis or experiments examine whether the derived EP-compatible PPO nudging signal and two-sided clipping remain stable under sensor noise, actuator delays, or unmodeled dynamics. This gap is load-bearing for the neuromorphic and on-robot positioning.
minor comments (1)
  1. [Methods] The two-sided ratio clipping mechanism is described in the methods but would benefit from an explicit equation or pseudocode block to improve reproducibility and clarity of the stabilization rule.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and insightful comments. We appreciate the positive assessment of the work's significance and the credit given for the derivation of the EP-compatible nudging signal and clipping mechanism. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract and Experiments] Abstract and Experiments section: the claim of 'comparable' performance in success rate, velocity tracking, actuator power, and body stability is presented without error bars, standard deviations across random seeds, or any statistical tests. This leaves the quantitative support for equivalence to the BPTT baseline moderate at best and weakens the central empirical claim.

    Authors: We agree that the absence of error bars, standard deviations, and statistical tests weakens the support for the comparability claims. In the revised manuscript, we will conduct additional runs with multiple random seeds (at least 5 per condition), report all metrics as mean ± standard deviation, and include statistical comparisons (e.g., paired t-tests or Mann-Whitney U tests with p-values) between the EP-PPO and BPTT-PPO results to quantify the degree of equivalence. revision: yes

  2. Referee: [Introduction and Conclusion] Introduction and Conclusion: the manuscript states that the results 'provide an algorithmic foundation for low-power on-robot adaptation and fine-tuning.' However, all validation occurs in clean simulation; no analysis or experiments examine whether the derived EP-compatible PPO nudging signal and two-sided clipping remain stable under sensor noise, actuator delays, or unmodeled dynamics. This gap is load-bearing for the neuromorphic and on-robot positioning.

    Authors: We acknowledge that the experiments are confined to idealized simulation without explicit modeling of sensor noise, actuator delays, or unmodeled dynamics. The manuscript's core contribution is the first-principles derivation of EP-compatible mechanisms for stochastic continuous-control policies and their empirical validation in a 12-DoF locomotion task. While we maintain that these mechanisms provide an algorithmic foundation, we agree the on-robot and neuromorphic positioning would be strengthened by robustness analysis. In revision we will (i) add a Limitations section that explicitly states the simulation-only scope and (ii) moderate the language in the Introduction and Conclusion to frame the results as a necessary first step toward on-robot adaptation rather than a direct enabler. Full noise-robustness studies remain future work. revision: partial

Circularity Check

0 steps flagged

No circularity: EP-PPO nudging signal and clipping derived independently

full rationale

The paper derives an EP-compatible PPO output-nudging signal and two-sided ratio clipping mechanism as first-principles constructions to adapt PPO for equilibrium propagation in continuous control. No equations or steps reduce the claimed results to fitted parameters, self-citations, or input data by construction. Performance equivalence to BPTT-PPO is shown via simulation experiments on the A1 quadruped, not by re-deriving the same quantities. The sim-to-real robustness concern is a generalization issue, not a circularity in the derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available, so no explicit free parameters, axioms, or invented entities can be extracted; the work relies on standard RL assumptions and the existence of an EP relaxation process.

pith-pipeline@v0.9.0 · 5580 in / 1211 out tokens · 75914 ms · 2026-05-12T04:18:34.966667+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages

  1. [1]

    Cpg-rl: Learning central pattern generators for quadruped locomotion

    Guillaume Bellegarda and Auke Ijspeert. Cpg-rl: Learning central pattern generators for quadruped locomotion. IEEE Robotics and Automation Letters, 7 0 (4): 0 12547--12554, 2022

  2. [2]

    Visual cpg-rl: Learning central pattern generators for visually-guided quadruped locomotion

    Guillaume Bellegarda, Milad Shafiee, and Auke Ijspeert. Visual cpg-rl: Learning central pattern generators for visually-guided quadruped locomotion. In 2024 IEEE International Conference on Robotics and Automation (ICRA), pages 1420--1427. IEEE, 2024

  3. [3]

    Michael Bloesch, Jan Humplik, Viorica Patraucean, Roland Hafner, Tuomas Haarnoja, Arunkumar Byravan, Noah Yamamoto Siegel, Saran Tunyasuvunakool, Federico Casarini, Nathan Batchelor, Francesco Romano, Stefano Saliceti, Martin Riedmiller, S. M. Ali Eslami, and Nicolas Heess. Towards real robot learning in the wild: A case study in bipedal locomotion. In Al...

  4. [4]

    Mueller, Akshara Rai, and Koushil Sreenath

    Shuxiao Chen, Bike Zhang, Mark W. Mueller, Akshara Rai, and Koushil Sreenath. Learning Torque Control for Quadrupedal Locomotion , March 2023

  5. [5]

    Robots that can adapt like animals

    Antoine Cully, Jeff Clune, Danesh Tarapore, and Jean-Baptiste Mouret. Robots that can adapt like animals. Nature, 521 0 (7553): 0 503--507, May 2015. ISSN 0028-0836, 1476-4687. doi:10.1038/nature14422

  6. [6]

    Current Principles of Motor Control , with Special Reference to Vertebrate Locomotion

    Sten Grillner and Abdeljabbar El Manira. Current Principles of Motor Control , with Special Reference to Vertebrate Locomotion . Physiological Reviews, 100 0 (1): 0 271--320, January 2020. ISSN 0031-9333, 1522-1210. doi:10.1152/physrev.00015.2019

  7. [7]

    The CPGs for Limbed Locomotion -- Facts and Fiction

    Sten Grillner and Alexander Kozlov. The CPGs for Limbed Locomotion -- Facts and Fiction . International Journal of Molecular Sciences, 22 0 (11): 0 5882, May 2021. ISSN 1422-0067. doi:10.3390/ijms22115882

  8. [8]

    Learning to walk in the real world with minimal human effort

    Sehoon Ha, Peng Xu, Zhenyu Tan, Sergey Levine, and Jie Tan. Learning to walk in the real world with minimal human effort. arXiv preprint arXiv:2002.08550, 2020

  9. [9]

    Learning to walk via deep reinforcement learning,

    Tuomas Haarnoja, Sehoon Ha, Aurick Zhou, Jie Tan, George Tucker, and Sergey Levine. Learning to walk via deep reinforcement learning. arXiv preprint arXiv:1812.11103, 2018

  10. [10]

    Learning quadrupedal high-speed running on uneven terrain

    Xinyu Han and Mingguo Zhao. Learning quadrupedal high-speed running on uneven terrain. Biomimetics, 9 0 (1): 0 37, 2024

  11. [11]

    Anymal parkour: Learning agile navigation for quadrupedal robots

    David Hoeller, Nikita Rudin, Dhionis Sako, and Marco Hutter. Anymal parkour: Learning agile navigation for quadrupedal robots. Science Robotics, 9 0 (88): 0 eadi7566, 2024

  12. [12]

    Learning agile and dynamic motor skills for legged robots

    Jemin Hwangbo, Joonho Lee, Alexey Dosovitskiy, Dario Bellicoso, Vassilios Tsounis, Vladlen Koltun, and Marco Hutter. Learning agile and dynamic motor skills for legged robots. Science Robotics, 4 0 (26): 0 eaau5872, 2019

  13. [13]

    Central pattern generators for locomotion control in animals and robots: a review

    Auke Jan Ijspeert. Central pattern generators for locomotion control in animals and robots: a review. Neural networks, 21 0 (4): 0 642--653, 2008

  14. [14]

    LOCOMOTOR CIRCUITS IN THE MAMMALIAN SPINAL CORD

    Ole Kiehn. LOCOMOTOR CIRCUITS IN THE MAMMALIAN SPINAL CORD . Annual Review of Neuroscience, 29 0 (1): 0 279--306, July 2006. ISSN 0147-006X, 1545-4126. doi:10.1146/annurev.neuro.29.051605.112910

  15. [15]

    Decoding the organization of spinal circuits that control locomotion

    Ole Kiehn. Decoding the organization of spinal circuits that control locomotion. Nature Reviews Neuroscience, 17 0 (4): 0 224--238, April 2016. ISSN 1471-003X, 1471-0048. doi:10.1038/nrn.2016.9

  16. [16]

    Highly dynamic quadruped locomotion via whole-body impulse control and model predictive control

    Donghyun Kim, Jared Di Carlo, Benjamin Katz, Gerardo Bledt, and Sangbae Kim. Highly dynamic quadruped locomotion via whole-body impulse control and model predictive control. arXiv preprint arXiv:1909.06586, 2019

  17. [17]

    Biologically inspired adaptive walking of a quadruped robot

    Hiroshi Kimura, Yasuhiro Fukuoka, and Avis H Cohen. Biologically inspired adaptive walking of a quadruped robot. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 365 0 (1850): 0 153--170, 2007

  18. [18]

    Combining backpropagation with Equilibrium Propagation to improve an Actor-Critic reinforcement learning framework

    Yoshimasa Kubo, Eric Chalmers, and Artur Luczak. Combining backpropagation with Equilibrium Propagation to improve an Actor-Critic reinforcement learning framework. Frontiers in Computational Neuroscience, 16: 0 980613, August 2022. ISSN 1662-5188. doi:10.3389/fncom.2022.980613

  19. [19]

    Rma: Rapid motor adaptation for legged robots,

    Ashish Kumar, Zipeng Fu, Deepak Pathak, and Jitendra Malik. Rma: Rapid motor adaptation for legged robots. arXiv preprint arXiv:2107.04034, 2021

  20. [20]

    Holomorphic equilibrium propagation computes exact gradients through finite size oscillations

    Axel Laborieux and Friedemann Zenke. Holomorphic equilibrium propagation computes exact gradients through finite size oscillations. Advances in neural information processing systems, 35: 0 12950--12963, 2022

  21. [21]

    Improving equilibrium propagation without weight symmetry through jacobian homeostasis

    Axel Laborieux and Friedemann Zenke. Improving equilibrium propagation without weight symmetry through jacobian homeostasis. arXiv preprint arXiv:2309.02214, 2023

  22. [22]

    Scaling equilibrium propagation to deep convnets by drastically reducing its gradient estimator bias

    Axel Laborieux, Maxence Ernoult, Benjamin Scellier, Yoshua Bengio, Julie Grollier, and Damien Querlioz. Scaling equilibrium propagation to deep convnets by drastically reducing its gradient estimator bias. Frontiers in neuroscience, 15: 0 633674, 2021

  23. [23]

    Difference Target Propagation

    Dong-Hyun Lee, Saizheng Zhang, Asja Fischer, and Yoshua Bengio. Difference Target Propagation . In Annalisa Appice, Pedro Pereira Rodrigues, V \'i tor Santos Costa, Carlos Soares, Jo \ a o Gama, and Al \'i pio Jorge, editors, Machine Learning and Knowledge Discovery in Databases , volume 9284, pages 498--515. Springer International Publishing, Cham, 2015....

  24. [24]

    Learning quadrupedal locomotion over challenging terrain,

    Joonho Lee, Jemin Hwangbo, Lorenz Wellhausen, Vladlen Koltun, and Marco Hutter. Learning quadrupedal locomotion over challenging terrain. Science Robotics, 5 0 (47): 0 eabc5986, October 2020. ISSN 2470-9476. doi:10.1126/scirobotics.abc5986

  25. [25]

    Lillicrap, Daniel Cownden, Douglas B

    Timothy P. Lillicrap, Daniel Cownden, Douglas B. Tweed, and Colin J. Akerman. Random synaptic feedback weights support error backpropagation for deep learning. Nature Communications, 7 0 (1): 0 13276, November 2016. ISSN 2041-1723. doi:10.1038/ncomms13276

  26. [26]

    Neural control and adaptive neural forward models for insect-like, energy-efficient, and adaptable locomotion of walking machines

    Poramate Manoonpong, Ulrich Parlitz, and Florentin W \"o rg \"o tter. Neural control and adaptive neural forward models for insect-like, energy-efficient, and adaptable locomotion of walking machines. Frontiers in neural circuits, 7: 0 12, 2013

  27. [27]

    Rapid locomotion via reinforcement learning

    Gabriel B Margolis, Ge Yang, Kartik Paigwar, Tao Chen, and Pulkit Agrawal. Rapid locomotion via reinforcement learning. The International Journal of Robotics Research, 43 0 (4): 0 572--587, 2024

  28. [28]

    Eqspike: spike-driven equilibrium propagation for neuromorphic implementations

    Erwann Martin, Maxence Ernoult, J \'e r \'e mie Laydevant, Shuai Li, Damien Querlioz, Teodora Petrisor, and Julie Grollier. Eqspike: spike-driven equilibrium propagation for neuromorphic implementations. Iscience, 24 0 (3), 2021

  29. [29]

    Learning robust perceptive locomotion for quadrupedal robots in the wild

    Takahiro Miki, Joonho Lee, Jemin Hwangbo, Lorenz Wellhausen, Vladlen Koltun, and Marco Hutter. Learning robust perceptive locomotion for quadrupedal robots in the wild. Science robotics, 7 0 (62): 0 eabk2822, 2022

  30. [30]

    Adaptive locomotion control of a hexapod robot via bio-inspired learning

    Wenjuan Ouyang, Haozhen Chi, Jiangnan Pang, Wenyu Liang, and Qinyuan Ren. Adaptive locomotion control of a hexapod robot via bio-inspired learning. Frontiers in Neurorobotics, 15: 0 627157, 2021

  31. [31]

    Pattern generators with sensory feedback for the control of quadruped locomotion

    Ludovic Righetti and Auke Jan Ijspeert. Pattern generators with sensory feedback for the control of quadruped locomotion. In 2008 IEEE International Conference on Robotics and Automation, pages 819--824. IEEE, 2008

  32. [32]

    Equilibrium propagation: Bridging the gap between energy-based models and backpropagation

    Benjamin Scellier and Yoshua Bengio. Equilibrium propagation: Bridging the gap between energy-based models and backpropagation. Frontiers in computational neuroscience, 11: 0 24, 2017

  33. [33]

    Energy-based learning algorithms for analog computing: a comparative study

    Benjamin Scellier, Maxence Ernoult, Jack Kendall, and Suhas Kumar. Energy-based learning algorithms for analog computing: a comparative study. Advances in neural information processing systems, 36: 0 52705--52731, 2023

  34. [34]

    High- Dimensional Continuous Control Using Generalized Advantage Estimation , 2015

    John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel. High- Dimensional Continuous Control Using Generalized Advantage Estimation , 2015

  35. [35]

    Proximal Policy Optimization Algorithms , 2017

    John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal Policy Optimization Algorithms , 2017

  36. [36]

    Two-stage learning of cpg and postural reflex towards quadruped locomotion on uneven terrain with simple reward

    Ryosei Seto, Guanda Li, Kyo Kutsuzawa, Dai Owaki, and Mitsuhiro Hayashibe. Two-stage learning of cpg and postural reflex towards quadruped locomotion on uneven terrain with simple reward. IEEE Access, 2025

  37. [37]

    A walk in the park: Learning to walk in 20 minutes with model-free reinforcement learning.arXiv preprint arXiv:2208.07860, 2022

    Laura Smith, Ilya Kostrikov, and Sergey Levine. A walk in the park: Learning to walk in 20 minutes with model-free reinforcement learning. arXiv preprint arXiv:2208.07860, 2022

  38. [38]

    Foot trajectory as a key factor for diverse gait patterns in quadruped robot locomotion

    Shura Suzuki, Kosuke Matayoshi, Mitsuhiro Hayashibe, and Dai Owaki. Foot trajectory as a key factor for diverse gait patterns in quadruped robot locomotion. Scientific Reports, 15 0 (1): 0 1861, January 2025. ISSN 2045-2322. doi:10.1038/s41598-024-84060-5

  39. [39]

    Sim-to-Real: Learning Agile Locomotion For Quadruped Robots

    Jie Tan, Tingnan Zhang, Erwin Coumans, Atil Iscen, Yunfei Bai, Danijar Hafner, Steven Bohez, and Vincent Vanhoucke. Sim-to-real: Learning agile locomotion for quadruped robots. arXiv preprint arXiv:1804.10332, 2018

  40. [40]

    MuJoCo: A physics engine for model-based control

    Emanuel Todorov, Tom Erez, and Yuval Tassa. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5026--5033. IEEE, 2012. doi:10.1109/IROS.2012.6386109

  41. [41]

    P.J. Werbos. Backpropagation through time: What it does and how to do it. Proceedings of the IEEE, 78 0 (10): 0 1550--1560, October 1990. ISSN 00189219. doi:10.1109/5.58337

  42. [42]

    James C. R. Whittington and Rafal Bogacz. An Approximation of the Error Backpropagation Algorithm in a Predictive Coding Network with Local Hebbian Synaptic Plasticity . Neural Computation, 29 0 (5): 0 1229--1262, May 2017. ISSN 0899-7667, 1530-888X. doi:10.1162/NECO_a_00949