arxiv: 2605.09595 · v1 · submitted 2026-05-10 · 💻 cs.NE · cs.RO

Recognition: no theorem link

Neuromorphic Reinforcement Learning for Quadruped Locomotion Control on Uneven Terrain

Abhronil Sengupta, Zhuangyu Han

Pith reviewed 2026-05-12 04:18 UTC · model grok-4.3

classification 💻 cs.NE cs.RO

keywords neuromorphic reinforcement learningequilibrium propagationquadruped locomotionuneven terraincentral pattern generatorproximal policy optimizationlocal learningon-robot adaptation

0 comments

The pith

Equilibrium propagation trains a reinforcement learning controller for quadruped locomotion on uneven terrain that matches backpropagation performance while using far less memory.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a local-learning approach to train policies for a quadruped robot walking on uneven ground. It replaces standard backpropagation with equilibrium propagation, a method where updates depend only on local neuron states rather than global gradients. A central pattern generator provides the base rhythmic motion, and a residual network adjusts posture. Experiments on a simulated 12-joint robot show the new method reaches similar success rates, speeds, and stability as a conventional PPO agent. The key gain is a 4.3-fold reduction in GPU memory use during training, which could support running learning directly on the robot.

Core claim

The paper introduces an equilibrium-propagation-based proximal policy optimization algorithm for training continuous-control policies in a neuromorphic-compatible way. It combines this with a CPG-based policy for basic locomotion and a residual policy for terrain adaptation, deriving a nudging signal and clipping rule to stabilize the updates. On a 12-DoF A1 quadruped in a two-stage uneven terrain task, the resulting controller matches a backpropagation-trained PPO baseline in success rate, velocity tracking, power use, and stability, while cutting memory requirements by 4.3 times relative to BPTT.

What carries the argument

The EP-compatible PPO output-nudging signal combined with a two-sided ratio clipping mechanism that stabilizes policy updates during the relaxation phase of equilibrium propagation.

If this is right

The controller achieves stable policy convergence in a two-stage uneven terrain locomotion task.
Locomotion performance matches a backpropagation-trained PPO baseline in success rate, velocity tracking, actuator power, and body stability.
GPU memory efficiency improves by 4.3 times compared with backpropagation through time.
Local equilibrium-based learning can support high-dimensional embodied locomotion and provide an algorithmic foundation for low-power on-robot adaptation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This approach could allow robots to adapt their locomotion policies on the fly to changes like payload shifts or actuator wear without needing to return to a full simulation environment.
Integration with actual neuromorphic chips might further reduce power consumption for continuous learning on physical hardware.
Similar local-learning techniques might extend to other embodied control problems such as manipulation or navigation where backpropagation is costly.

Load-bearing premise

The assumption that the derived EP-compatible signals and clipping mechanism will transfer stably from simulation to real hardware despite sensor noise, actuator delays, and unmodeled dynamics.

What would settle it

A physical experiment on the A1 quadruped robot showing either unstable policy updates, significant performance degradation, or memory efficiency not translating when sensor noise and delays are present.

Figures

Figures reproduced from arXiv: 2605.09595 by Abhronil Sengupta, Zhuangyu Han.

**Figure 2.** Figure 2: The convergence process of the locomotion training of BP and EP for stage-1 and stage-2. [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Locomotion performance measured in the walking test. From (a) to (g), the indicators are [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Per-update KL-divergence. The KL-divergence of all 5 versions, EP and BP algorithms, [PITH_FULL_IMAGE:figures/full_fig_p022_4.png] view at source ↗

**Figure 5.** Figure 5: The advantage-rnudging relationship of 1000 samples in the middle of a training process. The variation of log10(rt,nudging(ξt,out)) in positive and negative nudge phases are shown. The black dots represent the log10(rt(ξ ∗ t,out)) at free phase equilibrium. The gray line segments represent variation range of log10(rt,nudging(ξt,out)) in the entire positive or negative phase. selected 10 consecutive rollout… view at source ↗

**Figure 6.** Figure 6: Statistics of the aggregated extreme log10(rt,nudging(ξt,out)) across 10 rollouts. It can be observed that in the left panel, in the normalized advantage range [2.5, 4], the extreme log10(rt,nudging(ξt,out)) is significantly lower for ϵrev = 1.0 compared with two other cases. 24 [PITH_FULL_IMAGE:figures/full_fig_p024_6.png] view at source ↗

**Figure 7.** Figure 7: Convergence of stage-1 training for three values of [PITH_FULL_IMAGE:figures/full_fig_p025_7.png] view at source ↗

**Figure 8.** Figure 8: Convergence of stage-2 training for gradient scaling factor [PITH_FULL_IMAGE:figures/full_fig_p025_8.png] view at source ↗

**Figure 9.** Figure 9: Convergence of stage-1 training for dynamic and static gradient mask. The center line and [PITH_FULL_IMAGE:figures/full_fig_p026_9.png] view at source ↗

**Figure 10.** Figure 10: Step to convergence in the EP relaxation for both policy and value networks. The test [PITH_FULL_IMAGE:figures/full_fig_p026_10.png] view at source ↗

read the original abstract

Reinforcement learning (RL) has enabled robust quadruped locomotion over complex terrain, but most learned controllers are trained offline with backpropagation in massively parallel simulation and deployed as fixed policies, limiting adaptation to terrain variation, payload changes, actuator wear, and other real-world conditions under onboard power constraints. Local learning provides a potential path toward energy-aware on-robot adaptation by replacing global backpropagation graphs with updates driven by local neural states, making the learning rule more compatible with neuromorphic and in-memory computing substrates. This work proposes an equilibrium-propagation (EP)-based proximal policy optimization (PPO) framework for uneven-terrain quadruped locomotion. The controller combines a bio-inspired central pattern generator (CPG) policy with a residual postural adjustment policy, while replacing conventional backpropagation-trained policy and value networks with EP-enabled local learning. To train stochastic continuous-control policies with EP, we derive an EP-compatible PPO output-nudging signal and introduce a two-sided ratio clipping mechanism that stabilizes policy updates during relaxation. Experiments on a 12-DoF A1 quadruped show that the proposed controller achieves stable policy convergence in a two-stage uneven terrain locomotion task. Its locomotion performance is comparable to a backpropagation-trained PPO baseline in success rate, velocity tracking, actuator power, and body stability, while improving GPU memory efficiency by 4.3\(\times\) compared with backpropagation through time (BPTT). These results suggest that local equilibrium-based learning can support high-dimensional embodied locomotion and provide an algorithmic foundation for low-power on-robot adaptation and fine-tuning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

They worked out an EP-compatible PPO update with output nudging and two-sided clipping, then showed it trains a CPG-residual policy on a simulated 12-DoF quadruped at roughly the same performance as BPTT while using far less memory.

read the letter

The core advance is the derivation of an equilibrium-propagation version of PPO that handles stochastic continuous control. They introduce a specific output-nudging signal and a two-sided ratio clip to keep the relaxation stable, and this combination does not appear in the prior EP-RL work they reference. That is a real technical contribution rather than a straightforward re-use of existing local learning rules. They then apply it to a 12-DoF A1 quadruped with a CPG base policy plus residual postural correction, running a two-stage uneven-terrain task in simulation. The reported metrics on success rate, velocity tracking, actuator power, and body stability sit close to a standard backprop PPO baseline, and they document a 4.3 times reduction in GPU memory relative to BPTT. That memory number is the most concrete practical result in the paper and shows the local-update approach can scale to this dimensionality without exploding the compute graph. The experiments are limited to clean simulation with no error bars or multi-seed statistics reported, so the quantitative comparison is harder to assess than it could be. More importantly, the broader claim that this supplies an algorithmic foundation for low-power on-robot adaptation is not yet supported by any test under sensor noise, actuator delay, or model mismatch. The sim-to-real transfer therefore remains an open assumption rather than a demonstrated property. This work is aimed at researchers who already care about neuromorphic or in-memory hardware for RL and want a working example of EP applied to high-dimensional locomotion. A reader looking for a new local learning rule that actually runs on a non-trivial control task will get something usable from the derivation and the clipping mechanism. It is worth sending to peer review because the construction is original and the simulation results are on a realistic robot model, even though the hardware robustness story will need additional experiments in revision.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes an equilibrium-propagation (EP)-based proximal policy optimization (PPO) framework for quadruped locomotion control. It combines a bio-inspired central pattern generator (CPG) policy with a residual postural adjustment policy, derives an EP-compatible output-nudging signal and two-sided ratio clipping mechanism to enable local learning for continuous-control policies, and reports simulation results on a 12-DoF A1 quadruped showing stable convergence on a two-stage uneven-terrain task with performance metrics comparable to a backpropagation-trained PPO baseline and 4.3× lower GPU memory usage than BPTT.

Significance. If the local EP updates remain stable, the work could support energy-efficient on-device adaptation for robotic systems by making RL compatible with neuromorphic substrates. Credit is due for the first-principles derivation of the nudging signal and clipping rule that allows EP to handle stochastic policies, as well as the concrete memory-efficiency demonstration in a high-dimensional locomotion task.

major comments (2)

[Abstract and Experiments] Abstract and Experiments section: the claim of 'comparable' performance in success rate, velocity tracking, actuator power, and body stability is presented without error bars, standard deviations across random seeds, or any statistical tests. This leaves the quantitative support for equivalence to the BPTT baseline moderate at best and weakens the central empirical claim.
[Introduction and Conclusion] Introduction and Conclusion: the manuscript states that the results 'provide an algorithmic foundation for low-power on-robot adaptation and fine-tuning.' However, all validation occurs in clean simulation; no analysis or experiments examine whether the derived EP-compatible PPO nudging signal and two-sided clipping remain stable under sensor noise, actuator delays, or unmodeled dynamics. This gap is load-bearing for the neuromorphic and on-robot positioning.

minor comments (1)

[Methods] The two-sided ratio clipping mechanism is described in the methods but would benefit from an explicit equation or pseudocode block to improve reproducibility and clarity of the stabilization rule.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and insightful comments. We appreciate the positive assessment of the work's significance and the credit given for the derivation of the EP-compatible nudging signal and clipping mechanism. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract and Experiments] Abstract and Experiments section: the claim of 'comparable' performance in success rate, velocity tracking, actuator power, and body stability is presented without error bars, standard deviations across random seeds, or any statistical tests. This leaves the quantitative support for equivalence to the BPTT baseline moderate at best and weakens the central empirical claim.

Authors: We agree that the absence of error bars, standard deviations, and statistical tests weakens the support for the comparability claims. In the revised manuscript, we will conduct additional runs with multiple random seeds (at least 5 per condition), report all metrics as mean ± standard deviation, and include statistical comparisons (e.g., paired t-tests or Mann-Whitney U tests with p-values) between the EP-PPO and BPTT-PPO results to quantify the degree of equivalence. revision: yes
Referee: [Introduction and Conclusion] Introduction and Conclusion: the manuscript states that the results 'provide an algorithmic foundation for low-power on-robot adaptation and fine-tuning.' However, all validation occurs in clean simulation; no analysis or experiments examine whether the derived EP-compatible PPO nudging signal and two-sided clipping remain stable under sensor noise, actuator delays, or unmodeled dynamics. This gap is load-bearing for the neuromorphic and on-robot positioning.

Authors: We acknowledge that the experiments are confined to idealized simulation without explicit modeling of sensor noise, actuator delays, or unmodeled dynamics. The manuscript's core contribution is the first-principles derivation of EP-compatible mechanisms for stochastic continuous-control policies and their empirical validation in a 12-DoF locomotion task. While we maintain that these mechanisms provide an algorithmic foundation, we agree the on-robot and neuromorphic positioning would be strengthened by robustness analysis. In revision we will (i) add a Limitations section that explicitly states the simulation-only scope and (ii) moderate the language in the Introduction and Conclusion to frame the results as a necessary first step toward on-robot adaptation rather than a direct enabler. Full noise-robustness studies remain future work. revision: partial

Circularity Check

0 steps flagged

No circularity: EP-PPO nudging signal and clipping derived independently

full rationale

The paper derives an EP-compatible PPO output-nudging signal and two-sided ratio clipping mechanism as first-principles constructions to adapt PPO for equilibrium propagation in continuous control. No equations or steps reduce the claimed results to fitted parameters, self-citations, or input data by construction. Performance equivalence to BPTT-PPO is shown via simulation experiments on the A1 quadruped, not by re-deriving the same quantities. The sim-to-real robustness concern is a generalization issue, not a circularity in the derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available, so no explicit free parameters, axioms, or invented entities can be extracted; the work relies on standard RL assumptions and the existence of an EP relaxation process.

pith-pipeline@v0.9.0 · 5580 in / 1211 out tokens · 75914 ms · 2026-05-12T04:18:34.966667+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages

[1]

Cpg-rl: Learning central pattern generators for quadruped locomotion

Guillaume Bellegarda and Auke Ijspeert. Cpg-rl: Learning central pattern generators for quadruped locomotion. IEEE Robotics and Automation Letters, 7 0 (4): 0 12547--12554, 2022

work page 2022
[2]

Visual cpg-rl: Learning central pattern generators for visually-guided quadruped locomotion

Guillaume Bellegarda, Milad Shafiee, and Auke Ijspeert. Visual cpg-rl: Learning central pattern generators for visually-guided quadruped locomotion. In 2024 IEEE International Conference on Robotics and Automation (ICRA), pages 1420--1427. IEEE, 2024

work page 2024
[3]

Michael Bloesch, Jan Humplik, Viorica Patraucean, Roland Hafner, Tuomas Haarnoja, Arunkumar Byravan, Noah Yamamoto Siegel, Saran Tunyasuvunakool, Federico Casarini, Nathan Batchelor, Francesco Romano, Stefano Saliceti, Martin Riedmiller, S. M. Ali Eslami, and Nicolas Heess. Towards real robot learning in the wild: A case study in bipedal locomotion. In Al...

work page 2022
[4]

Mueller, Akshara Rai, and Koushil Sreenath

Shuxiao Chen, Bike Zhang, Mark W. Mueller, Akshara Rai, and Koushil Sreenath. Learning Torque Control for Quadrupedal Locomotion , March 2023

work page 2023
[5]

Robots that can adapt like animals

Antoine Cully, Jeff Clune, Danesh Tarapore, and Jean-Baptiste Mouret. Robots that can adapt like animals. Nature, 521 0 (7553): 0 503--507, May 2015. ISSN 0028-0836, 1476-4687. doi:10.1038/nature14422

work page doi:10.1038/nature14422 2015
[6]

Current Principles of Motor Control , with Special Reference to Vertebrate Locomotion

Sten Grillner and Abdeljabbar El Manira. Current Principles of Motor Control , with Special Reference to Vertebrate Locomotion . Physiological Reviews, 100 0 (1): 0 271--320, January 2020. ISSN 0031-9333, 1522-1210. doi:10.1152/physrev.00015.2019

work page doi:10.1152/physrev.00015.2019 2020
[7]

The CPGs for Limbed Locomotion -- Facts and Fiction

Sten Grillner and Alexander Kozlov. The CPGs for Limbed Locomotion -- Facts and Fiction . International Journal of Molecular Sciences, 22 0 (11): 0 5882, May 2021. ISSN 1422-0067. doi:10.3390/ijms22115882

work page doi:10.3390/ijms22115882 2021
[8]

Learning to walk in the real world with minimal human effort

Sehoon Ha, Peng Xu, Zhenyu Tan, Sergey Levine, and Jie Tan. Learning to walk in the real world with minimal human effort. arXiv preprint arXiv:2002.08550, 2020

work page arXiv 2002
[9]

Learning to walk via deep reinforcement learning,

Tuomas Haarnoja, Sehoon Ha, Aurick Zhou, Jie Tan, George Tucker, and Sergey Levine. Learning to walk via deep reinforcement learning. arXiv preprint arXiv:1812.11103, 2018

work page arXiv 2018
[10]

Learning quadrupedal high-speed running on uneven terrain

Xinyu Han and Mingguo Zhao. Learning quadrupedal high-speed running on uneven terrain. Biomimetics, 9 0 (1): 0 37, 2024

work page 2024
[11]

Anymal parkour: Learning agile navigation for quadrupedal robots

David Hoeller, Nikita Rudin, Dhionis Sako, and Marco Hutter. Anymal parkour: Learning agile navigation for quadrupedal robots. Science Robotics, 9 0 (88): 0 eadi7566, 2024

work page 2024
[12]

Learning agile and dynamic motor skills for legged robots

Jemin Hwangbo, Joonho Lee, Alexey Dosovitskiy, Dario Bellicoso, Vassilios Tsounis, Vladlen Koltun, and Marco Hutter. Learning agile and dynamic motor skills for legged robots. Science Robotics, 4 0 (26): 0 eaau5872, 2019

work page 2019
[13]

Central pattern generators for locomotion control in animals and robots: a review

Auke Jan Ijspeert. Central pattern generators for locomotion control in animals and robots: a review. Neural networks, 21 0 (4): 0 642--653, 2008

work page 2008
[14]

LOCOMOTOR CIRCUITS IN THE MAMMALIAN SPINAL CORD

Ole Kiehn. LOCOMOTOR CIRCUITS IN THE MAMMALIAN SPINAL CORD . Annual Review of Neuroscience, 29 0 (1): 0 279--306, July 2006. ISSN 0147-006X, 1545-4126. doi:10.1146/annurev.neuro.29.051605.112910

work page doi:10.1146/annurev.neuro.29.051605.112910 2006
[15]

Decoding the organization of spinal circuits that control locomotion

Ole Kiehn. Decoding the organization of spinal circuits that control locomotion. Nature Reviews Neuroscience, 17 0 (4): 0 224--238, April 2016. ISSN 1471-003X, 1471-0048. doi:10.1038/nrn.2016.9

work page doi:10.1038/nrn.2016.9 2016
[16]

Highly dynamic quadruped locomotion via whole-body impulse control and model predictive control

Donghyun Kim, Jared Di Carlo, Benjamin Katz, Gerardo Bledt, and Sangbae Kim. Highly dynamic quadruped locomotion via whole-body impulse control and model predictive control. arXiv preprint arXiv:1909.06586, 2019

work page arXiv 1909
[17]

Biologically inspired adaptive walking of a quadruped robot

Hiroshi Kimura, Yasuhiro Fukuoka, and Avis H Cohen. Biologically inspired adaptive walking of a quadruped robot. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 365 0 (1850): 0 153--170, 2007

work page 2007
[18]

Combining backpropagation with Equilibrium Propagation to improve an Actor-Critic reinforcement learning framework

Yoshimasa Kubo, Eric Chalmers, and Artur Luczak. Combining backpropagation with Equilibrium Propagation to improve an Actor-Critic reinforcement learning framework. Frontiers in Computational Neuroscience, 16: 0 980613, August 2022. ISSN 1662-5188. doi:10.3389/fncom.2022.980613

work page doi:10.3389/fncom.2022.980613 2022
[19]

Rma: Rapid motor adaptation for legged robots,

Ashish Kumar, Zipeng Fu, Deepak Pathak, and Jitendra Malik. Rma: Rapid motor adaptation for legged robots. arXiv preprint arXiv:2107.04034, 2021

work page arXiv 2021
[20]

Holomorphic equilibrium propagation computes exact gradients through finite size oscillations

Axel Laborieux and Friedemann Zenke. Holomorphic equilibrium propagation computes exact gradients through finite size oscillations. Advances in neural information processing systems, 35: 0 12950--12963, 2022

work page 2022
[21]

Improving equilibrium propagation without weight symmetry through jacobian homeostasis

Axel Laborieux and Friedemann Zenke. Improving equilibrium propagation without weight symmetry through jacobian homeostasis. arXiv preprint arXiv:2309.02214, 2023

work page arXiv 2023
[22]

Scaling equilibrium propagation to deep convnets by drastically reducing its gradient estimator bias

Axel Laborieux, Maxence Ernoult, Benjamin Scellier, Yoshua Bengio, Julie Grollier, and Damien Querlioz. Scaling equilibrium propagation to deep convnets by drastically reducing its gradient estimator bias. Frontiers in neuroscience, 15: 0 633674, 2021

work page 2021
[23]

Difference Target Propagation

Dong-Hyun Lee, Saizheng Zhang, Asja Fischer, and Yoshua Bengio. Difference Target Propagation . In Annalisa Appice, Pedro Pereira Rodrigues, V \'i tor Santos Costa, Carlos Soares, Jo \ a o Gama, and Al \'i pio Jorge, editors, Machine Learning and Knowledge Discovery in Databases , volume 9284, pages 498--515. Springer International Publishing, Cham, 2015....

work page doi:10.1007/978-3-319-23528-8_31 2015
[24]

Learning quadrupedal locomotion over challenging terrain,

Joonho Lee, Jemin Hwangbo, Lorenz Wellhausen, Vladlen Koltun, and Marco Hutter. Learning quadrupedal locomotion over challenging terrain. Science Robotics, 5 0 (47): 0 eabc5986, October 2020. ISSN 2470-9476. doi:10.1126/scirobotics.abc5986

work page doi:10.1126/scirobotics.abc5986 2020
[25]

Lillicrap, Daniel Cownden, Douglas B

Timothy P. Lillicrap, Daniel Cownden, Douglas B. Tweed, and Colin J. Akerman. Random synaptic feedback weights support error backpropagation for deep learning. Nature Communications, 7 0 (1): 0 13276, November 2016. ISSN 2041-1723. doi:10.1038/ncomms13276

work page doi:10.1038/ncomms13276 2016
[26]

Neural control and adaptive neural forward models for insect-like, energy-efficient, and adaptable locomotion of walking machines

Poramate Manoonpong, Ulrich Parlitz, and Florentin W \"o rg \"o tter. Neural control and adaptive neural forward models for insect-like, energy-efficient, and adaptable locomotion of walking machines. Frontiers in neural circuits, 7: 0 12, 2013

work page 2013
[27]

Rapid locomotion via reinforcement learning

Gabriel B Margolis, Ge Yang, Kartik Paigwar, Tao Chen, and Pulkit Agrawal. Rapid locomotion via reinforcement learning. The International Journal of Robotics Research, 43 0 (4): 0 572--587, 2024

work page 2024
[28]

Eqspike: spike-driven equilibrium propagation for neuromorphic implementations

Erwann Martin, Maxence Ernoult, J \'e r \'e mie Laydevant, Shuai Li, Damien Querlioz, Teodora Petrisor, and Julie Grollier. Eqspike: spike-driven equilibrium propagation for neuromorphic implementations. Iscience, 24 0 (3), 2021

work page 2021
[29]

Learning robust perceptive locomotion for quadrupedal robots in the wild

Takahiro Miki, Joonho Lee, Jemin Hwangbo, Lorenz Wellhausen, Vladlen Koltun, and Marco Hutter. Learning robust perceptive locomotion for quadrupedal robots in the wild. Science robotics, 7 0 (62): 0 eabk2822, 2022

work page 2022
[30]

Adaptive locomotion control of a hexapod robot via bio-inspired learning

Wenjuan Ouyang, Haozhen Chi, Jiangnan Pang, Wenyu Liang, and Qinyuan Ren. Adaptive locomotion control of a hexapod robot via bio-inspired learning. Frontiers in Neurorobotics, 15: 0 627157, 2021

work page 2021
[31]

Pattern generators with sensory feedback for the control of quadruped locomotion

Ludovic Righetti and Auke Jan Ijspeert. Pattern generators with sensory feedback for the control of quadruped locomotion. In 2008 IEEE International Conference on Robotics and Automation, pages 819--824. IEEE, 2008

work page 2008
[32]

Equilibrium propagation: Bridging the gap between energy-based models and backpropagation

Benjamin Scellier and Yoshua Bengio. Equilibrium propagation: Bridging the gap between energy-based models and backpropagation. Frontiers in computational neuroscience, 11: 0 24, 2017

work page 2017
[33]

Energy-based learning algorithms for analog computing: a comparative study

Benjamin Scellier, Maxence Ernoult, Jack Kendall, and Suhas Kumar. Energy-based learning algorithms for analog computing: a comparative study. Advances in neural information processing systems, 36: 0 52705--52731, 2023

work page 2023
[34]

High- Dimensional Continuous Control Using Generalized Advantage Estimation , 2015

John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel. High- Dimensional Continuous Control Using Generalized Advantage Estimation , 2015

work page 2015
[35]

Proximal Policy Optimization Algorithms , 2017

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal Policy Optimization Algorithms , 2017

work page 2017
[36]

Two-stage learning of cpg and postural reflex towards quadruped locomotion on uneven terrain with simple reward

Ryosei Seto, Guanda Li, Kyo Kutsuzawa, Dai Owaki, and Mitsuhiro Hayashibe. Two-stage learning of cpg and postural reflex towards quadruped locomotion on uneven terrain with simple reward. IEEE Access, 2025

work page 2025
[37]

A walk in the park: Learning to walk in 20 minutes with model-free reinforcement learning.arXiv preprint arXiv:2208.07860, 2022

Laura Smith, Ilya Kostrikov, and Sergey Levine. A walk in the park: Learning to walk in 20 minutes with model-free reinforcement learning. arXiv preprint arXiv:2208.07860, 2022

work page arXiv 2022
[38]

Foot trajectory as a key factor for diverse gait patterns in quadruped robot locomotion

Shura Suzuki, Kosuke Matayoshi, Mitsuhiro Hayashibe, and Dai Owaki. Foot trajectory as a key factor for diverse gait patterns in quadruped robot locomotion. Scientific Reports, 15 0 (1): 0 1861, January 2025. ISSN 2045-2322. doi:10.1038/s41598-024-84060-5

work page doi:10.1038/s41598-024-84060-5 2025
[39]

Sim-to-Real: Learning Agile Locomotion For Quadruped Robots

Jie Tan, Tingnan Zhang, Erwin Coumans, Atil Iscen, Yunfei Bai, Danijar Hafner, Steven Bohez, and Vincent Vanhoucke. Sim-to-real: Learning agile locomotion for quadruped robots. arXiv preprint arXiv:1804.10332, 2018

work page Pith review arXiv 2018
[40]

MuJoCo: A physics engine for model-based control

Emanuel Todorov, Tom Erez, and Yuval Tassa. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5026--5033. IEEE, 2012. doi:10.1109/IROS.2012.6386109

work page doi:10.1109/iros.2012.6386109 2012
[41]

P.J. Werbos. Backpropagation through time: What it does and how to do it. Proceedings of the IEEE, 78 0 (10): 0 1550--1560, October 1990. ISSN 00189219. doi:10.1109/5.58337

work page doi:10.1109/5.58337 1990
[42]

James C. R. Whittington and Rafal Bogacz. An Approximation of the Error Backpropagation Algorithm in a Predictive Coding Network with Local Hebbian Synaptic Plasticity . Neural Computation, 29 0 (5): 0 1229--1262, May 2017. ISSN 0899-7667, 1530-888X. doi:10.1162/NECO_a_00949

work page doi:10.1162/neco_a_00949 2017