Learning Gait-Aware Quadruped Locomotion with Temporal Logic Specifications

Alfredo Reina Corona; Cagan Bakirci; Jyotirmoy V. Deshmukh; Keyan Azbijari; Merve Atasever

arxiv: 2607.00442 · v1 · pith:4EQAZYJInew · submitted 2026-07-01 · 💻 cs.RO · cs.AI

Learning Gait-Aware Quadruped Locomotion with Temporal Logic Specifications

Merve Atasever , Cagan Bakirci , Alfredo Reina Corona , Keyan Azbijari , Jyotirmoy V. Deshmukh This is my paper

Pith reviewed 2026-07-02 11:56 UTC · model grok-4.3

classification 💻 cs.RO cs.AI

keywords quadruped locomotionreinforcement learningsignal temporal logicreward shapinggait specificationPPOtemporal logic constraints

0 comments

The pith

Signal Temporal Logic specifications shape rewards that improve quadruped velocity tracking and training stability over hand-crafted baselines.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a method to specify distinct quadruped gaits through parameterized Signal Temporal Logic constraints that encode safety bounds, gait synchronization, command tracking, and actuation limits. These constraints are turned into dense, continuous rewards via smooth approximations of their robustness measures, which then guide policy training with Proximal Policy Optimization. Parametric templates are defined and calibrated for three speed regimes from reference rollouts, and the resulting rewards are tested on a simulated Barkour quadruped with parallel simulation and domain randomization. The central demonstration is that this approach produces policies with tighter velocity tracking and more stable training than standard hand-crafted reward functions. A sympathetic reader would care because it replaces opaque reward design with explicit, interpretable temporal constraints for controlling complex locomotion behaviors.

Core claim

We introduce a framework where distinct gaits are specified using parameterized constraints expressed in Signal Temporal Logic (STL). These include safety bounds, gait synchronization constraints, command tracking, and actuation bounds. From these specifications, we develop a reward shaping mechanism that provides learning agents a dense, continuous reward landscape that encodes desired behavior. We define parametric STL templates for three speed regimes (walking-trot, trot, bound), calibrate their parameters from reference rollouts, and compute rewards from using smooth approximations of STL robustness over the rollouts. The generated rewards can be used to provide shaped gradients compatib

What carries the argument

Parameterized STL templates for three speed regimes that generate dense rewards from smooth robustness approximations for PPO training.

If this is right

Policies can be trained with explicit control over gait type through adjustments to the STL parameters.
Reward signals derived from temporal logic provide denser gradients that stabilize PPO training.
Velocity tracking accuracy improves when rewards encode synchronization and command constraints from STL.
Domain randomization combines with the shaped rewards to produce policies robust to variations in simulation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same STL specifications used for reward shaping could later verify whether a trained policy satisfies the original gait requirements.
Calibrating templates directly from real-robot data rather than simulation rollouts might reduce the sim-to-real gap.
Similar parametric STL templates could be developed for other periodic robot behaviors such as manipulation or multi-agent coordination.

Load-bearing premise

The parametric STL templates for the three speed regimes, calibrated from reference rollouts, accurately capture desired gait behaviors and produce effective dense rewards.

What would settle it

Training experiments in which STL-shaped rewards produce higher velocity tracking error or less stable learning curves than hand-crafted rewards would falsify the performance advantage.

Figures

Figures reproduced from arXiv: 2607.00442 by Alfredo Reina Corona, Cagan Bakirci, Jyotirmoy V. Deshmukh, Keyan Azbijari, Merve Atasever.

**Figure 1.** Figure 1: Overall Pipeline. havior whose contact patterns remain appropriate for the commanded regime. State-of-the-art deep RL pipelines address the first objective, and partially the second through carefully engineered reward terms, curriculum learning, and domain randomization [14, 11, 13]. However, these reward functions are often difficult to interpret, and they provide only indirect control over contact-seque… view at source ↗

**Figure 2.** Figure 2: Barkour vb robot in MJX. Formal specifications provide an alternative perspective: rather than expressing locomotion objectives through loosely coupled rewards, desired behaviors can be encoded as logical specifications. Signal Temporal Logic (STL) is a framework that has been used in robotics to express bounded-time constraints over real-valued signals. It admits quantitative or robustness semantics, i.… view at source ↗

read the original abstract

Reinforcement learning (RL) for quadruped locomotion commonly depends on fixed, hand-crafted, and Markovian reward functions that limit both interpretability of learned policies and lack explicit control over gait behaviors. We introduce a framework where distinct gaits are specified using parameterized constraints expressed in Signal Temporal Logic (STL). These include safety bounds, gait synchronization constraints, command tracking, and actuation bounds. From these specifications, we develop a reward shaping mechanism that provides learning agents a dense, continuous reward landscape that encodes desired behavior. We define parametric STL templates for three speed regimes (walking-trot, trot, bound), calibrate their parameters from reference rollouts, and compute rewards from using smooth approximations of STL robustness over the rollouts. The generated rewards can be used to provide shaped gradients compatible with Proximal Policy Optimization (PPO). We instantiate the approach on Google's Barkour quadruped robot in MuJoCo XLA (MJX). We use parallelization within the simulator to improve training speeds and use domain randomization to robustify learned policies. We show that compared to a baseline of hand-crafted rewards, the STL-shaped rewards yield tighter velocity tracking and more stable training. Videos can be found on our project website: https://stl-locomotion.github.io/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The STL templates for gait-specific rewards in quadruped RL are a structured step forward, but calibrating them from reference rollouts risks making the gains over hand-crafted baselines partly circular.

read the letter

The paper introduces parametric Signal Temporal Logic templates for three speed regimes in quadruped locomotion and turns them into dense rewards via smooth robustness approximations for PPO. This is applied to the Barkour robot in MuJoCo with domain randomization and parallel simulation.

What is new is the explicit use of STL constraints for gait synchronization, safety, and command tracking across regimes, rather than purely hand-crafted Markovian terms. The approach gives a clearer way to encode desired behaviors and produces rewards that the authors say yield tighter velocity tracking and more stable training than a hand-crafted baseline.

The work is solid on the technical side of mapping STL to RL-compatible rewards and on the simulation setup. It shows honest engagement with making policies more interpretable through logic specs.

The soft spot is the calibration step. Parameters are fit to reference rollouts, which likely already contain good gait behavior. This injects data-driven knowledge into the STL rewards that the hand-crafted baseline does not receive in the same way. Without details on how those rollouts were generated or sensitivity checks on the fitted parameters, the performance edge could trace to the fitting procedure rather than the temporal logic itself. The abstract does not address this directly.

This paper is for robotics researchers focused on RL locomotion who want more explicit gait control. A reader working on reward design or temporal logic in control would find the templates useful.

It deserves peer review. The core idea holds up and the application is concrete, but referees should check the reference data source and whether an ablation isolates the STL contribution from the calibration.

Referee Report

2 major / 2 minor

Summary. The paper introduces a framework for quadruped locomotion in RL that uses parameterized Signal Temporal Logic (STL) templates to encode gait behaviors (safety bounds, synchronization, command tracking) across three speed regimes. Parameters are calibrated from reference rollouts; smooth robustness approximations then produce dense rewards for PPO. The method is instantiated on the Barkour robot in MuJoCo with domain randomization and parallel simulation; the central empirical claim is that STL-shaped rewards produce tighter velocity tracking and more stable training than a hand-crafted reward baseline.

Significance. If the comparison is shown to be non-circular and the quantitative gains are reproducible, the work would demonstrate a practical route to injecting interpretable, temporally structured specifications into reward shaping for legged RL. This could improve policy transparency and gait control without sacrificing sample efficiency, a useful contribution to the intersection of formal methods and robotics.

major comments (2)

[method section on parametric templates] Calibration procedure for STL templates (method section on parametric templates for walking-trot/trot/bound): the paper states that parameters are fitted from reference rollouts, yet provides no description of how those rollouts were generated (e.g., whether they came from policies already optimized for the target velocities or from the same simulator setup used in the experiments). Without an explicit statement that the reference data are independent of both the STL and hand-crafted training loops, the reported advantage in velocity tracking cannot be unambiguously attributed to the temporal-logic structure rather than to the calibration step itself.
[results section] Experimental comparison (results section reporting velocity tracking and training stability): the abstract and claim assert tighter tracking and more stable PPO training, but the provided text supplies no numerical values, standard deviations, number of seeds, or statistical tests. A load-bearing claim of superiority therefore rests on unreported quantitative evidence; the manuscript must include these metrics (e.g., mean tracking error per regime, success rate, or learning curves) to allow verification.

minor comments (2)

[abstract] The abstract mentions "Videos can be found on our project website" but the manuscript does not include a persistent link or DOI; a stable reference should be added.
[reward shaping subsection] Notation for the smooth robustness approximation is introduced without an explicit equation number or reference to the approximation formula used (e.g., the specific sigmoid or log-sum-exp form); adding an equation label would improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment below and will revise the manuscript accordingly to improve clarity and provide supporting evidence.

read point-by-point responses

Referee: [method section on parametric templates] Calibration procedure for STL templates (method section on parametric templates for walking-trot/trot/bound): the paper states that parameters are fitted from reference rollouts, yet provides no description of how those rollouts were generated (e.g., whether they came from policies already optimized for the target velocities or from the same simulator setup used in the experiments). Without an explicit statement that the reference data are independent of both the STL and hand-crafted training loops, the reported advantage in velocity tracking cannot be unambiguously attributed to the temporal-logic structure rather than to the calibration step itself.

Authors: We agree that the manuscript does not explicitly describe how the reference rollouts were generated. We will revise the method section on parametric templates to state that these rollouts were produced by a preliminary policy trained independently using only a basic velocity-tracking reward (without STL or the hand-crafted baseline) in the same MuJoCo Barkour environment with domain randomization. This addition will establish the independence of the calibration data from the reported training loops. revision: yes
Referee: [results section] Experimental comparison (results section reporting velocity tracking and training stability): the abstract and claim assert tighter tracking and more stable PPO training, but the provided text supplies no numerical values, standard deviations, number of seeds, or statistical tests. A load-bearing claim of superiority therefore rests on unreported quantitative evidence; the manuscript must include these metrics (e.g., mean tracking error per regime, success rate, or learning curves) to allow verification.

Authors: We acknowledge that the manuscript reports only qualitative improvements without the requested quantitative details. We will revise the results section to include mean velocity tracking errors per regime with standard deviations, the number of random seeds used, success rates, and learning curve comparisons, along with any applicable statistical tests. This will substantiate the claims of tighter tracking and more stable training. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper explicitly states it defines parametric STL templates for three speed regimes, calibrates their parameters from reference rollouts, and computes rewards via smooth robustness approximations for use with PPO. It then reports an empirical comparison showing STL-shaped rewards yield tighter velocity tracking and more stable training than a hand-crafted baseline. This calibration step is presented as part of the method rather than a hidden fit renamed as a prediction, and the performance claim rests on independent training outcomes rather than reducing by construction to the reference data. No self-citations, uniqueness theorems, or ansatzes smuggled via prior work appear in the provided text. The approach is self-contained against the stated external baseline.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Based on abstract only, the approach depends on calibrated parameters from reference rollouts as free parameters and the domain assumption that STL can express the listed gait constraints.

free parameters (1)

STL template parameters for gaits = calibrated from reference rollouts
Parameters for safety bounds, gait synchronization constraints, command tracking, and actuation bounds are calibrated from reference rollouts for three speed regimes.

axioms (1)

domain assumption Signal Temporal Logic can express gait behaviors using parameterized constraints for safety, synchronization, tracking, and actuation
The framework is built on this assumption to generate rewards from STL specifications.

pith-pipeline@v0.9.1-grok · 5771 in / 1222 out tokens · 41580 ms · 2026-07-02T11:56:37.789792+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

76 extracted references · 14 canonical work pages · 4 internal anchors

[1]

Raibert, K

M. Raibert, K. Blankespoor, G. Nelson, and R. Playter. Bigdog, the rough-terrain quadruped robot.IFAC Proceedings Volumes, 41(2):10822–10825, 2008

2008
[2]

B. Katz, J. Di Carlo, and S. Kim. Mini cheetah: A platform for pushing the limits of dynamic quadruped control. In2019 international conference on robotics and automation (ICRA), pages 6295–6301. IEEE, 2019

2019
[3]

X. Peng, E. Coumans, T. Zhang, T. Lee, J. Tan, and S. Levine. Learning agile robotic locomo- tion skills by imitating animals. arxiv 2020.arXiv preprint arXiv:2004.00784

work page arXiv 2020
[4]

J. Lee, J. Hwangbo, L. Wellhausen, V . Koltun, and M. Hutter. Learning quadrupedal locomo- tion over challenging terrain.Science robotics, 5(47):eabc5986, 2020

2020
[5]

Z. Xie, X. Da, M. Van de Panne, B. Babich, and A. Garg. Dynamics randomization revisited: A case study for quadrupedal locomotion. In2021 IEEE International Conference on Robotics and Automation (ICRA), pages 4955–4961. IEEE, 2021

2021
[6]

Agarwal, A

A. Agarwal, A. Kumar, J. Malik, and D. Pathak. Legged locomotion in challenging terrains using egocentric vision. InConference on robot learning, pages 403–415. PMLR, 2023

2023
[7]

D. Kim, J. Di Carlo, B. Katz, G. Bledt, and S. Kim. Highly dynamic quadruped locomotion via whole-body impulse control and model predictive control.arXiv preprint arXiv:1909.06586, 2019

work page arXiv 1909
[8]

Nguyen, M

Q. Nguyen, M. J. Powell, B. Katz, J. Di Carlo, and S. Kim. Optimized jumping on the mit cheetah 3 robot. In2019 International Conference on Robotics and Automation (ICRA), pages 7448–7454. IEEE, 2019

2019
[9]

Di Carlo, P

J. Di Carlo, P. M. Wensing, B. Katz, G. Bledt, and S. Kim. Dynamic locomotion in the mit cheetah 3 through convex model-predictive control. In2018 IEEE/RSJ international confer- ence on intelligent robots and systems (IROS), pages 1–9. IEEE, 2018

2018
[10]

Todorov, T

E. Todorov, T. Erez, and Y . Tassa. Mujoco: A physics engine for model-based control. In2012 IEEE/RSJ international conference on intelligent robots and systems, pages 5026–5033. IEEE, 2012

2012
[11]

Hutter, C

M. Hutter, C. Gehring, D. Jud, A. Lauber, C. D. Bellicoso, V . Tsounis, J. Hwangbo, K. Bodie, P. Fankhauser, M. Bloesch, et al. Anymal-a highly mobile and dynamic quadrupedal robot. In2016 IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 38–44. IEEE, 2016

2016
[12]

Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning

V . Makoviychuk, L. Wawrzyniak, Y . Guo, M. Lu, K. Storey, M. Macklin, D. Hoeller, N. Rudin, A. Allshire, A. Handa, et al. Isaac gym: High performance gpu-based physics simulation for robot learning.arXiv preprint arXiv:2108.10470, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[13]

Caluwaerts, A

K. Caluwaerts, A. Iscen, J. C. Kew, W. Yu, T. Zhang, D. Freeman, K.-H. Lee, L. Lee, S. Sal- iceti, V . Zhuang, et al. Barkour: Benchmarking animal-level agility with quadruped robots. arXiv preprint arXiv:2305.14654, 2023

work page arXiv 2023
[14]

J. Tan, T. Zhang, E. Coumans, A. Iscen, Y . Bai, D. Hafner, S. Bohez, and V . Vanhoucke. Sim- to-real: Learning agile locomotion for quadruped robots.arXiv preprint arXiv:1804.10332, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[15]

Maler and D

O. Maler and D. Nickovic. Monitoring temporal properties of continuous signals. InInterna- tional symposium on formal techniques in real-time and fault-tolerant systems, pages 152–166. Springer, 2004. 9

2004
[16]

X. Ding, S. L. Smith, C. Belta, and D. Rus. Optimal control of markov decision processes with linear temporal logic constraints.IEEE Transactions on Automatic Control, 59(5):1244–1257, 2014

2014
[18]

Aksaray, A

D. Aksaray, A. Jones, Z. Kong, M. Schwager, and C. Belta. Q-learning for robust satisfaction of signal temporal logic specifications. In2016 IEEE 55th Conference on Decision and Control (CDC), pages 6565–6570. IEEE, 2016

2016
[19]

Kapinski, X

J. Kapinski, X. Jin, J. Deshmukh, A. Donze, T. Yamaguchi, H. Ito, T. Kaga, S. Kobuna, and S. Seshia. St-lib: A library for specifying and classifying model behaviors. Technical report, SAE Technical Paper, 2016

2016
[20]

J. V . Deshmukh, A. Donz ´e, S. Ghosh, X. Jin, G. Juniwal, and S. A. Seshia. Robust online monitoring of signal temporal logic.Formal Methods in System Design, 51(1):5–30, 2017

2017
[21]

Camacho, R

A. Camacho, R. T. Icarte, T. Q. Klassen, R. A. Valenzano, and S. A. McIlraith. Ltl and be- yond: Formal languages for reward function specification in reinforcement learning. InIJCAI, volume 19, pages 6065–6073, 2019

2019
[22]

Balakrishnan and J

A. Balakrishnan and J. V . Deshmukh. Structured reward shaping using signal temporal logic specifications. in 2019 ieee/rsj iros, 3481–3486, 2019

2019
[23]

L. Z. Yuan, M. Hasanbeig, A. Abate, and D. Kroening. Modular deep reinforcement learning with temporal logic specifications.arXiv preprint arXiv:1909.11591, 2019

work page arXiv 1909
[24]

Jiang, S

Y . Jiang, S. Bharadwaj, B. Wu, R. Shah, U. Topcu, and P. Stone. Temporal-logic-based reward shaping for continuing reinforcement learning tasks. InProceedings of the AAAI Conference on artificial Intelligence, volume 35, pages 7995–8003, 2021

2021
[25]

Hasanbeig, D

M. Hasanbeig, D. Kroening, and A. Abate. Deep reinforcement learning with temporal logics. InInternational Conference on Formal Modeling and Analysis of Timed Systems, pages 1–22. Springer, 2020

2020
[26]

M. Wen, R. Ehlers, and U. Topcu. Correct-by-synthesis reinforcement learning with tempo- ral logic constraints. In2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 4983–4990. IEEE, 2015

2015
[27]

R. T. Icarte, T. Q. Klassen, R. Valenzano, and S. A. McIlraith. Reward machines: Exploiting reward function structure in reinforcement learning.Journal of Artificial Intelligence Research, 73:173–208, 2022

2022
[28]

Dayan and B

P. Dayan and B. W. Balleine. Reward, motivation, and reinforcement learning.Neuron, 36(2): 285–298, 2002

2002
[29]

Eschmann

J. Eschmann. Reward function design in reinforcement learning.Reinforcement learning algorithms: Analysis and Applications, pages 25–33, 2021

2021
[30]

J. Hare. Dealing with sparse rewards in reinforcement learning.arXiv preprint arXiv:1910.09281, 2019

work page arXiv 1910
[31]

Kim, Y .-H

G. Kim, Y .-H. Lee, and H.-W. Park. A learning framework for diverse legged robot locomotion using barrier-based style rewards. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 10004–10010. IEEE, 2025

2025
[32]

J. Fu, K. Luo, and S. Levine. Learning robust rewards with adversarial inverse reinforcement learning.arXiv preprint arXiv:1710.11248, 2017. 10

work page internal anchor Pith review Pith/arXiv arXiv 2017
[33]

A. Y . Ng, S. Russell, et al. Algorithms for inverse reinforcement learning. InIcml, volume 1, page 2, 2000

2000
[34]

Arora and P

S. Arora and P. Doshi. A survey of inverse reinforcement learning: Challenges, methods and progress.Artificial Intelligence, 297:103500, 2021

2021
[35]

P. F. Christiano, J. Leike, T. Brown, M. Martic, S. Legg, and D. Amodei. Deep reinforcement learning from human preferences.Advances in neural information processing systems, 30, 2017

2017
[36]

D. Youm, H. Jung, H. Kim, J. Hwangbo, H.-W. Park, and S. Ha. Imitating and finetuning model predictive control for robust and symmetric quadrupedal locomotion.IEEE Robotics and Automation Letters, 8(11):7799–7806, 2023

2023
[37]

T. Li, J. Won, J. Cho, S. Ha, and A. Rai. Fastmimic: Model-based motion imitation for agile, diverse and generalizable quadrupedal locomotion.Robotics, 12(3):90, 2023

2023
[38]

H.-C. Liao. A survey of reinforcement learning with temporal logic rewards. 2020

2020
[39]

Li, C.-I

X. Li, C.-I. Vasile, and C. Belta. Reinforcement learning with temporal logic rewards. In2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 3834–
[40]

Kapoor, A

P. Kapoor, A. Balakrishnan, and J. V . Deshmukh. Model-based reinforcement learning from signal temporal logic specifications.arXiv preprint arXiv:2011.04950, 2020

work page arXiv 2011
[41]

Puranic, J

A. Puranic, J. Deshmukh, and S. Nikolaidis. Learning from demonstrations using signal tem- poral logic. InConference on Robot Learning, pages 2228–2242. PMLR, 2021

2021
[42]

S. Feng, X. Xinjilefu, W. Huang, and C. G. Atkeson. 3d walking based on online optimization. In2013 13th IEEE-RAS International Conference on Humanoid Robots (Humanoids), pages 21–27. IEEE, 2013

2013
[43]

Y . Zhao, U. Topcu, and L. Sentis. High-level planner synthesis for whole-body locomotion in unstructured environments. In2016 IEEE 55th Conference on Decision and Control (CDC), pages 6557–6564. IEEE, 2016

2016
[44]

Audren, A

H. Audren, A. Kheddar, and P. Gergondet. Stability polygons reshaping and morphing for smooth multi-contact transitions and force control of humanoid robots. In2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids), pages 1037–1044. IEEE, 2016

2016
[45]

Z. Gu, R. Guo, W. Yates, Y . Chen, Y . Zhao, and Y . Zhao. Walking-by-logic: Signal temporal logic-guided model predictive control for bipedal locomotion resilient to external perturba- tions. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 1121–1127. IEEE, 2024

2024
[46]

Z. Gu, Y . Zhao, Y . Chen, R. Guo, J. K. Leestma, G. S. Sawicki, and Y . Zhao. Robust- locomotion-by-logic: Perturbation-resilient bipedal locomotion via signal temporal logic guided model predictive control.IEEE Transactions on Robotics, 2025

2025
[47]

Humphreys and C

J. Humphreys and C. Zhou. Learning to adapt through bio-inspired gait strategies for versatile quadruped locomotion.Nature Machine Intelligence, 7(7):1141–1153, 2025

2025
[48]

DeFazio, Y

D. DeFazio, Y . Hayamizu, and S. Zhang. Learning quadruped locomotion policies using logi- cal rules. InProceedings of the International Conference on Automated Planning and Schedul- ing, volume 34, pages 142–150, 2024

2024
[49]

Sch ¨oner, W

G. Sch ¨oner, W. Y . Jiang, and J. S. Kelso. A synergetic theory of quadrupedal gaits and gait transitions.Journal of theoretical Biology, 142(3):359–391, 1990. 11

1990
[50]

S. M. Danner, S. D. Wilshin, N. A. Shevtsova, and I. A. Rybak. Central control of interlimb coordination and speed-dependent gait expression in quadrupeds.The Journal of physiology, 594(23):6947–6967, 2016

2016
[51]

Righetti and A

L. Righetti and A. J. Ijspeert. Pattern generators with sensory feedback for the control of quadruped locomotion. In2008 IEEE International Conference on Robotics and Automation, pages 819–824. IEEE, 2008

2008
[52]

C. Liu, Y . Chen, J. Zhang, and Q. Chen. Cpg driven locomotion control of quadruped robot. In2009 IEEE International Conference on Systems, Man and Cybernetics, pages 2368–2373. IEEE, 2009

2009
[53]

Humphreys, J

J. Humphreys, J. Li, Y . Wan, H. Gao, and C. Zhou. Bio-inspired gait transitions for quadruped locomotion.IEEE Robotics and Automation Letters, 8(10):6131–6138, 2023

2023
[54]

Neunert, F

M. Neunert, F. Farshidian, A. W. Winkler, and J. Buchli. Trajectory optimization through contacts and automatic gait discovery for quadrupeds.IEEE Robotics and Automation Letters, 2(3):1502–1509, 2017

2017
[55]

H. Sun, J. Yang, Y . Jia, and C. Wang. Online hierarchical planning for multicontact locomotion control of quadruped robots.IEEE/ASME Transactions on Mechatronics, 30(3):1718–1728, 2024

2024
[56]

K. Liu, L. Dong, X. Tan, W. Zhang, and L. Zhu. Optimization-based flocking control and mpc-based gait synchronization control for multiple quadruped robots.IEEE Robotics and Automation Letters, 9(2):1929–1936, 2024

1929
[57]

Bellegarda, M

G. Bellegarda, M. Shafiee, and A. Ijspeert. Allgaits: Learning all quadruped gaits and tran- sitions. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 15929–15935. IEEE, 2025

2025
[58]

Y . H. Lee, D. T. Tran, J.-h. Hyun, L. T. Phan, I. M. Koo, S. U. Yang, and H. R. Choi. A gait transition algorithm based on hybrid walking gait for a quadruped walking robot.Intelligent Service Robotics, 8(4):185–200, 2015

2015
[59]

B. Hu, S. Shao, Z. Cao, Q. Xiao, Q. Li, and C. Ma. Learning a faster locomotion gait for a quadruped robot with model-free deep reinforcement learning. In2019 IEEE International Conference on Robotics and Biomimetics (ROBIO), pages 1097–1102. IEEE, 2019

2019
[60]

Tsounis, M

V . Tsounis, M. Alge, J. Lee, F. Farshidian, and M. Hutter. Deepgait: Planning and control of quadrupedal gaits using deep reinforcement learning.IEEE Robotics and Automation Letters, 5(2):3699–3706, 2020

2020
[61]

Y . Shao, Y . Jin, X. Liu, W. He, H. Wang, and W. Yang. Learning free gait transition for quadruped robots via phase-guided controller.IEEE Robotics and Automation Letters, 7(2): 1230–1237, 2021

2021
[62]

S. Xu, L. Zhu, and C. P. Ho. Learning efficient and robust multi-modal quadruped locomotion: A hierarchical approach. In2022 international conference on robotics and automation (ICRA), pages 4649–4655. IEEE, 2022

2022
[63]

L. Wei, Y . Li, Y . Ai, Y . Wu, H. Xu, W. Wang, and G. Hu. Learning multiple-gait quadrupedal locomotion via hierarchical reinforcement learning.International Journal of Precision Engi- neering and Manufacturing, 24(9):1599–1613, 2023

2023
[64]

Y . Yang, T. Zhang, E. Coumans, J. Tan, and B. Boots. Fast and efficient locomotion via learned gait transitions. InConference on robot learning, pages 773–783. PMLR, 2022

2022
[65]

Y . Kim, B. Son, and D. Lee. Learning multiple gaits of quadruped robot using hierarchical reinforcement learning.arXiv preprint arXiv:2112.04741, 2021. 12

work page arXiv 2021
[66]

A. L. Mitchell, W. Merkt, A. Papatheodorou, I. Havoutis, and I. Posner. Gaitor: Learning a unified representation across gaits for real-world quadruped locomotion.arXiv preprint arXiv:2405.19452, 2024

work page arXiv 2024
[67]

Shafiee, G

M. Shafiee, G. Bellegarda, and A. Ijspeert. Manyquadrupeds: Learning a single locomotion policy for diverse quadruped robots. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 3471–3477. IEEE, 2024

2024
[68]

Zakka, B

K. Zakka, B. Tabanpour, Q. Liao, M. Haiderbhai, S. Holt, J. Y . Luo, A. Allshire, E. Frey, K. Sreenath, L. A. Kahrs, et al. Mujoco playground.arXiv preprint arXiv:2502.08844, 2025

work page arXiv 2025
[69]

R. M. Alexander and A. Jayes. A dynamic similarity hypothesis for the gaits of quadrupedal mammals.Journal of zoology, 201(1):135–152, 1983

1983
[70]

Donz ´e and O

A. Donz ´e and O. Maler. Robust satisfaction of temporal logic over real-valued signals. In International conference on formal modeling and analysis of timed systems, pages 92–106. Springer, 2010

2010
[71]

G. E. Fainekos and G. J. Pappas. Robustness of temporal logic specifications for continuous- time signals.Theoretical Computer Science, 410(42):4262–4291, 2009

2009
[72]

Asarin, A

E. Asarin, A. Donz ´e, O. Maler, and D. Nickovic. Parametric identification of temporal proper- ties. InInternational Conference on Runtime Verification, pages 147–160. Springer, 2011

2011
[73]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[74]

Kohl and P

N. Kohl and P. Stone. Policy gradient reinforcement learning for fast quadrupedal locomotion. InIEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA’04. 2004, volume 3, pages 2619–2624. IEEE, 2004

2004
[75]

Schulman, S

J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz. Trust region policy optimization. InInternational conference on machine learning, pages 1889–1897. PMLR, 2015

2015
[76]

V . Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu. Asynchronous methods for deep reinforcement learning. InInternational conference on machine learning, pages 1928–1937. PmLR, 2016

1928
[77]

C. D. Freeman, E. Frey, A. Raichuk, S. Girgin, I. Mordatch, and O. Bachem. Brax–a differen- tiable physics engine for large scale rigid body simulation.arXiv preprint arXiv:2106.13281, 2021. A Appendix A.1 STL Specifications We additionally evaluated the two rules described below; however, given the constraints already established in the main text, they d...

work page arXiv 2021

[1] [1]

Raibert, K

M. Raibert, K. Blankespoor, G. Nelson, and R. Playter. Bigdog, the rough-terrain quadruped robot.IFAC Proceedings Volumes, 41(2):10822–10825, 2008

2008

[2] [2]

B. Katz, J. Di Carlo, and S. Kim. Mini cheetah: A platform for pushing the limits of dynamic quadruped control. In2019 international conference on robotics and automation (ICRA), pages 6295–6301. IEEE, 2019

2019

[3] [3]

X. Peng, E. Coumans, T. Zhang, T. Lee, J. Tan, and S. Levine. Learning agile robotic locomo- tion skills by imitating animals. arxiv 2020.arXiv preprint arXiv:2004.00784

work page arXiv 2020

[4] [4]

J. Lee, J. Hwangbo, L. Wellhausen, V . Koltun, and M. Hutter. Learning quadrupedal locomo- tion over challenging terrain.Science robotics, 5(47):eabc5986, 2020

2020

[5] [5]

Z. Xie, X. Da, M. Van de Panne, B. Babich, and A. Garg. Dynamics randomization revisited: A case study for quadrupedal locomotion. In2021 IEEE International Conference on Robotics and Automation (ICRA), pages 4955–4961. IEEE, 2021

2021

[6] [6]

Agarwal, A

A. Agarwal, A. Kumar, J. Malik, and D. Pathak. Legged locomotion in challenging terrains using egocentric vision. InConference on robot learning, pages 403–415. PMLR, 2023

2023

[7] [7]

D. Kim, J. Di Carlo, B. Katz, G. Bledt, and S. Kim. Highly dynamic quadruped locomotion via whole-body impulse control and model predictive control.arXiv preprint arXiv:1909.06586, 2019

work page arXiv 1909

[8] [8]

Nguyen, M

Q. Nguyen, M. J. Powell, B. Katz, J. Di Carlo, and S. Kim. Optimized jumping on the mit cheetah 3 robot. In2019 International Conference on Robotics and Automation (ICRA), pages 7448–7454. IEEE, 2019

2019

[9] [9]

Di Carlo, P

J. Di Carlo, P. M. Wensing, B. Katz, G. Bledt, and S. Kim. Dynamic locomotion in the mit cheetah 3 through convex model-predictive control. In2018 IEEE/RSJ international confer- ence on intelligent robots and systems (IROS), pages 1–9. IEEE, 2018

2018

[10] [10]

Todorov, T

E. Todorov, T. Erez, and Y . Tassa. Mujoco: A physics engine for model-based control. In2012 IEEE/RSJ international conference on intelligent robots and systems, pages 5026–5033. IEEE, 2012

2012

[11] [11]

Hutter, C

M. Hutter, C. Gehring, D. Jud, A. Lauber, C. D. Bellicoso, V . Tsounis, J. Hwangbo, K. Bodie, P. Fankhauser, M. Bloesch, et al. Anymal-a highly mobile and dynamic quadrupedal robot. In2016 IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 38–44. IEEE, 2016

2016

[12] [12]

Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning

V . Makoviychuk, L. Wawrzyniak, Y . Guo, M. Lu, K. Storey, M. Macklin, D. Hoeller, N. Rudin, A. Allshire, A. Handa, et al. Isaac gym: High performance gpu-based physics simulation for robot learning.arXiv preprint arXiv:2108.10470, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[13] [13]

Caluwaerts, A

K. Caluwaerts, A. Iscen, J. C. Kew, W. Yu, T. Zhang, D. Freeman, K.-H. Lee, L. Lee, S. Sal- iceti, V . Zhuang, et al. Barkour: Benchmarking animal-level agility with quadruped robots. arXiv preprint arXiv:2305.14654, 2023

work page arXiv 2023

[14] [14]

J. Tan, T. Zhang, E. Coumans, A. Iscen, Y . Bai, D. Hafner, S. Bohez, and V . Vanhoucke. Sim- to-real: Learning agile locomotion for quadruped robots.arXiv preprint arXiv:1804.10332, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[15] [15]

Maler and D

O. Maler and D. Nickovic. Monitoring temporal properties of continuous signals. InInterna- tional symposium on formal techniques in real-time and fault-tolerant systems, pages 152–166. Springer, 2004. 9

2004

[16] [16]

X. Ding, S. L. Smith, C. Belta, and D. Rus. Optimal control of markov decision processes with linear temporal logic constraints.IEEE Transactions on Automatic Control, 59(5):1244–1257, 2014

2014

[17] [18]

Aksaray, A

D. Aksaray, A. Jones, Z. Kong, M. Schwager, and C. Belta. Q-learning for robust satisfaction of signal temporal logic specifications. In2016 IEEE 55th Conference on Decision and Control (CDC), pages 6565–6570. IEEE, 2016

2016

[18] [19]

Kapinski, X

J. Kapinski, X. Jin, J. Deshmukh, A. Donze, T. Yamaguchi, H. Ito, T. Kaga, S. Kobuna, and S. Seshia. St-lib: A library for specifying and classifying model behaviors. Technical report, SAE Technical Paper, 2016

2016

[19] [20]

J. V . Deshmukh, A. Donz ´e, S. Ghosh, X. Jin, G. Juniwal, and S. A. Seshia. Robust online monitoring of signal temporal logic.Formal Methods in System Design, 51(1):5–30, 2017

2017

[20] [21]

Camacho, R

A. Camacho, R. T. Icarte, T. Q. Klassen, R. A. Valenzano, and S. A. McIlraith. Ltl and be- yond: Formal languages for reward function specification in reinforcement learning. InIJCAI, volume 19, pages 6065–6073, 2019

2019

[21] [22]

Balakrishnan and J

A. Balakrishnan and J. V . Deshmukh. Structured reward shaping using signal temporal logic specifications. in 2019 ieee/rsj iros, 3481–3486, 2019

2019

[22] [23]

L. Z. Yuan, M. Hasanbeig, A. Abate, and D. Kroening. Modular deep reinforcement learning with temporal logic specifications.arXiv preprint arXiv:1909.11591, 2019

work page arXiv 1909

[23] [24]

Jiang, S

Y . Jiang, S. Bharadwaj, B. Wu, R. Shah, U. Topcu, and P. Stone. Temporal-logic-based reward shaping for continuing reinforcement learning tasks. InProceedings of the AAAI Conference on artificial Intelligence, volume 35, pages 7995–8003, 2021

2021

[24] [25]

Hasanbeig, D

M. Hasanbeig, D. Kroening, and A. Abate. Deep reinforcement learning with temporal logics. InInternational Conference on Formal Modeling and Analysis of Timed Systems, pages 1–22. Springer, 2020

2020

[25] [26]

M. Wen, R. Ehlers, and U. Topcu. Correct-by-synthesis reinforcement learning with tempo- ral logic constraints. In2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 4983–4990. IEEE, 2015

2015

[26] [27]

R. T. Icarte, T. Q. Klassen, R. Valenzano, and S. A. McIlraith. Reward machines: Exploiting reward function structure in reinforcement learning.Journal of Artificial Intelligence Research, 73:173–208, 2022

2022

[27] [28]

Dayan and B

P. Dayan and B. W. Balleine. Reward, motivation, and reinforcement learning.Neuron, 36(2): 285–298, 2002

2002

[28] [29]

Eschmann

J. Eschmann. Reward function design in reinforcement learning.Reinforcement learning algorithms: Analysis and Applications, pages 25–33, 2021

2021

[29] [30]

J. Hare. Dealing with sparse rewards in reinforcement learning.arXiv preprint arXiv:1910.09281, 2019

work page arXiv 1910

[30] [31]

Kim, Y .-H

G. Kim, Y .-H. Lee, and H.-W. Park. A learning framework for diverse legged robot locomotion using barrier-based style rewards. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 10004–10010. IEEE, 2025

2025

[31] [32]

J. Fu, K. Luo, and S. Levine. Learning robust rewards with adversarial inverse reinforcement learning.arXiv preprint arXiv:1710.11248, 2017. 10

work page internal anchor Pith review Pith/arXiv arXiv 2017

[32] [33]

A. Y . Ng, S. Russell, et al. Algorithms for inverse reinforcement learning. InIcml, volume 1, page 2, 2000

2000

[33] [34]

Arora and P

S. Arora and P. Doshi. A survey of inverse reinforcement learning: Challenges, methods and progress.Artificial Intelligence, 297:103500, 2021

2021

[34] [35]

P. F. Christiano, J. Leike, T. Brown, M. Martic, S. Legg, and D. Amodei. Deep reinforcement learning from human preferences.Advances in neural information processing systems, 30, 2017

2017

[35] [36]

D. Youm, H. Jung, H. Kim, J. Hwangbo, H.-W. Park, and S. Ha. Imitating and finetuning model predictive control for robust and symmetric quadrupedal locomotion.IEEE Robotics and Automation Letters, 8(11):7799–7806, 2023

2023

[36] [37]

T. Li, J. Won, J. Cho, S. Ha, and A. Rai. Fastmimic: Model-based motion imitation for agile, diverse and generalizable quadrupedal locomotion.Robotics, 12(3):90, 2023

2023

[37] [38]

H.-C. Liao. A survey of reinforcement learning with temporal logic rewards. 2020

2020

[38] [39]

Li, C.-I

X. Li, C.-I. Vasile, and C. Belta. Reinforcement learning with temporal logic rewards. In2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 3834–

[39] [40]

Kapoor, A

P. Kapoor, A. Balakrishnan, and J. V . Deshmukh. Model-based reinforcement learning from signal temporal logic specifications.arXiv preprint arXiv:2011.04950, 2020

work page arXiv 2011

[40] [41]

Puranic, J

A. Puranic, J. Deshmukh, and S. Nikolaidis. Learning from demonstrations using signal tem- poral logic. InConference on Robot Learning, pages 2228–2242. PMLR, 2021

2021

[41] [42]

S. Feng, X. Xinjilefu, W. Huang, and C. G. Atkeson. 3d walking based on online optimization. In2013 13th IEEE-RAS International Conference on Humanoid Robots (Humanoids), pages 21–27. IEEE, 2013

2013

[42] [43]

Y . Zhao, U. Topcu, and L. Sentis. High-level planner synthesis for whole-body locomotion in unstructured environments. In2016 IEEE 55th Conference on Decision and Control (CDC), pages 6557–6564. IEEE, 2016

2016

[43] [44]

Audren, A

H. Audren, A. Kheddar, and P. Gergondet. Stability polygons reshaping and morphing for smooth multi-contact transitions and force control of humanoid robots. In2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids), pages 1037–1044. IEEE, 2016

2016

[44] [45]

Z. Gu, R. Guo, W. Yates, Y . Chen, Y . Zhao, and Y . Zhao. Walking-by-logic: Signal temporal logic-guided model predictive control for bipedal locomotion resilient to external perturba- tions. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 1121–1127. IEEE, 2024

2024

[45] [46]

Z. Gu, Y . Zhao, Y . Chen, R. Guo, J. K. Leestma, G. S. Sawicki, and Y . Zhao. Robust- locomotion-by-logic: Perturbation-resilient bipedal locomotion via signal temporal logic guided model predictive control.IEEE Transactions on Robotics, 2025

2025

[46] [47]

Humphreys and C

J. Humphreys and C. Zhou. Learning to adapt through bio-inspired gait strategies for versatile quadruped locomotion.Nature Machine Intelligence, 7(7):1141–1153, 2025

2025

[47] [48]

DeFazio, Y

D. DeFazio, Y . Hayamizu, and S. Zhang. Learning quadruped locomotion policies using logi- cal rules. InProceedings of the International Conference on Automated Planning and Schedul- ing, volume 34, pages 142–150, 2024

2024

[48] [49]

Sch ¨oner, W

G. Sch ¨oner, W. Y . Jiang, and J. S. Kelso. A synergetic theory of quadrupedal gaits and gait transitions.Journal of theoretical Biology, 142(3):359–391, 1990. 11

1990

[49] [50]

S. M. Danner, S. D. Wilshin, N. A. Shevtsova, and I. A. Rybak. Central control of interlimb coordination and speed-dependent gait expression in quadrupeds.The Journal of physiology, 594(23):6947–6967, 2016

2016

[50] [51]

Righetti and A

L. Righetti and A. J. Ijspeert. Pattern generators with sensory feedback for the control of quadruped locomotion. In2008 IEEE International Conference on Robotics and Automation, pages 819–824. IEEE, 2008

2008

[51] [52]

C. Liu, Y . Chen, J. Zhang, and Q. Chen. Cpg driven locomotion control of quadruped robot. In2009 IEEE International Conference on Systems, Man and Cybernetics, pages 2368–2373. IEEE, 2009

2009

[52] [53]

Humphreys, J

J. Humphreys, J. Li, Y . Wan, H. Gao, and C. Zhou. Bio-inspired gait transitions for quadruped locomotion.IEEE Robotics and Automation Letters, 8(10):6131–6138, 2023

2023

[53] [54]

Neunert, F

M. Neunert, F. Farshidian, A. W. Winkler, and J. Buchli. Trajectory optimization through contacts and automatic gait discovery for quadrupeds.IEEE Robotics and Automation Letters, 2(3):1502–1509, 2017

2017

[54] [55]

H. Sun, J. Yang, Y . Jia, and C. Wang. Online hierarchical planning for multicontact locomotion control of quadruped robots.IEEE/ASME Transactions on Mechatronics, 30(3):1718–1728, 2024

2024

[55] [56]

K. Liu, L. Dong, X. Tan, W. Zhang, and L. Zhu. Optimization-based flocking control and mpc-based gait synchronization control for multiple quadruped robots.IEEE Robotics and Automation Letters, 9(2):1929–1936, 2024

1929

[56] [57]

Bellegarda, M

G. Bellegarda, M. Shafiee, and A. Ijspeert. Allgaits: Learning all quadruped gaits and tran- sitions. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 15929–15935. IEEE, 2025

2025

[57] [58]

Y . H. Lee, D. T. Tran, J.-h. Hyun, L. T. Phan, I. M. Koo, S. U. Yang, and H. R. Choi. A gait transition algorithm based on hybrid walking gait for a quadruped walking robot.Intelligent Service Robotics, 8(4):185–200, 2015

2015

[58] [59]

B. Hu, S. Shao, Z. Cao, Q. Xiao, Q. Li, and C. Ma. Learning a faster locomotion gait for a quadruped robot with model-free deep reinforcement learning. In2019 IEEE International Conference on Robotics and Biomimetics (ROBIO), pages 1097–1102. IEEE, 2019

2019

[59] [60]

Tsounis, M

V . Tsounis, M. Alge, J. Lee, F. Farshidian, and M. Hutter. Deepgait: Planning and control of quadrupedal gaits using deep reinforcement learning.IEEE Robotics and Automation Letters, 5(2):3699–3706, 2020

2020

[60] [61]

Y . Shao, Y . Jin, X. Liu, W. He, H. Wang, and W. Yang. Learning free gait transition for quadruped robots via phase-guided controller.IEEE Robotics and Automation Letters, 7(2): 1230–1237, 2021

2021

[61] [62]

S. Xu, L. Zhu, and C. P. Ho. Learning efficient and robust multi-modal quadruped locomotion: A hierarchical approach. In2022 international conference on robotics and automation (ICRA), pages 4649–4655. IEEE, 2022

2022

[62] [63]

L. Wei, Y . Li, Y . Ai, Y . Wu, H. Xu, W. Wang, and G. Hu. Learning multiple-gait quadrupedal locomotion via hierarchical reinforcement learning.International Journal of Precision Engi- neering and Manufacturing, 24(9):1599–1613, 2023

2023

[63] [64]

Y . Yang, T. Zhang, E. Coumans, J. Tan, and B. Boots. Fast and efficient locomotion via learned gait transitions. InConference on robot learning, pages 773–783. PMLR, 2022

2022

[64] [65]

Y . Kim, B. Son, and D. Lee. Learning multiple gaits of quadruped robot using hierarchical reinforcement learning.arXiv preprint arXiv:2112.04741, 2021. 12

work page arXiv 2021

[65] [66]

A. L. Mitchell, W. Merkt, A. Papatheodorou, I. Havoutis, and I. Posner. Gaitor: Learning a unified representation across gaits for real-world quadruped locomotion.arXiv preprint arXiv:2405.19452, 2024

work page arXiv 2024

[66] [67]

Shafiee, G

M. Shafiee, G. Bellegarda, and A. Ijspeert. Manyquadrupeds: Learning a single locomotion policy for diverse quadruped robots. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 3471–3477. IEEE, 2024

2024

[67] [68]

Zakka, B

K. Zakka, B. Tabanpour, Q. Liao, M. Haiderbhai, S. Holt, J. Y . Luo, A. Allshire, E. Frey, K. Sreenath, L. A. Kahrs, et al. Mujoco playground.arXiv preprint arXiv:2502.08844, 2025

work page arXiv 2025

[68] [69]

R. M. Alexander and A. Jayes. A dynamic similarity hypothesis for the gaits of quadrupedal mammals.Journal of zoology, 201(1):135–152, 1983

1983

[69] [70]

Donz ´e and O

A. Donz ´e and O. Maler. Robust satisfaction of temporal logic over real-valued signals. In International conference on formal modeling and analysis of timed systems, pages 92–106. Springer, 2010

2010

[70] [71]

G. E. Fainekos and G. J. Pappas. Robustness of temporal logic specifications for continuous- time signals.Theoretical Computer Science, 410(42):4262–4291, 2009

2009

[71] [72]

Asarin, A

E. Asarin, A. Donz ´e, O. Maler, and D. Nickovic. Parametric identification of temporal proper- ties. InInternational Conference on Runtime Verification, pages 147–160. Springer, 2011

2011

[72] [73]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[73] [74]

Kohl and P

N. Kohl and P. Stone. Policy gradient reinforcement learning for fast quadrupedal locomotion. InIEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA’04. 2004, volume 3, pages 2619–2624. IEEE, 2004

2004

[74] [75]

Schulman, S

J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz. Trust region policy optimization. InInternational conference on machine learning, pages 1889–1897. PMLR, 2015

2015

[75] [76]

V . Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu. Asynchronous methods for deep reinforcement learning. InInternational conference on machine learning, pages 1928–1937. PmLR, 2016

1928

[76] [77]

C. D. Freeman, E. Frey, A. Raichuk, S. Girgin, I. Mordatch, and O. Bachem. Brax–a differen- tiable physics engine for large scale rigid body simulation.arXiv preprint arXiv:2106.13281, 2021. A Appendix A.1 STL Specifications We additionally evaluated the two rules described below; however, given the constraints already established in the main text, they d...

work page arXiv 2021