Asymmetric physics enables efficient learning in quadrupedal robot swarms
Pith reviewed 2026-06-26 08:29 UTC · model grok-4.3
The pith
Asymmetric physics allows efficient end-to-end learning of decentralized vision-based control for large quadrupedal robot swarms.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Asymmetric physics enables efficient end-to-end learning of vision-based, decentralized control in large swarms of quadrupedal robots. During training, quadrupeds interact in shared environments, where a high-fidelity, non-differentiable simulator generates realistic motion and contact dynamics, and differentiable surrogate models provide gradients for navigation and locomotion policies. This separation enables up to 512 quadrupeds to learn coordinated navigation policies in obstacle-rich environments. At deployment, each robot acts from a single forward-facing depth camera, without explicit communication, centralized planning, or global maps. The policies generalize across forests, bridges,
What carries the argument
asymmetric physics, the separation of a high-fidelity non-differentiable simulator for generating realistic motion and contact dynamics from differentiable surrogate models that supply gradients for the navigation and locomotion policies.
Load-bearing premise
The differentiable surrogate models must supply gradients that remain sufficiently accurate and unbiased relative to the high-fidelity non-differentiable simulator across the full range of contact-rich interactions and visual observations encountered during training.
What would settle it
Training runs that compare symmetric physics setups (fully non-differentiable or fully differentiable) against the asymmetric setup and show that only the asymmetric version scales policies successfully to 512 robots with successful zero-shot real-world transfer would support the claim; failure of the asymmetric method to produce scalable or transferable policies would falsify it.
read the original abstract
Animal collectives navigate cluttered environments through local coordination, yet robot swarms still struggle to reproduce this capability in the physical world. End-to-end learning offers a route to such coordination, but scaling it to embodied swarms remains difficult: standard sampling-based reinforcement learning becomes inefficient when visual perception, dense robot-robot interaction, and contact-rich locomotion must be learned together. Here we show that asymmetric physics enables efficient end-to-end learning of vision-based, decentralized control in large swarms of quadrupedal robots. During training, quadrupeds interact in shared environments, where a high-fidelity, non-differentiable simulator generates realistic motion and contact dynamics, and differentiable surrogate models provide gradients for navigation and locomotion policies. This separation enables up to 512 quadrupeds to learn coordinated navigation policies in obstacle-rich environments. At deployment, each robot acts from a single forward-facing depth camera, without explicit communication, centralized planning, or global maps. The policies generalize across forests, bridges, enclosures, narrow passages, and mazes, and zero-shot transfer to six physical quadrupeds across five real-world scenarios. The resulting swarms exhibit predictive avoidance, right-side yielding, pausing before bottlenecks, and wall following, showing that asymmetric physics enables efficient training of scalable decentralized control policies for quadrupedal robot swarms.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims that an asymmetric physics setup—pairing a high-fidelity non-differentiable simulator for motion and contact dynamics with differentiable surrogate models for policy gradients—enables efficient end-to-end learning of vision-based decentralized control for swarms of up to 512 quadrupedal robots. The resulting policies generalize across multiple simulated environments and achieve zero-shot transfer to six physical robots in five real-world scenarios, producing emergent behaviors such as predictive avoidance and right-side yielding without explicit communication or centralized planning.
Significance. If the central empirical claim is supported by rigorous validation, the result would be significant for scalable embodied swarm learning: it offers a concrete route to training contact-rich, vision-based multi-agent policies that standard sampling-based RL cannot handle efficiently at this scale. The reported zero-shot sim-to-real transfer and emergence of coordinated local behaviors would constitute a notable demonstration in the field.
major comments (2)
- [Methods / Training Setup] The description of the differentiable surrogate models supplies no quantitative error metrics, bias analysis, or ablation studies on gradient fidelity relative to the high-fidelity simulator for contact-rich regimes (multi-robot collisions, terrain contacts, depth observations). This assumption is load-bearing for the scalability claim to 512 agents and the reported emergence of yielding behaviors.
- [Results] The results section reports generalization across forests, bridges, enclosures, narrow passages, and mazes together with zero-shot transfer to physical robots, yet provides no quantitative performance metrics, success rates, or statistical comparisons against baselines that would allow assessment of the strength of these claims.
minor comments (2)
- Notation for the surrogate models and the precise interface between the non-differentiable simulator and the gradient-providing surrogates should be defined more explicitly, ideally with a diagram or pseudocode.
- The abstract states specific numbers (512 quadrupeds, six physical robots, five scenarios); the main text should include corresponding tables or figures with error bars or confidence intervals.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which highlight important gaps in validation and quantification. We address each major point below and commit to revisions that strengthen the manuscript without altering its core claims.
read point-by-point responses
-
Referee: [Methods / Training Setup] The description of the differentiable surrogate models supplies no quantitative error metrics, bias analysis, or ablation studies on gradient fidelity relative to the high-fidelity simulator for contact-rich regimes (multi-robot collisions, terrain contacts, depth observations). This assumption is load-bearing for the scalability claim to 512 agents and the reported emergence of yielding behaviors.
Authors: We agree that the manuscript currently lacks explicit quantitative validation of surrogate gradient fidelity in contact-rich regimes. In revision we will add error metrics (MSE between surrogate and finite-difference gradients on contact forces and depth), bias analysis, and ablations showing policy performance sensitivity to gradient accuracy. These will appear in a dedicated Methods subsection with supporting figures. revision: yes
-
Referee: [Results] The results section reports generalization across forests, bridges, enclosures, narrow passages, and mazes together with zero-shot transfer to physical robots, yet provides no quantitative performance metrics, success rates, or statistical comparisons against baselines that would allow assessment of the strength of these claims.
Authors: The referee is correct that aggregated quantitative metrics and baseline comparisons are absent. We will revise the Results section to report success rates, collision statistics, and navigation efficiency across environments, together with statistical comparisons (multiple seeds) against independent RL and non-surrogate baselines. Real-world transfer will include success counts over the five scenarios. revision: yes
Circularity Check
No circularity; empirical demonstration of asymmetric simulator setup
full rationale
The paper presents an empirical training method that separates a high-fidelity non-differentiable simulator (for motion and contacts) from differentiable surrogate models (for policy gradients). Reported outcomes consist of policy generalization across simulated environments and zero-shot transfer to physical robots. No equations, fitted parameters renamed as predictions, or self-citation chains are shown that reduce the central claim to its own inputs by construction. The result is a standard empirical robotics demonstration whose validity rests on external benchmarks (sim-to-real transfer) rather than internal definitional equivalence.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Standard RL assumptions (Markov property, reward shaping, policy gradient validity) hold for the multi-agent vision-based setting.
- domain assumption Local forward-facing depth observations contain sufficient information for decentralized coordination in the tested environments.
Reference graph
Works this paper leans on
-
[1]
Adaptive behavior2(2), 189–218 (1993)
Kube, C.R., Zhang, H.: Collective robotics: From social insects to robots. Adaptive behavior2(2), 189–218 (1993)
1993
-
[2]
In: International Workshop on Swarm Robotics, pp
Beni, G.: From swarm intelligence to swarm robotics. In: International Workshop on Swarm Robotics, pp. 1–9 (2004). Springer
2004
-
[3]
In: International Workshop on Swarm Robotics, pp
Şahin, E.: Swarm robotics: From sources of inspiration to domains of application. In: International Workshop on Swarm Robotics, pp. 10–20 (2004). Springer
2004
-
[4]
Technical report (2019)
Kang, C.-k., Fahimi, F., Griffin, R., Landrum, D.B., Mesmer, B., Zhang, G., Lee, T., Aono, H., Pohly, J., McCain, J., et al.: Marsbee-swarm of flapping wing flyers for enhanced mars exploration: Nasa innovative advanced concepts (niac)-phase i. Technical report (2019)
2019
-
[5]
Science robotics8(80), 9548 (2023)
Arm, P., Waibel, G., Preisig, J., Tuna, T., Zhou, R., Bickel, V., Ligeza, G., Miki, T., Kehl, F., Kolvenbach, H.,et al.: Scientific exploration of challenging planetary analog environments with a team of legged robots. Science robotics8(80), 9548 (2023)
2023
-
[6]
Current opinion in biotechnology45, 76–84 (2017)
Bayat, B., Crasta, N., Crespi, A., Pascoal, A.M., Ijspeert, A.: Environmental monitoring using autonomous vehicles: a survey of recent searching techniques. Current opinion in biotechnology45, 76–84 (2017)
2017
-
[7]
IEEE Robotics & Automation Magazine19(1), 24–39 (2012)
Dunbabin, M., Marques, L.: Robots for environmental monitoring: Significant advancements and applications. IEEE Robotics & Automation Magazine19(1), 24–39 (2012)
2012
-
[8]
Autonomous Robots47(1), 77–93 (2023)
Horyna, J., Baca, T., Walter, V., Albani, D., Hert, D., Ferrante, E., Saska, M.: Decentralized swarms of unmanned aerial vehicles for search and rescue operations without explicit communication. Autonomous Robots47(1), 77–93 (2023)
2023
-
[9]
Science robotics3(14), 7650 (2018)
Yang, G.-Z., Bellingham, J., Dupont, P.E., Fischer, P., Floridi, L., Full, R., Jacob- stein, N., Kumar, V., McNutt, M., Merrifield, R.,et al.: The grand challenges of science robotics. Science robotics3(14), 7650 (2018)
2018
-
[10]
Science robotics5(49), 4385 (2020)
Dorigo, M., Theraulaz, G., Trianni, V.: Reflections on the future of swarm robotics. Science robotics5(49), 4385 (2020)
2020
-
[11]
In: Proceedings of the 14th Annual Conference on Computer Graphics and Interactive Techniques, pp
Reynolds, C.W.: Flocks, herds and schools: A distributed behavioral model. In: Proceedings of the 14th Annual Conference on Computer Graphics and Interactive Techniques, pp. 25–34 (1987)
1987
-
[12]
Physical review letters75(6), 1226 29 (1995)
Vicsek, T., Czirók, A., Ben-Jacob, E., Cohen, I., Shochet, O.: Novel type of phase transition in a system of self-driven particles. Physical review letters75(6), 1226 29 (1995)
1995
-
[13]
Nature433(7025), 513–516 (2005)
Couzin, I.D., Krause, J., Franks, N.R., Levin, S.A.: Effective leadership and decision-making in animal groups on the move. Nature433(7025), 513–516 (2005)
2005
-
[14]
In: Proceedings of the 9th Conference on Autonomous Robot Systems and Competitions, vol
Mondada, F., Bonani, M., Raemy, X., Pugh, J., Cianci, C., Klaptocz, A., Mag- nenat, S., Zufferey, J.-C., Floreano, D., Martinoli, A.,et al.: The e-puck, a robot designed for education in engineering. In: Proceedings of the 9th Conference on Autonomous Robot Systems and Competitions, vol. 1, pp. 59–65 (2009). Castelo Branco: IPCB, Instituto Politécnico de ...
2009
-
[15]
Science345(6198), 795–799 (2014)
Rubenstein, M., Cornejo, A., Nagpal, R.: Programmable self-assembly in a thousand-robot swarm. Science345(6198), 795–799 (2014)
2014
-
[16]
Foundations and Trends®in Robotics7(1-2), 1–179 (2018)
Osa, T., Pajarinen, J., Neumann, G., Bagnell, J.A., Abbeel, P., Peters, J.,et al.: An algorithmic perspective on imitation learning. Foundations and Trends®in Robotics7(1-2), 1–179 (2018)
2018
-
[17]
Adaptive computation and machine learning
Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction, 2nd edn. Adaptive computation and machine learning. The MIT Press, Cambridge (2018)
2018
-
[18]
Frontiers in Robotics and AI10, 1134841 (2023)
Kuckling, J.: Recent trends in robot learning and evolution for swarm robotics. Frontiers in Robotics and AI10, 1134841 (2023)
2023
-
[19]
Science Robotics4(26), 5872 (2019)
Hwangbo, J., Lee, J., Dosovitskiy, A., Bellicoso, D., Tsounis, V., Koltun, V., Hutter, M.: Learning agile and dynamic motor skills for legged robots. Science Robotics4(26), 5872 (2019)
2019
-
[20]
Science Robotics7(62), 2822 (2022)
Miki, T., Lee, J., Hwangbo, J., Wellhausen, L., Koltun, V., Hutter, M.: Learn- ing robust perceptive locomotion for quadrupedal robots in the wild. Science Robotics7(62), 2822 (2022)
2022
-
[21]
IEEE Robotics and Automation Letters6(3), 4257–4264 (2021)
Fuchs, F., Song, Y., Kaufmann, E., Scaramuzza, D., Dürr, P.: Super-human per- formance in gran turismo sport using deep reinforcement learning. IEEE Robotics and Automation Letters6(3), 4257–4264 (2021)
2021
-
[22]
Nature 602(7896), 223–228 (2022)
Wurman, P.R., Barrett, S., Kawamoto, K., MacGlashan, J., Subramanian, K., Walsh, T.J., Capobianco, R., Devlic, A., Eckert, F., Fuchs, F.,et al.: Outrac- ing champion gran turismo drivers with deep reinforcement learning. Nature 602(7896), 223–228 (2022)
2022
-
[23]
Nature 620(7976), 982–987 (2023)
Kaufmann,E.,Bauersfeld,L.,Loquercio,A.,Müller,M.,Koltun,V.,Scaramuzza, D.: Champion-level drone racing using deep reinforcement learning. Nature 620(7976), 982–987 (2023)
2023
-
[24]
Science Robotics8(82), 1462 (2023)
Song, Y., Romero, A., Müller, M., Koltun, V., Scaramuzza, D.: Reaching the limit 30 in autonomous racing: Optimal control versus reinforcement learning. Science Robotics8(82), 1462 (2023)
2023
-
[25]
Huh, D., Mohapatra, P.: Multi-agent Reinforcement Learning: A Comprehensive Survey. arXiv. arXiv:2312.10256 [cs] (2023). http://arxiv.org/abs/2312.10256 Accessed 2024-03-10
arXiv 2023
-
[26]
Sensors23(7), 3625 (2023) https://doi.org/10.3390/ s23073625
Orr, J., Dutta, A.: Multi-Agent Deep Reinforcement Learning for Multi-Robot Applications: A Survey. Sensors23(7), 3625 (2023) https://doi.org/10.3390/ s23073625 . Accessed 2024-10-08
2023
-
[27]
Artificial Intelligence Review57(2), 41 (2024) https://doi.org/10
Chung, J., Fayyad, J., Younes, Y.A., Najjaran, H.: Learning team-based nav- igation: a review of deep reinforcement learning techniques for multi-agent pathfinding. Artificial Intelligence Review57(2), 41 (2024) https://doi.org/10. 1007/s10462-023-10670-6 . Accessed 2024-10-08
2024
-
[28]
Current Robotics Reports 3(4), 239–257 (2022) https://doi.org/10.1007/s43154-022-00091-8
Wang, Y., Damani, M., Wang, P., Cao, Y., Sartoretti, G.: Distributed Rein- forcement Learning for Robot Teams: a Review. Current Robotics Reports 3(4), 239–257 (2022) https://doi.org/10.1007/s43154-022-00091-8 . Accessed 2024-03-14
-
[29]
Basic Books, New York (1954)
Piaget, J.: The Construction of Reality in the Child. Basic Books, New York (1954)
1954
-
[30]
Baillargeon, R.: Innate ideas revisited: For a principle of persistence in infants’ physical reasoning. Perspectives on Psychological Science3(1), 2–13 (2008) https: //doi.org/10.1111/j.1745-6916.2008.00056.x
-
[31]
arXiv preprint arXiv:2502.11831 (2025)
Garrido,Q.,Ballas,N.,Assran,M.,Bardes,A.,Najman,L.,Rabbat,M.,Dupoux, E., LeCun, Y.: Intuitive physics understanding emerges from self-supervised pretraining on natural videos. arXiv preprint arXiv:2502.11831 (2025)
arXiv 2025
-
[32]
Trends in cognitive sciences21(9), 649–665 (2017)
Ullman, T.D., Spelke, E., Battaglia, P., Tenenbaum, J.B.: Mind games: Game engines as an architecture for intuitive physics. Trends in cognitive sciences21(9), 649–665 (2017)
2017
-
[33]
20668–20696 (2022)
Suh, H.J., Simchowitz, M., Zhang, K., Tedrake, R.: Do differentiable simulators give better policy gradients? In: International Conference on Machine Learning, pp. 20668–20696 (2022). PMLR
2022
-
[34]
In: International Conference on Learning Representations (2021)
Xu, J., Makoviychuk, V., Narang, Y., Ramos, F., Matusik, W., Garg, A., Mack- lin, M.: Accelerated policy learning with parallel differentiable simulation. In: International Conference on Learning Representations (2021)
2021
-
[35]
Science Robotics3(20), 3536 (2018) 31
Vásárhelyi, G., Virágh, C., Somorjai, G., Nepusz, T., Eiben, A.E., Vicsek, T.: Optimized flocking of autonomous drones in confined environments. Science Robotics3(20), 3536 (2018) 31
2018
-
[36]
Science Robotics 7(66), 5954 (2022)
Zhou, X., Wen, X., Wang, Z., Gao, Y., Li, H., Wang, Q., Yang, T., Lu, H., Cao, Y., Xu, C.,et al.: Swarm of micro flying robots in the wild. Science Robotics 7(66), 5954 (2022)
2022
-
[37]
Springer, Berlin, Heidelberg (2008)
Trianni, V.: Evolutionary Swarm Robotics: Evolving Self-organising Behaviours in Groups of Autonomous Robots. Springer, Berlin, Heidelberg (2008)
2008
-
[38]
Swarm Intelligence8(2), 89–112 (2014)
Francesca, G., Brambilla, M., Brutschy, A., Trianni, V., Birattari, M.: Automode: A novel approach to the automatic design of control software for robot swarms. Swarm Intelligence8(2), 89–112 (2014)
2014
-
[39]
SN Computer Science3(2), 136 (2022)
Kuckling, J., Van Pelt, V., Birattari, M.: Automode-cedrata: automatic design of behavior trees for controlling a swarm of robots with communication capabilities. SN Computer Science3(2), 136 (2022)
2022
-
[40]
In: Exper- iments with the Mini-Robot Khepera, Proceedings of the First International Khepera Workshop, vol
Mondada, F., Franzi, E., Guignard, A.: The development of khepera. In: Exper- iments with the Mini-Robot Khepera, Proceedings of the First International Khepera Workshop, vol. 1, pp. 7–14 (1999). sn
1999
-
[41]
The International Journal of Robotics Research33(8), 1145–1161 (2014)
Gauci, M., Chen, J., Li, W., Dodd, T.J., Groß, R.: Self-organized aggregation without computation. The International Journal of Robotics Research33(8), 1145–1161 (2014)
2014
-
[42]
In: European Conference on Artificial Life, pp
Garnier, S., Jost, C., Jeanson, R., Gautrais, J., Asadpour, M., Caprari, G., Ther- aulaz, G.: Aggregation behaviour as a source of collective decision in a group of cockroach-like-robots. In: European Conference on Artificial Life, pp. 169–178 (2005). Springer
2005
-
[43]
PloS one11(3), 0151834 (2016)
Duarte, M., Costa, V., Gomes, J., Rodrigues, T., Silva, F., Oliveira, S.M., Chris- tensen, A.L.: Evolution of collective behaviors for a real swarm of aquatic surface robots. PloS one11(3), 0151834 (2016)
2016
-
[44]
In: OCEANS 2018 MTS/IEEE Charleston, pp
Vallegra, F., Mateo, D., Tokić, G., Bouffanais, R., Yue, D.K.: Gradual collective upgrade of a swarm of autonomous buoys for dynamic ocean monitoring. In: OCEANS 2018 MTS/IEEE Charleston, pp. 1–7 (2018). IEEE
2018
-
[45]
In: International Conference on Learning Representations (2019)
Baker, B., Kanitscheider, I., Markov, T., Wu, Y., Powell, G., McGrew, B., Mor- datch, I.: Emergent tool use from multi-agent autocurricula. In: International Conference on Learning Representations (2019)
2019
-
[46]
OpenAI, Berner, C., Brockman, G., Chan, B., Cheung, V., Dębiak, P., Dennison, C., Farhi, D., Fischer, Q., Hashme, S., Hesse, C., Józefowicz, R., Gray, S., Olsson, C., Pachocki, J., Petrov, M., Oliveira Pinto, H.P., Raiman, J., Salimans, T., Schlatter, J., Schneider, J., Sidor, S., Sutskever, I., Tang, J., Wolski, F., Zhang, S.: Dota 2 with large scale dee...
Pith/arXiv arXiv 2019
-
[47]
Science Robotics7(69), 0235 (2022)
Liu, S., Lever, G., Wang, Z., Merel, J., Eslami, S.A., Hennes, D., Czarnecki, 32 W.M., Tassa, Y., Omidshafiei, S., Abdolmaleki, A.,et al.: From motor control to team play in simulated humanoid football. Science Robotics7(69), 0235 (2022)
2022
-
[48]
IEEE Robotics and Automation Letters2(2), 656–663 (2017)
Long, P., Liu, W., Pan, J.: Deep-learned collision avoidance policy for distributed multiagent navigation. IEEE Robotics and Automation Letters2(2), 656–663 (2017)
2017
-
[49]
Robot motion planning in learned latent spaces,
Sartoretti, G., Kerr, J., Shi, Y., Wagner, G., Kumar, T.K.S., Koenig, S., Choset, H.: PRIMAL: Pathfinding via Reinforcement and Imitation Multi-Agent Learn- ing. IEEE Robotics and Automation Letters4(3), 2378–2385 (2019) https: //doi.org/10.1109/LRA.2019.2903261 . Accessed 2024-03-14
-
[50]
In: Proceedings of the 6th International Conference on Learning Representations
Bansal, T., Pachocki, J., Sidor, S., Sutskever, I., Mordatch, I.: EMERGENT COMPLEXITY VIA MULTI-AGENT COMPETITION. In: Proceedings of the 6th International Conference on Learning Representations. Conference Track Proceedings, Vancouver, BC, Canada (2018)
2018
-
[51]
In: Thirty- fifth Conference on Neural Information Processing Systems Datasets and Bench- marks Track (Round 1) (2021).https://openreview.net/forum?id=VdvDlnnjzIN
Freeman, C.D., Frey, E., Raichuk, A., Girgin, S., Mordatch, I., Bachem, O.: Brax - a differentiable physics engine for large scale rigid body simulation. In: Thirty- fifth Conference on Neural Information Processing Systems Datasets and Bench- marks Track (Round 1) (2021).https://openreview.net/forum?id=VdvDlnnjzIN
2021
-
[52]
arXiv preprint arXiv:2203.00806 (2022)
Howell, T., Le Cleac’h, S., Bruedigam, J., Kolter, Z., Schwager, M., Manchester, Z.: Dojo: A differentiable simulator for robotics. arXiv preprint arXiv:2203.00806 (2022)
arXiv 2022
-
[53]
ICLR (2022)
Ren, J., Yu, C., Chen, S., Ma, X., Pan, L., Liu, Z.: Diffmimic: Efficient motion mimicking with differentiable physics. ICLR (2022)
2022
-
[54]
Nature (2025) https://doi.org/10.1038/ s41586-025-08744-2
Hafner, D., Pasukonis, J., Ba, J., Lillicrap, T.: Mastering diverse con- trol tasks through world models. Nature (2025) https://doi.org/10.1038/ s41586-025-08744-2
2025
-
[55]
In: Conference on Robot Learning, pp
Wu, P., Escontrela, A., Hafner, D., Abbeel, P., Goldberg, K.: Daydreamer: World models for physical robot learning. In: Conference on Robot Learning, pp. 2226– 2240 (2023). PMLR
2023
-
[56]
In: ICML (2022)
Hansen, N., Wang, X., Su, H.: Temporal difference learning for model predictive control. In: ICML (2022)
2022
-
[57]
arXiv preprint arXiv:2501.10100 (2025)
Li, C., Krause, A., Hutter, M.: Robotic world model: A neural network simula- tor for robust policy optimization in robotics. arXiv preprint arXiv:2501.10100 (2025)
arXiv 2025
-
[58]
In: 8th Annual Conference on Robot Learning (2024)
Song, Y., Kim, S., Scaramuzza, D.: Learning quadruped locomotion using dif- ferentiable simulation. In: 8th Annual Conference on Robot Learning (2024). https://openreview.net/forum?id=XopATjibyz 33
2024
-
[59]
In: 2023 IEEE Inter- national Conference on Robotics and Automation (ICRA), pp
Wiedemann, N., Wüest, V., Loquercio, A., Müller, M., Floreano, D., Scaramuzza, D.: Training efficient controllers via analytic policy gradient. In: 2023 IEEE Inter- national Conference on Robotics and Automation (ICRA), pp. 1349–1356 (2023). IEEE
2023
-
[60]
Nature Machine Intelligence, 1–13 (2025)
Zhang, Y., Hu, Y., Song, Y., Zou, D., Lin, W.: Learning vision-based agile flight via differentiable physics. Nature Machine Intelligence, 1–13 (2025)
2025
-
[61]
arXiv preprint arXiv:1707.06347 (2017)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Pith/arXiv arXiv 2017
-
[62]
Makoviychuk, V., Wawrzyniak, L., Guo, Y., Lu, M., Storey, K., Macklin, M., Hoeller, D., Rudin, N., Allshire, A., Handa, A., State, G.: Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning (2021)
2021
-
[63]
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015).http://arxiv.org/abs/1412.6980
Pith/arXiv arXiv 2015
-
[64]
MIT Press, Cambridge, Mass
Raibert, M.H.: Legged Robots that Balance. MIT Press, Cambridge, Mass. (1986) 34 Supplementary Materials Table of Contents The supplementary information in this document includes: Supplementary Sections 1-3 Supplementary Tables 1-2 Other supplementary information includes: Supplementary Video 1 Supplementary Tables Table 1: Training Hyperparameters for Lo...
1986
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.