Asymmetric physics enables efficient learning in quadrupedal robot swarms

Danping Zou; Feng Yu; Kangyu Wang; Tianchi Liu; Weiyao Lin; Yuang Zhang; Yu Hu; Yunlong Song; Zelin Ni; Zhihao He

arxiv: 2606.23153 · v1 · pith:OM3UZ4LRnew · submitted 2026-06-22 · 💻 cs.RO

Asymmetric physics enables efficient learning in quadrupedal robot swarms

Yuang Zhang , Yunlong Song , Zhihao He , Zelin Ni , Kangyu Wang , Tianchi Liu , Yu Hu , Feng Yu

show 2 more authors

Danping Zou Weiyao Lin

This is my paper

Pith reviewed 2026-06-26 08:29 UTC · model grok-4.3

classification 💻 cs.RO

keywords asymmetric physicsquadrupedal robot swarmsdecentralized controlend-to-end learningvision-based navigationdifferentiable surrogate modelsmulti-robot coordinationcontact-rich locomotion

0 comments

The pith

Asymmetric physics allows efficient end-to-end learning of decentralized vision-based control for large quadrupedal robot swarms.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that asymmetric physics—pairing a high-fidelity non-differentiable simulator for realistic robot motion and contacts with differentiable surrogate models for policy gradients—makes it possible to train vision-based decentralized controllers for swarms of up to 512 quadrupeds. This separation addresses the inefficiency of standard reinforcement learning when visual perception, dense robot-robot interactions, and contact-rich locomotion must be learned together. The resulting policies enable coordinated navigation in cluttered environments without explicit communication or central planning. The policies generalize across multiple environment types and transfer zero-shot to physical robots in real-world scenarios.

Core claim

Asymmetric physics enables efficient end-to-end learning of vision-based, decentralized control in large swarms of quadrupedal robots. During training, quadrupeds interact in shared environments, where a high-fidelity, non-differentiable simulator generates realistic motion and contact dynamics, and differentiable surrogate models provide gradients for navigation and locomotion policies. This separation enables up to 512 quadrupeds to learn coordinated navigation policies in obstacle-rich environments. At deployment, each robot acts from a single forward-facing depth camera, without explicit communication, centralized planning, or global maps. The policies generalize across forests, bridges,

What carries the argument

asymmetric physics, the separation of a high-fidelity non-differentiable simulator for generating realistic motion and contact dynamics from differentiable surrogate models that supply gradients for the navigation and locomotion policies.

Load-bearing premise

The differentiable surrogate models must supply gradients that remain sufficiently accurate and unbiased relative to the high-fidelity non-differentiable simulator across the full range of contact-rich interactions and visual observations encountered during training.

What would settle it

Training runs that compare symmetric physics setups (fully non-differentiable or fully differentiable) against the asymmetric setup and show that only the asymmetric version scales policies successfully to 512 robots with successful zero-shot real-world transfer would support the claim; failure of the asymmetric method to produce scalable or transferable policies would falsify it.

read the original abstract

Animal collectives navigate cluttered environments through local coordination, yet robot swarms still struggle to reproduce this capability in the physical world. End-to-end learning offers a route to such coordination, but scaling it to embodied swarms remains difficult: standard sampling-based reinforcement learning becomes inefficient when visual perception, dense robot-robot interaction, and contact-rich locomotion must be learned together. Here we show that asymmetric physics enables efficient end-to-end learning of vision-based, decentralized control in large swarms of quadrupedal robots. During training, quadrupeds interact in shared environments, where a high-fidelity, non-differentiable simulator generates realistic motion and contact dynamics, and differentiable surrogate models provide gradients for navigation and locomotion policies. This separation enables up to 512 quadrupeds to learn coordinated navigation policies in obstacle-rich environments. At deployment, each robot acts from a single forward-facing depth camera, without explicit communication, centralized planning, or global maps. The policies generalize across forests, bridges, enclosures, narrow passages, and mazes, and zero-shot transfer to six physical quadrupeds across five real-world scenarios. The resulting swarms exhibit predictive avoidance, right-side yielding, pausing before bottlenecks, and wall following, showing that asymmetric physics enables efficient training of scalable decentralized control policies for quadrupedal robot swarms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Asymmetric physics lets them scale vision-based quadruped swarm training to 512 agents with zero-shot hardware transfer, but the surrogate gradient accuracy for contact-rich cases is the unverified premise.

read the letter

The paper shows that pairing a high-fidelity non-differentiable simulator for motion and contacts with differentiable surrogate models for gradients allows end-to-end training of decentralized vision policies across swarms of up to 512 quadrupeds. The policies run on single forward depth cameras, require no communication or maps, and transfer zero-shot to physical robots while producing behaviors such as predictive avoidance and right-side yielding.

This is a practical extension of existing RL ideas to embodied swarm scale. The separation of physics modeling addresses a real bottleneck: standard sampling-based methods slow down when vision, dense interactions, and contact-rich locomotion are learned together. The reported generalization across forests, bridges, mazes, and five real scenarios gives concrete evidence that the approach can produce usable coordination without centralized planning.

The load-bearing assumption is that the surrogates supply gradients accurate enough across the full range of contacts and visual observations. The abstract describes the split but supplies no error metrics, construction details, or ablations on gradient mismatch for multi-robot collisions or terrain interactions. If the full paper contains those checks, the scalability claim strengthens; if the mismatch grows in those regimes, the reported behaviors and 512-agent results would not necessarily follow from the asymmetric setup.

The work is aimed at researchers in multi-robot learning and embodied swarm systems who need methods that scale under realistic physics constraints. Readers focused on practical deployment of decentralized teams will find the empirical demonstration relevant.

It deserves peer review because the outcome, if the methods hold, has direct implications for larger robot collectives. I would send it out.

Referee Report

2 major / 2 minor

Summary. The manuscript claims that an asymmetric physics setup—pairing a high-fidelity non-differentiable simulator for motion and contact dynamics with differentiable surrogate models for policy gradients—enables efficient end-to-end learning of vision-based decentralized control for swarms of up to 512 quadrupedal robots. The resulting policies generalize across multiple simulated environments and achieve zero-shot transfer to six physical robots in five real-world scenarios, producing emergent behaviors such as predictive avoidance and right-side yielding without explicit communication or centralized planning.

Significance. If the central empirical claim is supported by rigorous validation, the result would be significant for scalable embodied swarm learning: it offers a concrete route to training contact-rich, vision-based multi-agent policies that standard sampling-based RL cannot handle efficiently at this scale. The reported zero-shot sim-to-real transfer and emergence of coordinated local behaviors would constitute a notable demonstration in the field.

major comments (2)

[Methods / Training Setup] The description of the differentiable surrogate models supplies no quantitative error metrics, bias analysis, or ablation studies on gradient fidelity relative to the high-fidelity simulator for contact-rich regimes (multi-robot collisions, terrain contacts, depth observations). This assumption is load-bearing for the scalability claim to 512 agents and the reported emergence of yielding behaviors.
[Results] The results section reports generalization across forests, bridges, enclosures, narrow passages, and mazes together with zero-shot transfer to physical robots, yet provides no quantitative performance metrics, success rates, or statistical comparisons against baselines that would allow assessment of the strength of these claims.

minor comments (2)

Notation for the surrogate models and the precise interface between the non-differentiable simulator and the gradient-providing surrogates should be defined more explicitly, ideally with a diagram or pseudocode.
The abstract states specific numbers (512 quadrupeds, six physical robots, five scenarios); the main text should include corresponding tables or figures with error bars or confidence intervals.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight important gaps in validation and quantification. We address each major point below and commit to revisions that strengthen the manuscript without altering its core claims.

read point-by-point responses

Referee: [Methods / Training Setup] The description of the differentiable surrogate models supplies no quantitative error metrics, bias analysis, or ablation studies on gradient fidelity relative to the high-fidelity simulator for contact-rich regimes (multi-robot collisions, terrain contacts, depth observations). This assumption is load-bearing for the scalability claim to 512 agents and the reported emergence of yielding behaviors.

Authors: We agree that the manuscript currently lacks explicit quantitative validation of surrogate gradient fidelity in contact-rich regimes. In revision we will add error metrics (MSE between surrogate and finite-difference gradients on contact forces and depth), bias analysis, and ablations showing policy performance sensitivity to gradient accuracy. These will appear in a dedicated Methods subsection with supporting figures. revision: yes
Referee: [Results] The results section reports generalization across forests, bridges, enclosures, narrow passages, and mazes together with zero-shot transfer to physical robots, yet provides no quantitative performance metrics, success rates, or statistical comparisons against baselines that would allow assessment of the strength of these claims.

Authors: The referee is correct that aggregated quantitative metrics and baseline comparisons are absent. We will revise the Results section to report success rates, collision statistics, and navigation efficiency across environments, together with statistical comparisons (multiple seeds) against independent RL and non-surrogate baselines. Real-world transfer will include success counts over the five scenarios. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical demonstration of asymmetric simulator setup

full rationale

The paper presents an empirical training method that separates a high-fidelity non-differentiable simulator (for motion and contacts) from differentiable surrogate models (for policy gradients). Reported outcomes consist of policy generalization across simulated environments and zero-shot transfer to physical robots. No equations, fitted parameters renamed as predictions, or self-citation chains are shown that reduce the central claim to its own inputs by construction. The result is a standard empirical robotics demonstration whose validity rests on external benchmarks (sim-to-real transfer) rather than internal definitional equivalence.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard assumptions of reinforcement learning and sim-to-real transfer plus the domain assumption that local depth images suffice for coordination without communication.

axioms (2)

domain assumption Standard RL assumptions (Markov property, reward shaping, policy gradient validity) hold for the multi-agent vision-based setting.
Invoked implicitly by the use of end-to-end learning for navigation and locomotion policies.
domain assumption Local forward-facing depth observations contain sufficient information for decentralized coordination in the tested environments.
Stated in the deployment description: each robot acts from a single forward-facing depth camera without explicit communication.

pith-pipeline@v0.9.1-grok · 5781 in / 1378 out tokens · 17148 ms · 2026-06-26T08:29:03.661458+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

64 extracted references · 3 canonical work pages

[1]

Adaptive behavior2(2), 189–218 (1993)

Kube, C.R., Zhang, H.: Collective robotics: From social insects to robots. Adaptive behavior2(2), 189–218 (1993)

1993
[2]

In: International Workshop on Swarm Robotics, pp

Beni, G.: From swarm intelligence to swarm robotics. In: International Workshop on Swarm Robotics, pp. 1–9 (2004). Springer

2004
[3]

In: International Workshop on Swarm Robotics, pp

Şahin, E.: Swarm robotics: From sources of inspiration to domains of application. In: International Workshop on Swarm Robotics, pp. 10–20 (2004). Springer

2004
[4]

Technical report (2019)

Kang, C.-k., Fahimi, F., Griffin, R., Landrum, D.B., Mesmer, B., Zhang, G., Lee, T., Aono, H., Pohly, J., McCain, J., et al.: Marsbee-swarm of flapping wing flyers for enhanced mars exploration: Nasa innovative advanced concepts (niac)-phase i. Technical report (2019)

2019
[5]

Science robotics8(80), 9548 (2023)

Arm, P., Waibel, G., Preisig, J., Tuna, T., Zhou, R., Bickel, V., Ligeza, G., Miki, T., Kehl, F., Kolvenbach, H.,et al.: Scientific exploration of challenging planetary analog environments with a team of legged robots. Science robotics8(80), 9548 (2023)

2023
[6]

Current opinion in biotechnology45, 76–84 (2017)

Bayat, B., Crasta, N., Crespi, A., Pascoal, A.M., Ijspeert, A.: Environmental monitoring using autonomous vehicles: a survey of recent searching techniques. Current opinion in biotechnology45, 76–84 (2017)

2017
[7]

IEEE Robotics & Automation Magazine19(1), 24–39 (2012)

Dunbabin, M., Marques, L.: Robots for environmental monitoring: Significant advancements and applications. IEEE Robotics & Automation Magazine19(1), 24–39 (2012)

2012
[8]

Autonomous Robots47(1), 77–93 (2023)

Horyna, J., Baca, T., Walter, V., Albani, D., Hert, D., Ferrante, E., Saska, M.: Decentralized swarms of unmanned aerial vehicles for search and rescue operations without explicit communication. Autonomous Robots47(1), 77–93 (2023)

2023
[9]

Science robotics3(14), 7650 (2018)

Yang, G.-Z., Bellingham, J., Dupont, P.E., Fischer, P., Floridi, L., Full, R., Jacob- stein, N., Kumar, V., McNutt, M., Merrifield, R.,et al.: The grand challenges of science robotics. Science robotics3(14), 7650 (2018)

2018
[10]

Science robotics5(49), 4385 (2020)

Dorigo, M., Theraulaz, G., Trianni, V.: Reflections on the future of swarm robotics. Science robotics5(49), 4385 (2020)

2020
[11]

In: Proceedings of the 14th Annual Conference on Computer Graphics and Interactive Techniques, pp

Reynolds, C.W.: Flocks, herds and schools: A distributed behavioral model. In: Proceedings of the 14th Annual Conference on Computer Graphics and Interactive Techniques, pp. 25–34 (1987)

1987
[12]

Physical review letters75(6), 1226 29 (1995)

Vicsek, T., Czirók, A., Ben-Jacob, E., Cohen, I., Shochet, O.: Novel type of phase transition in a system of self-driven particles. Physical review letters75(6), 1226 29 (1995)

1995
[13]

Nature433(7025), 513–516 (2005)

Couzin, I.D., Krause, J., Franks, N.R., Levin, S.A.: Effective leadership and decision-making in animal groups on the move. Nature433(7025), 513–516 (2005)

2005
[14]

In: Proceedings of the 9th Conference on Autonomous Robot Systems and Competitions, vol

Mondada, F., Bonani, M., Raemy, X., Pugh, J., Cianci, C., Klaptocz, A., Mag- nenat, S., Zufferey, J.-C., Floreano, D., Martinoli, A.,et al.: The e-puck, a robot designed for education in engineering. In: Proceedings of the 9th Conference on Autonomous Robot Systems and Competitions, vol. 1, pp. 59–65 (2009). Castelo Branco: IPCB, Instituto Politécnico de ...

2009
[15]

Science345(6198), 795–799 (2014)

Rubenstein, M., Cornejo, A., Nagpal, R.: Programmable self-assembly in a thousand-robot swarm. Science345(6198), 795–799 (2014)

2014
[16]

Foundations and Trends®in Robotics7(1-2), 1–179 (2018)

Osa, T., Pajarinen, J., Neumann, G., Bagnell, J.A., Abbeel, P., Peters, J.,et al.: An algorithmic perspective on imitation learning. Foundations and Trends®in Robotics7(1-2), 1–179 (2018)

2018
[17]

Adaptive computation and machine learning

Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction, 2nd edn. Adaptive computation and machine learning. The MIT Press, Cambridge (2018)

2018
[18]

Frontiers in Robotics and AI10, 1134841 (2023)

Kuckling, J.: Recent trends in robot learning and evolution for swarm robotics. Frontiers in Robotics and AI10, 1134841 (2023)

2023
[19]

Science Robotics4(26), 5872 (2019)

Hwangbo, J., Lee, J., Dosovitskiy, A., Bellicoso, D., Tsounis, V., Koltun, V., Hutter, M.: Learning agile and dynamic motor skills for legged robots. Science Robotics4(26), 5872 (2019)

2019
[20]

Science Robotics7(62), 2822 (2022)

Miki, T., Lee, J., Hwangbo, J., Wellhausen, L., Koltun, V., Hutter, M.: Learn- ing robust perceptive locomotion for quadrupedal robots in the wild. Science Robotics7(62), 2822 (2022)

2022
[21]

IEEE Robotics and Automation Letters6(3), 4257–4264 (2021)

Fuchs, F., Song, Y., Kaufmann, E., Scaramuzza, D., Dürr, P.: Super-human per- formance in gran turismo sport using deep reinforcement learning. IEEE Robotics and Automation Letters6(3), 4257–4264 (2021)

2021
[22]

Nature 602(7896), 223–228 (2022)

Wurman, P.R., Barrett, S., Kawamoto, K., MacGlashan, J., Subramanian, K., Walsh, T.J., Capobianco, R., Devlic, A., Eckert, F., Fuchs, F.,et al.: Outrac- ing champion gran turismo drivers with deep reinforcement learning. Nature 602(7896), 223–228 (2022)

2022
[23]

Nature 620(7976), 982–987 (2023)

Kaufmann,E.,Bauersfeld,L.,Loquercio,A.,Müller,M.,Koltun,V.,Scaramuzza, D.: Champion-level drone racing using deep reinforcement learning. Nature 620(7976), 982–987 (2023)

2023
[24]

Science Robotics8(82), 1462 (2023)

Song, Y., Romero, A., Müller, M., Koltun, V., Scaramuzza, D.: Reaching the limit 30 in autonomous racing: Optimal control versus reinforcement learning. Science Robotics8(82), 1462 (2023)

2023
[25]

Huh, D., Mohapatra, P.: Multi-agent Reinforcement Learning: A Comprehensive Survey. arXiv. arXiv:2312.10256 [cs] (2023). http://arxiv.org/abs/2312.10256 Accessed 2024-03-10

arXiv 2023
[26]

Sensors23(7), 3625 (2023) https://doi.org/10.3390/ s23073625

Orr, J., Dutta, A.: Multi-Agent Deep Reinforcement Learning for Multi-Robot Applications: A Survey. Sensors23(7), 3625 (2023) https://doi.org/10.3390/ s23073625 . Accessed 2024-10-08

2023
[27]

Artificial Intelligence Review57(2), 41 (2024) https://doi.org/10

Chung, J., Fayyad, J., Younes, Y.A., Najjaran, H.: Learning team-based nav- igation: a review of deep reinforcement learning techniques for multi-agent pathfinding. Artificial Intelligence Review57(2), 41 (2024) https://doi.org/10. 1007/s10462-023-10670-6 . Accessed 2024-10-08

2024
[28]

Current Robotics Reports 3(4), 239–257 (2022) https://doi.org/10.1007/s43154-022-00091-8

Wang, Y., Damani, M., Wang, P., Cao, Y., Sartoretti, G.: Distributed Rein- forcement Learning for Robot Teams: a Review. Current Robotics Reports 3(4), 239–257 (2022) https://doi.org/10.1007/s43154-022-00091-8 . Accessed 2024-03-14

work page doi:10.1007/s43154-022-00091-8 2022
[29]

Basic Books, New York (1954)

Piaget, J.: The Construction of Reality in the Child. Basic Books, New York (1954)

1954
[30]

Perspectives on Psychological Science3(1), 2–13 (2008) https: //doi.org/10.1111/j.1745-6916.2008.00056.x

Baillargeon, R.: Innate ideas revisited: For a principle of persistence in infants’ physical reasoning. Perspectives on Psychological Science3(1), 2–13 (2008) https: //doi.org/10.1111/j.1745-6916.2008.00056.x

work page doi:10.1111/j.1745-6916.2008.00056.x 2008
[31]

arXiv preprint arXiv:2502.11831 (2025)

Garrido,Q.,Ballas,N.,Assran,M.,Bardes,A.,Najman,L.,Rabbat,M.,Dupoux, E., LeCun, Y.: Intuitive physics understanding emerges from self-supervised pretraining on natural videos. arXiv preprint arXiv:2502.11831 (2025)

arXiv 2025
[32]

Trends in cognitive sciences21(9), 649–665 (2017)

Ullman, T.D., Spelke, E., Battaglia, P., Tenenbaum, J.B.: Mind games: Game engines as an architecture for intuitive physics. Trends in cognitive sciences21(9), 649–665 (2017)

2017
[33]

20668–20696 (2022)

Suh, H.J., Simchowitz, M., Zhang, K., Tedrake, R.: Do differentiable simulators give better policy gradients? In: International Conference on Machine Learning, pp. 20668–20696 (2022). PMLR

2022
[34]

In: International Conference on Learning Representations (2021)

Xu, J., Makoviychuk, V., Narang, Y., Ramos, F., Matusik, W., Garg, A., Mack- lin, M.: Accelerated policy learning with parallel differentiable simulation. In: International Conference on Learning Representations (2021)

2021
[35]

Science Robotics3(20), 3536 (2018) 31

Vásárhelyi, G., Virágh, C., Somorjai, G., Nepusz, T., Eiben, A.E., Vicsek, T.: Optimized flocking of autonomous drones in confined environments. Science Robotics3(20), 3536 (2018) 31

2018
[36]

Science Robotics 7(66), 5954 (2022)

Zhou, X., Wen, X., Wang, Z., Gao, Y., Li, H., Wang, Q., Yang, T., Lu, H., Cao, Y., Xu, C.,et al.: Swarm of micro flying robots in the wild. Science Robotics 7(66), 5954 (2022)

2022
[37]

Springer, Berlin, Heidelberg (2008)

Trianni, V.: Evolutionary Swarm Robotics: Evolving Self-organising Behaviours in Groups of Autonomous Robots. Springer, Berlin, Heidelberg (2008)

2008
[38]

Swarm Intelligence8(2), 89–112 (2014)

Francesca, G., Brambilla, M., Brutschy, A., Trianni, V., Birattari, M.: Automode: A novel approach to the automatic design of control software for robot swarms. Swarm Intelligence8(2), 89–112 (2014)

2014
[39]

SN Computer Science3(2), 136 (2022)

Kuckling, J., Van Pelt, V., Birattari, M.: Automode-cedrata: automatic design of behavior trees for controlling a swarm of robots with communication capabilities. SN Computer Science3(2), 136 (2022)

2022
[40]

In: Exper- iments with the Mini-Robot Khepera, Proceedings of the First International Khepera Workshop, vol

Mondada, F., Franzi, E., Guignard, A.: The development of khepera. In: Exper- iments with the Mini-Robot Khepera, Proceedings of the First International Khepera Workshop, vol. 1, pp. 7–14 (1999). sn

1999
[41]

The International Journal of Robotics Research33(8), 1145–1161 (2014)

Gauci, M., Chen, J., Li, W., Dodd, T.J., Groß, R.: Self-organized aggregation without computation. The International Journal of Robotics Research33(8), 1145–1161 (2014)

2014
[42]

In: European Conference on Artificial Life, pp

Garnier, S., Jost, C., Jeanson, R., Gautrais, J., Asadpour, M., Caprari, G., Ther- aulaz, G.: Aggregation behaviour as a source of collective decision in a group of cockroach-like-robots. In: European Conference on Artificial Life, pp. 169–178 (2005). Springer

2005
[43]

PloS one11(3), 0151834 (2016)

Duarte, M., Costa, V., Gomes, J., Rodrigues, T., Silva, F., Oliveira, S.M., Chris- tensen, A.L.: Evolution of collective behaviors for a real swarm of aquatic surface robots. PloS one11(3), 0151834 (2016)

2016
[44]

In: OCEANS 2018 MTS/IEEE Charleston, pp

Vallegra, F., Mateo, D., Tokić, G., Bouffanais, R., Yue, D.K.: Gradual collective upgrade of a swarm of autonomous buoys for dynamic ocean monitoring. In: OCEANS 2018 MTS/IEEE Charleston, pp. 1–7 (2018). IEEE

2018
[45]

In: International Conference on Learning Representations (2019)

Baker, B., Kanitscheider, I., Markov, T., Wu, Y., Powell, G., McGrew, B., Mor- datch, I.: Emergent tool use from multi-agent autocurricula. In: International Conference on Learning Representations (2019)

2019
[46]

OpenAI, Berner, C., Brockman, G., Chan, B., Cheung, V., Dębiak, P., Dennison, C., Farhi, D., Fischer, Q., Hashme, S., Hesse, C., Józefowicz, R., Gray, S., Olsson, C., Pachocki, J., Petrov, M., Oliveira Pinto, H.P., Raiman, J., Salimans, T., Schlatter, J., Schneider, J., Sidor, S., Sutskever, I., Tang, J., Wolski, F., Zhang, S.: Dota 2 with large scale dee...

Pith/arXiv arXiv 2019
[47]

Science Robotics7(69), 0235 (2022)

Liu, S., Lever, G., Wang, Z., Merel, J., Eslami, S.A., Hennes, D., Czarnecki, 32 W.M., Tassa, Y., Omidshafiei, S., Abdolmaleki, A.,et al.: From motor control to team play in simulated humanoid football. Science Robotics7(69), 0235 (2022)

2022
[48]

IEEE Robotics and Automation Letters2(2), 656–663 (2017)

Long, P., Liu, W., Pan, J.: Deep-learned collision avoidance policy for distributed multiagent navigation. IEEE Robotics and Automation Letters2(2), 656–663 (2017)

2017
[49]

Robot motion planning in learned latent spaces,

Sartoretti, G., Kerr, J., Shi, Y., Wagner, G., Kumar, T.K.S., Koenig, S., Choset, H.: PRIMAL: Pathfinding via Reinforcement and Imitation Multi-Agent Learn- ing. IEEE Robotics and Automation Letters4(3), 2378–2385 (2019) https: //doi.org/10.1109/LRA.2019.2903261 . Accessed 2024-03-14

work page doi:10.1109/lra.2019.2903261 2019
[50]

In: Proceedings of the 6th International Conference on Learning Representations

Bansal, T., Pachocki, J., Sidor, S., Sutskever, I., Mordatch, I.: EMERGENT COMPLEXITY VIA MULTI-AGENT COMPETITION. In: Proceedings of the 6th International Conference on Learning Representations. Conference Track Proceedings, Vancouver, BC, Canada (2018)

2018
[51]

In: Thirty- fifth Conference on Neural Information Processing Systems Datasets and Bench- marks Track (Round 1) (2021).https://openreview.net/forum?id=VdvDlnnjzIN

Freeman, C.D., Frey, E., Raichuk, A., Girgin, S., Mordatch, I., Bachem, O.: Brax - a differentiable physics engine for large scale rigid body simulation. In: Thirty- fifth Conference on Neural Information Processing Systems Datasets and Bench- marks Track (Round 1) (2021).https://openreview.net/forum?id=VdvDlnnjzIN

2021
[52]

arXiv preprint arXiv:2203.00806 (2022)

Howell, T., Le Cleac’h, S., Bruedigam, J., Kolter, Z., Schwager, M., Manchester, Z.: Dojo: A differentiable simulator for robotics. arXiv preprint arXiv:2203.00806 (2022)

arXiv 2022
[53]

ICLR (2022)

Ren, J., Yu, C., Chen, S., Ma, X., Pan, L., Liu, Z.: Diffmimic: Efficient motion mimicking with differentiable physics. ICLR (2022)

2022
[54]

Nature (2025) https://doi.org/10.1038/ s41586-025-08744-2

Hafner, D., Pasukonis, J., Ba, J., Lillicrap, T.: Mastering diverse con- trol tasks through world models. Nature (2025) https://doi.org/10.1038/ s41586-025-08744-2

2025
[55]

In: Conference on Robot Learning, pp

Wu, P., Escontrela, A., Hafner, D., Abbeel, P., Goldberg, K.: Daydreamer: World models for physical robot learning. In: Conference on Robot Learning, pp. 2226– 2240 (2023). PMLR

2023
[56]

In: ICML (2022)

Hansen, N., Wang, X., Su, H.: Temporal difference learning for model predictive control. In: ICML (2022)

2022
[57]

arXiv preprint arXiv:2501.10100 (2025)

Li, C., Krause, A., Hutter, M.: Robotic world model: A neural network simula- tor for robust policy optimization in robotics. arXiv preprint arXiv:2501.10100 (2025)

arXiv 2025
[58]

In: 8th Annual Conference on Robot Learning (2024)

Song, Y., Kim, S., Scaramuzza, D.: Learning quadruped locomotion using dif- ferentiable simulation. In: 8th Annual Conference on Robot Learning (2024). https://openreview.net/forum?id=XopATjibyz 33

2024
[59]

In: 2023 IEEE Inter- national Conference on Robotics and Automation (ICRA), pp

Wiedemann, N., Wüest, V., Loquercio, A., Müller, M., Floreano, D., Scaramuzza, D.: Training efficient controllers via analytic policy gradient. In: 2023 IEEE Inter- national Conference on Robotics and Automation (ICRA), pp. 1349–1356 (2023). IEEE

2023
[60]

Nature Machine Intelligence, 1–13 (2025)

Zhang, Y., Hu, Y., Song, Y., Zou, D., Lin, W.: Learning vision-based agile flight via differentiable physics. Nature Machine Intelligence, 1–13 (2025)

2025
[61]

arXiv preprint arXiv:1707.06347 (2017)

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)

Pith/arXiv arXiv 2017
[62]

Makoviychuk, V., Wawrzyniak, L., Guo, Y., Lu, M., Storey, K., Macklin, M., Hoeller, D., Rudin, N., Allshire, A., Handa, A., State, G.: Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning (2021)

2021
[63]

In: Bengio, Y., LeCun, Y

Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015).http://arxiv.org/abs/1412.6980

Pith/arXiv arXiv 2015
[64]

MIT Press, Cambridge, Mass

Raibert, M.H.: Legged Robots that Balance. MIT Press, Cambridge, Mass. (1986) 34 Supplementary Materials Table of Contents The supplementary information in this document includes: Supplementary Sections 1-3 Supplementary Tables 1-2 Other supplementary information includes: Supplementary Video 1 Supplementary Tables Table 1: Training Hyperparameters for Lo...

1986

[1] [1]

Adaptive behavior2(2), 189–218 (1993)

Kube, C.R., Zhang, H.: Collective robotics: From social insects to robots. Adaptive behavior2(2), 189–218 (1993)

1993

[2] [2]

In: International Workshop on Swarm Robotics, pp

Beni, G.: From swarm intelligence to swarm robotics. In: International Workshop on Swarm Robotics, pp. 1–9 (2004). Springer

2004

[3] [3]

In: International Workshop on Swarm Robotics, pp

Şahin, E.: Swarm robotics: From sources of inspiration to domains of application. In: International Workshop on Swarm Robotics, pp. 10–20 (2004). Springer

2004

[4] [4]

Technical report (2019)

Kang, C.-k., Fahimi, F., Griffin, R., Landrum, D.B., Mesmer, B., Zhang, G., Lee, T., Aono, H., Pohly, J., McCain, J., et al.: Marsbee-swarm of flapping wing flyers for enhanced mars exploration: Nasa innovative advanced concepts (niac)-phase i. Technical report (2019)

2019

[5] [5]

Science robotics8(80), 9548 (2023)

Arm, P., Waibel, G., Preisig, J., Tuna, T., Zhou, R., Bickel, V., Ligeza, G., Miki, T., Kehl, F., Kolvenbach, H.,et al.: Scientific exploration of challenging planetary analog environments with a team of legged robots. Science robotics8(80), 9548 (2023)

2023

[6] [6]

Current opinion in biotechnology45, 76–84 (2017)

Bayat, B., Crasta, N., Crespi, A., Pascoal, A.M., Ijspeert, A.: Environmental monitoring using autonomous vehicles: a survey of recent searching techniques. Current opinion in biotechnology45, 76–84 (2017)

2017

[7] [7]

IEEE Robotics & Automation Magazine19(1), 24–39 (2012)

Dunbabin, M., Marques, L.: Robots for environmental monitoring: Significant advancements and applications. IEEE Robotics & Automation Magazine19(1), 24–39 (2012)

2012

[8] [8]

Autonomous Robots47(1), 77–93 (2023)

Horyna, J., Baca, T., Walter, V., Albani, D., Hert, D., Ferrante, E., Saska, M.: Decentralized swarms of unmanned aerial vehicles for search and rescue operations without explicit communication. Autonomous Robots47(1), 77–93 (2023)

2023

[9] [9]

Science robotics3(14), 7650 (2018)

Yang, G.-Z., Bellingham, J., Dupont, P.E., Fischer, P., Floridi, L., Full, R., Jacob- stein, N., Kumar, V., McNutt, M., Merrifield, R.,et al.: The grand challenges of science robotics. Science robotics3(14), 7650 (2018)

2018

[10] [10]

Science robotics5(49), 4385 (2020)

Dorigo, M., Theraulaz, G., Trianni, V.: Reflections on the future of swarm robotics. Science robotics5(49), 4385 (2020)

2020

[11] [11]

In: Proceedings of the 14th Annual Conference on Computer Graphics and Interactive Techniques, pp

Reynolds, C.W.: Flocks, herds and schools: A distributed behavioral model. In: Proceedings of the 14th Annual Conference on Computer Graphics and Interactive Techniques, pp. 25–34 (1987)

1987

[12] [12]

Physical review letters75(6), 1226 29 (1995)

Vicsek, T., Czirók, A., Ben-Jacob, E., Cohen, I., Shochet, O.: Novel type of phase transition in a system of self-driven particles. Physical review letters75(6), 1226 29 (1995)

1995

[13] [13]

Nature433(7025), 513–516 (2005)

Couzin, I.D., Krause, J., Franks, N.R., Levin, S.A.: Effective leadership and decision-making in animal groups on the move. Nature433(7025), 513–516 (2005)

2005

[14] [14]

In: Proceedings of the 9th Conference on Autonomous Robot Systems and Competitions, vol

Mondada, F., Bonani, M., Raemy, X., Pugh, J., Cianci, C., Klaptocz, A., Mag- nenat, S., Zufferey, J.-C., Floreano, D., Martinoli, A.,et al.: The e-puck, a robot designed for education in engineering. In: Proceedings of the 9th Conference on Autonomous Robot Systems and Competitions, vol. 1, pp. 59–65 (2009). Castelo Branco: IPCB, Instituto Politécnico de ...

2009

[15] [15]

Science345(6198), 795–799 (2014)

Rubenstein, M., Cornejo, A., Nagpal, R.: Programmable self-assembly in a thousand-robot swarm. Science345(6198), 795–799 (2014)

2014

[16] [16]

Foundations and Trends®in Robotics7(1-2), 1–179 (2018)

Osa, T., Pajarinen, J., Neumann, G., Bagnell, J.A., Abbeel, P., Peters, J.,et al.: An algorithmic perspective on imitation learning. Foundations and Trends®in Robotics7(1-2), 1–179 (2018)

2018

[17] [17]

Adaptive computation and machine learning

Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction, 2nd edn. Adaptive computation and machine learning. The MIT Press, Cambridge (2018)

2018

[18] [18]

Frontiers in Robotics and AI10, 1134841 (2023)

Kuckling, J.: Recent trends in robot learning and evolution for swarm robotics. Frontiers in Robotics and AI10, 1134841 (2023)

2023

[19] [19]

Science Robotics4(26), 5872 (2019)

Hwangbo, J., Lee, J., Dosovitskiy, A., Bellicoso, D., Tsounis, V., Koltun, V., Hutter, M.: Learning agile and dynamic motor skills for legged robots. Science Robotics4(26), 5872 (2019)

2019

[20] [20]

Science Robotics7(62), 2822 (2022)

Miki, T., Lee, J., Hwangbo, J., Wellhausen, L., Koltun, V., Hutter, M.: Learn- ing robust perceptive locomotion for quadrupedal robots in the wild. Science Robotics7(62), 2822 (2022)

2022

[21] [21]

IEEE Robotics and Automation Letters6(3), 4257–4264 (2021)

Fuchs, F., Song, Y., Kaufmann, E., Scaramuzza, D., Dürr, P.: Super-human per- formance in gran turismo sport using deep reinforcement learning. IEEE Robotics and Automation Letters6(3), 4257–4264 (2021)

2021

[22] [22]

Nature 602(7896), 223–228 (2022)

Wurman, P.R., Barrett, S., Kawamoto, K., MacGlashan, J., Subramanian, K., Walsh, T.J., Capobianco, R., Devlic, A., Eckert, F., Fuchs, F.,et al.: Outrac- ing champion gran turismo drivers with deep reinforcement learning. Nature 602(7896), 223–228 (2022)

2022

[23] [23]

Nature 620(7976), 982–987 (2023)

Kaufmann,E.,Bauersfeld,L.,Loquercio,A.,Müller,M.,Koltun,V.,Scaramuzza, D.: Champion-level drone racing using deep reinforcement learning. Nature 620(7976), 982–987 (2023)

2023

[24] [24]

Science Robotics8(82), 1462 (2023)

Song, Y., Romero, A., Müller, M., Koltun, V., Scaramuzza, D.: Reaching the limit 30 in autonomous racing: Optimal control versus reinforcement learning. Science Robotics8(82), 1462 (2023)

2023

[25] [25]

Huh, D., Mohapatra, P.: Multi-agent Reinforcement Learning: A Comprehensive Survey. arXiv. arXiv:2312.10256 [cs] (2023). http://arxiv.org/abs/2312.10256 Accessed 2024-03-10

arXiv 2023

[26] [26]

Sensors23(7), 3625 (2023) https://doi.org/10.3390/ s23073625

Orr, J., Dutta, A.: Multi-Agent Deep Reinforcement Learning for Multi-Robot Applications: A Survey. Sensors23(7), 3625 (2023) https://doi.org/10.3390/ s23073625 . Accessed 2024-10-08

2023

[27] [27]

Artificial Intelligence Review57(2), 41 (2024) https://doi.org/10

Chung, J., Fayyad, J., Younes, Y.A., Najjaran, H.: Learning team-based nav- igation: a review of deep reinforcement learning techniques for multi-agent pathfinding. Artificial Intelligence Review57(2), 41 (2024) https://doi.org/10. 1007/s10462-023-10670-6 . Accessed 2024-10-08

2024

[28] [28]

Current Robotics Reports 3(4), 239–257 (2022) https://doi.org/10.1007/s43154-022-00091-8

Wang, Y., Damani, M., Wang, P., Cao, Y., Sartoretti, G.: Distributed Rein- forcement Learning for Robot Teams: a Review. Current Robotics Reports 3(4), 239–257 (2022) https://doi.org/10.1007/s43154-022-00091-8 . Accessed 2024-03-14

work page doi:10.1007/s43154-022-00091-8 2022

[29] [29]

Basic Books, New York (1954)

Piaget, J.: The Construction of Reality in the Child. Basic Books, New York (1954)

1954

[30] [30]

Perspectives on Psychological Science3(1), 2–13 (2008) https: //doi.org/10.1111/j.1745-6916.2008.00056.x

Baillargeon, R.: Innate ideas revisited: For a principle of persistence in infants’ physical reasoning. Perspectives on Psychological Science3(1), 2–13 (2008) https: //doi.org/10.1111/j.1745-6916.2008.00056.x

work page doi:10.1111/j.1745-6916.2008.00056.x 2008

[31] [31]

arXiv preprint arXiv:2502.11831 (2025)

Garrido,Q.,Ballas,N.,Assran,M.,Bardes,A.,Najman,L.,Rabbat,M.,Dupoux, E., LeCun, Y.: Intuitive physics understanding emerges from self-supervised pretraining on natural videos. arXiv preprint arXiv:2502.11831 (2025)

arXiv 2025

[32] [32]

Trends in cognitive sciences21(9), 649–665 (2017)

Ullman, T.D., Spelke, E., Battaglia, P., Tenenbaum, J.B.: Mind games: Game engines as an architecture for intuitive physics. Trends in cognitive sciences21(9), 649–665 (2017)

2017

[33] [33]

20668–20696 (2022)

Suh, H.J., Simchowitz, M., Zhang, K., Tedrake, R.: Do differentiable simulators give better policy gradients? In: International Conference on Machine Learning, pp. 20668–20696 (2022). PMLR

2022

[34] [34]

In: International Conference on Learning Representations (2021)

Xu, J., Makoviychuk, V., Narang, Y., Ramos, F., Matusik, W., Garg, A., Mack- lin, M.: Accelerated policy learning with parallel differentiable simulation. In: International Conference on Learning Representations (2021)

2021

[35] [35]

Science Robotics3(20), 3536 (2018) 31

Vásárhelyi, G., Virágh, C., Somorjai, G., Nepusz, T., Eiben, A.E., Vicsek, T.: Optimized flocking of autonomous drones in confined environments. Science Robotics3(20), 3536 (2018) 31

2018

[36] [36]

Science Robotics 7(66), 5954 (2022)

Zhou, X., Wen, X., Wang, Z., Gao, Y., Li, H., Wang, Q., Yang, T., Lu, H., Cao, Y., Xu, C.,et al.: Swarm of micro flying robots in the wild. Science Robotics 7(66), 5954 (2022)

2022

[37] [37]

Springer, Berlin, Heidelberg (2008)

Trianni, V.: Evolutionary Swarm Robotics: Evolving Self-organising Behaviours in Groups of Autonomous Robots. Springer, Berlin, Heidelberg (2008)

2008

[38] [38]

Swarm Intelligence8(2), 89–112 (2014)

Francesca, G., Brambilla, M., Brutschy, A., Trianni, V., Birattari, M.: Automode: A novel approach to the automatic design of control software for robot swarms. Swarm Intelligence8(2), 89–112 (2014)

2014

[39] [39]

SN Computer Science3(2), 136 (2022)

Kuckling, J., Van Pelt, V., Birattari, M.: Automode-cedrata: automatic design of behavior trees for controlling a swarm of robots with communication capabilities. SN Computer Science3(2), 136 (2022)

2022

[40] [40]

In: Exper- iments with the Mini-Robot Khepera, Proceedings of the First International Khepera Workshop, vol

Mondada, F., Franzi, E., Guignard, A.: The development of khepera. In: Exper- iments with the Mini-Robot Khepera, Proceedings of the First International Khepera Workshop, vol. 1, pp. 7–14 (1999). sn

1999

[41] [41]

The International Journal of Robotics Research33(8), 1145–1161 (2014)

Gauci, M., Chen, J., Li, W., Dodd, T.J., Groß, R.: Self-organized aggregation without computation. The International Journal of Robotics Research33(8), 1145–1161 (2014)

2014

[42] [42]

In: European Conference on Artificial Life, pp

Garnier, S., Jost, C., Jeanson, R., Gautrais, J., Asadpour, M., Caprari, G., Ther- aulaz, G.: Aggregation behaviour as a source of collective decision in a group of cockroach-like-robots. In: European Conference on Artificial Life, pp. 169–178 (2005). Springer

2005

[43] [43]

PloS one11(3), 0151834 (2016)

Duarte, M., Costa, V., Gomes, J., Rodrigues, T., Silva, F., Oliveira, S.M., Chris- tensen, A.L.: Evolution of collective behaviors for a real swarm of aquatic surface robots. PloS one11(3), 0151834 (2016)

2016

[44] [44]

In: OCEANS 2018 MTS/IEEE Charleston, pp

Vallegra, F., Mateo, D., Tokić, G., Bouffanais, R., Yue, D.K.: Gradual collective upgrade of a swarm of autonomous buoys for dynamic ocean monitoring. In: OCEANS 2018 MTS/IEEE Charleston, pp. 1–7 (2018). IEEE

2018

[45] [45]

In: International Conference on Learning Representations (2019)

Baker, B., Kanitscheider, I., Markov, T., Wu, Y., Powell, G., McGrew, B., Mor- datch, I.: Emergent tool use from multi-agent autocurricula. In: International Conference on Learning Representations (2019)

2019

[46] [46]

OpenAI, Berner, C., Brockman, G., Chan, B., Cheung, V., Dębiak, P., Dennison, C., Farhi, D., Fischer, Q., Hashme, S., Hesse, C., Józefowicz, R., Gray, S., Olsson, C., Pachocki, J., Petrov, M., Oliveira Pinto, H.P., Raiman, J., Salimans, T., Schlatter, J., Schneider, J., Sidor, S., Sutskever, I., Tang, J., Wolski, F., Zhang, S.: Dota 2 with large scale dee...

Pith/arXiv arXiv 2019

[47] [47]

Science Robotics7(69), 0235 (2022)

Liu, S., Lever, G., Wang, Z., Merel, J., Eslami, S.A., Hennes, D., Czarnecki, 32 W.M., Tassa, Y., Omidshafiei, S., Abdolmaleki, A.,et al.: From motor control to team play in simulated humanoid football. Science Robotics7(69), 0235 (2022)

2022

[48] [48]

IEEE Robotics and Automation Letters2(2), 656–663 (2017)

Long, P., Liu, W., Pan, J.: Deep-learned collision avoidance policy for distributed multiagent navigation. IEEE Robotics and Automation Letters2(2), 656–663 (2017)

2017

[49] [49]

Robot motion planning in learned latent spaces,

Sartoretti, G., Kerr, J., Shi, Y., Wagner, G., Kumar, T.K.S., Koenig, S., Choset, H.: PRIMAL: Pathfinding via Reinforcement and Imitation Multi-Agent Learn- ing. IEEE Robotics and Automation Letters4(3), 2378–2385 (2019) https: //doi.org/10.1109/LRA.2019.2903261 . Accessed 2024-03-14

work page doi:10.1109/lra.2019.2903261 2019

[50] [50]

In: Proceedings of the 6th International Conference on Learning Representations

Bansal, T., Pachocki, J., Sidor, S., Sutskever, I., Mordatch, I.: EMERGENT COMPLEXITY VIA MULTI-AGENT COMPETITION. In: Proceedings of the 6th International Conference on Learning Representations. Conference Track Proceedings, Vancouver, BC, Canada (2018)

2018

[51] [51]

In: Thirty- fifth Conference on Neural Information Processing Systems Datasets and Bench- marks Track (Round 1) (2021).https://openreview.net/forum?id=VdvDlnnjzIN

Freeman, C.D., Frey, E., Raichuk, A., Girgin, S., Mordatch, I., Bachem, O.: Brax - a differentiable physics engine for large scale rigid body simulation. In: Thirty- fifth Conference on Neural Information Processing Systems Datasets and Bench- marks Track (Round 1) (2021).https://openreview.net/forum?id=VdvDlnnjzIN

2021

[52] [52]

arXiv preprint arXiv:2203.00806 (2022)

Howell, T., Le Cleac’h, S., Bruedigam, J., Kolter, Z., Schwager, M., Manchester, Z.: Dojo: A differentiable simulator for robotics. arXiv preprint arXiv:2203.00806 (2022)

arXiv 2022

[53] [53]

ICLR (2022)

Ren, J., Yu, C., Chen, S., Ma, X., Pan, L., Liu, Z.: Diffmimic: Efficient motion mimicking with differentiable physics. ICLR (2022)

2022

[54] [54]

Nature (2025) https://doi.org/10.1038/ s41586-025-08744-2

Hafner, D., Pasukonis, J., Ba, J., Lillicrap, T.: Mastering diverse con- trol tasks through world models. Nature (2025) https://doi.org/10.1038/ s41586-025-08744-2

2025

[55] [55]

In: Conference on Robot Learning, pp

Wu, P., Escontrela, A., Hafner, D., Abbeel, P., Goldberg, K.: Daydreamer: World models for physical robot learning. In: Conference on Robot Learning, pp. 2226– 2240 (2023). PMLR

2023

[56] [56]

In: ICML (2022)

Hansen, N., Wang, X., Su, H.: Temporal difference learning for model predictive control. In: ICML (2022)

2022

[57] [57]

arXiv preprint arXiv:2501.10100 (2025)

Li, C., Krause, A., Hutter, M.: Robotic world model: A neural network simula- tor for robust policy optimization in robotics. arXiv preprint arXiv:2501.10100 (2025)

arXiv 2025

[58] [58]

In: 8th Annual Conference on Robot Learning (2024)

Song, Y., Kim, S., Scaramuzza, D.: Learning quadruped locomotion using dif- ferentiable simulation. In: 8th Annual Conference on Robot Learning (2024). https://openreview.net/forum?id=XopATjibyz 33

2024

[59] [59]

In: 2023 IEEE Inter- national Conference on Robotics and Automation (ICRA), pp

Wiedemann, N., Wüest, V., Loquercio, A., Müller, M., Floreano, D., Scaramuzza, D.: Training efficient controllers via analytic policy gradient. In: 2023 IEEE Inter- national Conference on Robotics and Automation (ICRA), pp. 1349–1356 (2023). IEEE

2023

[60] [60]

Nature Machine Intelligence, 1–13 (2025)

Zhang, Y., Hu, Y., Song, Y., Zou, D., Lin, W.: Learning vision-based agile flight via differentiable physics. Nature Machine Intelligence, 1–13 (2025)

2025

[61] [61]

arXiv preprint arXiv:1707.06347 (2017)

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)

Pith/arXiv arXiv 2017

[62] [62]

Makoviychuk, V., Wawrzyniak, L., Guo, Y., Lu, M., Storey, K., Macklin, M., Hoeller, D., Rudin, N., Allshire, A., Handa, A., State, G.: Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning (2021)

2021

[63] [63]

In: Bengio, Y., LeCun, Y

Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015).http://arxiv.org/abs/1412.6980

Pith/arXiv arXiv 2015

[64] [64]

MIT Press, Cambridge, Mass

Raibert, M.H.: Legged Robots that Balance. MIT Press, Cambridge, Mass. (1986) 34 Supplementary Materials Table of Contents The supplementary information in this document includes: Supplementary Sections 1-3 Supplementary Tables 1-2 Other supplementary information includes: Supplementary Video 1 Supplementary Tables Table 1: Training Hyperparameters for Lo...

1986