pith. sign in

arxiv: 2606.26154 · v1 · pith:5LQBKKALnew · submitted 2026-06-23 · 💻 cs.RO · cs.LG· physics.bio-ph

Reinforcement Learning Enables Autonomous Microrobot Navigation and Intervention in Simulated Blood Capillaries

Pith reviewed 2026-06-26 01:25 UTC · model grok-4.3

classification 💻 cs.RO cs.LGphysics.bio-ph
keywords reinforcement learningmicrorobot navigationblood capillarieschemotaxisflow interventionsimulationred blood cells
0
0 comments X

The pith

Reinforcement learning agents discover universal strategies to navigate and intervene in simulated blood capillary networks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds a detailed simulation of blood capillary networks that includes realistic flow patterns, moving red blood cells, and natural branching shapes. It then uses deep reinforcement learning to train agents that steer microrobots toward chemical signals inside this environment. The trained agents develop the same movement patterns across different robot sizes and speeds, such as alternating runs with turns or staying still to conserve energy. These agents can then block or unblock specific capillary sections to adjust blood flow back to normal levels without any extra training. The work demonstrates that reinforcement learning can produce control policies for microrobots in environments that match real biological complexity more closely than earlier simplified models.

Core claim

We develop a physically grounded simulation of a blood capillary network, incorporating realistic hydrodynamic flow fields, explicit red blood cell dynamics, and anatomically derived branching geometry, and train deep RL agents to navigate it via chemotaxis. Successful agents independently discover multiple universal strategy types, including run-and-rotate and energy-efficient search-and-sit policies, regardless of robot parameters. Without retraining, these agents perform targeted blocking and unblocking of capillary flow, restoring throughput to healthy baseline levels.

What carries the argument

Deep RL agents trained via chemotaxis inside a simulation that includes hydrodynamic flow fields, explicit red blood cell dynamics, and branching geometry.

If this is right

  • Navigation succeeds only outside a forbidden regime where Brownian motion and flow overpower the robot's propulsion.
  • The same agents achieve targeted flow control that returns capillary throughput to baseline levels.
  • Strategy types emerge independently of specific robot size and swimming speed.
  • Reinforcement learning supplies a workable method for autonomous microrobotic intervention in complex biological settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the simulation matches real vessels, the same training process could produce policies for physical microrobots performing drug delivery or clot removal.
  • The appearance of parameter-independent strategies suggests reinforcement learning can locate robust behaviors in other variable fluid systems with obstacles.
  • One set of trained agents handling both navigation and flow intervention implies potential for multi-task microrobotic control in a single training run.

Load-bearing premise

The simulation accurately reproduces the hydrodynamic flow fields, red blood cell dynamics, and branching geometry of real in vivo blood capillaries such that navigation and intervention results will carry over to physical systems.

What would settle it

Testing the trained agents inside a physical microfluidic device that reproduces capillary flow, cell motion, and branches, then measuring whether navigation success rates and flow restoration match the simulation predictions.

Figures

Figures reproduced from arXiv: 2606.26154 by Christian Holm, Christoph Lohrmann, Jannik Drotleff, Julian Ho{\ss}bach, Konstantin Nikolaou, Paul Hohenberger, Samuel Tovey.

Figure 1
Figure 1. Figure 1: Simulated capillary environment. a) Capillary geometry and Lattice-Boltzmann flow field, derived from an anatomical capillary illustration [30]. b) Static concentration field for a chemical source at ⃗rs = (119 µm, 175 µm)T , constrained by capillary walls. the learning dynamics. The cumulative training reward (Fig. 2b), which indicates how efficiently the model learned the required policy, shows the same … view at source ↗
Figure 2
Figure 2. Figure 2: Navigation performance across robot parameters. a) Probability of successful chemotaxis (opacity scales with success rate); success requires ≥ 8 of 10 robots reaching within 20 µm of the source. b) Cumulative reward across all 30 training runs per parameter combination. c) Mean equilibrium distance from the source. d) Mean time to reach the equilibrium distance. universal and independent of the robot’s phy… view at source ↗
Figure 3
Figure 3. Figure 3: Strategy analysis. a) t-SNE embedding of learned policies colored by robot radius. b) Same embedding colored by k-means cluster assignment (k = 11, selected by silhouette analysis [36]). c–h) Mean action probability distributions (bold lines) and standard deviation (shaded) for the six largest clusters as a function of sensed concentration change. Positive values indicate motion toward the source; amplitud… view at source ↗
Figure 4
Figure 4. Figure 4: Targeted capillary intervention. a) Blocking task setup: chemical sources (red crosses) placed at y = 90 µm in each vessel branch. b) Measured leak rate versus number of agents for varying robot radii; larger agents achieve full occlusion with fewer robots. c,d) Vertical unblock￾ing setup and normalized leak rates showing near-complete flow restoration (green) relative to free flow (blue) and blocked basel… view at source ↗
Figure 5
Figure 5. Figure 5: Péclet analysis of the navigation phase space. Translational and rotational Péclet boundaries (blue: water viscosity, orange: blood plasma viscosity) overlaid on the training success data. The forbidden regime broadly aligns with the diffusion-dominated Péclet regime, though the capillary flow field shifts the boundary relative to the pure Brownian case. The forbidden regime is reminiscent of the Péclet re… view at source ↗
Figure 6
Figure 6. Figure 6: Cross-parameter model transfer. A single successfully trained policy was deployed across all robot sizes and speeds (30 simulations per combination). Color indicates the probability of successful chemotaxis (≥8 of 10 agents within 20 µm of the source). The persistent forbidden region confirms a fundamental physical boundary rather than an algorithmic limitation. 17 [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Remaining strategy clusters. Mean action probability distributions (bold lines) and standard deviation (shaded) for clusters 7–11, complementing the six largest clusters shown in the main text. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: t-SNE embedding colored by swimming speed. The color indicates swim speed in body lengths per second. Robots of all speeds appear across clusters, confirming that emergent strategies are velocity-independent. 19 [PITH_FULL_IMAGE:figures/full_fig_p019_8.png] view at source ↗
read the original abstract

Autonomous microrobots navigating biological vasculature could enable targeted drug delivery and thrombolysis, yet training control policies for realistic environments remains an open challenge. Prior reinforcement learning (RL) studies of microrobotic navigation have been limited to idealized geometries that omit complex hydrodynamic flow fields, confined branching structures, and dense cellular obstacles found in vivo. Here, we develop a physically grounded simulation of a blood capillary network, incorporating realistic hydrodynamic flow fields, explicit red blood cell dynamics, and anatomically derived branching geometry, and train deep RL agents to navigate it via chemotaxis. We systematically map the physical limits of navigation across robot size and swimming speed, revealing a forbidden regime where Brownian motion and flow overcome propulsion. Successful agents independently discover multiple universal strategy types, including run-and-rotate and energy-efficient search-and-sit policies, regardless of robot parameters. Without retraining, these agents perform targeted blocking and unblocking of capillary flow, restoring throughput to healthy baseline levels. These results establish RL as a viable framework for developing autonomous microrobotic intervention strategies in complex biological environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The manuscript develops a physically grounded simulation of blood capillary networks incorporating realistic hydrodynamic flow fields, explicit red blood cell dynamics, and anatomically derived branching geometry. Deep RL agents are trained via chemotaxis to navigate this environment. The work systematically maps navigation limits across robot size and swimming speed, identifies a forbidden regime dominated by Brownian motion and flow, reports that successful agents discover multiple universal strategy types (run-and-rotate, energy-efficient search-and-sit) independent of robot parameters, and shows that these agents can perform targeted blocking/unblocking of capillary flow to restore throughput to healthy baseline levels without retraining.

Significance. If the simulation fidelity holds, the results would establish RL as a viable approach for autonomous microrobotic control in complex biological environments, addressing gaps in prior work limited to idealized geometries. The discovery of parameter-independent strategies and their direct application to flow restoration without retraining is a notable strength, with potential implications for targeted drug delivery and thrombolysis. The explicit inclusion of cellular obstacles and flow fields strengthens the environmental realism relative to earlier studies.

major comments (2)
  1. [Abstract and simulation setup] The central claims of universal strategy discovery and flow restoration to healthy baseline levels rest on the assumption that the simulation reproduces in vivo capillary hydrodynamics, RBC dynamics, and branching geometry with sufficient accuracy for navigation limits and intervention outcomes to translate. The manuscript states that it incorporates realistic hydrodynamics and explicit RBCs with anatomically derived geometry but supplies no quantitative benchmarking (e.g., matching measured capillary velocities, pressure drops, or cell deformation statistics from in vivo studies). This validation gap is load-bearing for the strongest claims.
  2. [Results on navigation and intervention] Navigation success rates and throughput restoration metrics are reported without error bars, confidence intervals, or details on the number of independent simulation runs or statistical tests, undermining assessment of whether the discovered strategies reliably outperform baselines or generalize beyond the specific training conditions.
minor comments (3)
  1. [Abstract] The abstract refers to 'anatomically derived branching geometry' without citing the specific anatomical data source or describing the implementation details (e.g., vessel diameters, bifurcation angles).
  2. [Methods] Clarify how the chemotaxis signal is modeled in the observation space and reward function, including any assumptions about chemical gradient sensing in the presence of flow and RBCs.
  3. [Results] The description of the 'forbidden regime' would benefit from an explicit parameter map or phase diagram showing the boundary in robot size vs. swimming speed space.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address the two major comments below and will revise the manuscript accordingly to improve clarity and rigor.

read point-by-point responses
  1. Referee: [Abstract and simulation setup] The central claims of universal strategy discovery and flow restoration to healthy baseline levels rest on the assumption that the simulation reproduces in vivo capillary hydrodynamics, RBC dynamics, and branching geometry with sufficient accuracy for navigation limits and intervention outcomes to translate. The manuscript states that it incorporates realistic hydrodynamics and explicit RBCs with anatomically derived geometry but supplies no quantitative benchmarking (e.g., matching measured capillary velocities, pressure drops, or cell deformation statistics from in vivo studies). This validation gap is load-bearing for the strongest claims.

    Authors: We agree that explicit quantitative benchmarking against in vivo measurements would strengthen the claims. The simulation parameters were selected from established literature values for capillary flow, RBC properties, and vessel geometry, but direct side-by-side comparisons to specific experimental datasets were not included. In the revised manuscript we will add a new subsection in Methods that tabulates all parameter sources with citations and provides available comparisons to in vivo data (e.g., mean flow velocities and pressure drops). We will also expand the Discussion to explicitly state the model limitations and the conditions under which the navigation and intervention results are expected to translate. revision: yes

  2. Referee: [Results on navigation and intervention] Navigation success rates and throughput restoration metrics are reported without error bars, confidence intervals, or details on the number of independent simulation runs or statistical tests, undermining assessment of whether the discovered strategies reliably outperform baselines or generalize beyond the specific training conditions.

    Authors: We acknowledge the omission of statistical reporting. The original results were obtained from multiple independent training runs, but these details and variability measures were not reported. In the revision we will (i) state the exact number of independent runs (N=10) used for all metrics, (ii) add error bars (standard deviation) and 95% confidence intervals to all success-rate and throughput plots, and (iii) include appropriate statistical tests (e.g., Welch’s t-test) when comparing agent performance against baselines. revision: yes

Circularity Check

0 steps flagged

No circularity: results are simulation outcomes, not reductions to inputs

full rationale

The paper trains RL agents in a custom hydrodynamic simulation and reports their discovered behaviors and intervention performance as direct outputs of those training runs. No equations, fitted parameters, or self-citations are invoked such that any reported success metric or strategy type reduces by construction to the training data or prior author work. The central claims rest on independent simulation executions rather than self-definitional mappings or renamed fits. External validity of the simulation is an assumption but does not create internal circularity in the reported derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no explicit free parameters, axioms, or invented entities are stated. The central claims rest on the unstated premise that the simulation physics are sufficiently accurate, but this cannot be audited from the provided text.

pith-pipeline@v0.9.1-grok · 5747 in / 1150 out tokens · 20240 ms · 2026-06-26T01:25:21.415273+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

60 extracted references · 3 linked inside Pith

  1. [1]

    & Yang, G.- Z

    An, Y., He, B., Ma, Z., Guo, Y. & Yang, G.- Z. Microassembly: A review on fundamen- tals, applications and recent developments. Engineering48, 323–346 (2025)

  2. [2]

    Physical Review E109, 065106 (2024)

    Liao, C.-T.et al.Propulsion of a three- sphere microrobot in a porous medium. Physical Review E109, 065106 (2024)

  3. [3]

    & Holm, C

    Lohrmann, C. & Holm, C. Optimal motil- ity strategies for self-propelled agents to ex- plore porous media.Physical Review E108, 054401 (2023)

  4. [4]

    G., Raj, R

    Lee, J. G., Raj, R. R., Day, N. B. & Shields, C. W. I. Microrobots for biomedicine: Unsolved challenges and opportunities for translation.ACS Nano17, 14196–14204 (2023)

  5. [5]

    Lv, S., Tang, L. V. & Hu, Y. Application of nanotechnology and micro/nanorobots in thromboticdiseases.EngMedicine2, 100061 (2025)

  6. [6]

    Journal of Thrombosis and Haemostasis12, 1580–1590 (2014)

    Raskob, G.et al.Thrombosis: A major contributor to the global disease burden. Journal of Thrombosis and Haemostasis12, 1580–1590 (2014)

  7. [7]

    Schuerle, S.et al.Synthetic and liv- ing micropropellers for convection-enhanced nanoparticle transport.Science Advances5, eaav4803 (2019)

  8. [8]

    & Zhang, L

    Yu, J., Yang, L. & Zhang, L. Pattern gen- eration and motion control of a vortex-like paramagnetic nanoparticle swarm.The In- ternational Journal of Robotics Research37, 912–930 (2018). 12

  9. [9]

    & Zhang, L

    Jiang, J., Yang, L. & Zhang, L. DQN-based on-line path planning method for automatic navigation of miniature robots. In2023 IEEE International Conference on Robotics and Automation (ICRA), 5407–5413 (2023)

  10. [10]

    C.et al.Clinically ready mag- netic microrobots for targeted therapies

    Landers, F. C.et al.Clinically ready mag- netic microrobots for targeted therapies. Science390, 710–715 (2025)

  11. [11]

    Dreyfus, R.et al.Microscopic artificial swimmers.Nature437, 862–865 (2005)

  12. [12]

    Journal of Magnetism and Magnetic Mate- rials272–276, E1741–E1742 (2004)

    Yamazaki, A.et al.Wireless micro swim- ming machine with magnetic thin film. Journal of Magnetism and Magnetic Mate- rials272–276, E1741–E1742 (2004)

  13. [13]

    Zhang, L.et al.Characterizing the swim- ming properties of artificial bacterial flag- ella.Nano Letters9, 3663–3667 (2009)

  14. [14]

    & Qian, K

    Su, H., Li, S., Yang, G.-Z. & Qian, K. Janus micro/nanorobots in biomedical ap- plications.Advanced Healthcare Materials 12, 2202391 (2023)

  15. [15]

    & Sano, M

    Jiang, H.-R., Yoshinaga, N. & Sano, M. Active motion of a Janus particle by self- thermophoresis in a defocused laser beam. Physical Review Letters105, 268302 (2010)

  16. [16]

    M.et al.Microscopic robots that sense, think, act, and compute.Science Robotics10, eadu8009 (2025)

    Lassiter, M. M.et al.Microscopic robots that sense, think, act, and compute.Science Robotics10, eadu8009 (2025)

  17. [17]

    & Huang, Z

    Cai, W., Wang, G., Zhang, Y., Qu, X. & Huang, Z. Reinforcement learning for ac- tive matter.Biophysics Reviews6, 031302 (2025)

  18. [18]

    & Biferale, L

    Colabrese, S., Gustavsson, K., Celani, A. & Biferale, L. Flow navigation by smart microswimmers via reinforcement learning. Physical Review Letters118, 158004 (2017)

  19. [19]

    & Liebchen, B

    Nasiri, M. & Liebchen, B. Reinforcement learning of optimal active particle naviga- tion.New Journal of Physics24, 073042 (2022)

  20. [20]

    & Cichos, F

    Muiños-Landin, S., Fischer, A., Holubec, V. & Cichos, F. Reinforcement learning with artificial microswimmers.Science Robotics 6, eabd9285 (2021)

  21. [21]

    Xiong, T., Liu, Z., Wang, Y., Ong, C. J. & Zhu, L. Chemotactic navigation in robotic swimmers via reset-free hierarchical rein- forcementlearning.Nature Communications 16, 5441 (2025)

  22. [22]

    Tovey, S.et al.Environmental effects on emergent strategy in micro-scale multi- agent reinforcement learning (2023).2307. 00994

  23. [23]

    Berg, H. C. & Brown, D. A. Chemo- taxis in escherichia coli analysed by three- dimensional tracking.Nature239, 500–504 (1972)

  24. [24]

    & Larson, R

    Watari, N. & Larson, R. G. The hydrody- namics of a run-and-tumble bacterium pro- pelled by polymorphic helical flagella.Bio- physical journal98, 12–17 (2010)

  25. [25]

    C., Turner, L., Rojevsky, S

    Darnton, N. C., Turner, L., Rojevsky, S. & Berg, H. C. On torque and tumbling in swimming escherichia coli.Journal of Bac- teriology189, 1756–1764 (2007)

  26. [26]

    & Holm, C

    Tovey, S., Lohrmann, C. & Holm, C. Emergence of chemotactic strategies with multi-agent reinforcement learning.Ma- chine Learning: Science and Technology5, 035054 (2024)

  27. [27]

    Tovey, S.et al.SwarmRL: Building the fu- ture of smart active systems.The European Physical Journal E48, 16 (2025)

  28. [28]

    Freund, J. B. Numerical simulation of flow- ing blood cells.Annual Review of Fluid Me- chanics46, 67–95 (2014)

  29. [29]

    Marsden, A. L. & Esmaily-Moghadam, M. Multiscale modeling of cardiovascular flows for clinical decision support.Applied Me- chanics Reviews67(2015). 13

  30. [30]

    Blood vessel | definition, anatomy, function, & types | britannica

    Britannica, E. Blood vessel | definition, anatomy, function, & types | britannica. https://www.britannica.com/science/blood- vessel (2026)

  31. [31]

    G.et al.Ch

    Betts, J. G.et al.Ch. 1 introduc- tion - anatomy and physiology | Open- Stax. https://assets.openstax.org/oscms- prodcms/media/documents/anatomy-and- physiology-2e_-_WEB.pdf (2013)

  32. [32]

    Bird, R., Stewart, W.&Lightfoot, E.Trans- port Phenomena(John Wiley and Sons, New York, 2002), 2 edn

  33. [33]

    & Hinton, G

    van der Maaten, L. & Hinton, G. Visualiz- ing Data using t-SNE.Journal of Machine Learning Research9, 2579–2605 (2008)

  34. [34]

    Pedregosa, F.et al.Scikit-learn: Machine learning in python.Journal of Machine Learning Research12, 2825–2830 (2011)

  35. [35]

    Least squares quantization in PCM.IEEE Transactions on Information Theory28, 129–137 (1982)

    Lloyd, S. Least squares quantization in PCM.IEEE Transactions on Information Theory28, 129–137 (1982)

  36. [36]

    Rousseeuw, P. J. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis.Journal of Computational and Applied Mathematics20, 53–65 (1987)

  37. [37]

    & Gar- cia, H

    Phillips, R., Kondev, J., Theriot, J. & Gar- cia, H. What and where: Constructionplans for cells and organisms. InPhysical Biol- ogy of the Cell, 2, 68 (Garland Science, New York, 2012), 2 edn

  38. [38]

    Kannojiya, V., Das, A. K. & Das, P. K. Sim- ulation of blood as fluid: A review from rhe- ological aspects.IEEE Reviews in Biomed- ical Engineering14, 327–341 (2021)

  39. [39]

    & Fasano, A

    Ascolese, M., Farina, A. & Fasano, A. The Fåhræus-Lindqvist effect in small blood ves- sels: How does it help the heart?Journal of Biological Physics45, 379–394 (2019)

  40. [40]

    P., Tra- chsler, H

    Bollinger, A., Butti, P., Barras, J. P., Tra- chsler, H. & Siegenthaler, W. Red blood cell velocity in nailfold capillaries of man mea- sured by a television microscopy technique. Microvascular Research7, 61–72 (1974)

  41. [41]

    Hudetz, A. G. Blood flow in the cerebral capillary network: A review emphasizing observations with intravital microscopy.Mi- crocirculation4, 233–252 (1997)

  42. [42]

    2510.18107

    Jarolímová, A.et al.In vivo evidence of blood flow slippage: Failure of the no- slip boundary condition assumption (2025). 2510.18107

  43. [43]

    Koutsiaris, A. G. & Pogiatzi, A. Velocity pulse measurements in the mesenteric arte- rioles of rabbits.Physiological Measurement 25, 15 (2003)

  44. [44]

    Nader, E.et al.Bloodrheology: Keyparam- eters, impact on blood flow, role in sickle cell disease and effects of exercise.Frontiers in Physiology10(2019)

  45. [45]

    URLhttp://github.com/google/ jax

    Bradbury, J.et al.JAX: composable trans- formations of Python+NumPy programs (2018). URLhttp://github.com/google/ jax

  46. [46]

    URL http://github.com/google/flax

    Heek, J.et al.Flax: A neural network li- brary and ecosystem for JAX (2023). URL http://github.com/google/flax

  47. [47]

    Weik, F.et al.ESPResSo 4.0 – an extensi- blesoftwarepackageforsimulatingsoftmat- ter systems.The European Physical Journal Special Topics227, 1789–1816 (2019)

  48. [48]

    Bauer, M.et al.waLBerla: A block- structured high-performance framework for multiphysics simulations.Computers & Mathematics with Applications81, 478–501 (2021)

  49. [49]

    K., Parkinson, J

    Liu, M., Nicholson, J. K., Parkinson, J. A. & Lindon, J. C. Measurement of biomolecu- lar diffusion coefficients in blood plasma us- ing two-dimensional1 H-1 H diffusion-edited total-correlation NMR spectroscopy.Ana- lytical Chemistry69, 1504–1509 (1997). 14

  50. [50]

    Virtanen, P.et al.SciPy 1.0: Funda- mental algorithms for scientific computing in Python.Nature Methods17, 261–272 (2020)

  51. [51]

    D., Chandler, D

    Weeks, J. D., Chandler, D. & Andersen, H. C. Role of repulsive forces in determining the equilibrium structure of simple liquids. The Journal of Chemical Physics54, 5237– 5247 (1971)

  52. [52]

    G., Sutton, R

    Barto, A. G., Sutton, R. S. & Anderson, C. W. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and CyberneticsSMC-13, 834–846 (1983)

  53. [53]

    Grondman, I., Busoniu, L., Lopes, G. A. D. & Babuska, R. A survey of actor-critic re- inforcement learning: Standard and natural policygradients.IEEE Transactions on Sys- tems, Man, and Cybernetics, Part C (Appli- cations and Reviews)42, 1291–1307 (2012)

  54. [54]

    & Klimov, O

    Schulman, J., Wolski, F., Dhariwal, P., Rad- ford, A. & Klimov, O. Proximal policy op- timization algorithms (2017).1707.06347

  55. [55]

    Schulman, J., Moritz, P., Levine, S., Jordan, M. I. & Abbeel, P. High-dimensional con- tinuous control using generalized advantage estimation (2018).1506.02438

  56. [56]

    Huber, P. J. Robust estimation of a loca- tion parameter.The Annals of Mathemati- cal Statistics35, 73–101 (1964)

  57. [57]

    & Diepold, K

    Gronauer, S. & Diepold, K. Multi-agent deep reinforcement learning: A survey. Artificial Intelligence Review55, 895–943 (2022)

  58. [58]

    Oliehoek, F. A. & Amato, C.A Concise Introduction to Decentralized POMDPs. SpringerBriefs in Intelligent Systems (Springer International Publishing, Cham, 2016)

  59. [59]

    Kingma, D. P. & Ba, J. Adam: A Method for Stochastic Optimization.arXiv(2017). 1412.6980

  60. [60]

    Murray, A. G. & Jackson, G. A. Viral dynamics: A model of the effects of size, shape, motion and abundance of single- celled planktonic organisms and other par- ticles.Marine Ecology Progress Series89, 103–116 (1992). Acknowledgments This study was funded by the Deutsche Forschungsgemeinschaft (DFG, German Re- search Foundation) through Compute Cluster gr...