Reinforcement Learning Enables Autonomous Microrobot Navigation and Intervention in Simulated Blood Capillaries
Pith reviewed 2026-06-26 01:25 UTC · model grok-4.3
The pith
Reinforcement learning agents discover universal strategies to navigate and intervene in simulated blood capillary networks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We develop a physically grounded simulation of a blood capillary network, incorporating realistic hydrodynamic flow fields, explicit red blood cell dynamics, and anatomically derived branching geometry, and train deep RL agents to navigate it via chemotaxis. Successful agents independently discover multiple universal strategy types, including run-and-rotate and energy-efficient search-and-sit policies, regardless of robot parameters. Without retraining, these agents perform targeted blocking and unblocking of capillary flow, restoring throughput to healthy baseline levels.
What carries the argument
Deep RL agents trained via chemotaxis inside a simulation that includes hydrodynamic flow fields, explicit red blood cell dynamics, and branching geometry.
If this is right
- Navigation succeeds only outside a forbidden regime where Brownian motion and flow overpower the robot's propulsion.
- The same agents achieve targeted flow control that returns capillary throughput to baseline levels.
- Strategy types emerge independently of specific robot size and swimming speed.
- Reinforcement learning supplies a workable method for autonomous microrobotic intervention in complex biological settings.
Where Pith is reading between the lines
- If the simulation matches real vessels, the same training process could produce policies for physical microrobots performing drug delivery or clot removal.
- The appearance of parameter-independent strategies suggests reinforcement learning can locate robust behaviors in other variable fluid systems with obstacles.
- One set of trained agents handling both navigation and flow intervention implies potential for multi-task microrobotic control in a single training run.
Load-bearing premise
The simulation accurately reproduces the hydrodynamic flow fields, red blood cell dynamics, and branching geometry of real in vivo blood capillaries such that navigation and intervention results will carry over to physical systems.
What would settle it
Testing the trained agents inside a physical microfluidic device that reproduces capillary flow, cell motion, and branches, then measuring whether navigation success rates and flow restoration match the simulation predictions.
Figures
read the original abstract
Autonomous microrobots navigating biological vasculature could enable targeted drug delivery and thrombolysis, yet training control policies for realistic environments remains an open challenge. Prior reinforcement learning (RL) studies of microrobotic navigation have been limited to idealized geometries that omit complex hydrodynamic flow fields, confined branching structures, and dense cellular obstacles found in vivo. Here, we develop a physically grounded simulation of a blood capillary network, incorporating realistic hydrodynamic flow fields, explicit red blood cell dynamics, and anatomically derived branching geometry, and train deep RL agents to navigate it via chemotaxis. We systematically map the physical limits of navigation across robot size and swimming speed, revealing a forbidden regime where Brownian motion and flow overcome propulsion. Successful agents independently discover multiple universal strategy types, including run-and-rotate and energy-efficient search-and-sit policies, regardless of robot parameters. Without retraining, these agents perform targeted blocking and unblocking of capillary flow, restoring throughput to healthy baseline levels. These results establish RL as a viable framework for developing autonomous microrobotic intervention strategies in complex biological environments.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript develops a physically grounded simulation of blood capillary networks incorporating realistic hydrodynamic flow fields, explicit red blood cell dynamics, and anatomically derived branching geometry. Deep RL agents are trained via chemotaxis to navigate this environment. The work systematically maps navigation limits across robot size and swimming speed, identifies a forbidden regime dominated by Brownian motion and flow, reports that successful agents discover multiple universal strategy types (run-and-rotate, energy-efficient search-and-sit) independent of robot parameters, and shows that these agents can perform targeted blocking/unblocking of capillary flow to restore throughput to healthy baseline levels without retraining.
Significance. If the simulation fidelity holds, the results would establish RL as a viable approach for autonomous microrobotic control in complex biological environments, addressing gaps in prior work limited to idealized geometries. The discovery of parameter-independent strategies and their direct application to flow restoration without retraining is a notable strength, with potential implications for targeted drug delivery and thrombolysis. The explicit inclusion of cellular obstacles and flow fields strengthens the environmental realism relative to earlier studies.
major comments (2)
- [Abstract and simulation setup] The central claims of universal strategy discovery and flow restoration to healthy baseline levels rest on the assumption that the simulation reproduces in vivo capillary hydrodynamics, RBC dynamics, and branching geometry with sufficient accuracy for navigation limits and intervention outcomes to translate. The manuscript states that it incorporates realistic hydrodynamics and explicit RBCs with anatomically derived geometry but supplies no quantitative benchmarking (e.g., matching measured capillary velocities, pressure drops, or cell deformation statistics from in vivo studies). This validation gap is load-bearing for the strongest claims.
- [Results on navigation and intervention] Navigation success rates and throughput restoration metrics are reported without error bars, confidence intervals, or details on the number of independent simulation runs or statistical tests, undermining assessment of whether the discovered strategies reliably outperform baselines or generalize beyond the specific training conditions.
minor comments (3)
- [Abstract] The abstract refers to 'anatomically derived branching geometry' without citing the specific anatomical data source or describing the implementation details (e.g., vessel diameters, bifurcation angles).
- [Methods] Clarify how the chemotaxis signal is modeled in the observation space and reward function, including any assumptions about chemical gradient sensing in the presence of flow and RBCs.
- [Results] The description of the 'forbidden regime' would benefit from an explicit parameter map or phase diagram showing the boundary in robot size vs. swimming speed space.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive report. We address the two major comments below and will revise the manuscript accordingly to improve clarity and rigor.
read point-by-point responses
-
Referee: [Abstract and simulation setup] The central claims of universal strategy discovery and flow restoration to healthy baseline levels rest on the assumption that the simulation reproduces in vivo capillary hydrodynamics, RBC dynamics, and branching geometry with sufficient accuracy for navigation limits and intervention outcomes to translate. The manuscript states that it incorporates realistic hydrodynamics and explicit RBCs with anatomically derived geometry but supplies no quantitative benchmarking (e.g., matching measured capillary velocities, pressure drops, or cell deformation statistics from in vivo studies). This validation gap is load-bearing for the strongest claims.
Authors: We agree that explicit quantitative benchmarking against in vivo measurements would strengthen the claims. The simulation parameters were selected from established literature values for capillary flow, RBC properties, and vessel geometry, but direct side-by-side comparisons to specific experimental datasets were not included. In the revised manuscript we will add a new subsection in Methods that tabulates all parameter sources with citations and provides available comparisons to in vivo data (e.g., mean flow velocities and pressure drops). We will also expand the Discussion to explicitly state the model limitations and the conditions under which the navigation and intervention results are expected to translate. revision: yes
-
Referee: [Results on navigation and intervention] Navigation success rates and throughput restoration metrics are reported without error bars, confidence intervals, or details on the number of independent simulation runs or statistical tests, undermining assessment of whether the discovered strategies reliably outperform baselines or generalize beyond the specific training conditions.
Authors: We acknowledge the omission of statistical reporting. The original results were obtained from multiple independent training runs, but these details and variability measures were not reported. In the revision we will (i) state the exact number of independent runs (N=10) used for all metrics, (ii) add error bars (standard deviation) and 95% confidence intervals to all success-rate and throughput plots, and (iii) include appropriate statistical tests (e.g., Welch’s t-test) when comparing agent performance against baselines. revision: yes
Circularity Check
No circularity: results are simulation outcomes, not reductions to inputs
full rationale
The paper trains RL agents in a custom hydrodynamic simulation and reports their discovered behaviors and intervention performance as direct outputs of those training runs. No equations, fitted parameters, or self-citations are invoked such that any reported success metric or strategy type reduces by construction to the training data or prior author work. The central claims rest on independent simulation executions rather than self-definitional mappings or renamed fits. External validity of the simulation is an assumption but does not create internal circularity in the reported derivation.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
& Yang, G.- Z
An, Y., He, B., Ma, Z., Guo, Y. & Yang, G.- Z. Microassembly: A review on fundamen- tals, applications and recent developments. Engineering48, 323–346 (2025)
2025
-
[2]
Physical Review E109, 065106 (2024)
Liao, C.-T.et al.Propulsion of a three- sphere microrobot in a porous medium. Physical Review E109, 065106 (2024)
2024
-
[3]
& Holm, C
Lohrmann, C. & Holm, C. Optimal motil- ity strategies for self-propelled agents to ex- plore porous media.Physical Review E108, 054401 (2023)
2023
-
[4]
G., Raj, R
Lee, J. G., Raj, R. R., Day, N. B. & Shields, C. W. I. Microrobots for biomedicine: Unsolved challenges and opportunities for translation.ACS Nano17, 14196–14204 (2023)
2023
-
[5]
Lv, S., Tang, L. V. & Hu, Y. Application of nanotechnology and micro/nanorobots in thromboticdiseases.EngMedicine2, 100061 (2025)
2025
-
[6]
Journal of Thrombosis and Haemostasis12, 1580–1590 (2014)
Raskob, G.et al.Thrombosis: A major contributor to the global disease burden. Journal of Thrombosis and Haemostasis12, 1580–1590 (2014)
2014
-
[7]
Schuerle, S.et al.Synthetic and liv- ing micropropellers for convection-enhanced nanoparticle transport.Science Advances5, eaav4803 (2019)
2019
-
[8]
& Zhang, L
Yu, J., Yang, L. & Zhang, L. Pattern gen- eration and motion control of a vortex-like paramagnetic nanoparticle swarm.The In- ternational Journal of Robotics Research37, 912–930 (2018). 12
2018
-
[9]
& Zhang, L
Jiang, J., Yang, L. & Zhang, L. DQN-based on-line path planning method for automatic navigation of miniature robots. In2023 IEEE International Conference on Robotics and Automation (ICRA), 5407–5413 (2023)
2023
-
[10]
C.et al.Clinically ready mag- netic microrobots for targeted therapies
Landers, F. C.et al.Clinically ready mag- netic microrobots for targeted therapies. Science390, 710–715 (2025)
2025
-
[11]
Dreyfus, R.et al.Microscopic artificial swimmers.Nature437, 862–865 (2005)
2005
-
[12]
Journal of Magnetism and Magnetic Mate- rials272–276, E1741–E1742 (2004)
Yamazaki, A.et al.Wireless micro swim- ming machine with magnetic thin film. Journal of Magnetism and Magnetic Mate- rials272–276, E1741–E1742 (2004)
2004
-
[13]
Zhang, L.et al.Characterizing the swim- ming properties of artificial bacterial flag- ella.Nano Letters9, 3663–3667 (2009)
2009
-
[14]
& Qian, K
Su, H., Li, S., Yang, G.-Z. & Qian, K. Janus micro/nanorobots in biomedical ap- plications.Advanced Healthcare Materials 12, 2202391 (2023)
2023
-
[15]
& Sano, M
Jiang, H.-R., Yoshinaga, N. & Sano, M. Active motion of a Janus particle by self- thermophoresis in a defocused laser beam. Physical Review Letters105, 268302 (2010)
2010
-
[16]
M.et al.Microscopic robots that sense, think, act, and compute.Science Robotics10, eadu8009 (2025)
Lassiter, M. M.et al.Microscopic robots that sense, think, act, and compute.Science Robotics10, eadu8009 (2025)
2025
-
[17]
& Huang, Z
Cai, W., Wang, G., Zhang, Y., Qu, X. & Huang, Z. Reinforcement learning for ac- tive matter.Biophysics Reviews6, 031302 (2025)
2025
-
[18]
& Biferale, L
Colabrese, S., Gustavsson, K., Celani, A. & Biferale, L. Flow navigation by smart microswimmers via reinforcement learning. Physical Review Letters118, 158004 (2017)
2017
-
[19]
& Liebchen, B
Nasiri, M. & Liebchen, B. Reinforcement learning of optimal active particle naviga- tion.New Journal of Physics24, 073042 (2022)
2022
-
[20]
& Cichos, F
Muiños-Landin, S., Fischer, A., Holubec, V. & Cichos, F. Reinforcement learning with artificial microswimmers.Science Robotics 6, eabd9285 (2021)
2021
-
[21]
Xiong, T., Liu, Z., Wang, Y., Ong, C. J. & Zhu, L. Chemotactic navigation in robotic swimmers via reset-free hierarchical rein- forcementlearning.Nature Communications 16, 5441 (2025)
2025
-
[22]
Tovey, S.et al.Environmental effects on emergent strategy in micro-scale multi- agent reinforcement learning (2023).2307. 00994
2023
-
[23]
Berg, H. C. & Brown, D. A. Chemo- taxis in escherichia coli analysed by three- dimensional tracking.Nature239, 500–504 (1972)
1972
-
[24]
& Larson, R
Watari, N. & Larson, R. G. The hydrody- namics of a run-and-tumble bacterium pro- pelled by polymorphic helical flagella.Bio- physical journal98, 12–17 (2010)
2010
-
[25]
C., Turner, L., Rojevsky, S
Darnton, N. C., Turner, L., Rojevsky, S. & Berg, H. C. On torque and tumbling in swimming escherichia coli.Journal of Bac- teriology189, 1756–1764 (2007)
2007
-
[26]
& Holm, C
Tovey, S., Lohrmann, C. & Holm, C. Emergence of chemotactic strategies with multi-agent reinforcement learning.Ma- chine Learning: Science and Technology5, 035054 (2024)
2024
-
[27]
Tovey, S.et al.SwarmRL: Building the fu- ture of smart active systems.The European Physical Journal E48, 16 (2025)
2025
-
[28]
Freund, J. B. Numerical simulation of flow- ing blood cells.Annual Review of Fluid Me- chanics46, 67–95 (2014)
2014
-
[29]
Marsden, A. L. & Esmaily-Moghadam, M. Multiscale modeling of cardiovascular flows for clinical decision support.Applied Me- chanics Reviews67(2015). 13
2015
-
[30]
Blood vessel | definition, anatomy, function, & types | britannica
Britannica, E. Blood vessel | definition, anatomy, function, & types | britannica. https://www.britannica.com/science/blood- vessel (2026)
2026
-
[31]
G.et al.Ch
Betts, J. G.et al.Ch. 1 introduc- tion - anatomy and physiology | Open- Stax. https://assets.openstax.org/oscms- prodcms/media/documents/anatomy-and- physiology-2e_-_WEB.pdf (2013)
2013
-
[32]
Bird, R., Stewart, W.&Lightfoot, E.Trans- port Phenomena(John Wiley and Sons, New York, 2002), 2 edn
2002
-
[33]
& Hinton, G
van der Maaten, L. & Hinton, G. Visualiz- ing Data using t-SNE.Journal of Machine Learning Research9, 2579–2605 (2008)
2008
-
[34]
Pedregosa, F.et al.Scikit-learn: Machine learning in python.Journal of Machine Learning Research12, 2825–2830 (2011)
2011
-
[35]
Least squares quantization in PCM.IEEE Transactions on Information Theory28, 129–137 (1982)
Lloyd, S. Least squares quantization in PCM.IEEE Transactions on Information Theory28, 129–137 (1982)
1982
-
[36]
Rousseeuw, P. J. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis.Journal of Computational and Applied Mathematics20, 53–65 (1987)
1987
-
[37]
& Gar- cia, H
Phillips, R., Kondev, J., Theriot, J. & Gar- cia, H. What and where: Constructionplans for cells and organisms. InPhysical Biol- ogy of the Cell, 2, 68 (Garland Science, New York, 2012), 2 edn
2012
-
[38]
Kannojiya, V., Das, A. K. & Das, P. K. Sim- ulation of blood as fluid: A review from rhe- ological aspects.IEEE Reviews in Biomed- ical Engineering14, 327–341 (2021)
2021
-
[39]
& Fasano, A
Ascolese, M., Farina, A. & Fasano, A. The Fåhræus-Lindqvist effect in small blood ves- sels: How does it help the heart?Journal of Biological Physics45, 379–394 (2019)
2019
-
[40]
P., Tra- chsler, H
Bollinger, A., Butti, P., Barras, J. P., Tra- chsler, H. & Siegenthaler, W. Red blood cell velocity in nailfold capillaries of man mea- sured by a television microscopy technique. Microvascular Research7, 61–72 (1974)
1974
-
[41]
Hudetz, A. G. Blood flow in the cerebral capillary network: A review emphasizing observations with intravital microscopy.Mi- crocirculation4, 233–252 (1997)
1997
-
[42]
Jarolímová, A.et al.In vivo evidence of blood flow slippage: Failure of the no- slip boundary condition assumption (2025). 2510.18107
arXiv 2025
-
[43]
Koutsiaris, A. G. & Pogiatzi, A. Velocity pulse measurements in the mesenteric arte- rioles of rabbits.Physiological Measurement 25, 15 (2003)
2003
-
[44]
Nader, E.et al.Bloodrheology: Keyparam- eters, impact on blood flow, role in sickle cell disease and effects of exercise.Frontiers in Physiology10(2019)
2019
-
[45]
URLhttp://github.com/google/ jax
Bradbury, J.et al.JAX: composable trans- formations of Python+NumPy programs (2018). URLhttp://github.com/google/ jax
2018
-
[46]
URL http://github.com/google/flax
Heek, J.et al.Flax: A neural network li- brary and ecosystem for JAX (2023). URL http://github.com/google/flax
2023
-
[47]
Weik, F.et al.ESPResSo 4.0 – an extensi- blesoftwarepackageforsimulatingsoftmat- ter systems.The European Physical Journal Special Topics227, 1789–1816 (2019)
2019
-
[48]
Bauer, M.et al.waLBerla: A block- structured high-performance framework for multiphysics simulations.Computers & Mathematics with Applications81, 478–501 (2021)
2021
-
[49]
K., Parkinson, J
Liu, M., Nicholson, J. K., Parkinson, J. A. & Lindon, J. C. Measurement of biomolecu- lar diffusion coefficients in blood plasma us- ing two-dimensional1 H-1 H diffusion-edited total-correlation NMR spectroscopy.Ana- lytical Chemistry69, 1504–1509 (1997). 14
1997
-
[50]
Virtanen, P.et al.SciPy 1.0: Funda- mental algorithms for scientific computing in Python.Nature Methods17, 261–272 (2020)
2020
-
[51]
D., Chandler, D
Weeks, J. D., Chandler, D. & Andersen, H. C. Role of repulsive forces in determining the equilibrium structure of simple liquids. The Journal of Chemical Physics54, 5237– 5247 (1971)
1971
-
[52]
G., Sutton, R
Barto, A. G., Sutton, R. S. & Anderson, C. W. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and CyberneticsSMC-13, 834–846 (1983)
1983
-
[53]
Grondman, I., Busoniu, L., Lopes, G. A. D. & Babuska, R. A survey of actor-critic re- inforcement learning: Standard and natural policygradients.IEEE Transactions on Sys- tems, Man, and Cybernetics, Part C (Appli- cations and Reviews)42, 1291–1307 (2012)
2012
-
[54]
Schulman, J., Wolski, F., Dhariwal, P., Rad- ford, A. & Klimov, O. Proximal policy op- timization algorithms (2017).1707.06347
Pith/arXiv arXiv 2017
-
[55]
Schulman, J., Moritz, P., Levine, S., Jordan, M. I. & Abbeel, P. High-dimensional con- tinuous control using generalized advantage estimation (2018).1506.02438
Pith/arXiv arXiv 2018
-
[56]
Huber, P. J. Robust estimation of a loca- tion parameter.The Annals of Mathemati- cal Statistics35, 73–101 (1964)
1964
-
[57]
& Diepold, K
Gronauer, S. & Diepold, K. Multi-agent deep reinforcement learning: A survey. Artificial Intelligence Review55, 895–943 (2022)
2022
-
[58]
Oliehoek, F. A. & Amato, C.A Concise Introduction to Decentralized POMDPs. SpringerBriefs in Intelligent Systems (Springer International Publishing, Cham, 2016)
2016
-
[59]
Kingma, D. P. & Ba, J. Adam: A Method for Stochastic Optimization.arXiv(2017). 1412.6980
Pith/arXiv arXiv 2017
-
[60]
Murray, A. G. & Jackson, G. A. Viral dynamics: A model of the effects of size, shape, motion and abundance of single- celled planktonic organisms and other par- ticles.Marine Ecology Progress Series89, 103–116 (1992). Acknowledgments This study was funded by the Deutsche Forschungsgemeinschaft (DFG, German Re- search Foundation) through Compute Cluster gr...
1992
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.