arxiv: 2604.20151 · v1 · submitted 2026-04-22 · 💻 cs.RO · cs.LG

Recognition: unknown

Toward Safe Autonomous Robotic Endovascular Interventions using World Models

Harry Robertshaw , Nikola Fischer , Han-Ru Wu , Andrea Walker Perez , Weiyuan Deng , Benjamin Jackson , Christos Bergeles , Alejandro Granados

show 1 more author

Thomas C Booth

Authors on Pith no claims yet

Pith reviewed 2026-05-10 00:53 UTC · model grok-4.3

classification 💻 cs.RO cs.LG

keywords world modelsreinforcement learningautonomous navigationmechanical thrombectomyendovascular interventionrobotic cathetervascular phantomsfluoroscopy

0 comments

The pith

World-model reinforcement learning outperforms standard RL for autonomous navigation in simulated and phantom blood vessels.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests a world-model approach to reinforcement learning for guiding robotic devices through complex, variable vascular paths during mechanical thrombectomy. It trains a TD-MPC2 agent on navigation tasks, evaluates it on previously unseen patient-specific vascular models in simulation, and then validates both the new agent and a standard Soft Actor-Critic baseline on physical patient-specific plastic phantoms under fluoroscopic imaging. The work shows higher success in simulation and comparable results in the physical setup while keeping contact forces low. A reader would care because manual endovascular procedures demand high precision amid anatomical differences, and reliable automation could shorten procedure times and lower complication risks if the models prove robust.

Core claim

The TD-MPC2 world-model agent achieves a mean success rate of 58% on hold-out in silico vasculatures compared to 36% for the SAC baseline, with mean tip contact forces of 0.15 N well below the 1.5 N vessel rupture threshold. In vitro tests on patient-specific vascular phantoms yield comparable success rates of 68% for TD-MPC2 versus 60% for SAC, with superior path ratios for TD-MPC2 at the cost of longer procedure times. These results constitute the first demonstration of autonomous MT navigation validated across both hold-out simulated data and fluoroscopy-guided phantom experiments.

What carries the argument

TD-MPC2, the model-based reinforcement learning method that builds an internal dynamics model to plan sequences of actions by predicting future states and minimizing predicted costs.

If this is right

TD-MPC2 produces significantly higher success rates than SAC across diverse hold-out vascular geometries in simulation.
Contact forces stay below the safety threshold for vessel rupture in the tested simulation scenarios.
In physical phantom experiments, TD-MPC2 matches SAC success rates while following more efficient paths, though it takes longer.
The framework supplies the first cross-validation of autonomous endovascular navigation between simulation and fluoroscopy-guided physical models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the learned dynamics capture the dominant interactions, the same planning loop could support navigation over longer distances or more branched anatomies than tested here.
Adding online model updates during a procedure might allow the system to adapt to small blood-flow effects or minor tissue shifts not present in static phantoms.
The planning-plus-dynamics structure could transfer to other image-guided robotic tasks that require predicting device-tissue contact.

Load-bearing premise

That performance measured on simulated vessels and rigid plastic phantoms will carry over when the system meets living tissue that deforms, blood flow that moves the device, and anatomical details absent from the training set.

What would settle it

A controlled experiment in live animals or human patients in which the autonomous system produces vessel rupture or reaches the target in fewer than half the cases under real physiological conditions.

Figures

Figures reproduced from arXiv: 2604.20151 by Alejandro Granados, Andrea Walker Perez, Benjamin Jackson, Christos Bergeles, Han-Ru Wu, Harry Robertshaw, Nikola Fischer, Thomas C Booth, Weiyuan Deng.

**Figure 1.** Figure 1: In vitro 3D vascular phantom used, with first phase of thrombectomy (navigation from the femoral artery to the internal carotid artery (ICA)) navigation tasks labeled and displayed. A1: Common iliac artery to superior aspect of descending aorta, A2L: superior aspect of descending aorta to left common carotid artery (CCA), a superhuman task when using a straight catheter tip. A2R: superior aspect of descend… view at source ↗

**Figure 2.** Figure 2: In vitro testbed set-up. E. Reinforcement learning agents Model-free SAC and model-based TD-MPC2 algorithms were used to train the RL agents in this study. The SAC RL algorithm was implemented from the open-source stEVE framework [17], which has been shown to be the current state-of-the-art for autonomous endovascular interventions [18]. In SAC, the critic learns the value and the actor optimizes the cri… view at source ↗

**Figure 3.** Figure 3: Training overview. An agent was trained using SAC for each task [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Examples of in vitro failure modes with areas highlighted by red circle. (a) Wrong branch catheterization during A2L - the wire has entered the left subclavian artery, (b) entering the LCCA during A2L but not utilizing the catheter effectively to reach the target further up the branch, (c) device leaving the phantom through the origin of the ascending aorta during A2R, and (d) unable to recover from device… view at source ↗

read the original abstract

Autonomous mechanical thrombectomy (MT) presents substantial challenges due to highly variable vascular geometries and the requirements for accurate, real-time control. While reinforcement learning (RL) has emerged as a promising paradigm for the automation of endovascular navigation, existing approaches often show limited robustness when faced with diverse patient anatomies or extended navigation horizons. In this work, we investigate a world-model-based framework for autonomous endovascular navigation built on TD-MPC2, a model-based RL method that integrates planning and learned dynamics. We evaluate a TD-MPC2 agent trained on multiple navigation tasks across hold out patient-specific vasculatures and benchmark its performance against the state-of-the-art Soft Actor-Critic (SAC) algorithm agent. Both approaches are further validated in vitro using patient-specific vascular phantoms under fluoroscopic guidance. In simulation, TD-MPC2 demonstrates a significantly higher mean success rate than SAC (58% vs. 36%, p < 0.001), and mean tip contact forces of 0.15 N, well below the proposed 1.5 N vessel rupture threshold. Mean success rates for TD-MPC2 (68%) were comparable to SAC (60%) in vitro, but TD-MPC2 achieved superior path ratios (p = 0.017) at the cost of longer procedure times (p < 0.001). Together, these results provide the first demonstration of autonomous MT navigation validated across both hold out in silico data and fluoroscopy-guided in vitro experiments, highlighting the promise of world models for safe and generalizable AI-assisted endovascular interventions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TD-MPC2 beats SAC in simulated endovascular navigation with safer forces, but in-vitro gains are small, times are longer, and real-patient generalization stays untested.

read the letter

The main thing to know is that this paper takes TD-MPC2, a model-based RL method, and applies it to autonomous catheter navigation for mechanical thrombectomy. It shows higher success rates than SAC in hold-out simulated vasculatures (58% vs 36%) with tip forces around 0.15 N, then runs both agents on rigid patient-specific plastic phantoms under fluoroscopy where success rates end up comparable (68% vs 60%) but path efficiency favors TD-MPC2 at the cost of longer procedures.

Referee Report

2 major / 2 minor

Summary. The paper introduces a TD-MPC2 world-model-based RL framework for autonomous mechanical thrombectomy navigation, training on multiple tasks and benchmarking against SAC. It reports higher success rates in hold-out simulation (58% vs 36%, p<0.001) with mean tip forces of 0.15 N, and comparable in-vitro success (68% vs 60%) but superior path ratios (p=0.017) on rigid patient-specific phantoms under fluoroscopy, claiming the first validated demonstration of autonomous MT navigation and the promise of world models for safe, generalizable AI-assisted endovascular interventions.

Significance. If the empirical results hold, the work offers a concrete step toward model-based RL for long-horizon endovascular tasks by showing planning with learned dynamics can achieve low contact forces and competitive success rates in both simulated and physical phantom settings. The dual validation (hold-out in silico plus fluoroscopy-guided in vitro) and absence of circularity in the reported metrics are strengths that could inform safer robotic control strategies, though the rigid-phantom setup limits immediate claims of broad generalizability.

major comments (2)

[Abstract] Abstract and Results: The central claim that the results 'highlight the promise of world models for safe and generalizable AI-assisted endovascular interventions' rests on in-vitro success rates that are statistically close to SAC (68% vs 60%) yet obtained on rigid plastic phantoms lacking blood flow, pulsatile pressure, and vessel compliance. No evidence is presented that the TD-MPC2 dynamics model was evaluated or adapted for this domain shift, directly affecting the reliability of the reported force predictions and planning performance.
[Methods] Methods (training and data sections): The manuscript provides insufficient detail on training vasculature diversity, exact hold-out selection criteria for patient-specific geometries, and the registration procedure used to align simulated models with the in-vitro phantoms. These omissions prevent independent verification that the 58% hold-out success rate reflects genuine generalization rather than data leakage or limited anatomical coverage.

minor comments (2)

[Results] Results: The specific statistical test underlying the reported p-values (p<0.001, p=0.017, p<0.001) is not stated, which would clarify whether parametric or non-parametric assumptions were used given the success-rate data.
[Abstract] Abstract: The in-vitro success-rate comparison is described as 'comparable' without an accompanying p-value, unlike the simulation results; adding this would make the performance contrast fully transparent.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which have helped us clarify the scope and limitations of our work. We address each major comment point by point below. Where appropriate, we have revised the manuscript to provide additional methodological details, acknowledge limitations of the phantom setup, and moderate claims regarding generalizability. These changes improve transparency without altering the core empirical findings.

read point-by-point responses

Referee: [Abstract] Abstract and Results: The central claim that the results 'highlight the promise of world models for safe and generalizable AI-assisted endovascular interventions' rests on in-vitro success rates that are statistically close to SAC (68% vs 60%) yet obtained on rigid plastic phantoms lacking blood flow, pulsatile pressure, and vessel compliance. No evidence is presented that the TD-MPC2 dynamics model was evaluated or adapted for this domain shift, directly affecting the reliability of the reported force predictions and planning performance.

Authors: We agree that the rigid-phantom in-vitro experiments lack blood flow, pulsatile pressure, and vessel compliance, representing a clear limitation for claiming broad physiological generalizability. The primary support for generalization instead derives from the hold-out simulation results, where TD-MPC2 achieves significantly higher success (58% vs. 36%, p<0.001) on unseen patient-specific geometries. The in-vitro validation demonstrates that the same policy, transferred zero-shot, yields comparable success rates while achieving statistically superior path efficiency (p=0.017) and low measured tip forces (0.15 N, well below the 1.5 N rupture threshold). The TD-MPC2 dynamics model was trained exclusively in simulation and not adapted or fine-tuned on physical data; the in-vitro outcomes therefore constitute an implicit evaluation of zero-shot transfer. We have revised the abstract to read 'highlight the promise of world models for safe AI-assisted endovascular interventions in simulated and phantom settings' and added a dedicated limitations paragraph in the Discussion that explicitly notes the domain-shift constraints and the need for future compliant, flow-enabled phantoms. The reported forces are directly measured during in-vitro trials rather than solely predicted by the model. revision: yes
Referee: [Methods] Methods (training and data sections): The manuscript provides insufficient detail on training vasculature diversity, exact hold-out selection criteria for patient-specific geometries, and the registration procedure used to align simulated models with the in-vitro phantoms. These omissions prevent independent verification that the 58% hold-out success rate reflects genuine generalization rather than data leakage or limited anatomical coverage.

Authors: We accept that the original Methods section lacked sufficient granularity for full reproducibility. In the revised manuscript we have expanded the 'Training Dataset and Hold-out Protocol' subsection to state: training used 12 distinct patient-specific vascular geometries segmented from CTA scans (covering Types I–III aortic arches and common branch variations); hold-out selection randomly reserved 4 geometries (33% of the set) with zero topological or segmental overlap to the training set; and registration between simulation meshes and physical phantoms was performed via 5–7 fiducial anatomical landmarks followed by rigid-body transformation, yielding a mean target registration error of 1.2 mm. These additions confirm that the 58% hold-out success rate measures generalization to truly unseen anatomies. A supplementary table enumerating the specific geometries and their inclusion status has also been added. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical validation of TD-MPC2 agent relies on independent simulation and phantom experiments

full rationale

The paper reports measured success rates (58% vs 36% in sim; 68% vs 60% in vitro), path ratios, procedure times, and tip forces (0.15 N) from hold-out in silico vasculatures and fluoroscopy-guided patient-specific plastic phantoms. These are direct experimental outcomes, not quantities derived from equations or parameters fitted within the paper. No self-definitional loops, fitted-input predictions, or load-bearing self-citations appear in the derivation chain; the TD-MPC2 world model is used as a black-box planner whose performance is externally validated rather than assumed by construction. The central claim of 'first demonstration' rests on these independent benchmarks, not on renaming or smuggling prior results.

Axiom & Free-Parameter Ledger

1 free parameters · 0 axioms · 0 invented entities

The work relies on standard assumptions of model-based RL (learned dynamics approximate real vessel-tool interaction) and the validity of phantom models as proxies for human anatomy; no new free parameters, axioms, or invented entities are introduced beyond the TD-MPC2 framework itself.

free parameters (1)

TD-MPC2 training hyperparameters
Typical RL hyperparameters (learning rates, planning horizon, model capacity) are fitted or chosen during training but not enumerated in the abstract.

pith-pipeline@v0.9.0 · 5605 in / 1105 out tokens · 81534 ms · 2026-05-10T00:53:58.013686+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 3 canonical work pages · 2 internal anchors

[1]

2024 heart disease and stroke statistics: A report of us and global data from the american heart association,

S. S. Martinet al., “2024 heart disease and stroke statistics: A report of us and global data from the american heart association,”Circulation, vol. 149, pp. E347–E913, 2 2024

2024
[2]

Endovascular thrombectomy for acute ischaemic stroke with established large infarct: multicentre, open-label, ran- domised trial,

M. Bendszuset al., “Endovascular thrombectomy for acute ischaemic stroke with established large infarct: multicentre, open-label, ran- domised trial,”The Lancet, 11 2023

2023
[3]

Impact of time to treatment on endovascular thrombectomy outcomes in the early versus late treatment time win- dows,

N. Asdaghiet al., “Impact of time to treatment on endovascular thrombectomy outcomes in the early versus late treatment time win- dows,”Stroke, vol. 54, pp. 733–742, 3 2023

2023
[4]

State of the nation report 2025,

SSNAP, “State of the nation report 2025,” Sentinel Stroke National Audit Programme, Tech. Rep., 2025. [Online]. Avail- able: www.hqip.org.uk/wp-content/uploads/2025/11/SSNAP-State-of- the-Nation-2025-FINALv2.pdf

2025
[5]

The impact of large core and late treatment trials: An update on the modelled annual thrombectomy eligibility of uk stroke patients,

P. McMeekin, M. James, C. I. Price, G. A. Ford, and P. White, “The impact of large core and late treatment trials: An update on the modelled annual thrombectomy eligibility of uk stroke patients,” European Stroke Journal, vol. 9, pp. 566–574, 9 2024

2024
[6]

Impact of robotics and a suspended lead suit on physician radiation exposure during percutaneous coronary intervention,

R. D. Madderet al., “Impact of robotics and a suspended lead suit on physician radiation exposure during percutaneous coronary intervention,”Cardiovascular Revascularization Medicine, 2017

2017
[7]

Artificial intelligence in the autonomous nav- igation of endovascular interventions: a systematic review,

H. Robertshawet al., “Artificial intelligence in the autonomous nav- igation of endovascular interventions: a systematic review,”Frontiers in Human Neuroscience, vol. 17, 8 2023

2023
[8]

Cathsim: An open-source simulator for endovascular intervention,

T. Jianuet al., “Cathsim: An open-source simulator for endovascular intervention,”IEEE Transactions on Medical Robotics and Bionics, vol. 6, pp. 971–979, 2024

2024
[9]

Recurrent neural networks for generalization towards the vessel geometry in autonomous endovascular guidewire navigation in the aortic arch,

L. Karstensenet al., “Recurrent neural networks for generalization towards the vessel geometry in autonomous endovascular guidewire navigation in the aortic arch,”Int J CARS, 2023

2023
[10]

Reinforcement learning for safe autonomous two-device navigation of cerebral vessels in mechanical thrombec- tomy,

H. Robertshawet al., “Reinforcement learning for safe autonomous two-device navigation of cerebral vessels in mechanical thrombec- tomy,”Int J CARS, 2025

2025
[11]

Autonomous navigation of catheters and guidewires in mechan- ical thrombectomy using inverse reinforcement learning,

——, “Autonomous navigation of catheters and guidewires in mechan- ical thrombectomy using inverse reinforcement learning,”Int J CARS, 6 2024

2024
[12]

Deep reinforcement learning that matters,

P. Hendersonet al., “Deep reinforcement learning that matters,” in Thirty-Second AAAI Conference on AI and Thirtieth Innovative Appli- cations of AI Conference and Eighth AAAI Symposium on Educational Advances in AI, 2 2018

2018
[13]

Pwm: Policy learning with large world models,

I. Georgiev, V . Giridhar, N. Hansen, and A. Garg, “Pwm: Policy learning with large world models,” 7 2024. [Online]. Available: http://arxiv.org/abs/2407.02466

work page arXiv 2024
[14]

Mastering Diverse Domains through World Models

D. Hafner, J. Pasukonis, J. Ba, and T. Lillicrap, “Mastering diverse domains through world models,” 1 2023. [Online]. Available: http://arxiv.org/abs/2301.04104

work page internal anchor Pith review arXiv 2023
[15]

Recurrent world models facilitate policy evolution,

D. Ha and J. Schmidhuber, “Recurrent world models facilitate policy evolution,” inAdvances in Neural Information Processing Systems, 2018

2018
[16]

TD-MPC2: Scalable, Robust World Models for Continuous Control

N. Hansen, H. Su, and X. Wang, “Td-mpc2: Scalable, robust world models for continuous control,” inThe Twelfth International Conference on Learning Representations, 2024. [Online]. Available: http://arxiv.org/abs/2310.16828

work page internal anchor Pith review arXiv 2024
[17]

Learning-based autonomous navigation, bench- mark environments and simulation framework for endovascular inter- ventions,

L. Karstensenet al., “Learning-based autonomous navigation, bench- mark environments and simulation framework for endovascular inter- ventions,”Computers in Biology and Medicine, vol. 196, 9 2025

2025
[18]

Benchmarking reinforcement learning algorithms for autonomous mechanical thrombectomy,

F. Moosaet al., “Benchmarking reinforcement learning algorithms for autonomous mechanical thrombectomy,”Int J CARS, vol. 20, 2025

2025
[19]

World model for AI autonomous navigation in mechanical thrombectomy,

H. Robertshaw, H.-R. Wu, A. Granados, and T. C. Booth, “World model for AI autonomous navigation in mechanical thrombectomy,” inproceedings of Medical Image Computing and Computer Assisted Intervention – MICCAI 2025, 2025

2025
[20]

Contact and friction between catheter and blood vessel,

K. Takashimaet al., “Contact and friction between catheter and blood vessel,”Tribology International, vol. 40, pp. 319–328, 2007

2007
[21]

Reducing contact forces in the arch and supra- aortic vessels using the magellan robot,

H. Rafii-Tariet al., “Reducing contact forces in the arch and supra- aortic vessels using the magellan robot,” inJournal of V ascular Surgery, vol. 64. Mosby Inc., 11 2016, pp. 1422–1432

2016
[22]

H. Robertshawet al., “A position statement on endovascular models and effectiveness metrics for mechanical thrombectomy navigation, on behalf of the stakeholder taskforce for artificial intelligence–assisted robotic thrombectomy (START),”Journal of the American Heart Association, vol. 15, 2026

2026
[23]

A zero-shot reinforcement learning strategy for autonomous guidewire navigation,

V . Scarponi, M. Duprez, F. Nageotte, and S. Cotin, “A zero-shot reinforcement learning strategy for autonomous guidewire navigation,” Int J CARS, 2024

2024
[24]

Sofa, a multi-model framework for interactive physical simulation,

F. Faureet al., “Sofa, a multi-model framework for interactive physical simulation,” pp. 283–321, 2012

2012
[25]

Automated aortic anatomy analysis: from image to clinical indicators,

M. Lahlouhet al., “Automated aortic anatomy analysis: from image to clinical indicators,” inAnnual International Conference of the IEEE Engineering in Medicine and Biology Society, 2023

2023
[26]

Toward AI autonomous navigation for mechan- ical thrombectomy using hierarchical modular multi-agent reinforce- ment learning (HM-MARL),

H. Robertshawet al., “Toward AI autonomous navigation for mechan- ical thrombectomy using hierarchical modular multi-agent reinforce- ment learning (HM-MARL),”IEEE Robotics and Automation Letters, pp. 1–8, 2026

2026
[27]

Comparative verification of control methodology for robotic interventional neuroradiology procedures,

B. Jacksonet al., “Comparative verification of control methodology for robotic interventional neuroradiology procedures,”International Journal of Computer Assisted Radiology and Surgery, 2023

2023
[28]

A ros2-based testbed environment for endovascular robotic systems,

C. Eyberget al., “A ros2-based testbed environment for endovascular robotic systems,” inCurrent Directions in Biomedical Engineering, vol. 8. Walter de Gruyter GmbH, 7 2022, pp. 89–92

2022
[29]

Autonomous endovascular navigation with a long ctr: In-vitro studies,

H. Sadatiet al., “Autonomous endovascular navigation with a long ctr: In-vitro studies,” inHamlyn Symposium on Medical Robotics, 2025

2025
[30]

Assessment of thrombectomy procedure diffi- culty by neurointerventionalists based on vessel geometry parameters from carotid artery 3d reconstructions,

M. S. Shazeebet al., “Assessment of thrombectomy procedure diffi- culty by neurointerventionalists based on vessel geometry parameters from carotid artery 3d reconstructions,”Journal of Clinical Neuro- science, vol. 113, pp. 121–125, 7 2023

2023
[31]

Estimation of contact forces of endovascular devices using physicians’ bio-signals,

M. Sierotowiczet al., “Estimation of contact forces of endovascular devices using physicians’ bio-signals,” in2025 47th Annual Interna- tional Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2025, pp. 1–7

2025