arxiv: 2604.06280 · v1 · submitted 2026-04-07 · ⚛️ physics.med-ph · cs.AI

Recognition: no theorem link

DosimeTron: Automating Personalized Monte Carlo Radiation Dosimetry in PET/CT with Agentic AI

Antonios Tzortzakakis, Eleftherios Tzanis, Michail E. Klontzas

Pith reviewed 2026-05-10 18:55 UTC · model grok-4.3

classification ⚛️ physics.med-ph cs.AI

keywords agentic AIMonte Carlo dosimetryPET/CTpersonalized radiation dosimetryautomationnuclear medicinedosimetric validationPSMA-PET

0 comments

The pith

An agentic AI system automates the full pipeline of patient-specific Monte Carlo radiation dosimetry from PET/CT data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces DosimeTron, an agentic AI that uses a large language model to direct specialized tools for extracting DICOM information, preparing images, running Monte Carlo simulations, segmenting organs, and generating dose reports through ordinary language instructions. Evaluated on 597 PSMA-PET/CT studies from 378 patients, the system completed every task without execution failures or incorrect outputs across varied single-turn and multi-turn prompts. Dose calculations matched those from the established OpenDose3D tool with median Pearson correlation of 0.997 and mean differences below 5 percent for most organs, while taking an average of 32 minutes per study.

Core claim

DosimeTron autonomously executed complex dosimetry pipelines across diverse prompt configurations and achieved high dosimetric agreement with OpenDose3D at clinically acceptable processing times, demonstrating the feasibility of agentic AI for patient-specific Monte Carlo dosimetry in PET/CT.

What carries the argument

The agentic AI framework that uses GPT-5.2 as reasoning engine together with 23 tools exposed via four Model Context Protocol servers to handle DICOM metadata extraction, image preprocessing, Monte Carlo simulation, organ segmentation, and dosimetric reporting through natural-language interaction.

Load-bearing premise

The system and its tools will continue to perform with zero errors and matching accuracy when applied to live clinical data and arbitrary user prompts outside the controlled retrospective dataset.

What would settle it

Execution failures, pipeline errors, or mean absolute percentage differences above 5 percent for most organs when DosimeTron is run on a fresh collection of patient PET/CT scans from additional scanner models or with live hospital data.

read the original abstract

Purpose: To develop and evaluate DosimeTron, an agentic AI system for automated patient-specific MC internal radiation dosimetry in PET/CT examinations. Materials and Methods: In this retrospective study, DosimeTron was evaluated on a publicly available PSMA-PET/CT dataset comprising 597 studies from 378 male patients acquired on three scanner models (18-F, n = 369; 68-Ga, n = 228). The system uses GPT-5.2 as its reasoning engine and 23 tools exposed via four Model Context Protocol servers, automating DICOM metadata extraction, image preprocessing, MC simulation, organ segmentation, and dosimetric reporting through natural-language interaction. Agentic performance was assessed using diverse prompt templates spanning single-turn instructions of varying specificity and multi-turn conversational exchanges, monitored via OpenTelemetry traces. Dosimetric accuracy was validated against OpenDose3D across 114 cases and 22 organs using Pearson's r, Lin's concordance correlation coefficient (CCC), and Bland-Altman analysis. Results: Across all prompt templates and all runs, no execution failures, pipeline errors, or hallucinated outputs were observed. Pearson's r ranged from 0.965 to 1.000 (median 0.997; all p < 0.001) and CCC from 0.963 to 1.000 (median 0.996). Mean absolute percentage difference was below 5% for 19 of 22 organs (median 2.5%). Total per-study processing time (SD) was 32.3 (6.0) minutes. Conclusion: DosimeTron autonomously executed complex dosimetry pipelines across diverse prompt configurations and achieved high dosimetric agreement with OpenDose3D at clinically acceptable processing times, demonstrating the feasibility of agentic AI for patient-specific Monte Carlo dosimetry in PET/CT.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DosimeTron automates the full PET/CT dosimetry pipeline with agentic AI and matches reference results closely on retrospective data, but real-world robustness is untested.

read the letter

The one or two things to know: DosimeTron is a new agentic AI system that uses GPT-5.2 to orchestrate 23 tools for end-to-end automated Monte Carlo dosimetry in PET/CT, and on retrospective public data it ran perfectly with dosimetric results matching the OpenDose3D reference very closely. The paper does well in demonstrating a working pipeline. They tested it on 597 studies from three scanners, used diverse prompts including multi-turn conversations, and reported zero errors or hallucinations. The validation on 114 cases showed strong correlations and low percentage differences across organs, which is encouraging for consistency. The architecture with Model Context Protocol servers for tools like segmentation and simulation is a solid way to structure the agent. Soft spots are around generalization. The stress-test concern holds up because all testing was on fixed retrospective data with scripted prompts. There's no data on how it handles real-time clinical inputs, scanner differences beyond the three models, or cases with artifacts. That leaves open questions about reliability in practice, even if the lab results are clean. Also, since it's tied to a specific LLM version, updates or changes could affect performance. This is for readers in nuclear medicine and medical physics who are exploring AI for routine tasks. Someone working on dosimetry automation would get value from the tool count and prompt testing approach. It deserves peer review because the empirical results are clear and the application is timely, even if more validation is needed. I would bring this to a reading group as maybe, to discuss the practical challenges of agentic systems in medicine. I wouldn't cite it in my own work in the next year unless more evidence comes out. But yes, send it for peer review.

Referee Report

2 major / 2 minor

Summary. The manuscript presents DosimeTron, an agentic AI system using GPT-5.2 as the reasoning engine and 23 tools via four Model Context Protocol servers to automate the full pipeline of DICOM handling, image preprocessing, organ segmentation, Monte Carlo simulation, and dosimetric reporting for patient-specific internal radiation dosimetry in PET/CT. In a retrospective evaluation on a public PSMA-PET/CT dataset of 597 studies, the system recorded zero execution failures or hallucinations across single- and multi-turn prompt templates; dosimetric outputs on 114 cases and 22 organs showed Pearson r from 0.965 to 1.000 (median 0.997) and CCC from 0.963 to 1.000 (median 0.996) versus OpenDose3D, with MAPE below 5% for 19 organs and mean processing time of 32.3 minutes.

Significance. If the reported zero-failure rate and high concordance metrics hold under broader conditions, the work establishes technical feasibility for agentic AI to autonomously manage complex, multi-step Monte Carlo dosimetry workflows on sizable public datasets. The independent validation against OpenDose3D, absence of observed pipeline errors, and clinically plausible processing times constitute concrete strengths that could support future automation efforts in personalized radiation dosimetry.

major comments (2)

[Results] Results section: dosimetric agreement is reported only on a subset of 114 cases drawn from the 597-study cohort; without explicit selection criteria or analysis of whether these cases are representative with respect to scanner model, tracer (18-F vs 68-Ga), or organ coverage, it is unclear whether the median r = 0.997 and CCC = 0.996 generalize to the full dataset or to more challenging anatomies.
[Conclusion] Conclusion: the claim that DosimeTron demonstrates feasibility of agentic AI for patient-specific Monte Carlo dosimetry rests on retrospective, pre-curated data and scripted prompts. No evidence is provided that the same zero-error performance and <5% MAPE persist under live scanner variability, motion artifacts, atypical anatomy, or open-ended clinician prompts that could trigger tool mis-calls or segmentation drift; this gap directly limits the strength of the clinical-feasibility assertion.

minor comments (2)

[Materials and Methods] Materials and Methods: the description of the 23 tools and four MCP servers is high-level; expanding the list of tool functions (especially DICOM metadata extraction and MC parameter handling) would improve reproducibility.
The manuscript would benefit from an explicit limitations paragraph addressing the retrospective design, controlled prompt conditions, and absence of prospective or multi-center testing.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and constructive review. The comments highlight important aspects of generalizability and scope that we address below with proposed revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Results] Results section: dosimetric agreement is reported only on a subset of 114 cases drawn from the 597-study cohort; without explicit selection criteria or analysis of whether these cases are representative with respect to scanner model, tracer (18-F vs 68-Ga), or organ coverage, it is unclear whether the median r = 0.997 and CCC = 0.996 generalize to the full dataset or to more challenging anatomies.

Authors: We agree that the manuscript did not explicitly describe the selection criteria for the 114-case validation subset. The subset was obtained via stratified random sampling to maintain proportional representation of the three scanner models and both tracers (18-F, n=369; 68-Ga, n=228) while ensuring at least one case per major organ segmentation category present in the full cohort. In the revised manuscript we will insert a dedicated paragraph in the Materials and Methods section detailing the sampling procedure, together with a supplementary table comparing scanner distribution, tracer type, patient age/weight, and organ coverage statistics between the 114 cases and the full 597 studies. This addition will allow readers to evaluate representativeness directly. revision: yes
Referee: [Conclusion] Conclusion: the claim that DosimeTron demonstrates feasibility of agentic AI for patient-specific Monte Carlo dosimetry rests on retrospective, pre-curated data and scripted prompts. No evidence is provided that the same zero-error performance and <5% MAPE persist under live scanner variability, motion artifacts, atypical anatomy, or open-ended clinician prompts that could trigger tool mis-calls or segmentation drift; this gap directly limits the strength of the clinical-feasibility assertion.

Authors: We concur that the present study is confined to retrospective, pre-curated data and structured prompts, and therefore cannot furnish direct evidence of performance under live clinical conditions. The reported zero-failure rate and dosimetric concordance establish technical feasibility within this controlled retrospective setting, which constitutes a necessary first step. In revision we will (i) rephrase the Conclusion to state that the work demonstrates feasibility for retrospective automated Monte Carlo dosimetry pipelines and (ii) add a new Limitations paragraph that explicitly enumerates the untested scenarios (scanner variability, motion, atypical anatomy, open-ended prompts) and calls for prospective validation studies. These changes will align the strength of the claims with the evidence provided. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical validation against external reference

full rationale

The paper describes an agentic AI dosimetry pipeline and reports its performance on a public retrospective PSMA-PET/CT dataset of 597 studies. Dosimetric outputs are compared directly to the independent external tool OpenDose3D using Pearson r, CCC, and MAPE on 114 cases; no equations, parameters, or predictions are fitted to the target metrics or defined in terms of themselves. All load-bearing claims (zero execution failures, r/CCC >0.96, <5% MAPE, 32.3 min runtime) are statistical summaries of observed runs under controlled conditions, not derivations that reduce to the inputs by construction. Self-citations are absent from the central evaluation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an applied systems paper that introduces and validates a software pipeline rather than deriving new physical laws or constants. No free parameters are fitted to produce the central performance claims, no domain axioms beyond standard Monte Carlo radiation transport are invoked, and no new physical entities are postulated.

pith-pipeline@v0.9.0 · 5659 in / 1305 out tokens · 53246 ms · 2026-05-10T18:55:16.018335+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

35 extracted references · 29 canonical work pages

[1]

Total-Body PET/CT: Current Applications and Future Perspectives

Tan H, Gu Y, Yu H, et al. Total-Body PET/CT: Current Applications and Future Perspectives. American Journal of Roentgenology. 2020;215(2):325–337. doi: 10.2214/AJR.19.22705

work page doi:10.2214/ajr.19.22705 2020
[2]

Radiotheranostics: a roadmap for future development

Herrmann K, Schwaiger M, Lewis JS, et al. Radiotheranostics: a roadmap for future development. Lancet Oncol. 2020;21(3):e146–e156. doi: 10.1016/S1470-2045(19)30821-6

work page doi:10.1016/s1470-2045(19)30821-6 2020
[3]

MIRD Pamphlet No

Bolch WE, Eckerman KF, Sgouros G, Thomas SR. MIRD Pamphlet No. 21: A Generalized Schema for Radiopharmaceutical Dosimetry—Standardization of Nomenclature. Journal of Nuclear Medicine. 2009;50(3):477–484. doi: 10.2967/jnumed.108.056036

work page doi:10.2967/jnumed.108.056036 2009
[4]

OLINDA/EXM 2—The Next-generation Personal Computer Software for Internal Dose Assessment in Nuclear Medicine

Stabin MG. OLINDA/EXM 2—The Next-generation Personal Computer Software for Internal Dose Assessment in Nuclear Medicine. Health Phys. 2023;124(5):397–406. doi: 10.1097/HP.0000000000001682

work page doi:10.1097/hp.0000000000001682 2023
[5]

Uncertainties in Internal Dose Calculations for Radiopharmaceuticals

Stabin MG. Uncertainties in Internal Dose Calculations for Radiopharmaceuticals. Journal of Nuclear Medicine. 2008;49(5):853–860. doi: 10.2967/jnumed.107.048132

work page doi:10.2967/jnumed.107.048132 2008
[6]

Personalised Dosimetry in Nuclear Medicine: Bridging Physics, Biology and AI for Next Generation Radiopharmaceutical Therapy

Shanmugiah J, Kim JS. Personalised Dosimetry in Nuclear Medicine: Bridging Physics, Biology and AI for Next Generation Radiopharmaceutical Therapy. Nucl Med Mol Imaging. 2026; doi: 10.1007/s13139-026-00988-8

work page doi:10.1007/s13139-026-00988-8 2026
[7]

Advanced Monte Carlo simulations of emission tomography imaging systems with GATE

Sarrut D, Bała M, Bardiès M, et al. Advanced Monte Carlo simulations of emission tomography imaging systems with GATE. Phys Med Biol. 2021;66(10):10TR03. doi: 10.1088/1361-6560/abf276

work page doi:10.1088/1361-6560/abf276 2021
[8]

Empowering AI data scientists using a multi-agent LLM framework with self-evolving capabilities for autonomous, tool-aware biomedical data analyses

Bu D, Sun J, Li K, et al. Empowering AI data scientists using a multi-agent LLM framework with self-evolving capabilities for autonomous, tool-aware biomedical data analyses. Nat Biomed Eng. 2026; doi: 10.1038/s41551-026-01634-6. 15

work page doi:10.1038/s41551-026-01634-6 2026
[9]

mAIstro: An open-source multi-agent system for automated end-to-end development of radiomics and deep learning models for medical imaging

Tzanis E, Klontzas ME. mAIstro: An open-source multi-agent system for automated end-to-end development of radiomics and deep learning models for medical imaging. European Journal of Radiology Artificial Intelligence. 2025;4:100044. doi: 10.1016/j.ejrai.2025.100044

work page doi:10.1016/j.ejrai.2025.100044 2025
[10]

Agentic systems in radiology: Principles, opportunities, privacy risks, regulation, and sustainability concerns

Tzanis E, Adams LC, Akinci D’Antonoli T, et al. Agentic systems in radiology: Principles, opportunities, privacy risks, regulation, and sustainability concerns. Diagn Interv Imaging. 2026;107(1):7–16. doi: 10.1016/j.diii.2025.10.002

work page doi:10.1016/j.diii.2025.10.002 2026
[11]

Klontzas

Tzanis E, Klontzas ME. Klontzas. ReclAIm: A multi-agent framework for degradation-aware performance tuning of medical imaging AI. arXiv:251017004. 2025. doi: 10.48550/arXiv.2510.17004

work page doi:10.48550/arxiv.2510.17004 2025
[12]

A whole-body PSMA-PET/CT dataset with manually annotated tumor lesions (Version 2) [dataset]

Jeblick, K., Schachtner, B., Mittermeier, A., et al. A whole-body PSMA-PET/CT dataset with manually annotated tumor lesions (Version 2) [dataset]. The Cancer Imaging Archive 2026. doi: 10.7937/r7ep-3×37

work page doi:10.7937/r7ep-3 2026
[13]

Checklist for Artificial Intelligence in Medical Imaging (CLAIM): 2024 Update

Tejani AS, Klontzas ME, Gatti AA, et al. Checklist for Artificial Intelligence in Medical Imaging (CLAIM): 2024 Update. Radiol Artif Intell. 2024;6(4). doi: 10.1148/ryai.240300

work page doi:10.1148/ryai.240300 2024
[14]

https://www.anthropic.com/news/model-context-protocol

Introducing the Model Context Protocol. https://www.anthropic.com/news/model-context-protocol
[15]

Medical image interpolation based on 3D Lanczos filtering

Moraes T, Amorim P, da Silva JV, Pedrini H. Medical image interpolation based on 3D Lanczos filtering. Comput Methods Biomech Biomed Eng Imaging Vis. 2020;8(3):294–300. doi: 10.1080/21681163.2019.1683469

work page doi:10.1080/21681163.2019.1683469 2020
[16]

A review of the use and potential of the GATE Monte Carlo simulation code for radiation therapy and dosimetry applications

Sarrut D, Bardiès M, Boussion N, et al. A review of the use and potential of the GATE Monte Carlo simulation code for radiation therapy and dosimetry applications. Med Phys. 2014;41(6Part1):064301. doi: 10.1118/1.4871617 16

work page doi:10.1118/1.4871617 2014
[17]

https://geant4.web.cern.ch/documentation/pipelines/master/plg_html/PhysicsListGuide/electromagnetic/O pt4.html

Geant4 - Electromagnetic physics constructors. https://geant4.web.cern.ch/documentation/pipelines/master/plg_html/PhysicsListGuide/electromagnetic/O pt4.html. Accessed 6/2/2026

2026
[18]

TotalSegmentator: Robust Segmentation of 104 Anatomic Structures in CT Images

Wasserthal J, Breit H-C, Meyer MT, et al. TotalSegmentator: Robust Segmentation of 104 Anatomic Structures in CT Images. Radiol Artif Intell. 2023;5(5). doi: 10.1148/ryai.230024

work page doi:10.1148/ryai.230024 2023
[19]

https://opentelemetry.io/

OpenTelemetry: The open standard for telemetry. https://opentelemetry.io/. Accessed 6/2/2026

2026
[20]

https://github.com/Arize- ai/openinference

OpenInference: Semantic conventions for LLM observability. https://github.com/Arize- ai/openinference. Accessed 6/2/2026

2026
[21]

https://phoenix.arize.com/

Arize Phoenix: Open-source LLM tracing and evaluation. https://phoenix.arize.com/. Accessed 6/2/2026

2026
[22]

OpenDose3D: A Free, Open-Source Clinical Dosimetry Software for Patient-Specific Dosimetry

Fragoso-Negrín J-A, Vergara-Gil A, Rahman Hakim A, et al. OpenDose3D: A Free, Open-Source Clinical Dosimetry Software for Patient-Specific Dosimetry. Journal of Nuclear Medicine. 2025;jnumed.125.269539. doi: 10.2967/jnumed.125.269539

work page doi:10.2967/jnumed.125.269539 2025
[23]

A Concordance Correlation Coefficient to Evaluate Reproducibility

Lin LI-K. A Concordance Correlation Coefficient to Evaluate Reproducibility. Biometrics. 1989;45(1):255. doi: 10.2307/2532051

work page doi:10.2307/2532051 1989
[24]

Measurement in Medicine: The Analysis of Method Comparison Studies

Altman DG, Bland JM. Measurement in Medicine: The Analysis of Method Comparison Studies. The Statistician. 1983;32(3):307. doi: 10.2307/2987937

work page doi:10.2307/2987937 1983
[25]

Radiation Dosimetry and Biodistribution of 68 Ga-FAPI-46 PET Imaging in Cancer Patients

Meyer C, Dahlbom M, Lindner T, et al. Radiation Dosimetry and Biodistribution of 68 Ga-FAPI-46 PET Imaging in Cancer Patients. Journal of Nuclear Medicine. 2020;61(8):1171–1177. doi: 10.2967/jnumed.119.236786. 17

work page doi:10.2967/jnumed.119.236786 2020
[26]

Radiation dosimetry of 18F-FDG PET/CT: incorporating exam-specific parameters in dose estimates

Quinn B, Dauer Z, Pandit-Taskar N, Schoder H, Dauer LT. Radiation dosimetry of 18F-FDG PET/CT: incorporating exam-specific parameters in dose estimates. BMC Med Imaging. 2016;16(1):41. doi: 10.1186/s12880-016-0143-y

work page doi:10.1186/s12880-016-0143-y 2016
[27]

F-18 labelled PSMA-1007: biodistribution, radiation dosimetry and histopathological validation of tumor lesions in prostate cancer patients

Giesel FL, Hadaschik B, Cardinale J, et al. F-18 labelled PSMA-1007: biodistribution, radiation dosimetry and histopathological validation of tumor lesions in prostate cancer patients. Eur J Nucl Med Mol Imaging. 2017;44(4):678–688. doi: 10.1007/s00259-016-3573-4

work page doi:10.1007/s00259-016-3573-4 2017
[28]

Radiation Dosimetry of Whole-Body Dual-Tracer 18 F-FDG and 11 C-Acetate PET/CT for Hepatocellular Carcinoma

Liu D, Khong P-L, Gao Y, et al. Radiation Dosimetry of Whole-Body Dual-Tracer 18 F-FDG and 11 C-Acetate PET/CT for Hepatocellular Carcinoma. Journal of Nuclear Medicine. 2016;57(6):907–912. doi: 10.2967/jnumed.115.165944

work page doi:10.2967/jnumed.115.165944 2016
[29]

FAPI-74 PET/CT Using Either 18 F-AlF or Cold-Kit 68 Ga Labeling: Biodistribution, Radiation Dosimetry, and Tumor Delineation in Lung Cancer Patients

Giesel FL, Adeberg S, Syed M, et al. FAPI-74 PET/CT Using Either 18 F-AlF or Cold-Kit 68 Ga Labeling: Biodistribution, Radiation Dosimetry, and Tumor Delineation in Lung Cancer Patients. Journal of Nuclear Medicine. 2021;62(2):201–207. doi: 10.2967/jnumed.120.245084

work page doi:10.2967/jnumed.120.245084 2021
[30]

An internal radiation dosimetry computer program, IDAC 2.0, for estimation of patient doses from radiopharmaceuticals

Andersson M, Johansson L, Minarik D, Mattsson S, Leide-Svegborn S. An internal radiation dosimetry computer program, IDAC 2.0, for estimation of patient doses from radiopharmaceuticals. Radiat Prot Dosimetry. 2014;162(3):299–305. doi: 10.1093/rpd/nct337

work page doi:10.1093/rpd/nct337 2014
[31]

MIRD Pamphlet No

Kesner AL, Carter LM, Ramos JCO, et al. MIRD Pamphlet No. 28, Part 1: MIRDcalc—A Software Tool for Medical Internal Radiation Dosimetry. Journal of Nuclear Medicine. 2023;64(7):1117–1124. doi: 10.2967/jnumed.122.264225

work page doi:10.2967/jnumed.122.264225 2023
[32]

MCNPTM Version 5

Forster RA, Cox LJ, Barrett RF, et al. MCNPTM Version 5. Nucl Instrum Methods Phys Res B. 2004;213:82–86. doi: 10.1016/S0168-583X(03)01538-6

work page doi:10.1016/s0168-583x(03)01538-6 2004
[33]

3D Slicer as an image computing platform for the Quantitative Imaging Network

Fedorov A, Beichel R, Kalpathy-Cramer J, et al. 3D Slicer as an image computing platform for the Quantitative Imaging Network. Magn Reson Imaging. 2012;30(9):1323–1341. doi: 10.1016/j.mri.2012.05.001. 18

work page doi:10.1016/j.mri.2012.05.001 2012
[34]

Screening for cancer with PET and PET/CT: potential and limitations

Schöder H, Gönen M. Screening for cancer with PET and PET/CT: potential and limitations. J Nucl Med. 2007 Jan;48 Suppl 1:4S-18S

2007
[35]

Ionising radiation and cardiovascular disease: systematic review and meta-analysis

Little MP, Azizova T v, Richardson DB, et al. Ionising radiation and cardiovascular disease: systematic review and meta-analysis. BMJ. 2023;380:e072924. doi: 10.1136/bmj-2022-072924. 19 Table 1. Organ/tissue dosimetric quantities for ¹⁸F cases. Values are reported as mean ± SD across studies. Organ / Tissue Ḋ (μGy/s) D_scan (mGy) D_inj (mGy) DCF (Gy/(Bq·s...

work page doi:10.1136/bmj-2022-072924 2023