pith. machine review for the scientific record. sign in

arxiv: 2604.06280 · v1 · submitted 2026-04-07 · ⚛️ physics.med-ph · cs.AI

Recognition: no theorem link

DosimeTron: Automating Personalized Monte Carlo Radiation Dosimetry in PET/CT with Agentic AI

Antonios Tzortzakakis, Eleftherios Tzanis, Michail E. Klontzas

Pith reviewed 2026-05-10 18:55 UTC · model grok-4.3

classification ⚛️ physics.med-ph cs.AI
keywords agentic AIMonte Carlo dosimetryPET/CTpersonalized radiation dosimetryautomationnuclear medicinedosimetric validationPSMA-PET
0
0 comments X

The pith

An agentic AI system automates the full pipeline of patient-specific Monte Carlo radiation dosimetry from PET/CT data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces DosimeTron, an agentic AI that uses a large language model to direct specialized tools for extracting DICOM information, preparing images, running Monte Carlo simulations, segmenting organs, and generating dose reports through ordinary language instructions. Evaluated on 597 PSMA-PET/CT studies from 378 patients, the system completed every task without execution failures or incorrect outputs across varied single-turn and multi-turn prompts. Dose calculations matched those from the established OpenDose3D tool with median Pearson correlation of 0.997 and mean differences below 5 percent for most organs, while taking an average of 32 minutes per study.

Core claim

DosimeTron autonomously executed complex dosimetry pipelines across diverse prompt configurations and achieved high dosimetric agreement with OpenDose3D at clinically acceptable processing times, demonstrating the feasibility of agentic AI for patient-specific Monte Carlo dosimetry in PET/CT.

What carries the argument

The agentic AI framework that uses GPT-5.2 as reasoning engine together with 23 tools exposed via four Model Context Protocol servers to handle DICOM metadata extraction, image preprocessing, Monte Carlo simulation, organ segmentation, and dosimetric reporting through natural-language interaction.

Load-bearing premise

The system and its tools will continue to perform with zero errors and matching accuracy when applied to live clinical data and arbitrary user prompts outside the controlled retrospective dataset.

What would settle it

Execution failures, pipeline errors, or mean absolute percentage differences above 5 percent for most organs when DosimeTron is run on a fresh collection of patient PET/CT scans from additional scanner models or with live hospital data.

read the original abstract

Purpose: To develop and evaluate DosimeTron, an agentic AI system for automated patient-specific MC internal radiation dosimetry in PET/CT examinations. Materials and Methods: In this retrospective study, DosimeTron was evaluated on a publicly available PSMA-PET/CT dataset comprising 597 studies from 378 male patients acquired on three scanner models (18-F, n = 369; 68-Ga, n = 228). The system uses GPT-5.2 as its reasoning engine and 23 tools exposed via four Model Context Protocol servers, automating DICOM metadata extraction, image preprocessing, MC simulation, organ segmentation, and dosimetric reporting through natural-language interaction. Agentic performance was assessed using diverse prompt templates spanning single-turn instructions of varying specificity and multi-turn conversational exchanges, monitored via OpenTelemetry traces. Dosimetric accuracy was validated against OpenDose3D across 114 cases and 22 organs using Pearson's r, Lin's concordance correlation coefficient (CCC), and Bland-Altman analysis. Results: Across all prompt templates and all runs, no execution failures, pipeline errors, or hallucinated outputs were observed. Pearson's r ranged from 0.965 to 1.000 (median 0.997; all p < 0.001) and CCC from 0.963 to 1.000 (median 0.996). Mean absolute percentage difference was below 5% for 19 of 22 organs (median 2.5%). Total per-study processing time (SD) was 32.3 (6.0) minutes. Conclusion: DosimeTron autonomously executed complex dosimetry pipelines across diverse prompt configurations and achieved high dosimetric agreement with OpenDose3D at clinically acceptable processing times, demonstrating the feasibility of agentic AI for patient-specific Monte Carlo dosimetry in PET/CT.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents DosimeTron, an agentic AI system using GPT-5.2 as the reasoning engine and 23 tools via four Model Context Protocol servers to automate the full pipeline of DICOM handling, image preprocessing, organ segmentation, Monte Carlo simulation, and dosimetric reporting for patient-specific internal radiation dosimetry in PET/CT. In a retrospective evaluation on a public PSMA-PET/CT dataset of 597 studies, the system recorded zero execution failures or hallucinations across single- and multi-turn prompt templates; dosimetric outputs on 114 cases and 22 organs showed Pearson r from 0.965 to 1.000 (median 0.997) and CCC from 0.963 to 1.000 (median 0.996) versus OpenDose3D, with MAPE below 5% for 19 organs and mean processing time of 32.3 minutes.

Significance. If the reported zero-failure rate and high concordance metrics hold under broader conditions, the work establishes technical feasibility for agentic AI to autonomously manage complex, multi-step Monte Carlo dosimetry workflows on sizable public datasets. The independent validation against OpenDose3D, absence of observed pipeline errors, and clinically plausible processing times constitute concrete strengths that could support future automation efforts in personalized radiation dosimetry.

major comments (2)
  1. [Results] Results section: dosimetric agreement is reported only on a subset of 114 cases drawn from the 597-study cohort; without explicit selection criteria or analysis of whether these cases are representative with respect to scanner model, tracer (18-F vs 68-Ga), or organ coverage, it is unclear whether the median r = 0.997 and CCC = 0.996 generalize to the full dataset or to more challenging anatomies.
  2. [Conclusion] Conclusion: the claim that DosimeTron demonstrates feasibility of agentic AI for patient-specific Monte Carlo dosimetry rests on retrospective, pre-curated data and scripted prompts. No evidence is provided that the same zero-error performance and <5% MAPE persist under live scanner variability, motion artifacts, atypical anatomy, or open-ended clinician prompts that could trigger tool mis-calls or segmentation drift; this gap directly limits the strength of the clinical-feasibility assertion.
minor comments (2)
  1. [Materials and Methods] Materials and Methods: the description of the 23 tools and four MCP servers is high-level; expanding the list of tool functions (especially DICOM metadata extraction and MC parameter handling) would improve reproducibility.
  2. The manuscript would benefit from an explicit limitations paragraph addressing the retrospective design, controlled prompt conditions, and absence of prospective or multi-center testing.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and constructive review. The comments highlight important aspects of generalizability and scope that we address below with proposed revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Results] Results section: dosimetric agreement is reported only on a subset of 114 cases drawn from the 597-study cohort; without explicit selection criteria or analysis of whether these cases are representative with respect to scanner model, tracer (18-F vs 68-Ga), or organ coverage, it is unclear whether the median r = 0.997 and CCC = 0.996 generalize to the full dataset or to more challenging anatomies.

    Authors: We agree that the manuscript did not explicitly describe the selection criteria for the 114-case validation subset. The subset was obtained via stratified random sampling to maintain proportional representation of the three scanner models and both tracers (18-F, n=369; 68-Ga, n=228) while ensuring at least one case per major organ segmentation category present in the full cohort. In the revised manuscript we will insert a dedicated paragraph in the Materials and Methods section detailing the sampling procedure, together with a supplementary table comparing scanner distribution, tracer type, patient age/weight, and organ coverage statistics between the 114 cases and the full 597 studies. This addition will allow readers to evaluate representativeness directly. revision: yes

  2. Referee: [Conclusion] Conclusion: the claim that DosimeTron demonstrates feasibility of agentic AI for patient-specific Monte Carlo dosimetry rests on retrospective, pre-curated data and scripted prompts. No evidence is provided that the same zero-error performance and <5% MAPE persist under live scanner variability, motion artifacts, atypical anatomy, or open-ended clinician prompts that could trigger tool mis-calls or segmentation drift; this gap directly limits the strength of the clinical-feasibility assertion.

    Authors: We concur that the present study is confined to retrospective, pre-curated data and structured prompts, and therefore cannot furnish direct evidence of performance under live clinical conditions. The reported zero-failure rate and dosimetric concordance establish technical feasibility within this controlled retrospective setting, which constitutes a necessary first step. In revision we will (i) rephrase the Conclusion to state that the work demonstrates feasibility for retrospective automated Monte Carlo dosimetry pipelines and (ii) add a new Limitations paragraph that explicitly enumerates the untested scenarios (scanner variability, motion, atypical anatomy, open-ended prompts) and calls for prospective validation studies. These changes will align the strength of the claims with the evidence provided. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical validation against external reference

full rationale

The paper describes an agentic AI dosimetry pipeline and reports its performance on a public retrospective PSMA-PET/CT dataset of 597 studies. Dosimetric outputs are compared directly to the independent external tool OpenDose3D using Pearson r, CCC, and MAPE on 114 cases; no equations, parameters, or predictions are fitted to the target metrics or defined in terms of themselves. All load-bearing claims (zero execution failures, r/CCC >0.96, <5% MAPE, 32.3 min runtime) are statistical summaries of observed runs under controlled conditions, not derivations that reduce to the inputs by construction. Self-citations are absent from the central evaluation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an applied systems paper that introduces and validates a software pipeline rather than deriving new physical laws or constants. No free parameters are fitted to produce the central performance claims, no domain axioms beyond standard Monte Carlo radiation transport are invoked, and no new physical entities are postulated.

pith-pipeline@v0.9.0 · 5659 in / 1305 out tokens · 53246 ms · 2026-05-10T18:55:16.018335+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

35 extracted references · 29 canonical work pages

  1. [1]

    Total-Body PET/CT: Current Applications and Future Perspectives

    Tan H, Gu Y, Yu H, et al. Total-Body PET/CT: Current Applications and Future Perspectives. American Journal of Roentgenology. 2020;215(2):325–337. doi: 10.2214/AJR.19.22705

  2. [2]

    Radiotheranostics: a roadmap for future development

    Herrmann K, Schwaiger M, Lewis JS, et al. Radiotheranostics: a roadmap for future development. Lancet Oncol. 2020;21(3):e146–e156. doi: 10.1016/S1470-2045(19)30821-6

  3. [3]

    MIRD Pamphlet No

    Bolch WE, Eckerman KF, Sgouros G, Thomas SR. MIRD Pamphlet No. 21: A Generalized Schema for Radiopharmaceutical Dosimetry—Standardization of Nomenclature. Journal of Nuclear Medicine. 2009;50(3):477–484. doi: 10.2967/jnumed.108.056036

  4. [4]

    OLINDA/EXM 2—The Next-generation Personal Computer Software for Internal Dose Assessment in Nuclear Medicine

    Stabin MG. OLINDA/EXM 2—The Next-generation Personal Computer Software for Internal Dose Assessment in Nuclear Medicine. Health Phys. 2023;124(5):397–406. doi: 10.1097/HP.0000000000001682

  5. [5]

    Uncertainties in Internal Dose Calculations for Radiopharmaceuticals

    Stabin MG. Uncertainties in Internal Dose Calculations for Radiopharmaceuticals. Journal of Nuclear Medicine. 2008;49(5):853–860. doi: 10.2967/jnumed.107.048132

  6. [6]

    Personalised Dosimetry in Nuclear Medicine: Bridging Physics, Biology and AI for Next Generation Radiopharmaceutical Therapy

    Shanmugiah J, Kim JS. Personalised Dosimetry in Nuclear Medicine: Bridging Physics, Biology and AI for Next Generation Radiopharmaceutical Therapy. Nucl Med Mol Imaging. 2026; doi: 10.1007/s13139-026-00988-8

  7. [7]

    Advanced Monte Carlo simulations of emission tomography imaging systems with GATE

    Sarrut D, Bała M, Bardiès M, et al. Advanced Monte Carlo simulations of emission tomography imaging systems with GATE. Phys Med Biol. 2021;66(10):10TR03. doi: 10.1088/1361-6560/abf276

  8. [8]

    Empowering AI data scientists using a multi-agent LLM framework with self-evolving capabilities for autonomous, tool-aware biomedical data analyses

    Bu D, Sun J, Li K, et al. Empowering AI data scientists using a multi-agent LLM framework with self-evolving capabilities for autonomous, tool-aware biomedical data analyses. Nat Biomed Eng. 2026; doi: 10.1038/s41551-026-01634-6. 15

  9. [9]

    mAIstro: An open-source multi-agent system for automated end-to-end development of radiomics and deep learning models for medical imaging

    Tzanis E, Klontzas ME. mAIstro: An open-source multi-agent system for automated end-to-end development of radiomics and deep learning models for medical imaging. European Journal of Radiology Artificial Intelligence. 2025;4:100044. doi: 10.1016/j.ejrai.2025.100044

  10. [10]

    Agentic systems in radiology: Principles, opportunities, privacy risks, regulation, and sustainability concerns

    Tzanis E, Adams LC, Akinci D’Antonoli T, et al. Agentic systems in radiology: Principles, opportunities, privacy risks, regulation, and sustainability concerns. Diagn Interv Imaging. 2026;107(1):7–16. doi: 10.1016/j.diii.2025.10.002

  11. [11]

    Klontzas

    Tzanis E, Klontzas ME. Klontzas. ReclAIm: A multi-agent framework for degradation-aware performance tuning of medical imaging AI. arXiv:251017004. 2025. doi: 10.48550/arXiv.2510.17004

  12. [12]

    A whole-body PSMA-PET/CT dataset with manually annotated tumor lesions (Version 2) [dataset]

    Jeblick, K., Schachtner, B., Mittermeier, A., et al. A whole-body PSMA-PET/CT dataset with manually annotated tumor lesions (Version 2) [dataset]. The Cancer Imaging Archive 2026. doi: 10.7937/r7ep-3×37

  13. [13]

    Checklist for Artificial Intelligence in Medical Imaging (CLAIM): 2024 Update

    Tejani AS, Klontzas ME, Gatti AA, et al. Checklist for Artificial Intelligence in Medical Imaging (CLAIM): 2024 Update. Radiol Artif Intell. 2024;6(4). doi: 10.1148/ryai.240300

  14. [14]

    https://www.anthropic.com/news/model-context-protocol

    Introducing the Model Context Protocol. https://www.anthropic.com/news/model-context-protocol

  15. [15]

    Medical image interpolation based on 3D Lanczos filtering

    Moraes T, Amorim P, da Silva JV, Pedrini H. Medical image interpolation based on 3D Lanczos filtering. Comput Methods Biomech Biomed Eng Imaging Vis. 2020;8(3):294–300. doi: 10.1080/21681163.2019.1683469

  16. [16]

    A review of the use and potential of the GATE Monte Carlo simulation code for radiation therapy and dosimetry applications

    Sarrut D, Bardiès M, Boussion N, et al. A review of the use and potential of the GATE Monte Carlo simulation code for radiation therapy and dosimetry applications. Med Phys. 2014;41(6Part1):064301. doi: 10.1118/1.4871617 16

  17. [17]

    https://geant4.web.cern.ch/documentation/pipelines/master/plg_html/PhysicsListGuide/electromagnetic/O pt4.html

    Geant4 - Electromagnetic physics constructors. https://geant4.web.cern.ch/documentation/pipelines/master/plg_html/PhysicsListGuide/electromagnetic/O pt4.html. Accessed 6/2/2026

  18. [18]

    TotalSegmentator: Robust Segmentation of 104 Anatomic Structures in CT Images

    Wasserthal J, Breit H-C, Meyer MT, et al. TotalSegmentator: Robust Segmentation of 104 Anatomic Structures in CT Images. Radiol Artif Intell. 2023;5(5). doi: 10.1148/ryai.230024

  19. [19]

    https://opentelemetry.io/

    OpenTelemetry: The open standard for telemetry. https://opentelemetry.io/. Accessed 6/2/2026

  20. [20]

    https://github.com/Arize- ai/openinference

    OpenInference: Semantic conventions for LLM observability. https://github.com/Arize- ai/openinference. Accessed 6/2/2026

  21. [21]

    https://phoenix.arize.com/

    Arize Phoenix: Open-source LLM tracing and evaluation. https://phoenix.arize.com/. Accessed 6/2/2026

  22. [22]

    OpenDose3D: A Free, Open-Source Clinical Dosimetry Software for Patient-Specific Dosimetry

    Fragoso-Negrín J-A, Vergara-Gil A, Rahman Hakim A, et al. OpenDose3D: A Free, Open-Source Clinical Dosimetry Software for Patient-Specific Dosimetry. Journal of Nuclear Medicine. 2025;jnumed.125.269539. doi: 10.2967/jnumed.125.269539

  23. [23]

    A Concordance Correlation Coefficient to Evaluate Reproducibility

    Lin LI-K. A Concordance Correlation Coefficient to Evaluate Reproducibility. Biometrics. 1989;45(1):255. doi: 10.2307/2532051

  24. [24]

    Measurement in Medicine: The Analysis of Method Comparison Studies

    Altman DG, Bland JM. Measurement in Medicine: The Analysis of Method Comparison Studies. The Statistician. 1983;32(3):307. doi: 10.2307/2987937

  25. [25]

    Radiation Dosimetry and Biodistribution of 68 Ga-FAPI-46 PET Imaging in Cancer Patients

    Meyer C, Dahlbom M, Lindner T, et al. Radiation Dosimetry and Biodistribution of 68 Ga-FAPI-46 PET Imaging in Cancer Patients. Journal of Nuclear Medicine. 2020;61(8):1171–1177. doi: 10.2967/jnumed.119.236786. 17

  26. [26]

    Radiation dosimetry of 18F-FDG PET/CT: incorporating exam-specific parameters in dose estimates

    Quinn B, Dauer Z, Pandit-Taskar N, Schoder H, Dauer LT. Radiation dosimetry of 18F-FDG PET/CT: incorporating exam-specific parameters in dose estimates. BMC Med Imaging. 2016;16(1):41. doi: 10.1186/s12880-016-0143-y

  27. [27]

    F-18 labelled PSMA-1007: biodistribution, radiation dosimetry and histopathological validation of tumor lesions in prostate cancer patients

    Giesel FL, Hadaschik B, Cardinale J, et al. F-18 labelled PSMA-1007: biodistribution, radiation dosimetry and histopathological validation of tumor lesions in prostate cancer patients. Eur J Nucl Med Mol Imaging. 2017;44(4):678–688. doi: 10.1007/s00259-016-3573-4

  28. [28]

    Radiation Dosimetry of Whole-Body Dual-Tracer 18 F-FDG and 11 C-Acetate PET/CT for Hepatocellular Carcinoma

    Liu D, Khong P-L, Gao Y, et al. Radiation Dosimetry of Whole-Body Dual-Tracer 18 F-FDG and 11 C-Acetate PET/CT for Hepatocellular Carcinoma. Journal of Nuclear Medicine. 2016;57(6):907–912. doi: 10.2967/jnumed.115.165944

  29. [29]

    FAPI-74 PET/CT Using Either 18 F-AlF or Cold-Kit 68 Ga Labeling: Biodistribution, Radiation Dosimetry, and Tumor Delineation in Lung Cancer Patients

    Giesel FL, Adeberg S, Syed M, et al. FAPI-74 PET/CT Using Either 18 F-AlF or Cold-Kit 68 Ga Labeling: Biodistribution, Radiation Dosimetry, and Tumor Delineation in Lung Cancer Patients. Journal of Nuclear Medicine. 2021;62(2):201–207. doi: 10.2967/jnumed.120.245084

  30. [30]

    An internal radiation dosimetry computer program, IDAC 2.0, for estimation of patient doses from radiopharmaceuticals

    Andersson M, Johansson L, Minarik D, Mattsson S, Leide-Svegborn S. An internal radiation dosimetry computer program, IDAC 2.0, for estimation of patient doses from radiopharmaceuticals. Radiat Prot Dosimetry. 2014;162(3):299–305. doi: 10.1093/rpd/nct337

  31. [31]

    MIRD Pamphlet No

    Kesner AL, Carter LM, Ramos JCO, et al. MIRD Pamphlet No. 28, Part 1: MIRDcalc—A Software Tool for Medical Internal Radiation Dosimetry. Journal of Nuclear Medicine. 2023;64(7):1117–1124. doi: 10.2967/jnumed.122.264225

  32. [32]

    MCNPTM Version 5

    Forster RA, Cox LJ, Barrett RF, et al. MCNPTM Version 5. Nucl Instrum Methods Phys Res B. 2004;213:82–86. doi: 10.1016/S0168-583X(03)01538-6

  33. [33]

    3D Slicer as an image computing platform for the Quantitative Imaging Network

    Fedorov A, Beichel R, Kalpathy-Cramer J, et al. 3D Slicer as an image computing platform for the Quantitative Imaging Network. Magn Reson Imaging. 2012;30(9):1323–1341. doi: 10.1016/j.mri.2012.05.001. 18

  34. [34]

    Screening for cancer with PET and PET/CT: potential and limitations

    Schöder H, Gönen M. Screening for cancer with PET and PET/CT: potential and limitations. J Nucl Med. 2007 Jan;48 Suppl 1:4S-18S

  35. [35]

    Ionising radiation and cardiovascular disease: systematic review and meta-analysis

    Little MP, Azizova T v, Richardson DB, et al. Ionising radiation and cardiovascular disease: systematic review and meta-analysis. BMJ. 2023;380:e072924. doi: 10.1136/bmj-2022-072924. 19 Table 1. Organ/tissue dosimetric quantities for ¹⁸F cases. Values are reported as mean ± SD across studies. Organ / Tissue Ḋ (μGy/s) D_scan (mGy) D_inj (mGy) DCF (Gy/(Bq·s...