pith. sign in

arxiv: 2606.25108 · v1 · pith:H65GULHRnew · submitted 2026-06-23 · 💻 cs.AI · cs.HC

The Clinician's Veto: Navigating Trust, Liability, and Uncertainty in Autonomous AI Prescribing

Pith reviewed 2026-06-25 22:52 UTC · model grok-4.3

classification 💻 cs.AI cs.HC
keywords autonomous AI prescribingclinician surveyepistemic uncertaintyaleatoric uncertaintyliability allocationconfidence calibrationinferential transparencyAI regulation
0
0 comments X

The pith

Clinicians would block autonomous AI prescribing absent calibrated confidence escalation, uncertainty-type signals, and inferential transparency.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that autonomous AI for medication prescribing needs three minimum architectural features to be acceptable to clinicians: calibrated per-prediction confidence for escalation thresholds, separate communication of epistemic versus aleatoric uncertainty, and inferential transparency at decision time. A survey of 136 U.S. prescribing clinicians found they would not permit autonomy without the confidence mechanism, would favor option summaries for aleatoric uncertainty but abstention for epistemic uncertainty, and would accept added liability only when transparency allows substantive judgment. These requirements would shift the system toward a supervised tool rather than an independent agent. This matters because new laws and pilots are authorizing AI prescribing, and clinician adoption plus proper liability allocation depend on addressing uncertainty and trust.

Core claim

Prescribing clinicians would not permit autonomous prescribing without a calibrated confidence-based escalation mechanism, preferred a competing-options summary when uncertainty was aleatoric but shifted to abstention when uncertainty was epistemic, and were only willing to accept additional liability when inferential transparency enabled a substantive judgment under acknowledged uncertainty. These findings indicate the recommended architectural features would encourage higher rates of clinician adoption by collapsing much of what autonomy conventionally means, turning the system into a heavily supervised decision-support tool.

What carries the argument

Three architectural requirements for safe autonomous prescribing: calibrated confidence-based escalation, differentiated uncertainty communication (epistemic versus aleatoric), and inferential transparency for liability allocation.

If this is right

  • Clinicians would adopt AI prescribing systems at higher rates when the three features are present.
  • The AI would function less as an autonomous agent and more as a heavily supervised decision-support tool.
  • Liability would align with the institutional actors who control system design and deployment.
  • Regulation could constrain the degree of autonomy granted to AI in prescribing while matching liability to control.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • These requirements could be directly tested in ongoing state pilots such as Utah's prescription-renewal program.
  • Similar architectural constraints on uncertainty handling might apply to other high-stakes AI medical decisions beyond prescribing.
  • Approval based solely on aggregate performance metrics could result in low real-world adoption if these features are absent.

Load-bearing premise

The preferences expressed by the 136 surveyed U.S. prescribing clinicians accurately predict real-world adoption behavior, liability tolerance, and safety impact in deployed systems.

What would settle it

A real-world pilot deployment of autonomous AI prescribing that achieves high clinician acceptance and use rates without implementing calibrated confidence escalation, uncertainty-type differentiation, or inferential transparency.

Figures

Figures reproduced from arXiv: 2606.25108 by Adarsh Subbaswamy, Andrew Taylor, Anne Andrews, Chirag Agarwal, Cree Gaskin, Eileanor LaRocco, Sarah Tan.

Figure 1
Figure 1. Figure 1: A Clinician-driven Case for Constraining Autonomous AI in Prescribing. I) [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Distributed self-reported familiarity with generative AI vs. agentic AI. Results show that familiarity skews higher for generative AI but lower and more dispersed for agentic AI. Do clinicians support Autonomous AI prescribing? On average, we found that the support for autonomous AI prescribing was very low across all drug categories. For new prescriptions of both low and high-risk drugs and refills of hig… view at source ↗
Figure 3
Figure 3. Figure 3: Clinician support for autonomous AI prescribing without requiring clinician sign-off by drug category and patient complexity (non-medically complex patients (top), medically complex patients (bottom). Results show that opposition exceeds 83% for all new prescriptions and high-risk refills across both patient types, with the only non-trivial support shown for low-risk refills. 5.2 H1: Calibrated, Action-Gat… view at source ↗
Figure 4
Figure 4. Figure 4: Rated patient safety of agents with calibrated vs. uncalibrated confidence-based escalation mechanisms, each relative to an agent with no escalation at all. Results show clinicians rated the calibrated agent as substantially safer than the uncalibrated agent. 0 5 10 15 20 25 30 Number of respondents Multistakeholder body Regulatory body Reviewing clinician Independent clinical board Health system AI compan… view at source ↗
Figure 5
Figure 5. Figure 5: Preferred party for setting the confidence threshold at which an AI agent escalates to clinician review. Results show that no single party commanded a majority, but organization and oversight-level options combined account for 59.5% of responses, while only 5.3% of clinicians endorsed the AI company to set the threshold. Findings. Clinicians responded primarily that these two scenarios require fundamentall… view at source ↗
Figure 6
Figure 6. Figure 6: Preferred content of the escalated interface by source of uncertainty (aleatoric vs. epistemic). Results show that clinicians preferred to be shown competing options with associated confidence scores when aleatoric uncertainty was high, but when epistemic uncertainty was high, the preference shifted towards wanting the agent to abstain from recommending and present the clinical data only. Scenarios 3 & 4. … view at source ↗
Figure 7
Figure 7. Figure 7: Bar graph of mean responsibility (from 0-5) assigned to each party across four scenarios in the case of an adverse outcome. Results show that organizational parties are assigned more responsibility than individuals in every scenario except when the agent escalates, with patients being assigned the least responsibility throughout despite consent. Positive Neutral Negative 0 10 20 30 Number of responses 39.1… view at source ↗
Figure 8
Figure 8. Figure 8: Sentiment distribution of the 64 substantive open-ended responses to Q18. Results show that sentiment is roughly balanced. Liability. The overriding concern was that the liability for AI mistakes (H3) will unfairly default to the clinician, outweighing any potential efficiency gain. Respondents specifically noted that they are unwilling to use AI if they bear full responsibility, suggesting compensation is… view at source ↗
Figure 9
Figure 9. Figure 9: Q1 0 10 20 30 40 Number of respondents Internal Medicine Surgery Pediatrics Family Medicine/General Practice Neurology Other Anesthesiology Psychiatry Obstetrics and Gynecology Physical Medicine and Rehabilitation Critical Care Emergency Medicine Radiology 26.7% 12.6% 11.1% 9.6% 7.4% 7.4% 6.7% 5.2% 3.0% 3.0% 3.0% 2.2% 2.2% [PITH_FULL_IMAGE:figures/full_fig_p021_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Q2 21 [PITH_FULL_IMAGE:figures/full_fig_p021_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Q3 0 10 20 30 40 Number of respondents 1-5 years 6-10 years 11-15 years 16-20 years > 20 years 30.1% 16.2% 17.6% 10.3% 25.7% [PITH_FULL_IMAGE:figures/full_fig_p022_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Q4 Uses AI Does not use AI 0 25 50 75 100 Number of respondents 86.0% 14.0% [PITH_FULL_IMAGE:figures/full_fig_p022_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Q7 0 20 40 60 80 100 Number of respondents Not at all Very little Somewhat To a great extent 2.2% 13.3% 17.0% 67.4% [PITH_FULL_IMAGE:figures/full_fig_p022_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Q10 0 20 40 60 80 Number of respondents Yes, they require a fundamentally different approach No, I would approach both the same way Yes, but only minor differences Unsure 59.8% 22.7% 9.8% 7.6% [PITH_FULL_IMAGE:figures/full_fig_p022_14.png] view at source ↗
Figure 15
Figure 15. Figure 15 [PITH_FULL_IMAGE:figures/full_fig_p022_15.png] view at source ↗
read the original abstract

Autonomous AI systems are transitioning from advisory to autonomous roles for medication prescriptions. Recent United States bill H.R. 238 and Utah's prescription-renewal pilot both authorize AI to prescribe medications in an agentic capacity. While some regulatory guidelines suggest aggregate model performance metrics for clearance, they do not require i) calibrated per-prediction confidence for action-gated thresholds, ii) differentiated communication of uncertainty arising from model ignorance (epistemic) versus genuine clinical ambiguity (aleatoric), and iii) inferential transparency at the moment of decision that allows for liability allocation. Here, we present a regulatory and technical argument (tested with a survey of 136 U.S. prescribing clinicians) positioning these as minimum architectural requirements for safe autonomous prescribing. Our results suggest prescribing clinicians i) would not permit autonomous prescribing without a calibrated confidence-based escalation mechanism, ii) preferred a competing-options summary when uncertainty was aleatoric but shifted to abstention when uncertainty was epistemic, and iii) were only willing to accept additional liability when inferential transparency enabled a substantive judgment under acknowledged uncertainty. These findings indicate our recommended architectural features would encourage higher rates of clinician adoption, largely through collapsing much of what "autonomy" conventionally means. A system meeting these requirements would function less as an autonomous agent and more as a heavily supervised decision-support tool. As legislation and state pilots proceed, our technical argument backed by clinician perspectives provides opportunities for regulation to constrain the degree of autonomy ethically granted to AI in prescribing while aligning liability with the institutional actors who control system design and deployment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper claims that safe autonomous AI prescribing requires three minimum architectural features—calibrated per-prediction confidence with action-gated thresholds, differentiated handling of aleatoric versus epistemic uncertainty, and inferential transparency for liability allocation—supported by survey responses from 136 U.S. prescribing clinicians. The survey indicates clinicians would reject autonomy without confidence-based escalation, prefer competing-options summaries for aleatoric uncertainty but abstention for epistemic, and accept added liability only with transparency enabling substantive judgment. These features are positioned as regulatory necessities given legislation like H.R. 238 and Utah pilots, with the argument that they would increase adoption by reducing true autonomy.

Significance. If the survey evidence is robust, the work offers clinician-grounded input on AI autonomy limits in a high-stakes domain, directly relevant to ongoing U.S. regulatory developments. It explicitly links technical design choices to liability and trust outcomes, providing a concrete framework that could inform policy. The survey-based testing of the three requirements is a strength, as is the explicit acknowledgment that the resulting systems would function more as supervised tools than fully autonomous agents.

major comments (2)
  1. [Survey methods and results section] The central claims rest on the survey of 136 clinicians, yet the manuscript provides no details on survey design, sampling method, statistical analysis, response rates, vignette construction, or power calculations (see abstract and the section reporting survey results). Without this information, the strength of evidence for the three requirements cannot be evaluated, undermining the regulatory argument.
  2. [Discussion] The extrapolation from vignette-based preferences to real-world adoption, override rates, and liability tolerance under actual patient outcomes is asserted but not tested or externally validated against observed AI tool deployment data or liability records (see discussion of the three findings).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which highlight important areas for improving the transparency and scope of our claims. We address each major comment below and indicate planned revisions.

read point-by-point responses
  1. Referee: [Survey methods and results section] The central claims rest on the survey of 136 clinicians, yet the manuscript provides no details on survey design, sampling method, statistical analysis, response rates, vignette construction, or power calculations (see abstract and the section reporting survey results). Without this information, the strength of evidence for the three requirements cannot be evaluated, undermining the regulatory argument.

    Authors: We agree that the submitted manuscript does not provide sufficient methodological detail on the survey. In the revised version, we will insert a dedicated methods section describing survey design (including how vignettes were constructed from standard prescribing scenarios), sampling method (recruitment through U.S. clinician professional networks and associations), achieved response rate, statistical analyses (preference comparisons and descriptive statistics), and any power considerations. This will allow readers to assess the robustness of the evidence for the three architectural requirements. revision: yes

  2. Referee: [Discussion] The extrapolation from vignette-based preferences to real-world adoption, override rates, and liability tolerance under actual patient outcomes is asserted but not tested or externally validated against observed AI tool deployment data or liability records (see discussion of the three findings).

    Authors: We acknowledge that the discussion extrapolates from vignette-based survey preferences to potential real-world effects on adoption and liability. The survey directly measures clinician preferences under controlled hypothetical conditions, but the manuscript does not include or claim external validation against existing deployment or liability datasets, as autonomous AI prescribing systems remain limited in deployment. We will revise the discussion to frame these as hypothesized implications supported by the survey data, explicitly note the absence of real-world validation, and call for future empirical studies. This maintains the regulatory argument while accurately bounding the evidence. revision: partial

Circularity Check

0 steps flagged

No significant circularity; argument rests on external survey data

full rationale

The paper advances a regulatory argument for three architectural features in autonomous AI prescribing systems, supported by results from a survey of 136 U.S. prescribing clinicians and references to external legislation (H.R. 238 and Utah pilot). No equations, fitted parameters, predictions derived from internal models, or self-citation chains appear in the derivation. Claims are presented as empirical findings from the survey rather than reductions to self-defined inputs or prior author work. The central premise is externally grounded and does not reduce to its own assumptions by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central recommendations rest on the unverified assumption that the clinician survey captures representative views that will translate to deployment settings; no free parameters or new physical entities are introduced.

axioms (1)
  • domain assumption Survey responses from 136 U.S. prescribing clinicians reflect the preferences that would govern real-world adoption and liability decisions for autonomous AI prescribing.
    The paper positions the survey results as evidence that the three architectural features are minimum requirements; this premise is invoked to support the regulatory argument.
invented entities (1)
  • Clinician's Veto no independent evidence
    purpose: Conceptual mechanism allowing clinicians to override or escalate AI prescriptions based on confidence and transparency signals.
    The title and argument introduce this as a framing device for the required features, with no independent evidence provided beyond the survey interpretation.

pith-pipeline@v0.9.1-grok · 5834 in / 1510 out tokens · 33198 ms · 2026-06-25T22:52:25.036700+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

46 extracted references · 28 canonical work pages

  1. [1]

    Schweikert

    David [R-AZ-1 Rep. Schweikert. Text - H.R.238 - 119th Congress (2025-2026): Healthy Technology Act of 2025, January 2025. URL https://www.congress.gov/bill/ 119th-congress/house-bill/238/text. Archive Location: 2025-01-07. 1

  2. [2]

    Michelle M. Mello. Utah’s Experiment With AI-Driven Prescription Renewals.JAMA Health Forum, 7(3):e261001, March 2026. ISSN 2689-0186. doi: 10.1001/jamahealthforum.2026.1001. URLhttps://doi.org/10.1001/jamahealthforum.2026.1001. 2, 3

  3. [3]

    Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?, February 2025

    Yoshua Bengio, Michael Cohen, Damiano Fornasiere, Joumana Ghosn, Pietro Greiner, Matt MacDermott, Sören Mindermann, Adam Oberman, Jesse Richardson, Oliver Richardson, Marc-Antoine Rondeau, Pierre-Luc St-Charles, and David Williams-King. Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?, February 2025. URL http://arxiv.o...

  4. [4]

    Artificial Intelligence-Enabled Device Software Functions: Lifecycle Management and Marketing Submission Recommendations, January 2025

    FDA. Artificial Intelligence-Enabled Device Software Functions: Lifecycle Management and Marketing Submission Recommendations, January 2025. 3

  5. [5]

    FDA Relaxes Clinical Decision Support and General Wellness Guidance: What It Means for Generative AI and Consumer Wearables, January 2026

    Michael Schellhous and Sam Pinson. FDA Relaxes Clinical Decision Support and General Wellness Guidance: What It Means for Generative AI and Consumer Wearables, January 2026. 3

  6. [6]

    Teng, Saawan D

    Clare W. Teng, Saawan D. Patel, Andrew J. Barkmeier, T. Y . Alvin Liu, David Myung, Jeffrey Henderer, James Liu, Eric Hansen, and Lama A. Al-Aswad. Autonomous Artifi- cial Intelligence in Diabetic Retinopathy Testing—Lessons Learned on Successful Health System Adoption.Ophthalmology Science, 6(1):100935, January 2026. ISSN 2666-9145. doi: 10.1016/j.xops.2...

  7. [7]

    Consternation as Congress proposal for autonomous prescribing AI coincides with the haphazard cuts at the FDA.NPJ Digital Medicine, 8:165, March 2025

    Stephen Gilbert, Tinglong Dai, and Rebecca Mathias. Consternation as Congress proposal for autonomous prescribing AI coincides with the haphazard cuts at the FDA.NPJ Digital Medicine, 8:165, March 2025. ISSN 2398-6352. doi: 10.1038/s41746-025-01540-2. URL https://pmc.ncbi.nlm.nih.gov/articles/PMC11920405/. 3

  8. [8]

    Physician Survey on Augmented Intelligence, March

    American Medical Association. Physician Survey on Augmented Intelligence, March

  9. [9]

    URL https://www.ama-assn.org/practice-management/digital-health/ physician-survey-augmented-intelligence. 3, 4

  10. [10]

    Multinational Attitudes Toward AI in Health Care and Diagnostics Among Hospital Patients.JAMA Network Open, 8(6):e2514452, June 2025

    Felix Busch et al. Multinational Attitudes Toward AI in Health Care and Diagnostics Among Hospital Patients.JAMA Network Open, 8(6):e2514452, June 2025. ISSN 2574-3805. doi: 10. 1001/jamanetworkopen.2025.14452. URL https://doi.org/10.1001/jamanetworkopen. 2025.14452. 3, 4

  11. [11]

    Potentiality of algorithms and artificial intelligence adoption to improve medication management in primary care: a systematic review.BMJ Open, 13 (3):e065301, March 2023

    Gianfranco Damiani, Gerardo Altamura, Massimo Zedda, Mario Cesare Nurchis, Giovanni Aulino, Aurora Heidar Alizadeh, Francesca Cazzato, Gabriele Della Morte, Matteo Caputo, Simone Grassi, and Antonio Oliva. Potentiality of algorithms and artificial intelligence adoption to improve medication management in primary care: a systematic review.BMJ Open, 13 (3):...

  12. [12]

    Automation Bias in AI-Decision Support: Results from an Empirical Study.Studies in Health Technology and Informatics, 317:298–304, August 2024

    Florian Kücking, Ursula Hübner, Mareike Przysucha, Niels Hannemann, Jan-Oliver Kutza, Maurice Moelleken, Cornelia Erfurt-Berge, Joachim Dissemond, Birgit Babitsch, and Dorothee Busch. Automation Bias in AI-Decision Support: Results from an Empirical Study.Studies in Health Technology and Informatics, 317:298–304, August 2024. ISSN 1879-8365. doi: 10.3233/...

  13. [13]

    Sungwon Yoon et al. Assessing the Utility, Impact, and Adoption Challenges of an Artificial Intelligence-Enabled Prescription Advisory Tool for Type 2 Diabetes Management: Qualitative Study.JMIR human factors, 11:e50939, June 2024. ISSN 2292-9495. doi: 10.2196/50939. 3

  14. [14]

    Junren Chen, Yigeng Cao, Yahui Feng, Saibing Qi, Donglin Yang, Yu Hu, Aiming Pang, Qiujin Shen, Jieya Luo, Xiaowen Gong, Rongli Zhang, Xiaolin Zhai, Xueqian Li, Wen Yan, Xianjing Zhang, Mengyun Chen, Mingming Niu, Jialin Wei, Chen Liang, Weihua Zhai, Ningning Zhao, Xueou Liu, Sichang Liu, Wangsong Zhai, Ruixin Li, Xianfeng Shao, Dong Zhang, Mingyang Wang,...

  15. [15]

    doi: 10.1038/s41467-025-62926-0

    ISSN 2041-1723. doi: 10.1038/s41467-025-62926-0. URL https://www.nature. com/articles/s41467-025-62926-0. 3

  16. [16]

    Abbott, Tehreem Rehman, Anthony Rosania, Donald L

    Ethan E. Abbott, Tehreem Rehman, Anthony Rosania, Donald L. Lum, Todd B. Taylor, AJ. Kirk, R. Andrew Taylor, Eileen F. Baker, Elaine Rabin, Aasim Padela, Nicholas Genes, Atul Srivastava, Rohit B. Sangal, and Donald Apakama. Understanding and Addressing Bias in Artificial Intelligence Systems: A Primer for the Emergency Medicine Physician.Journal of the Am...

  17. [17]

    Automation Bias in Mammography: The Impact of Artificial Intelligence BI-RADS Suggestions on Reader Performance.Radiology, 307(4):e222176, May 2023

    Thomas Dratsch, Xue Chen, Mohammad Rezazade Mehrizi, Roman Kloeckner, Aline Mähringer-Kunz, Michael Püsken, Bettina Baeßler, Stephanie Sauer, David Maintz, and Daniel 14 The Clinician’s Veto: Navigating Trust, Liability, and Uncertainty in Autonomous AI Prescribing Pinto dos Santos. Automation Bias in Mammography: The Impact of Artificial Intelligence BI-...

  18. [18]

    & Wyatt, J

    Kate Goddard, Abdul Roudsari, and Jeremy C Wyatt. Automation bias: a systematic review of frequency, effect mediators, and mitigators.Journal of the American Medical Informatics Asso- ciation, 19(1):121–127, January 2012. ISSN 1067-5027. doi: 10.1136/amiajnl-2011-000089. URLhttps://doi.org/10.1136/amiajnl-2011-000089

  19. [19]

    Wickens, Benjamin A

    Christopher D. Wickens, Benjamin A. Clegg, Alex Z. Vieane, and Angelia L. Sebok. Com- placency and Automation Bias in the Use of Imperfect Automation.Human Factors, 57 (5):728–739, August 2015. ISSN 0018-7208. doi: 10.1177/0018720815581940. URL https://doi.org/10.1177/0018720815581940

  20. [20]

    Simon, and Joseph S

    Rohan Khera, Melissa A. Simon, and Joseph S. Ross. Automation Bias and Assistive AI: Risk of Harm From AI-Driven Clinical Decision Support.JAMA, 330(23):2255–2257, December

  21. [21]

    doi: 10.1001/jama.2023.22557

    ISSN 0098-7484. doi: 10.1001/jama.2023.22557. URL https://doi.org/10.1001/ jama.2023.22557

  22. [22]

    Automation Bias in Large Language Model–Assisted Diagnostic Reasoning among Physicians Trained in AI Literacy — A Randomized Clinical Trial.NEJM AI, 3(5):AIoa2501001, April 2026

    Ihsan Ayyub Qazi, Ayesha Ali, Asad Ullah Khawaja, Muhammad Junaid Akhtar, Ali Zafar Sheikh, and Muhammad Hamad Alizai. Automation Bias in Large Language Model–Assisted Diagnostic Reasoning among Physicians Trained in AI Literacy — A Randomized Clinical Trial.NEJM AI, 3(5):AIoa2501001, April 2026. doi: 10.1056/AIoa2501001. URL https: //ai.nejm.org/doi/abs/...

  23. [23]

    Big Data & Society3(2), 2053951716679679 (2016) https://doi.org/10.1177/2053951716679679 https://doi.org/10.1177/2053951716679679

    Brent Daniel Mittelstadt, Patrick Allo, Mariarosaria Taddeo, Sandra Wachter, and Lu- ciano Floridi. The ethics of algorithms: Mapping the debate.Big Data & So- ciety, 3(2):2053951716679679, December 2016. ISSN 2053-9517, 2053-9517. doi: 10.1177/2053951716679679. URL https://journals.sagepub.com/doi/10.1177/ 2053951716679679. 3, 4

  24. [24]

    Char, Nigam H

    Danton S. Char, Nigam H. Shah, and David Magnus. Implementing Machine Learning in Health Care — Addressing Ethical Challenges.New England Journal of Medicine, 378(11): 981–983, March 2018. ISSN 0028-4793, 1533-4406. doi: 10.1056/NEJMp1714229. URL http://www.nejm.org/doi/10.1056/NEJMp1714229

  25. [25]

    Machado, Christopher Burr, Josh Cowls, Indra Joshi, Mariarosaria Taddeo, and Luciano Floridi

    Jessica Morley, Caio C.V . Machado, Christopher Burr, Josh Cowls, Indra Joshi, Mariarosaria Taddeo, and Luciano Floridi. The ethics of AI in health care: A mapping review.Social Science & Medicine, September 2020. doi: 10.1016/j.socscimed.2020.113172. URL https: //linkinghub.elsevier.com/retrieve/pii/S0277953620303919. 3

  26. [26]

    On the ethics of algorithmic decision-making in healthcare.Journal of Medical Ethics, 46(3):205–211, March 2020

    Thomas Grote and Philipp Berens. On the ethics of algorithmic decision-making in healthcare.Journal of Medical Ethics, 46(3):205–211, March 2020. ISSN 0306-6800. doi: 10.1136/medethics-2019-105586. URL https://pmc.ncbi.nlm.nih.gov/articles/ PMC7042960/. 4

  27. [27]

    Benjamin Lambert, Florence Forbes, Senan Doyle, Harmonie Dehaene, and Michel Dojat. Trustworthy clinical AI solutions: A unified review of uncertainty quantification in Deep Learning models for medical image analysis.Artificial Intelligence in Medicine, 150:102830, April 2024. ISSN 09333657. doi: 10.1016/j.artmed.2024.102830. URL https://linkinghub. elsev...

  28. [28]

    Quantifying uncertainty in natural language explanations of large language models

    Sree Harsha Tanneru, Chirag Agarwal, and Himabindu Lakkaraju. Quantifying uncertainty in natural language explanations of large language models. InInternational Conference on Artificial Intelligence and Statistics, pages 1072–1080. PMLR, 2024. 4

  29. [29]

    Umang Bhatt, Javier Antorán, Yunfeng Zhang, Q. Vera Liao, Prasanna Sattigeri, Riccardo Fogliato, Gabrielle Melançon, Ranganath Krishnan, Jason Stanley, Omesh Tickoo, Lama Nachman, Rumi Chunara, Madhulika Srikumar, Adrian Weller, and Alice Xiang. Uncertainty as a Form of Transparency: Measuring, Communicating, and Using Uncertainty. InProceedings of the 20...

  30. [30]

    URL https://dl.acm.org/doi/10.1145/3461702

    doi: 10.1145/3461702.3462571. URL https://dl.acm.org/doi/10.1145/3461702. 3462571. 4, 5 15 The Clinician’s Veto: Navigating Trust, Liability, and Uncertainty in Autonomous AI Prescribing

  31. [31]

    Horvitz.Principles of Mixed-Initiative User Interfaces

    Eric Horvitz. Principles of mixed-initiative user interfaces. InProceedings of the SIGCHI conference on Human Factors in Computing Systems, CHI ’99, pages 159–166, New York, NY , USA, May 1999. Association for Computing Machinery. ISBN 978-0-201-48559-2. doi: 10.1145/302979.303030. URLhttps://dl.acm.org/doi/10.1145/302979.303030. 4

  32. [32]

    Benjamin Kompa, Jasper Snoek, and Andrew L. Beam. Second opinion needed: communicating uncertainty in medical machine learning.npj Digital Medicine, 4(1):4, January 2021. ISSN 2398-

  33. [33]

    Second opinion needed: communicatinguncertaintyinmedicalmachinelearning

    doi: 10.1038/s41746-020-00367-3. URL https://www.nature.com/articles/ s41746-020-00367-3. 4

  34. [34]

    Aruna Kumari, Shakeel Ahmed, and Abdulaziz Al- humam

    Parvathaneni Naga Srinivasu, Gorli L. Aruna Kumari, Shakeel Ahmed, and Abdulaziz Al- humam. Exploring Agentic AI in Healthcare: A Study on Its Working Mechanism.Fron- tiers in Medicine, 12, January 2026. ISSN 2296-858X. doi: 10.3389/fmed.2025.1753443. URL https://www.frontiersin.org/journals/medicine/articles/10.3389/fmed. 2025.1753443/full. 4

  35. [35]

    Position Paper: Integrating Explainability and Uncertainty Estimation in Medical AI, September 2025

    Xiuyi Fan. Position Paper: Integrating Explainability and Uncertainty Estimation in Medical AI, September 2025. URL http://arxiv.org/abs/2509.18132. arXiv:2509.18132 [cs] version: 1. 5

  36. [36]

    The Role of Model Confidence on Bias Effects in Measured Uncertainties for Vision-Language Models, 2025

    Xinyi Liu, Weiguang Wang, and Hangfeng He. The Role of Model Confidence on Bias Effects in Measured Uncertainties for Vision-Language Models, 2025. URL https://arxiv.org/ abs/2506.16724. Version Number: 2. 5

  37. [37]

    Stephen C. Hora. Aleatory and epistemic uncertainty in probability elicitation with an example from hazardous waste management.Reliability Engineering & System Safety, 54(2):217–223, November 1996. ISSN 0951-8320. doi: 10.1016/S0951-8320(96)00077-4. URL https: //www.sciencedirect.com/science/article/pii/S0951832096000774. 5

  38. [38]

    Quantifying Aleatoric and Epistemic Uncer- tainty with Proper Scoring Rules, April 2024

    Paul Hofman, Yusuf Sale, and Eyke Hüllermeier. Quantifying Aleatoric and Epistemic Uncer- tainty with Proper Scoring Rules, April 2024. URL http://arxiv.org/abs/2404.12215. arXiv:2404.12215 [cs]. 5

  39. [39]

    Jones, Guoqing Wang, Vivek Yedavalli, and Haris Sair

    Craig K. Jones, Guoqing Wang, Vivek Yedavalli, and Haris Sair. Direct quantification of epistemic and aleatoric uncertainty in 3D U-net segmentation.Journal of Medical Imaging, 9(3):034002, May 2022. ISSN 2329-4302. doi: 10.1117/1.JMI.9.3.034002. URL https: //pmc.ncbi.nlm.nih.gov/articles/PMC9174341/. 5

  40. [40]

    Chan, Maria J

    Matthew A. Chan, Maria J. Molina, and Christopher A. Metzler. Estimating Epistemic and Aleatoric Uncertainty with a Single Model, November 2024. URL http://arxiv.org/abs/ 2402.03478. arXiv:2402.03478 [cs]. 5

  41. [41]

    Regulating AI Agents, March 2026

    Kathrin Gardhouse, Amin Oueslati, and Noam Kolt. Regulating AI Agents, March 2026. URL https://papers.ssrn.com/abstract=6462658. 5, 6

  42. [42]

    Could transparent model cards with layered accessible information drive trust and safety in health AI?NPJ Digital Medicine, 8:124, February 2025

    Stephen Gilbert, Rasmus Adler, Taras Holoyad, and Eva Weicken. Could transparent model cards with layered accessible information drive trust and safety in health AI?NPJ Digital Medicine, 8:124, February 2025. ISSN 2398-6352. doi: 10.1038/s41746-025-01482-9. URL https://pmc.ncbi.nlm.nih.gov/articles/PMC11861263/. 6

  43. [43]

    A validated framework for responsible AI in healthcare autonomous sys- tems.Scientific Reports, 15(1):44432, December 2025

    Turki Alelyani. A validated framework for responsible AI in healthcare autonomous sys- tems.Scientific Reports, 15(1):44432, December 2025. ISSN 2045-2322. doi: 10.1038/ s41598-025-25266-z. URL https://www.nature.com/articles/s41598-025-25266-z . 6

  44. [44]

    Artificial Intelligence Risk Management Framework (AI RMF 1.0)

    Elham Tabassi. Artificial Intelligence Risk Management Framework (AI RMF 1.0). Technical Report NIST AI 100-1, National Institute of Standards and Technology (U.S.), Gaithersburg, MD, January 2023. URL http://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.100-1. pdf. 6

  45. [45]

    Google DeepMind and healthcare in an age of algorithms

    Julia Powles and Hal Hodson. Google DeepMind and healthcare in an age of algorithms. Health and Technology, 7(4):351–367, December 2017. ISSN 2190-7196. doi: 10.1007/ s12553-017-0179-1. URLhttps://doi.org/10.1007/s12553-017-0179-1. 12

  46. [46]

    Dissecting racial bias in an algorithm used to manage the health of populations.Science, 366(6464):447–453, October 2019

    Ziad Obermeyer, Brian Powers, Christine V ogeli, and Sendhil Mullainathan. Dissecting racial bias in an algorithm used to manage the health of populations.Science, 366(6464):447–453, October 2019. doi: 10.1126/science.aax2342. URL https://www.science.org/doi/10. 1126/science.aax2342. 12 16 The Clinician’s Veto: Navigating Trust, Liability, and Uncertainty...