Principled Uncertainty in Clinical AI: End-to-End Bayesian Modelling and Algorithmic Equity Auditing Across Multimodal Patient Data

Dimeji Abdulsobur Olawuyi; Joseph Odamo; Oladimeji Anthonio; Oloruntoba Ajayi; Temiloluwa Aderemi

arxiv: 2606.09789 · v1 · pith:FBM4J7XEnew · submitted 2026-06-08 · 💻 cs.CY

Principled Uncertainty in Clinical AI: End-to-End Bayesian Modelling and Algorithmic Equity Auditing Across Multimodal Patient Data

Oladimeji Anthonio , Dimeji Abdulsobur Olawuyi , Oloruntoba Ajayi , Temiloluwa Aderemi , Joseph Odamo This is my paper

Pith reviewed 2026-06-27 14:42 UTC · model grok-4.3

classification 💻 cs.CY

keywords Bayesian deep learningmultimodal clinical dataepistemic uncertaintyalgorithmic fairnessequity auditingvariational encodersuncertainty calibration

0 comments

The pith

A Bayesian multimodal model shows epistemic uncertainty flags 15.3 percent equity gaps for rural and low-income patients.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds an end-to-end Bayesian deep learning system for multimodal clinical records that outputs separate aleatoric and epistemic uncertainty estimates. It trains the system on 1,000 simulated patients and audits the uncertainty values across facility type, socioeconomic status, age, and sex. The audit finds that epistemic uncertainty is reliably higher for primary/rural facility patients, low socioeconomic status patients, and elderly patients, while no sex difference appears. These results position calibrated epistemic uncertainty as a direct, label-free signal for detecting algorithmic inequity in clinical predictions.

Core claim

The central claim is that a precision-weighted late-fusion Bayesian architecture, trained with a composite loss of binary cross-entropy, KL divergence, and an uncertainty calibration penalty, produces epistemic uncertainty estimates that systematically differ across patient subgroups in simulated multimodal data, with primary/rural patients showing a 15.3 percent uncertainty gap, low socioeconomic status patients a 6.8 percent gap, and elderly patients a 3.9 percent gap.

What carries the argument

Modality-specific variational encoders combined with precision-weighted late fusion and a decomposed uncertainty output head that isolates epistemic uncertainty.

If this is right

Epistemic uncertainty can serve as an automated flag to route predictions from primary or rural facilities for additional human review.
Model retraining or data collection can be prioritized for the subgroups that exhibit the largest epistemic uncertainty gaps.
Calibration penalties in the loss function can be tuned specifically to reduce subgroup differences in uncertainty.
Uncertainty-based auditing can be applied at deployment time without requiring outcome labels for the audited cases.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the simulation faithfully captures real disparities, uncertainty auditing could reduce the need for separate fairness audits that require protected-attribute labels.
The same architecture could be tested on longitudinal patient trajectories to check whether uncertainty gaps widen or narrow over time.
Connecting uncertainty gaps to downstream clinical decisions would show whether high-uncertainty predictions actually lead to different treatment rates.

Load-bearing premise

The generative process that created the 1,000 simulated patient records and their labels accurately reproduces the statistical structure and disparity patterns of real clinical data.

What would settle it

Re-running the equity audit on a real clinical dataset of comparable size and modality structure and finding no statistically significant uncertainty gaps for the same subgroups.

Figures

Figures reproduced from arXiv: 2606.09789 by Dimeji Abdulsobur Olawuyi, Joseph Odamo, Oladimeji Anthonio, Oloruntoba Ajayi, Temiloluwa Aderemi.

**Figure 2.** Figure 2: Reliability Diagram — Model Calibration. Reliability diagram showing mean predicted confidence against actual accuracy across 10 bins. Points near the dashed diagonal indicate well-calibrated predictions. Bin size encoded by colour intensity. ECE = 0.096. 8 [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Latent Uncertainty Distribution by Prediction Outcome. Histogram of fused latent standard deviation (uncertainty) for correctly classified patients (blue, n=257) versus incorrectly classified patients (red, n=43). Higher uncertainty in incorrect predictions confirms expected calibration behaviour. 9 [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 8.** Figure 8: High-Uncertainty Patient Overrepresentation by Facility. Bar chart showing the percentage of each facility subgroup flagged as high-uncertainty (top quartile). Dashed line indicates the population average (25%). Primary/rural patients are overrepresented at 35.7% (+42.8%). 14 [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗

read the original abstract

Clinical artificial intelligence (AI) systems routinely produce predictions without principled quantification of uncertainty, limiting their trustworthiness in high-stakes medical environments. This paper presents an integrated research programme addressing two interconnected problems: (1) the development of a fully end-to-end Bayesian uncertainty modelling framework for multimodal clinical data, and (2) the application of calibrated uncertainty estimates as a formal measure of algorithmic equity across patient subgroups. We construct a probabilistic deep learning architecture comprising modality-specific variational encoders, a precision-weighted late fusion mechanism, and a decomposed uncertainty output head that separates aleatoric from epistemic uncertainty. The system is trained with a composite Bayesian loss incorporating binary cross-entropy, Kullback-Leibler divergence regularisation, and an uncertainty calibration penalty. We evaluate model calibration using Expected Calibration Error (ECE = 0.096) and conduct a subgroup equity audit across facility type, socioeconomic status, age group, and biological sex on a dataset of 1,000 simulated patients. Results demonstrate that epistemic uncertainty systematically identifies underserved populations: primary/rural facility patients show a 15.3% uncertainty equity gap (p < 0.001, effect size = 0.698), low socioeconomic status patients exhibit a 6.8% gap (p < 0.001), and elderly patients show a 3.9% gap (p < 0.001), whilst no significant sex-based disparity is detected. These findings establish that calibrated uncertainty is not merely a technical property of probabilistic models but constitutes an actionable equity signal with direct clinical relevance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper applies standard Bayesian multimodal fusion to equity auditing but all reported gaps rest on an unspecified simulation of 1000 patients.

read the letter

The main thing to know is that this work frames epistemic uncertainty from a Bayesian model as an operational equity metric for clinical AI, with reported gaps of 15.3% for rural facilities, 6.8% for low SES, and 3.9% for elderly patients on their data. The technical setup is competent but the results cannot be read as independent evidence.

The architecture combines modality-specific variational encoders, precision-weighted late fusion, and a decomposed uncertainty head trained with cross-entropy, KL, and a calibration penalty. That produces a clean separation of aleatoric and epistemic uncertainty and they report a reasonable ECE of 0.096. Treating uncertainty itself as the equity signal is a straightforward extension of existing Bayesian deep learning, and the subgroup analysis is presented with p-values and effect sizes.

The load-bearing weakness is the data. All numbers come from a single set of 1,000 simulated patients whose generative process is not described. If the simulation already injects higher noise, missingness, or label uncertainty into rural or low-SES records, then elevated epistemic uncertainty in those groups follows directly from the data-generating assumptions rather than from any property of the model. There is no real patient data, no external cohort, no ablation of the calibration term, and no sensitivity analysis on the simulation parameters.

This is for readers already working on uncertainty quantification in medical ML who want to see it applied to fairness questions. Someone looking for new modeling techniques or reproducible results on real data will not get much. It deserves a serious referee because the technical components are standard and coherent, and the equity framing is worth testing properly; the simulation issue is fixable with transparency and validation experiments.

I would send it to review with a clear request for the simulation details and at least one real dataset check.

Referee Report

3 major / 2 minor

Summary. The paper presents an end-to-end Bayesian deep learning architecture for multimodal clinical data, using modality-specific variational encoders, precision-weighted late fusion, and a decomposed uncertainty head separating aleatoric from epistemic uncertainty. Trained via a composite loss (BCE + KL + calibration penalty), it reports ECE=0.096 on 1,000 simulated patients and claims that epistemic uncertainty identifies underserved subgroups via equity gaps of 15.3% (primary/rural facility, p<0.001, effect size 0.698), 6.8% (low SES), and 3.9% (elderly), with no sex disparity.

Significance. If the simulation were shown to be independent of the model's assumptions and validated externally, the linkage of calibrated epistemic uncertainty to algorithmic equity would constitute a substantive contribution to trustworthy clinical AI. The integrated framework and explicit subgroup audit are positive elements, but the current results do not yet support that assessment.

major comments (3)

[Simulated Dataset] Simulated Dataset section: The generative process for creating the 1,000 patients' multimodal records, labels, and subgroup-specific noise/missingness patterns is unspecified. This is load-bearing for the central claim, because the reported equity gaps (15.3% facility, 6.8% SES, 3.9% age) are computed directly from the model's epistemic uncertainty on data generated under the same modeling assumptions.
[Results] Results section: All quantitative findings (ECE=0.096, p-values, effect sizes) rest on a single simulated dataset with no external validation cohort, no ablation removing the uncertainty calibration penalty, and no error bars or sensitivity analysis on the reported gaps.
[Methods] Methods (composite Bayesian loss): The weight of the uncertainty calibration penalty is a free parameter, yet no sensitivity analysis demonstrates whether the equity gaps persist when this term is varied or removed; the gaps may therefore be an artifact of the loss design rather than an independent property of the Bayesian model.

minor comments (2)

[Abstract] Abstract and Results: The p-values and effect sizes are presented without stating the exact statistical test or correction for multiple comparisons.
Notation: The precision-weighted late fusion mechanism would benefit from an explicit equation defining how modality precisions are combined.

Simulated Author's Rebuttal

3 responses · 2 unresolved

We thank the referee for the constructive and detailed review. The comments highlight important issues of transparency, robustness, and scope that we address below. We outline specific revisions to the manuscript while noting limitations that cannot be resolved within the current study.

read point-by-point responses

Referee: [Simulated Dataset] Simulated Dataset section: The generative process for creating the 1,000 patients' multimodal records, labels, and subgroup-specific noise/missingness patterns is unspecified. This is load-bearing for the central claim, because the reported equity gaps (15.3% facility, 6.8% SES, 3.9% age) are computed directly from the model's epistemic uncertainty on data generated under the same modeling assumptions.

Authors: We agree that the generative process must be fully specified. In the revised manuscript we will expand the Simulated Dataset section with a complete description of the data generation procedure, including the probabilistic models for each modality, the mechanisms for introducing subgroup-specific noise and missingness, and the label generation process. This addition will allow readers to evaluate the degree of independence between the simulation and the modeling assumptions. revision: yes
Referee: [Results] Results section: All quantitative findings (ECE=0.096, p-values, effect sizes) rest on a single simulated dataset with no external validation cohort, no ablation removing the uncertainty calibration penalty, and no error bars or sensitivity analysis on the reported gaps.

Authors: We acknowledge these limitations of the current evaluation. We will add bootstrapped error bars to all reported metrics and gaps, and we will include an ablation that removes the calibration penalty term. External validation on a real clinical cohort is outside the scope of this work, which is designed to demonstrate the framework under controlled conditions; we will state this explicitly as a limitation. revision: partial
Referee: [Methods] Methods (composite Bayesian loss): The weight of the uncertainty calibration penalty is a free parameter, yet no sensitivity analysis demonstrates whether the equity gaps persist when this term is varied or removed; the gaps may therefore be an artifact of the loss design rather than an independent property of the Bayesian model.

Authors: We will perform a sensitivity analysis over a range of weights for the calibration penalty term, including the case where the term is removed entirely. The results of this analysis will be reported in the revised Methods and Results sections to demonstrate whether the equity gaps are robust to this hyperparameter. revision: yes

standing simulated objections not resolved

External validation on a real clinical cohort, as the study uses only simulated data.
Demonstration that the simulation is fully independent of the model's assumptions without conducting additional experiments.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents a Bayesian multimodal architecture (modality-specific variational encoders, precision-weighted fusion, decomposed uncertainty head) trained via a composite loss (BCE + KL + calibration penalty) and evaluates calibration (ECE=0.096) plus subgroup uncertainty gaps on 1,000 simulated patients. No equations, generative-process description, or self-citation chain is provided that reduces the reported equity gaps (15.3% facility, 6.8% SES, 3.9% age) to the model inputs or simulation design by construction. The simulation is treated as an independent testbed; the derivation of the uncertainty model itself does not presuppose the subgroup findings.

Axiom & Free-Parameter Ledger

3 free parameters · 3 axioms · 0 invented entities

The central claim depends on the unstated generative model that produced the 1,000 simulated patients, the assumption that variational inference yields well-calibrated epistemic uncertainty, and the decision to treat uncertainty magnitude itself as a direct equity metric; none of these receive independent empirical support in the abstract.

free parameters (3)

precision weights in late fusion
Learned or hand-tuned scalars that control modality contribution; directly affect the fused representation and downstream uncertainty values.
weight of uncertainty calibration penalty
Hyperparameter in the composite loss that trades off prediction accuracy against calibration; its value shapes the reported ECE and subgroup gaps.
variational posterior parameters
Means and variances of the approximate posteriors in each modality encoder; fitted during training and determine epistemic uncertainty.

axioms (3)

standard math Variational inference produces a faithful approximation to the true posterior over network weights
Invoked by the use of variational encoders without further justification.
domain assumption The simulated patient records and labels preserve the statistical relationships and disparity structure of real multimodal clinical data
Required for the equity gaps measured on the simulation to generalize.
ad hoc to paper Higher epistemic uncertainty is a valid proxy for algorithmic inequity
The paper treats uncertainty magnitude as an equity signal without external validation against clinical outcomes.

pith-pipeline@v0.9.1-grok · 5845 in / 1894 out tokens · 30443 ms · 2026-06-27T14:42:54.556395+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

30 extracted references · 2 canonical work pages

[1]

Dropout as a Bayesian approximation: representing model uncertainty in deep learning

Gal Y, Ghahramani Z. Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: Proceedings of the 33rd International Conference on Machine Learning. PMLR; 2016. p. 1050-1059

2016
[2]

What uncertainties do we need in Bayesian deep learning for computer vision? In: Advances in Neural Information Processing Systems 30 (NIPS 2017)

Kendall A, Gal Y. What uncertainties do we need in Bayesian deep learning for computer vision? In: Advances in Neural Information Processing Systems 30 (NIPS 2017). 2017. p. 5580-5590

2017
[3]

Simple and scalable predictive uncertainty estimation using deep ensembles

Lakshminarayanan B, Pritzel A, Blundell C. Simple and scalable predictive uncertainty estimation using deep ensembles. In: Advances in Neural Informa- tion Processing Systems 30 (NIPS 2017). 2017. p. 6402-6413

2017
[4]

On calibration of modern neural networks

Guo C, Pleiss G, Sun Y, Weinberger KQ. On calibration of modern neural networks. In: Proceedings of the 34th International Conference on Machine Learning. PMLR; 2017. p. 1321-1330

2017
[5]

Deep Bayesian Gaussian processes for uncertainty estimation in electronic health records

Li Y, Rao S, Hassaine A, Ramakrishnan R, Canoy D, Salimi-Khorshidi G, et al. Deep Bayesian Gaussian processes for uncertainty estimation in electronic health records. Scientific Reports. 2021;11(1):20685

2021
[6]

Estimating uncertainty and interpretability in deep learning for coronavirus (COVID-19) detection

Ghoshal B, Tucker A. Estimating uncertainty and interpretability in deep learning for coronavirus (COVID-19) detection. arXiv preprint arXiv:2003.10769. 2020

work page arXiv 2003
[7]

Leveraging uncertainty information from deep neural networks for disease detection

Leibig C, Allken V, Ayhan MS, Berens P, Wahl S. Leveraging uncertainty information from deep neural networks for disease detection. Scientific Reports. 2017;7(1):17816

2017
[8]

Dissecting racial bias in an algorithm used to manage the health of populations

Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019;366(6464):447-453

2019
[9]

Algorithmic fairness in artificial intelligence for medicine and healthcare

Chen RJ, Wang JJ, Williamson DFK, Chen TY, Lipkova J, Lu MY, et al. Algorithmic fairness in artificial intelligence for medicine and healthcare. Nature Biomedical Engineering. 2023;7(6):719-742

2023
[10]

Sources of bias in artificial intelligence that perpetuate healthcare disparities: a global review

Celi LA, Cellini J, Charpignon ML, Dee EC, Dernoncourt F, Eber R, et al. Sources of bias in artificial intelligence that perpetuate healthcare disparities: a global review. PLOS Digital Health. 2022;1(3):e0000022

2022
[11]

Under- diagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations

Seyyed-Kalantari L, Zhang H, McDermott M, Chen IY, Ghassemi M. Under- diagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations. Nature Medicine. 2021;27(12):2176-2182

2021
[12]

Auto-encoding variational Bayes

Kingma DP, Welling M. Auto-encoding variational Bayes. In: Proceedings of the 2nd International Conference on Learning Representations (ICLR). 2014

2014
[13]

MIMIC-IV, a freely accessible electronic health record dataset

Johnson AEW, Bulgarelli L, Shen L, Gayles A, Shammout A, Horng S, et al. MIMIC-IV, a freely accessible electronic health record dataset. Scientific Data. 2023;10(1):1. 17

2023
[14]

PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals

Goldberger AL, Amaral LAN, Glass L, Hausdorff JM, Ivanov PC, Mark RG, et al. PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation. 2000;101(23):e215-e220

2000
[15]

Towards equitable AI in Africa: chal- lenges and opportunities

Afonja T, Sink A, Ige O, Jagun M. Towards equitable AI in Africa: chal- lenges and opportunities. arXiv preprint arXiv:2301.09528. 2023

work page arXiv 2023
[16]

Weight uncertainty in neural networks

Blundell C, Cornebise J, Kavukcuoglu K, Wierstra D. Weight uncertainty in neural networks. In: Proceedings of the 32nd International Conference on Machine Learning. PMLR; 2015. p. 1613-1622

2015
[17]

A review of uncertainty quantification in deep learning: techniques, applications and challenges

Abdar M, Pourpanah F, Hussain S, Rezazadegan D, Liu L, Ghavamzadeh M, et al. A review of uncertainty quantification in deep learning: techniques, applications and challenges. Information Fusion. 2021;76:243-297

2021
[18]

The need for uncertainty quantifica- tion in machine-assisted medical decision making

Begoli E, Bhattacharya T, Kusnezov D. The need for uncertainty quantifica- tion in machine-assisted medical decision making. Nature Machine Intelligence. 2019;1(1):20-23

2019
[19]

AI in health and medicine

Rajpurkar P, Chen E, Banerjee O, Topol EJ. AI in health and medicine. Nature Medicine. 2022;28(1):31-38

2022
[20]

Association between surgical skin markings in dermoscopic images and diagnostic performance of a deep learning convolutional neural network for melanoma recognition

Winkler JK, Fink C, Toberer F, Enk A, Deinlein T, Hofmann-Wellenhof R, et al. Association between surgical skin markings in dermoscopic images and diagnostic performance of a deep learning convolutional neural network for melanoma recognition. JAMA Dermatology. 2019;155(10):1135-1141

2019
[21]

High-performance medicine: the convergence of human and arti- ficial intelligence

Topol EJ. High-performance medicine: the convergence of human and arti- ficial intelligence. Nature Medicine. 2019;25(1):44-56

2019
[22]

Dermatologist-level classification of skin cancer with deep neural networks

Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, et al. Dermatologist-level classification of skin cancer with deep neural networks. Na- ture. 2017;542(7639):115-118

2017
[23]

Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs

Gulshan V, Peng L, Coram M, Stumpe MC, Wu D, Narayanaswamy A, et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA. 2016;316(22):2402- 2410

2016
[24]

Algorithmic encoding of protected characteristics in chest X-ray disease detection models

Glocker B, Jones C, Bernhardt M, Winzeck S. Algorithmic encoding of protected characteristics in chest X-ray disease detection models. eBioMedicine. 2023;89:104467

2023
[25]

Addressing fairness, bias, and appropriate use of artificial intelligence and machine learning in global health

Fletcher RR, Nakeshimana A, Olubeko O. Addressing fairness, bias, and appropriate use of artificial intelligence and machine learning in global health. Frontiers in Artificial Intelligence. 2021;3:561802

2021
[26]

Implementing machine learning in health care: addressing ethical challenges

Char DS, Shah NH, Magnus D. Implementing machine learning in health care: addressing ethical challenges. New England Journal of Medicine. 2018;378(11):981-983. 18

2018
[27]

Counterfactual explanations without opening the black box: automated decisions and the GDPR

Wachter S, Mittelstadt B, Russell C. Counterfactual explanations without opening the black box: automated decisions and the GDPR. Harvard Journal of Law and Technology. 2017;31(2):841-887

2017
[28]

MINIMAR (MINimum Information for Medical AI Reporting): developing reporting stan- dards for artificial intelligence in health care

Hernandez-Boussard T, Bozkurt S, Ioannidis JPA, Shah NH. MINIMAR (MINimum Information for Medical AI Reporting): developing reporting stan- dards for artificial intelligence in health care. Journal of the American Medical Informatics Association. 2020;27(12):2011-2015

2020
[29]

Fairness and Machine Learning: Limi- tations and Opportunities

Barocas S, Hardt M, Narayanan A. Fairness and Machine Learning: Limi- tations and Opportunities. MIT Press; 2023

2023
[30]

Attention is all you need

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. In: Advances in Neural Information Processing Systems 30 (NIPS 2017). 2017. p. 5998-6008. 19

2017

[1] [1]

Dropout as a Bayesian approximation: representing model uncertainty in deep learning

Gal Y, Ghahramani Z. Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: Proceedings of the 33rd International Conference on Machine Learning. PMLR; 2016. p. 1050-1059

2016

[2] [2]

What uncertainties do we need in Bayesian deep learning for computer vision? In: Advances in Neural Information Processing Systems 30 (NIPS 2017)

Kendall A, Gal Y. What uncertainties do we need in Bayesian deep learning for computer vision? In: Advances in Neural Information Processing Systems 30 (NIPS 2017). 2017. p. 5580-5590

2017

[3] [3]

Simple and scalable predictive uncertainty estimation using deep ensembles

Lakshminarayanan B, Pritzel A, Blundell C. Simple and scalable predictive uncertainty estimation using deep ensembles. In: Advances in Neural Informa- tion Processing Systems 30 (NIPS 2017). 2017. p. 6402-6413

2017

[4] [4]

On calibration of modern neural networks

Guo C, Pleiss G, Sun Y, Weinberger KQ. On calibration of modern neural networks. In: Proceedings of the 34th International Conference on Machine Learning. PMLR; 2017. p. 1321-1330

2017

[5] [5]

Deep Bayesian Gaussian processes for uncertainty estimation in electronic health records

Li Y, Rao S, Hassaine A, Ramakrishnan R, Canoy D, Salimi-Khorshidi G, et al. Deep Bayesian Gaussian processes for uncertainty estimation in electronic health records. Scientific Reports. 2021;11(1):20685

2021

[6] [6]

Estimating uncertainty and interpretability in deep learning for coronavirus (COVID-19) detection

Ghoshal B, Tucker A. Estimating uncertainty and interpretability in deep learning for coronavirus (COVID-19) detection. arXiv preprint arXiv:2003.10769. 2020

work page arXiv 2003

[7] [7]

Leveraging uncertainty information from deep neural networks for disease detection

Leibig C, Allken V, Ayhan MS, Berens P, Wahl S. Leveraging uncertainty information from deep neural networks for disease detection. Scientific Reports. 2017;7(1):17816

2017

[8] [8]

Dissecting racial bias in an algorithm used to manage the health of populations

Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019;366(6464):447-453

2019

[9] [9]

Algorithmic fairness in artificial intelligence for medicine and healthcare

Chen RJ, Wang JJ, Williamson DFK, Chen TY, Lipkova J, Lu MY, et al. Algorithmic fairness in artificial intelligence for medicine and healthcare. Nature Biomedical Engineering. 2023;7(6):719-742

2023

[10] [10]

Sources of bias in artificial intelligence that perpetuate healthcare disparities: a global review

Celi LA, Cellini J, Charpignon ML, Dee EC, Dernoncourt F, Eber R, et al. Sources of bias in artificial intelligence that perpetuate healthcare disparities: a global review. PLOS Digital Health. 2022;1(3):e0000022

2022

[11] [11]

Under- diagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations

Seyyed-Kalantari L, Zhang H, McDermott M, Chen IY, Ghassemi M. Under- diagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations. Nature Medicine. 2021;27(12):2176-2182

2021

[12] [12]

Auto-encoding variational Bayes

Kingma DP, Welling M. Auto-encoding variational Bayes. In: Proceedings of the 2nd International Conference on Learning Representations (ICLR). 2014

2014

[13] [13]

MIMIC-IV, a freely accessible electronic health record dataset

Johnson AEW, Bulgarelli L, Shen L, Gayles A, Shammout A, Horng S, et al. MIMIC-IV, a freely accessible electronic health record dataset. Scientific Data. 2023;10(1):1. 17

2023

[14] [14]

PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals

Goldberger AL, Amaral LAN, Glass L, Hausdorff JM, Ivanov PC, Mark RG, et al. PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation. 2000;101(23):e215-e220

2000

[15] [15]

Towards equitable AI in Africa: chal- lenges and opportunities

Afonja T, Sink A, Ige O, Jagun M. Towards equitable AI in Africa: chal- lenges and opportunities. arXiv preprint arXiv:2301.09528. 2023

work page arXiv 2023

[16] [16]

Weight uncertainty in neural networks

Blundell C, Cornebise J, Kavukcuoglu K, Wierstra D. Weight uncertainty in neural networks. In: Proceedings of the 32nd International Conference on Machine Learning. PMLR; 2015. p. 1613-1622

2015

[17] [17]

A review of uncertainty quantification in deep learning: techniques, applications and challenges

Abdar M, Pourpanah F, Hussain S, Rezazadegan D, Liu L, Ghavamzadeh M, et al. A review of uncertainty quantification in deep learning: techniques, applications and challenges. Information Fusion. 2021;76:243-297

2021

[18] [18]

The need for uncertainty quantifica- tion in machine-assisted medical decision making

Begoli E, Bhattacharya T, Kusnezov D. The need for uncertainty quantifica- tion in machine-assisted medical decision making. Nature Machine Intelligence. 2019;1(1):20-23

2019

[19] [19]

AI in health and medicine

Rajpurkar P, Chen E, Banerjee O, Topol EJ. AI in health and medicine. Nature Medicine. 2022;28(1):31-38

2022

[20] [20]

Association between surgical skin markings in dermoscopic images and diagnostic performance of a deep learning convolutional neural network for melanoma recognition

Winkler JK, Fink C, Toberer F, Enk A, Deinlein T, Hofmann-Wellenhof R, et al. Association between surgical skin markings in dermoscopic images and diagnostic performance of a deep learning convolutional neural network for melanoma recognition. JAMA Dermatology. 2019;155(10):1135-1141

2019

[21] [21]

High-performance medicine: the convergence of human and arti- ficial intelligence

Topol EJ. High-performance medicine: the convergence of human and arti- ficial intelligence. Nature Medicine. 2019;25(1):44-56

2019

[22] [22]

Dermatologist-level classification of skin cancer with deep neural networks

Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, et al. Dermatologist-level classification of skin cancer with deep neural networks. Na- ture. 2017;542(7639):115-118

2017

[23] [23]

Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs

Gulshan V, Peng L, Coram M, Stumpe MC, Wu D, Narayanaswamy A, et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA. 2016;316(22):2402- 2410

2016

[24] [24]

Algorithmic encoding of protected characteristics in chest X-ray disease detection models

Glocker B, Jones C, Bernhardt M, Winzeck S. Algorithmic encoding of protected characteristics in chest X-ray disease detection models. eBioMedicine. 2023;89:104467

2023

[25] [25]

Addressing fairness, bias, and appropriate use of artificial intelligence and machine learning in global health

Fletcher RR, Nakeshimana A, Olubeko O. Addressing fairness, bias, and appropriate use of artificial intelligence and machine learning in global health. Frontiers in Artificial Intelligence. 2021;3:561802

2021

[26] [26]

Implementing machine learning in health care: addressing ethical challenges

Char DS, Shah NH, Magnus D. Implementing machine learning in health care: addressing ethical challenges. New England Journal of Medicine. 2018;378(11):981-983. 18

2018

[27] [27]

Counterfactual explanations without opening the black box: automated decisions and the GDPR

Wachter S, Mittelstadt B, Russell C. Counterfactual explanations without opening the black box: automated decisions and the GDPR. Harvard Journal of Law and Technology. 2017;31(2):841-887

2017

[28] [28]

MINIMAR (MINimum Information for Medical AI Reporting): developing reporting stan- dards for artificial intelligence in health care

Hernandez-Boussard T, Bozkurt S, Ioannidis JPA, Shah NH. MINIMAR (MINimum Information for Medical AI Reporting): developing reporting stan- dards for artificial intelligence in health care. Journal of the American Medical Informatics Association. 2020;27(12):2011-2015

2020

[29] [29]

Fairness and Machine Learning: Limi- tations and Opportunities

Barocas S, Hardt M, Narayanan A. Fairness and Machine Learning: Limi- tations and Opportunities. MIT Press; 2023

2023

[30] [30]

Attention is all you need

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. In: Advances in Neural Information Processing Systems 30 (NIPS 2017). 2017. p. 5998-6008. 19

2017