arxiv: 2604.09656 · v1 · submitted 2026-03-30 · 💻 cs.LG · cs.AI· stat.AP· stat.ME

Recognition: 2 theorem links

· Lean Theorem

Fairboard: a quantitative framework for equity assessment of healthcare models

James K. Ruffle , Samia Mohinta , Chris Foulon , Mohamad Zeina , Zicheng Wang , Sebastian Brandner , Harpreet Hyare , Parashkev Nachev

Authors on Pith no claims yet

Pith reviewed 2026-05-14 21:04 UTC · model grok-4.3

classification 💻 cs.LG cs.AIstat.APstat.ME

keywords equity assessmentbrain tumor segmentationAI fairnessgliomamodel performance variancepatient subgroupsspatial bias analysisFairboard dashboard

0 comments

The pith

Patient identity explains more variance in brain tumor segmentation accuracy than model architecture or choice.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Fairboard, a framework that measures equity by testing 18 open-source brain tumor segmentation models on 648 glioma patients from two datasets. It shows through univariate, Bayesian multivariate, spatial, and high-dimensional analyses that who the patient is—specifically factors like molecular diagnosis, tumor grade, and extent of resection—accounts for more performance differences than which algorithm is applied. Spatial mapping reveals consistent, compartment-specific biases across models, while performance clusters in the space of lesion and demographic features point to patient-level axes of vulnerability. Newer models trend toward better equity but none deliver formal fairness guarantees. The work releases a no-code dashboard to make routine equity monitoring accessible for medical imaging AI.

Core claim

Across 11,664 model inferences, patient identity consistently accounts for greater performance variance than model choice, with clinical variables including molecular diagnosis, tumor grade, and extent of resection emerging as stronger predictors of segmentation accuracy than architecture; voxel-wise meta-analysis shows localized neuroanatomical biases that are compartment-specific yet often shared across models, and high-dimensional clustering of lesion masks with clinic-demographic features identifies patient feature axes along which models are systematically vulnerable.

What carries the argument

Fairboard equity assessment framework, which combines univariate statistics, Bayesian multivariate modeling, voxel-wise spatial meta-analysis, and latent-space clustering of lesion masks with clinic-demographic features to quantify how patient subgroups affect segmentation performance.

If this is right

Newer segmentation models achieve greater equity than older ones but still lack formal fairness guarantees.
Performance clusters in the high-dimensional space of lesion masks and clinic-demographic features indicate systematic patient-level vulnerabilities.
Localized neuroanatomical biases identified in voxel-wise analysis are compartment-specific and consistent across models.
Equity monitoring should prioritize patient identity and clinical factors over selection among current model architectures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Improving training data diversity across molecular subtypes and resection extents may yield larger equity gains than further architectural changes.
The same multi-dimensional assessment approach could be applied to other medical imaging tasks such as organ segmentation or lesion detection to reveal analogous patient-driven biases.
Regulatory pathways for medical AI might eventually require quantitative equity reports like those produced by Fairboard before approval.
Extending the framework to longitudinal patient data could test whether biases persist or evolve with disease progression.

Load-bearing premise

The two independent datasets totaling 648 patients sufficiently represent real-world glioma populations and that the chosen metrics and multivariate models capture equity without unmeasured confounding.

What would settle it

A replication study on an independent cohort of at least 500 glioma patients in which model architecture explains more performance variance than patient identity or clinical factors would falsify the central claim.

Figures

Figures reproduced from arXiv: 2604.09656 by Chris Foulon, Harpreet Hyare, James K. Ruffle, Mohamad Zeina, Parashkev Nachev, Samia Mohinta, Sebastian Brandner, Zicheng Wang.

**Figure 2.** Figure 2: Qualitative segmentation comparison across diverse patient cases. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: A composite league table of model performance and distributional equity. [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Bayesian linear mixed-effects model of clinico-demographic predictors of segmentation per [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Spatial equity meta-analysis of voxel-wise segmentation bias. [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: Representational equity analysis of Dice similarity across tumour compartments. [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗

read the original abstract

Despite there now being more than 1,000 FDA-authorised AI medical devices, formal equity assessments -- whether model performance is uniform across patient subgroups -- are rare. Here, we evaluate the equity of 18 open-source brain tumour segmentation models across 648 glioma patients from two independent datasets (n = 11,664 model inferences) along distinct univariate, Bayesian multivariate, spatial, and representational dimensions. We find that patient identity consistently explains more performance variance than model choice, with clinical factors, including molecular diagnosis, tumour grade, and extent of resection, predicting segmentation accuracy more strongly than model architecture. A voxel-wise spatial meta-analysis identifies neuroanatomically localised biases that are compartment-specific yet often consistent across models. Within a high-dimensional latent space of lesion masks and clinic-demographic features, model performance clusters significantly, indicating that the patient feature space contains axes of algorithmic vulnerability. Although newer models tend toward greater equity, none provide a formal fairness guarantee. Lastly, we release Fairboard, an open-source, no-code dashboard that lowers barriers to equitable model monitoring in medical imaging.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Patient identity explains more segmentation performance variance than model choice across 18 models on two glioma datasets, backed by multi-angle analysis and a released dashboard, though source confounding needs explicit checking.

read the letter

The main point is that patient identity explains more of the variance in segmentation performance than the choice of model, across 18 open-source brain tumor models tested on 648 glioma patients from two datasets. Clinical factors such as molecular diagnosis, tumor grade, and extent of resection also come out stronger than architecture in the multivariate models. They apply this through univariate analysis, Bayesian multivariate modeling, voxel-wise spatial meta-analysis, and clustering in a latent space combining lesion masks with clinic-demographic features. The spatial meta-analysis reveals localized biases tied to specific neuroanatomical compartments that often appear across different models. Within the latent space, performance clusters in ways that highlight axes of vulnerability in the patient feature space. They also release Fairboard, a no-code dashboard for ongoing equity monitoring, which is a direct practical output. This multi-method approach on a solid number of inferences (11,664) adds breadth to equity work in medical imaging. The observation that newer models show better equity trends is worth noting, though none achieve a formal fairness guarantee. The patterns hold across the two datasets, which strengthens the case. One area that needs care is the potential for dataset-level effects to inflate the patient identity variance. Since the data come from only two sources, differences in acquisition or annotation could be absorbed into the patient random effect if dataset identity is not explicitly included as a covariate in the Bayesian model. The abstract does not indicate this was done, so the central claim about patient identity dominating model choice could partly stem from source heterogeneity. A reviewer should check the methods section for how this was handled. This is relevant for anyone building or evaluating AI models for glioma segmentation or similar tasks where subgroup performance matters. It provides both the analysis framework and a usable tool, so it merits peer review to verify the statistical details and confirm the robustness of the findings.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces Fairboard, a quantitative framework for equity assessment of medical imaging AI. It evaluates 18 open-source brain tumour segmentation models across 648 glioma patients from two independent datasets (totaling 11,664 inferences) using univariate, Bayesian multivariate, spatial, and representational analyses. Central claims are that patient identity explains more performance variance than model choice, clinical factors (molecular diagnosis, tumour grade, extent of resection) predict segmentation accuracy more strongly than architecture, voxel-wise biases are neuroanatomically localised and often model-consistent, and performance clusters in a high-dimensional latent space of lesion and clinic-demographic features. Newer models show greater equity but none offer formal fairness guarantees; the work releases an open-source no-code dashboard for monitoring.

Significance. If the variance decomposition and clustering results hold after addressing potential dataset confounding, the paper provides a valuable multi-dimensional toolkit for equity evaluation in healthcare AI, where such formal assessments remain rare despite over 1,000 FDA-authorised devices. The empirical demonstration that patient-level and clinical factors dominate model architecture, combined with the release of Fairboard, could meaningfully advance reproducible fairness monitoring in medical imaging.

major comments (3)

[Methods] Methods (Bayesian multivariate model): The description does not indicate that dataset ID (the two sources) was entered as a fixed or random covariate. With total n=648 drawn from only two datasets, any unmodeled scanner, protocol, or acquisition effects will be absorbed into the patient-identity random effect, directly undermining the central claim that patient identity consistently explains more variance than model choice.
[Results] Results (variance decomposition): No error bars, posterior intervals, or exact model specification (e.g., priors, convergence diagnostics) are referenced for the claim that patient identity > model choice and clinical factors > architecture. Without these, it is impossible to assess whether the reported dominance is robust or sensitive to post-hoc modeling choices.
[Results] Results (spatial meta-analysis): The voxel-wise analysis identifies compartment-specific biases consistent across models, but the manuscript does not report the multiple-comparison correction or the exact statistical threshold used to declare localisation, which is load-bearing for the claim of neuroanatomically specific equity gaps.

minor comments (2)

[Abstract] Abstract: The phrase 'formal fairness guarantee' is used without definition; clarify whether this refers to a specific metric (e.g., demographic parity) or a statistical test.
[Figures] Figure captions: Several spatial and clustering figures lack axis labels or scale bars, reducing interpretability of the reported neuroanatomical biases.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed comments. We address each major point below and have revised the manuscript to incorporate the requested clarifications and analyses.

read point-by-point responses

Referee: [Methods] Methods (Bayesian multivariate model): The description does not indicate that dataset ID (the two sources) was entered as a fixed or random covariate. With total n=648 drawn from only two datasets, any unmodeled scanner, protocol, or acquisition effects will be absorbed into the patient-identity random effect, directly undermining the central claim that patient identity consistently explains more variance than model choice.

Authors: We agree this is a critical methodological detail. In the revised manuscript we have added dataset ID as a fixed effect in the Bayesian multivariate model. Re-fitting the model shows that patient identity still accounts for substantially more performance variance than model choice (posterior mean difference remains >2x larger), and we have updated the Methods with the full model equation, priors, and convergence diagnostics. revision: yes
Referee: [Results] Results (variance decomposition): No error bars, posterior intervals, or exact model specification (e.g., priors, convergence diagnostics) are referenced for the claim that patient identity > model choice and clinical factors > architecture. Without these, it is impossible to assess whether the reported dominance is robust or sensitive to post-hoc modeling choices.

Authors: We have revised the Results to display 95% credible intervals on all variance-component estimates. The Methods section now specifies the exact hierarchical Bayesian model (weakly informative normal(0,1) priors on fixed effects, half-Cauchy(0,1) on variance terms), sampling details (4 chains, 2000 iterations post-warmup), and convergence criteria (R-hat < 1.01, bulk ESS > 4000). These additions confirm the robustness of the reported dominance ordering. revision: yes
Referee: [Results] Results (spatial meta-analysis): The voxel-wise analysis identifies compartment-specific biases consistent across models, but the manuscript does not report the multiple-comparison correction or the exact statistical threshold used to declare localisation, which is load-bearing for the claim of neuroanatomically specific equity gaps.

Authors: We have clarified the spatial meta-analysis procedure in the revised Methods: voxel-wise threshold of p < 0.001 followed by cluster-level family-wise error correction via 5000 permutations (alpha = 0.05). The Results now explicitly report this threshold and correction, supporting the neuroanatomically localised and model-consistent bias claims. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical variance decomposition is self-contained

full rationale

The paper performs direct empirical analysis via univariate statistics, Bayesian multivariate modeling, spatial meta-analysis, and clustering on performance metrics from 18 models evaluated on 648 patients. No load-bearing step reduces a claimed prediction or result to a fitted parameter by construction, invokes self-citation for uniqueness theorems, or renames known patterns as novel derivations. The central finding that patient identity explains more variance than model choice follows from standard variance partitioning applied to the observed data without circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claims rest on standard statistical assumptions for variance partitioning and clustering; no free parameters, ad-hoc axioms, or invented entities are described in the abstract.

pith-pipeline@v0.9.0 · 5522 in / 1174 out tokens · 30411 ms · 2026-05-14T21:04:25.919224+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Bayesian linear mixed-effects (LME) models with crossed random intercepts for patient (n=569) and model (n=18)... Variance decomposition revealed that patient identity consistently explained more variance than model identity. Patient-level intraclass correlation coefficients (ICCs) ranged from 0.31... whereas model-level ICCs ranged from 0.04...
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

DerSimonian–Laird random-effects meta-analysis of voxel-wise segmentation performance bias across 18 models... UMAP... latent-space GLMs

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

106 extracted references · 87 canonical work pages · 2 internal anchors

[1]

van der Laak, Bram van Ginneken, and Clara I

Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen A.W.M. van der Laak, Bram van Ginneken, and Clara I. Sánchez. A survey on deep learning in medical image analysis.Medical Image Analysis, 42:60–88, 2017. doi:10.1016/j.media.2017.07.005

work page doi:10.1016/j.media.2017.07.005 2017
[2]

Eric J. Topol. High-performance medicine: the convergence of human and artificial intelligence.Nature Medicine, 25(1):44–56, 2019. doi:10.1038/s41591-018-0300-7

work page doi:10.1038/s41591-018-0300-7 2019
[3]

Ting, Alan Karthikesalingam, Dominic King, Hutan Ashrafian, and Ara Darzi

Ravi Aggarwal, Viknesh Sounderajah, Guy Martin, Daniel S.W. Ting, Alan Karthikesalingam, Dominic King, Hutan Ashrafian, and Ara Darzi. Diagnostic accuracy of deep learning in medical imaging: a systematic review and meta-analysis.npj Digital Medicine, 4(1):65, 2021. doi:10.1038/s41746-021- 00438-z

work page doi:10.1038/s41746-021- 2021
[4]

Menze, Andras Jakab, Stefan Bauer, Jayashree Kalpathy-Cramer, Keyvan Farahani, Justin Kirby, Yuliya Burren, Nicole Porz, Johannes Slotboom, Roland Wiest, et al

Bjoern H. Menze, Andras Jakab, Stefan Bauer, Jayashree Kalpathy-Cramer, Keyvan Farahani, Justin Kirby, Yuliya Burren, Nicole Porz, Johannes Slotboom, Roland Wiest, et al. The Multimodal Brain 22 Tumor Image Segmentation Benchmark (BRATS).IEEE Transactions on Medical Imaging, 34(10): 1993–2024, 2015. doi:10.1109/TMI.2014.2377694

work page doi:10.1109/tmi.2014.2377694 1993
[5]

Kirby, John B

Spyridon Bakas, Hamed Akbari, Aristeidis Sotiras, Michel Bilello, Martin Rozycki, Justin S. Kirby, John B. Freymann, Keyvan Farahani, and Christos Davatzikos. Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features.Scientific Data, 4: 170117, 2017. doi:10.1038/sdata.2017.117

work page doi:10.1038/sdata.2017.117 2017
[6]

Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge

Spyridon Bakas, Mauricio Reyes, Andras Jakab, Stefan Bauer, Markus Rempfler, Alessandro Crimi, Russell T. Shinohara, Christoph Berger, Sung Min Ha, Martin Rozycki, et al. Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in a multi-institutional multi-site dataset.arXiv p...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1811.02629 2018
[7]

The RSNA-ASNR-MICCAI BraTS 2021 Benchmark on Brain Tumor Segmentation and Radiogenomic Classification

Ujjwal Baid, Satyam Ghodasara, Suyash Mohan, Michel Bilello, Evan Calabrese, Errol Colak, Keyvan Farahani, Jayashree Kalpathy-Cramer, Felipe C. Kitamura, Sarthak Pati, et al. The RSNA-ASNR- MICCAI BraTS 2021 benchmark on brain tumor segmentation and radiogenomic classification.arXiv preprint, 2021. doi:10.48550/arXiv.2107.02314

work page internal anchor Pith review doi:10.48550/arxiv.2107.02314 2021
[8]

Nature Methods 18(2), 203–211 (2021)

Fabian Isensee, Paul F. Jaeger, Simon A. A. Kohl, Jens Petersen, and Klaus H. Maier-Hein. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation.Nature Methods, 18 (2):203–211, 2021. doi:10.1038/s41592-020-01008-z

work page doi:10.1038/s41592-020-01008-z 2021
[9]

3D MRI brain tumor segmentation using autoencoder regularization

Andriy Myronenko. 3D MRI brain tumor segmentation using autoencoder regularization. InBrainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. BrainLes 2018. LNCS, volume 11384. Springer, 2019. doi:10.1007/978-3-030-11726-9_28

work page doi:10.1007/978-3-030-11726-9_28 2018
[10]

Brain tu- mour segmentation with incomplete imaging data.Brain Communications, 5(2):fcad118, 2023

James K Ruffle, Samia Mohinta, Robert Gray, Harpreet Hyare, and Parashkev Nachev. Brain tu- mour segmentation with incomplete imaging data.Brain Communications, 5(2):fcad118, 2023. doi:10.1093/braincomms/fcad118

work page doi:10.1093/braincomms/fcad118 2023
[11]

Predicting brain tumour enhancement from non-contrast MR imaging with artificial intelligence.arXiv preprint, 2025

James K Ruffle, Samia Mohinta, Guilherme Pombo, Asthik Biswas, Alan Campbell, Indran Davagnanam, David Doig, Ahmed Hammam, Harpreet Hyare, Farrah Jabeen, Emma Lim, Dermot Mallon, Stephanie Owen, Sophie Wilkinson, Sebastian Brandner, and Parashkev Nachev. Predicting brain tumour enhancement from non-contrast MR imaging with artificial intelligence.arXiv pr...

work page doi:10.48550/arxiv.2508.16650 2025
[12]

Roth, and Daguang Xu

Ali Hatamizadeh, Vishwesh Nath, Yucheng Tang, Dong Yang, Holger R. Roth, and Daguang Xu. Swin UNETR: Swin transformers for semantic segmentation of brain tumors in MRI images. InBrainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. BrainLes 2021. LNCS, volume 12962. Springer, 2022. doi:10.1007/978-3-031-08999-2_22

work page doi:10.1007/978-3-031-08999-2_22 2021
[13]

BraTS toolkit: Translating BraTS brain tumor segmentation algorithms into clinical and scientific practice.Frontiers in Neuroscience, 14:125, 2020

Florian Kofler, Christoph Berger, Diana Waldmannstetter, Jana Lipkova, Ivan Ezhov, Giles Tran, Bjoern Menze, et al. BraTS toolkit: Translating BraTS brain tumor segmentation algorithms into clinical and scientific practice.Frontiers in Neuroscience, 14:125, 2020. doi:10.3389/fnins.2020.00125

work page doi:10.3389/fnins.2020.00125 2020
[14]

Health equity

World Health Organization. Health equity. https://www.who.int/health-topics/ health-equity, 2025. Accessed 25 March 2026

2025
[15]

Muehlematter, Paola Daniore, and Kerstin N

Urs J. Muehlematter, Paola Daniore, and Kerstin N. V okinger. Approval of artificial intelligence and machine learning-based medical devices in the USA and Europe (2015–20): a comparative analysis. The Lancet Digital Health, 3(3):e195–e203, 2021. doi:10.1016/S2589-7500(20)30292-2. 23

work page doi:10.1016/s2589-7500(20)30292-2 2015
[16]

Erickson

Kang Zhang, Bardia Khosravi, Shahriar Vahdati, and Bradley J. Erickson. FDA review of radiologic AI algorithms: Process and challenges.Radiology, 310(1):e230242, 2024. doi:10.1148/radiol.230242

work page doi:10.1148/radiol.230242 2024
[17]

Milam and Chi Wan Koo

Morgan E. Milam and Chi Wan Koo. The current status and future of FDA-approved artificial in- telligence tools in chest radiology in the United States.Clinical Radiology, 78(2):115–122, 2023. doi:10.1016/j.crad.2022.08.135

work page doi:10.1016/j.crad.2022.08.135 2023
[18]

Lin, Bhav Jain, Jay M

John C. Lin, Bhav Jain, Jay M. Iyer, Ishan Rola, Anusha R. Srinivasan, Chaerim Kang, Heta Patel, and Ravi B. Parikh. Benefit-risk reporting for FDA-cleared artificial intelligence-enabled medical devices. JAMA Health Forum, 6(9):e253351, 2025. doi:10.1001/jamahealthforum.2025.3351

work page doi:10.1001/jamahealthforum.2025.3351 2025
[19]

A survey on bias and fairness in machine learning.ACM Computing Surveys, 54(6):1–35, 2022

Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. A survey on bias and fairness in machine learning.ACM Computing Surveys, 54(6):1–35, 2022. doi:10.1145/3457607

work page doi:10.1145/3457607 2022
[20]

Fair prediction with disparate impact: A study of bias in recidivism prediction instruments.Big data, 5(2):153–163, 2017

Alexandra Chouldechova. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments.Big Data, 5(2):153–163, 2017. doi:10.1089/big.2016.0047

work page doi:10.1089/big.2016.0047 2017
[21]

Shenkman, Jiang Bian, and Fei Wang

Jie Xu, Yunyu Xiao, Wendy Hui Wang, Yue Ning, Elizabeth A. Shenkman, Jiang Bian, and Fei Wang. Algorithmic fairness in computational medicine.eBioMedicine, 84:104250, 2022. doi:10.1016/j.ebiom.2022.104250

work page doi:10.1016/j.ebiom.2022.104250 2022
[22]

Gender shades: Intersectional accuracy disparities in commercial gender classification

Joy Buolamwini and Timnit Gebru. Gender shades: Intersectional accuracy disparities in commercial gender classification. InProceedings of the 1st Conference on Fairness, Accountability and Transparency, volume 81 ofProceedings of Machine Learning Research, pages 77–91. PMLR, 2018. URL https: //proceedings.mlr.press/v81/buolamwini18a.html

2018
[23]

Dissecting racial bias in an algorithm used to manage the health of populations.Science, 366(6464):447–453, 2019

Ziad Obermeyer, Brian Powers, Christine V ogeli, and Sendhil Mullainathan. Dissecting racial bias in an algorithm used to manage the health of populations.Science, 366(6464):447–453, 2019. doi:10.1126/science.aax2342

work page doi:10.1126/science.aax2342 2019
[24]

McDermott, Irene Y

Laleh Seyyed-Kalantari, Haoran Zhang, Matthew B.A. McDermott, Irene Y . Chen, and Marzyeh Ghassemi. Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations.Nature Medicine, 27(12):2176–2182, 2021. doi:10.1038/s41591-021- 01595-0

work page doi:10.1038/s41591-021- 2021
[25]

Larrazabal, Nicolás Nieto, Victoria Peterson, Diego H

Agostina J. Larrazabal, Nicolás Nieto, Victoria Peterson, Diego H. Milone, and Enzo Ferrante. Gender imbalance in medical imaging datasets produces biased classifiers for computer-aided diagnosis.Proceed- ings of the National Academy of Sciences, 117(23):12592–12594, 2020. doi:10.1073/pnas.1919012117

work page doi:10.1073/pnas.1919012117 2020
[26]

Algorithmic encoding of protected characteristics in chest X-ray disease detection models.eBioMedicine, 89:104467, 2023

Ben Glocker, Charles Jones, Mélanie Bernhardt, and Stefan Winzeck. Algorithmic encoding of protected characteristics in chest X-ray disease detection models.eBioMedicine, 89:104467, 2023. doi:10.1016/j.ebiom.2023.104467

work page doi:10.1016/j.ebiom.2023.104467 2023
[27]

Com- putational limits to the legibility of the imaged human brain.NeuroImage, 291:120600, 2024

James K Ruffle, Samia Mohinta, Robert Gray, Harpreet Hyare, and Parashkev Nachev. Com- putational limits to the legibility of the imaged human brain.NeuroImage, 291:120600, 2024. doi:10.1016/j.neuroimage.2024.120600

work page doi:10.1016/j.neuroimage.2024.120600 2024
[28]

Representational ethical model calibration.npj Digital Medicine, 5(1):170, 2022

Robert Carruthers, Isabel Straw, James K Ruffle, Daniel Herron, Amy Nelson, Danilo Bzdok, Delmiro Fernandez-Reyes, Geraint Rees, and Parashkev Nachev. Representational ethical model calibration.npj Digital Medicine, 5(1):170, 2022. doi:10.1038/s41746-022-00716-4. 24

work page doi:10.1038/s41746-022-00716-4 2022
[29]

Gichoya, Dina Katabi, and Marzyeh Ghassemi

Yuzhe Yang, Haoran Zhang, Judy W. Gichoya, Dina Katabi, and Marzyeh Ghassemi. The limits of fair medical imaging AI in real-world generalization.Nature Medicine, 30(10):2838–2848, 2024. doi:10.1038/s41591-024-03113-4

work page doi:10.1038/s41591-024-03113-4 2024
[30]

Chen, Emma Pierson, Sherri Rose, Shalmali Joshi, Kadija Ferryman, and Marzyeh Ghassemi

Irene Y . Chen, Emma Pierson, Sherri Rose, Shalmali Joshi, Kadija Ferryman, and Marzyeh Ghassemi. Ethical machine learning in healthcare.Annual Review of Biomedical Data Science, 4:123–144, 2021. doi:10.1146/annurev-biodatasci-092820-114757

work page doi:10.1146/annurev-biodatasci-092820-114757 2021
[31]

McCradden, Shalmali Joshi, Mjaye Mazwi, and James A

Melissa D. McCradden, Shalmali Joshi, Mjaye Mazwi, and James A. Anderson. Ethical limitations of algorithmic fairness solutions in health care machine learning.The Lancet Digital Health, 2(5): e221–e223, 2020. doi:10.1016/S2589-7500(20)30065-0

work page doi:10.1016/s2589-7500(20)30065-0 2020
[32]

Parikh, Stephanie Teeple, and Amol S

Ravi B. Parikh, Stephanie Teeple, and Amol S. Navathe. Addressing bias in artificial intelligence in health care.JAMA, 322(24):2377–2378, 2019. doi:10.1001/jama.2019.18058

work page doi:10.1001/jama.2019.18058 2019
[33]

Piechnik, Stefan Neubauer, Steffen E

Esther Puyol-Antón, Bram Ruijsink, Stefan K. Piechnik, Stefan Neubauer, Steffen E. Petersen, Reza Razavi, and Andrew P. King. Fairness in cardiac magnetic resonance imaging: Assessing sex and racial bias in deep learning-based segmentation.Frontiers in Cardiovascular Medicine, 9:859310, 2022. doi:10.3389/fcvm.2022.859310

work page doi:10.3389/fcvm.2022.859310 2022
[34]

Addressing fairness in artificial intelligence for medical imaging.Nature Communications, 13:4581, 2022

María Agustina Ricci Lara, Rodrigo Echeveste, and Enzo Ferrante. Addressing fairness in artificial intelligence for medical imaging.Nature Communications, 13:4581, 2022. doi:10.1038/s41467-022- 32186-3

work page doi:10.1038/s41467-022- 2022
[35]

On (assessing) the fairness of risk score models

Eike Petersen, Melanie Ganz, Søren Holm, and Aasa Feragen. On (assessing) the fairness of risk score models. InProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, pages 817–829, 2023. doi:10.1145/3593013.3594045

work page doi:10.1145/3593013.3594045 2023
[36]

Kevin Zhou

Zikang Xu, Jun Li, Qingsong Yao, Han Li, Mingyue Zhao, and S. Kevin Zhou. Addressing fairness issues in deep learning-based medical image analysis: a systematic review.npj Digital Medicine, 7(1): 286, 2024. doi:10.1038/s41746-024-01276-5

work page doi:10.1038/s41746-024-01276-5 2024
[37]

Villanueva-Meyer, Jeffrey D

Evan Calabrese, Javier E. Villanueva-Meyer, and Soonmee Cha. The University of California San Francisco Preoperative Diffuse Glioma MRI (UCSF-PDGM) dataset.Radiology: Artificial Intelligence, 4(6):e220058, 2022. doi:10.1148/ryai.220058

work page doi:10.1148/ryai.220058 2022
[38]

Rudie, Nazím Flores Santamaría, Anahita Fathi Kazerooni, Sarthak Pati, et al

Spyridon Bakas, Chiharu Sako, Hamed Akbari, Michel Bilello, Aristeidis Sotiras, Gaurav Shukla, Jeffrey D. Rudie, Nazím Flores Santamaría, Anahita Fathi Kazerooni, Sarthak Pati, et al. The University of Pennsylvania Glioblastoma (UPenn-GBM) cohort: advanced MRI, clinical, genomics, & radiomics. Scientific Data, 9(1):453, 2022. doi:10.1038/s41597-022-01560-7

work page doi:10.1038/s41597-022-01560-7 2022
[39]

On the measurement of inequalities in health

Adam Wagstaff, Pierella Paci, and Eddy van Doorslaer. On the measurement of inequalities in health. Social Science & Medicine, 51(5):667–681, 2000. doi:10.1016/S0277-9536(99)00382-2

work page doi:10.1016/s0277-9536(99)00382-2 2000
[40]

Variabilità e mutabilità: contributo allo studio delle distribuzioni e delle relazioni statistiche.Studi Economico-Giuridici della Regia Università di Cagliari, 3:3–159, 1912

Corrado Gini. Variabilità e mutabilità: contributo allo studio delle distribuzioni e delle relazioni statistiche.Studi Economico-Giuridici della Regia Università di Cagliari, 3:3–159, 1912

1912
[41]

Atkinson

Anthony B. Atkinson. On the measurement of inequality.Journal of Economic Theory, 2(3):244–263,
[42]

doi:10.1016/0022-0531(70)90039-6

work page doi:10.1016/0022-0531(70)90039-6
[43]

North-Holland, Amsterdam, 1967

Henri Theil.Economics and Information Theory. North-Holland, Amsterdam, 1967. ISBN 978-0-7204- 3347-3. 25

1967
[44]

Hoover, Jr

Edgar M. Hoover, Jr. The measurement of industrial localization.The Review of Economics and Statistics, 18(4):162–171, 1936. doi:10.2307/1927875

work page doi:10.2307/1927875 1936
[45]

Shorrocks

Anthony F. Shorrocks. The class of additively decomposable inequality measures.Econometrica, 48(3): 613–625, 1980. doi:10.2307/1913126

work page doi:10.2307/1913126 1980
[46]

Homogeneous middles vs

José Gabriel Palma. Homogeneous middles vs. heterogeneous tails, and the end of the ‘inverted-U’: It’s all about the share of the rich.Development and Change, 42(1):87–153, 2011. doi:10.1111/j.1467- 7660.2011.01694.x

work page doi:10.1111/j.1467- 2011
[47]

Fairboard: a quantitative framework for equity assessment of healthcare models

James K. Ruffle, Samia Mohinta, Chris Foulon, Mohamad Zeina, Zicheng Wang, Sebastian Brandner, Harpreet Hyare, and Parashkev Nachev. Model inferences for “Fairboard: a quantitative framework for equity assessment of healthcare models”. Zenodo, 2026. doi:10.5281/zenodo.19207798

work page doi:10.5281/zenodo.19207798 2026
[48]

Auto3DSeg for brain tumor segmentation from 3D MRI in BraTS 2023 challenge.arXiv preprint, 2025

Andriy Myronenko, Dong Yang, Yufan He, and Daguang Xu. Auto3DSeg for brain tumor segmentation from 3D MRI in BraTS 2023 challenge.arXiv preprint, 2025. doi:10.48550/arXiv.2510.25058

work page doi:10.48550/arxiv.2510.25058 2023
[49]

Enhanced data augmentation using synthetic data for brain tumour segmentation

André Ferreira, Naida Solak, Jianning Li, Philipp Dammann, Jens Kleesiek, Victor Alves, and Jan Egger. Enhanced data augmentation using synthetic data for brain tumour segmentation. InBrain Tumor Segmentation, and Cross-Modality Domain Adaptation for Medical Image Segmentation. BraTS 2023. LNCS, volume 14669. Springer, 2024. doi:10.1007/978-3-031-76163-8_8

work page doi:10.1007/978-3-031-76163-8_8 2023
[50]

Advanced tumor segmentation in medical imaging: An ensemble approach for BraTS 2023 adult glioma and pediatric tumor tasks

Fadillah Maani, Anees Ur Rehman Hashmi, Mariam Aljuboory, Numan Saeed, Ikboljon Sobirov, and Mohammad Yaqub. Advanced tumor segmentation in medical imaging: An ensemble approach for BraTS 2023 adult glioma and pediatric tumor tasks. InBrain Tumor Segmentation, and Cross-Modality Domain Adaptation for Medical Image Segmentation. BraTS 2023. LNCS, volume 14...

2023
[51]

doi:10.1007/978-3-031-76163-8_24

work page doi:10.1007/978-3-031-76163-8_24
[52]

Extending nn-UNet for brain tumor segmentation

Huan Minh Luu and Sung-Hong Park. Extending nn-UNet for brain tumor segmentation. InBrainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. BrainLes 2021. LNCS, volume 12963. Springer, 2022. doi:10.1007/978-3-031-09002-8_16

work page doi:10.1007/978-3-031-09002-8_16 2021
[53]

Tomás Capretto, Camen Piho, Ravin Kumar, Jacob Westfall, Tal Yarkoni, and Osvaldo A. Martin. Bambi: A simple interface for fitting Bayesian linear models in Python.Journal of Statistical Software, 103(15): 1–29, 2022. doi:10.18637/jss.v103.i15

work page doi:10.18637/jss.v103.i15 2022
[54]

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC.Statistics and Computing, 27(5):1413–1432, 2017

Aki Vehtari, Andrew Gelman, and Jonah Gabry. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC.Statistics and Computing, 27(5):1413–1432, 2017. doi:10.1007/s11222- 016-9696-4

work page doi:10.1007/s11222- 2017
[55]

Meta-analysis in clinical trials.Controlled Clinical Trials, 7(3): 177–188, 1986

Rebecca DerSimonian and Nan Laird. Meta-analysis in clinical trials.Controlled Clinical Trials, 7(3): 177–188, 1986. doi:10.1016/0197-2456(86)90046-2

work page doi:10.1016/0197-2456(86)90046-2 1986
[56]

UMAP: Uniform manifold approximation and projection for dimension reduction.Journal of Open Source Software, 3(29):861, 2018

Leland McInnes, John Healy, and James Melville. UMAP: Uniform manifold approximation and projection for dimension reduction.Journal of Open Source Software, 3(29):861, 2018. doi:10.21105/joss.00861

work page doi:10.21105/joss.00861 2018
[57]

Yoav Benjamini and Yosef Hochberg. Controlling the false discovery rate: A practical and powerful approach to multiple testing.Journal of the Royal Statistical Society: Series B (Methodological), 57(1): 289–300, 1995. doi:10.1111/j.2517-6161.1995.tb02031.x. 26

work page doi:10.1111/j.2517-6161.1995.tb02031.x 1995
[59]

Emma A. M. Stanley, Roger Y . Tsang, Haley Gillett, Raissa Souza, Vibujithan Vigneshwaran, Chris Kang, Melissa D. McCradden, Matthias Wilms, and Nils D. Forkert. Connecting algorithmic fairness and fair outcomes in a sociotechnical simulation case study of AI-assisted healthcare.Nature Communications, 17(1):788, 2025. doi:10.1038/s41467-025-67470-5

work page doi:10.1038/s41467-025-67470-5 2025
[60]

Louis, Arie Perry, Pieter Wesseling, Daniel J

David N. Louis, Arie Perry, Pieter Wesseling, Daniel J. Brat, Ian A. Cree, Dominique Figarella-Branger, Cynthia Hawkins, H.K. Ng, Scott M. Pfister, Guido Reifenberger, Riccardo Soffietti, Andreas von Deimling, and David W. Ellison. The 2021 WHO classification of tumors of the central nervous system: a summary.Neuro-Oncology, 23(8):1231–1251, 2021. doi:10....

work page doi:10.1093/neuonc/noab106 2021
[61]

Brain tumour genetic network signatures of survival.Brain, 146(11):4736–4754, 2023

James K Ruffle, Samia Mohinta, Guilherme Pombo, Robert Gray, Valeriya Kopanitsa, Faith Lee, Sebastian Brandner, Harpreet Hyare, and Parashkev Nachev. Brain tumour genetic network signatures of survival.Brain, 146(11):4736–4754, 2023. doi:10.1093/brain/awad199

work page doi:10.1093/brain/awad199 2023
[62]

Learning fair representations

Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. Learning fair representations. In Proceedings of the 30th International Conference on Machine Learning, volume 28 ofProceedings of Machine Learning Research, pages 325–333. PMLR, 2013

2013
[63]

Preventing fairness gerrymandering: Auditing and learning for subgroup fairness

Michael Kearns, Seth Neel, Aaron Roth, and Zhiwei Steven Wu. Preventing fairness gerrymandering: Auditing and learning for subgroup fairness. InProceedings of the 35th International Conference on Machine Learning, volume 80 ofProceedings of Machine Learning Research, pages 2564–2572. PMLR, 2018

2018
[64]

Kimberlé Crenshaw. Demarginalizing the intersection of race and sex: A black feminist critique of antidiscrimination doctrine, feminist theory and antiracist politics.University of Chicago Legal Forum, 1989(1):139–167, 1989

1989
[65]

Elle Lett and William G. La Cava. Translating intersectionality to fair machine learning in health sciences.Nature Machine Intelligence, 5(5):476–479, 2023. doi:10.1038/s42256-023-00651-3

work page doi:10.1038/s42256-023-00651-3 2023
[66]

Burns, Leo Anthony Celi, Li-Ching Chen, Ramon Correa, Natalie Dullerud, Marzyeh Ghassemi, Shih-Cheng Huang, Po-Chih Kuo, Matthew P

Judy Wawira Gichoya, Imon Banerjee, Ananth Reddy Bhimireddy, John L. Burns, Leo Anthony Celi, Li-Ching Chen, Ramon Correa, Natalie Dullerud, Marzyeh Ghassemi, Shih-Cheng Huang, Po-Chih Kuo, Matthew P. Lungren, Lyle J. Palmer, Brandon J. Price, Saptarshi Purkayastha, Ayis T. Pyrros, Lauren Oakden-Rayner, Chima Okechukwu, Laleh Seyyed-Kalantari, Hari Trived...

work page doi:10.1016/s2589-7500(22)00063-2 2022
[67]

Stanislav Nikolov, Sam Blackwell, Alexei Zverovitch, Ruheena Menber, Jeffrey De Fauw, Nenad Patel, Clemens Meyer, Harry Askham, Bernadino Romera-Paredes, Christopher Kelly, et al. Clinically appli- cable segmentation of head and neck anatomy for radiotherapy: Deep learning algorithm development and validation study.Journal of Medical Internet Research, 23...

work page doi:10.2196/26151 2021
[68]

Medical domain knowledge in domain-agnostic generative AI.npj Digital Medicine, 5(1):90, 2022

Jakob Nikolas Kather, Narmin Ghaffari Laleh, Sebastian Foersch, and Daniel Truhn. Medical domain knowledge in domain-agnostic generative AI.npj Digital Medicine, 5(1):90, 2022. doi:10.1038/s41746- 022-00634-5

work page doi:10.1038/s41746- 2022
[69]

Collins, Johannes B

Gary S. Collins, Johannes B. Reitsma, Douglas G. Altman, and Karel G. M. Moons. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement.Annals of Internal Medicine, 162(1):55–63, 2015. doi:10.7326/M14-0697. 27

work page doi:10.7326/m14-0697 2015
[70]

Uncertainty-driven refinement of tumor-core segmentation using 3D-to-2D networks with label uncertainty

Richard McKinley, Michael Rebsamen, Katrin Daetwyler, Raphael Meier, Piotr Radojewski, and Roland Wiest. Uncertainty-driven refinement of tumor-core segmentation using 3D-to-2D networks with label uncertainty. InBrainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. BrainLes
[71]

Springer, 2021

LNCS, volume 12658. Springer, 2021. doi:10.1007/978-3-030-72084-1_36

work page doi:10.1007/978-3-030-72084-1_36 2021
[72]

Jaeger, Peter M

Fabian Isensee, Paul F. Jaeger, Peter M. Full, Philipp V ollmuth, and Klaus H. Maier-Hein. nnU-Net for brain tumor segmentation. InBrainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. BrainLes 2020. LNCS, volume 12659. Springer, 2021. doi:10.1007/978-3-030-72087-2_11

work page doi:10.1007/978-3-030-72087-2_11 2020
[73]

Automatic brain tumor segmentation with scale attention network

Yading Yuan. Automatic brain tumor segmentation with scale attention network. InBrainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. BrainLes 2020. LNCS, volume 12658. Springer,

2020
[74]

doi:10.1007/978-3-030-72084-1_26

work page doi:10.1007/978-3-030-72084-1_26
[75]

Modality-pairing learning for brain tumor segmentation

Yixin Wang, Yao Zhang, Feng Hou, Yang Liu, Jie Tian, Cheng Zhong, Yang Zhang, and Zhiqiang He. Modality-pairing learning for brain tumor segmentation. InBrainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. BrainLes 2020. LNCS, volume 12658. Springer, 2021. doi:10.1007/978-3-030-72084-1_21

work page doi:10.1007/978-3-030-72084-1_21 2020
[76]

H2NF-Net for brain tumor segmentation using multimodal MR imaging: 2nd place solution to BraTS challenge 2020 segmentation task

Haozhe Jia, Weidong Cai, Heng Huang, and Yong Xia. H2NF-Net for brain tumor segmentation using multimodal MR imaging: 2nd place solution to BraTS challenge 2020 segmentation task. InBrainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. BrainLes 2020. LNCS, volume 12659. Springer, 2021. doi:10.1007/978-3-030-72087-2_6

work page doi:10.1007/978-3-030-72087-2_6 2020
[77]

Bag of tricks for 3D MRI brain tumor segmentation

Yuan-Xing Zhao, Yan-Ming Zhang, and Cheng-Lin Liu. Bag of tricks for 3D MRI brain tumor segmentation. InBrainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. BrainLes
[78]

Springer, 2020

LNCS, volume 11992. Springer, 2020. doi:10.1007/978-3-030-46640-4_20

work page doi:10.1007/978-3-030-46640-4_20 2020
[79]

Maier-Hein

Fabian Isensee, Philipp Kickingereder, Wolfgang Wick, Martin Bendszus, and Klaus H. Maier-Hein. No new-net. InBrainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. BrainLes
[80]

Springer, 2019

LNCS, volume 11384. Springer, 2019. doi:10.1007/978-3-030-11726-9_21

work page doi:10.1007/978-3-030-11726-9_21 2019
[81]

Tustison, Sohil H

Xue Feng, Nicholas J. Tustison, Sohil H. Patel, and Craig H. Meyer. Brain tumor segmentation using an ensemble of 3D U-Nets and overall survival prediction using radiomic features.Frontiers in Computational Neuroscience, 14:25, 2020. doi:10.3389/fncom.2020.00025

work page doi:10.3389/fncom.2020.00025 2020

Showing first 80 references.