pith. machine review for the scientific record. sign in

arxiv: 2604.09656 · v1 · submitted 2026-03-30 · 💻 cs.LG · cs.AI· stat.AP· stat.ME

Recognition: 2 theorem links

· Lean Theorem

Fairboard: a quantitative framework for equity assessment of healthcare models

Authors on Pith no claims yet

Pith reviewed 2026-05-14 21:04 UTC · model grok-4.3

classification 💻 cs.LG cs.AIstat.APstat.ME
keywords equity assessmentbrain tumor segmentationAI fairnessgliomamodel performance variancepatient subgroupsspatial bias analysisFairboard dashboard
0
0 comments X

The pith

Patient identity explains more variance in brain tumor segmentation accuracy than model architecture or choice.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Fairboard, a framework that measures equity by testing 18 open-source brain tumor segmentation models on 648 glioma patients from two datasets. It shows through univariate, Bayesian multivariate, spatial, and high-dimensional analyses that who the patient is—specifically factors like molecular diagnosis, tumor grade, and extent of resection—accounts for more performance differences than which algorithm is applied. Spatial mapping reveals consistent, compartment-specific biases across models, while performance clusters in the space of lesion and demographic features point to patient-level axes of vulnerability. Newer models trend toward better equity but none deliver formal fairness guarantees. The work releases a no-code dashboard to make routine equity monitoring accessible for medical imaging AI.

Core claim

Across 11,664 model inferences, patient identity consistently accounts for greater performance variance than model choice, with clinical variables including molecular diagnosis, tumor grade, and extent of resection emerging as stronger predictors of segmentation accuracy than architecture; voxel-wise meta-analysis shows localized neuroanatomical biases that are compartment-specific yet often shared across models, and high-dimensional clustering of lesion masks with clinic-demographic features identifies patient feature axes along which models are systematically vulnerable.

What carries the argument

Fairboard equity assessment framework, which combines univariate statistics, Bayesian multivariate modeling, voxel-wise spatial meta-analysis, and latent-space clustering of lesion masks with clinic-demographic features to quantify how patient subgroups affect segmentation performance.

If this is right

  • Newer segmentation models achieve greater equity than older ones but still lack formal fairness guarantees.
  • Performance clusters in the high-dimensional space of lesion masks and clinic-demographic features indicate systematic patient-level vulnerabilities.
  • Localized neuroanatomical biases identified in voxel-wise analysis are compartment-specific and consistent across models.
  • Equity monitoring should prioritize patient identity and clinical factors over selection among current model architectures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Improving training data diversity across molecular subtypes and resection extents may yield larger equity gains than further architectural changes.
  • The same multi-dimensional assessment approach could be applied to other medical imaging tasks such as organ segmentation or lesion detection to reveal analogous patient-driven biases.
  • Regulatory pathways for medical AI might eventually require quantitative equity reports like those produced by Fairboard before approval.
  • Extending the framework to longitudinal patient data could test whether biases persist or evolve with disease progression.

Load-bearing premise

The two independent datasets totaling 648 patients sufficiently represent real-world glioma populations and that the chosen metrics and multivariate models capture equity without unmeasured confounding.

What would settle it

A replication study on an independent cohort of at least 500 glioma patients in which model architecture explains more performance variance than patient identity or clinical factors would falsify the central claim.

Figures

Figures reproduced from arXiv: 2604.09656 by Chris Foulon, Harpreet Hyare, James K. Ruffle, Mohamad Zeina, Parashkev Nachev, Samia Mohinta, Sebastian Brandner, Zicheng Wang.

Figure 1
Figure 1. Figure 1: Fairboard: an equitable model monitoring dashboard. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Qualitative segmentation comparison across diverse patient cases. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: A composite league table of model performance and distributional equity. [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Bayesian linear mixed-effects model of clinico-demographic predictors of segmentation per [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Spatial equity meta-analysis of voxel-wise segmentation bias. [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Representational equity analysis of Dice similarity across tumour compartments. [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗
read the original abstract

Despite there now being more than 1,000 FDA-authorised AI medical devices, formal equity assessments -- whether model performance is uniform across patient subgroups -- are rare. Here, we evaluate the equity of 18 open-source brain tumour segmentation models across 648 glioma patients from two independent datasets (n = 11,664 model inferences) along distinct univariate, Bayesian multivariate, spatial, and representational dimensions. We find that patient identity consistently explains more performance variance than model choice, with clinical factors, including molecular diagnosis, tumour grade, and extent of resection, predicting segmentation accuracy more strongly than model architecture. A voxel-wise spatial meta-analysis identifies neuroanatomically localised biases that are compartment-specific yet often consistent across models. Within a high-dimensional latent space of lesion masks and clinic-demographic features, model performance clusters significantly, indicating that the patient feature space contains axes of algorithmic vulnerability. Although newer models tend toward greater equity, none provide a formal fairness guarantee. Lastly, we release Fairboard, an open-source, no-code dashboard that lowers barriers to equitable model monitoring in medical imaging.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces Fairboard, a quantitative framework for equity assessment of medical imaging AI. It evaluates 18 open-source brain tumour segmentation models across 648 glioma patients from two independent datasets (totaling 11,664 inferences) using univariate, Bayesian multivariate, spatial, and representational analyses. Central claims are that patient identity explains more performance variance than model choice, clinical factors (molecular diagnosis, tumour grade, extent of resection) predict segmentation accuracy more strongly than architecture, voxel-wise biases are neuroanatomically localised and often model-consistent, and performance clusters in a high-dimensional latent space of lesion and clinic-demographic features. Newer models show greater equity but none offer formal fairness guarantees; the work releases an open-source no-code dashboard for monitoring.

Significance. If the variance decomposition and clustering results hold after addressing potential dataset confounding, the paper provides a valuable multi-dimensional toolkit for equity evaluation in healthcare AI, where such formal assessments remain rare despite over 1,000 FDA-authorised devices. The empirical demonstration that patient-level and clinical factors dominate model architecture, combined with the release of Fairboard, could meaningfully advance reproducible fairness monitoring in medical imaging.

major comments (3)
  1. [Methods] Methods (Bayesian multivariate model): The description does not indicate that dataset ID (the two sources) was entered as a fixed or random covariate. With total n=648 drawn from only two datasets, any unmodeled scanner, protocol, or acquisition effects will be absorbed into the patient-identity random effect, directly undermining the central claim that patient identity consistently explains more variance than model choice.
  2. [Results] Results (variance decomposition): No error bars, posterior intervals, or exact model specification (e.g., priors, convergence diagnostics) are referenced for the claim that patient identity > model choice and clinical factors > architecture. Without these, it is impossible to assess whether the reported dominance is robust or sensitive to post-hoc modeling choices.
  3. [Results] Results (spatial meta-analysis): The voxel-wise analysis identifies compartment-specific biases consistent across models, but the manuscript does not report the multiple-comparison correction or the exact statistical threshold used to declare localisation, which is load-bearing for the claim of neuroanatomically specific equity gaps.
minor comments (2)
  1. [Abstract] Abstract: The phrase 'formal fairness guarantee' is used without definition; clarify whether this refers to a specific metric (e.g., demographic parity) or a statistical test.
  2. [Figures] Figure captions: Several spatial and clustering figures lack axis labels or scale bars, reducing interpretability of the reported neuroanatomical biases.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed comments. We address each major point below and have revised the manuscript to incorporate the requested clarifications and analyses.

read point-by-point responses
  1. Referee: [Methods] Methods (Bayesian multivariate model): The description does not indicate that dataset ID (the two sources) was entered as a fixed or random covariate. With total n=648 drawn from only two datasets, any unmodeled scanner, protocol, or acquisition effects will be absorbed into the patient-identity random effect, directly undermining the central claim that patient identity consistently explains more variance than model choice.

    Authors: We agree this is a critical methodological detail. In the revised manuscript we have added dataset ID as a fixed effect in the Bayesian multivariate model. Re-fitting the model shows that patient identity still accounts for substantially more performance variance than model choice (posterior mean difference remains >2x larger), and we have updated the Methods with the full model equation, priors, and convergence diagnostics. revision: yes

  2. Referee: [Results] Results (variance decomposition): No error bars, posterior intervals, or exact model specification (e.g., priors, convergence diagnostics) are referenced for the claim that patient identity > model choice and clinical factors > architecture. Without these, it is impossible to assess whether the reported dominance is robust or sensitive to post-hoc modeling choices.

    Authors: We have revised the Results to display 95% credible intervals on all variance-component estimates. The Methods section now specifies the exact hierarchical Bayesian model (weakly informative normal(0,1) priors on fixed effects, half-Cauchy(0,1) on variance terms), sampling details (4 chains, 2000 iterations post-warmup), and convergence criteria (R-hat < 1.01, bulk ESS > 4000). These additions confirm the robustness of the reported dominance ordering. revision: yes

  3. Referee: [Results] Results (spatial meta-analysis): The voxel-wise analysis identifies compartment-specific biases consistent across models, but the manuscript does not report the multiple-comparison correction or the exact statistical threshold used to declare localisation, which is load-bearing for the claim of neuroanatomically specific equity gaps.

    Authors: We have clarified the spatial meta-analysis procedure in the revised Methods: voxel-wise threshold of p < 0.001 followed by cluster-level family-wise error correction via 5000 permutations (alpha = 0.05). The Results now explicitly report this threshold and correction, supporting the neuroanatomically localised and model-consistent bias claims. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical variance decomposition is self-contained

full rationale

The paper performs direct empirical analysis via univariate statistics, Bayesian multivariate modeling, spatial meta-analysis, and clustering on performance metrics from 18 models evaluated on 648 patients. No load-bearing step reduces a claimed prediction or result to a fitted parameter by construction, invokes self-citation for uniqueness theorems, or renames known patterns as novel derivations. The central finding that patient identity explains more variance than model choice follows from standard variance partitioning applied to the observed data without circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claims rest on standard statistical assumptions for variance partitioning and clustering; no free parameters, ad-hoc axioms, or invented entities are described in the abstract.

pith-pipeline@v0.9.0 · 5522 in / 1174 out tokens · 30411 ms · 2026-05-14T21:04:25.919224+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

106 extracted references · 87 canonical work pages · 2 internal anchors

  1. [1]

    van der Laak, Bram van Ginneken, and Clara I

    Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen A.W.M. van der Laak, Bram van Ginneken, and Clara I. Sánchez. A survey on deep learning in medical image analysis.Medical Image Analysis, 42:60–88, 2017. doi:10.1016/j.media.2017.07.005

  2. [2]

    Eric J. Topol. High-performance medicine: the convergence of human and artificial intelligence.Nature Medicine, 25(1):44–56, 2019. doi:10.1038/s41591-018-0300-7

  3. [3]

    Ting, Alan Karthikesalingam, Dominic King, Hutan Ashrafian, and Ara Darzi

    Ravi Aggarwal, Viknesh Sounderajah, Guy Martin, Daniel S.W. Ting, Alan Karthikesalingam, Dominic King, Hutan Ashrafian, and Ara Darzi. Diagnostic accuracy of deep learning in medical imaging: a systematic review and meta-analysis.npj Digital Medicine, 4(1):65, 2021. doi:10.1038/s41746-021- 00438-z

  4. [4]

    Menze, Andras Jakab, Stefan Bauer, Jayashree Kalpathy-Cramer, Keyvan Farahani, Justin Kirby, Yuliya Burren, Nicole Porz, Johannes Slotboom, Roland Wiest, et al

    Bjoern H. Menze, Andras Jakab, Stefan Bauer, Jayashree Kalpathy-Cramer, Keyvan Farahani, Justin Kirby, Yuliya Burren, Nicole Porz, Johannes Slotboom, Roland Wiest, et al. The Multimodal Brain 22 Tumor Image Segmentation Benchmark (BRATS).IEEE Transactions on Medical Imaging, 34(10): 1993–2024, 2015. doi:10.1109/TMI.2014.2377694

  5. [5]

    Kirby, John B

    Spyridon Bakas, Hamed Akbari, Aristeidis Sotiras, Michel Bilello, Martin Rozycki, Justin S. Kirby, John B. Freymann, Keyvan Farahani, and Christos Davatzikos. Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features.Scientific Data, 4: 170117, 2017. doi:10.1038/sdata.2017.117

  6. [6]

    Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge

    Spyridon Bakas, Mauricio Reyes, Andras Jakab, Stefan Bauer, Markus Rempfler, Alessandro Crimi, Russell T. Shinohara, Christoph Berger, Sung Min Ha, Martin Rozycki, et al. Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in a multi-institutional multi-site dataset.arXiv p...

  7. [7]

    The RSNA-ASNR-MICCAI BraTS 2021 Benchmark on Brain Tumor Segmentation and Radiogenomic Classification

    Ujjwal Baid, Satyam Ghodasara, Suyash Mohan, Michel Bilello, Evan Calabrese, Errol Colak, Keyvan Farahani, Jayashree Kalpathy-Cramer, Felipe C. Kitamura, Sarthak Pati, et al. The RSNA-ASNR- MICCAI BraTS 2021 benchmark on brain tumor segmentation and radiogenomic classification.arXiv preprint, 2021. doi:10.48550/arXiv.2107.02314

  8. [8]

    Nature Methods 18(2), 203–211 (2021)

    Fabian Isensee, Paul F. Jaeger, Simon A. A. Kohl, Jens Petersen, and Klaus H. Maier-Hein. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation.Nature Methods, 18 (2):203–211, 2021. doi:10.1038/s41592-020-01008-z

  9. [9]

    3D MRI brain tumor segmentation using autoencoder regularization

    Andriy Myronenko. 3D MRI brain tumor segmentation using autoencoder regularization. InBrainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. BrainLes 2018. LNCS, volume 11384. Springer, 2019. doi:10.1007/978-3-030-11726-9_28

  10. [10]

    Brain tu- mour segmentation with incomplete imaging data.Brain Communications, 5(2):fcad118, 2023

    James K Ruffle, Samia Mohinta, Robert Gray, Harpreet Hyare, and Parashkev Nachev. Brain tu- mour segmentation with incomplete imaging data.Brain Communications, 5(2):fcad118, 2023. doi:10.1093/braincomms/fcad118

  11. [11]

    Predicting brain tumour enhancement from non-contrast MR imaging with artificial intelligence.arXiv preprint, 2025

    James K Ruffle, Samia Mohinta, Guilherme Pombo, Asthik Biswas, Alan Campbell, Indran Davagnanam, David Doig, Ahmed Hammam, Harpreet Hyare, Farrah Jabeen, Emma Lim, Dermot Mallon, Stephanie Owen, Sophie Wilkinson, Sebastian Brandner, and Parashkev Nachev. Predicting brain tumour enhancement from non-contrast MR imaging with artificial intelligence.arXiv pr...

  12. [12]

    Roth, and Daguang Xu

    Ali Hatamizadeh, Vishwesh Nath, Yucheng Tang, Dong Yang, Holger R. Roth, and Daguang Xu. Swin UNETR: Swin transformers for semantic segmentation of brain tumors in MRI images. InBrainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. BrainLes 2021. LNCS, volume 12962. Springer, 2022. doi:10.1007/978-3-031-08999-2_22

  13. [13]

    BraTS toolkit: Translating BraTS brain tumor segmentation algorithms into clinical and scientific practice.Frontiers in Neuroscience, 14:125, 2020

    Florian Kofler, Christoph Berger, Diana Waldmannstetter, Jana Lipkova, Ivan Ezhov, Giles Tran, Bjoern Menze, et al. BraTS toolkit: Translating BraTS brain tumor segmentation algorithms into clinical and scientific practice.Frontiers in Neuroscience, 14:125, 2020. doi:10.3389/fnins.2020.00125

  14. [14]

    Health equity

    World Health Organization. Health equity. https://www.who.int/health-topics/ health-equity, 2025. Accessed 25 March 2026

  15. [15]

    Muehlematter, Paola Daniore, and Kerstin N

    Urs J. Muehlematter, Paola Daniore, and Kerstin N. V okinger. Approval of artificial intelligence and machine learning-based medical devices in the USA and Europe (2015–20): a comparative analysis. The Lancet Digital Health, 3(3):e195–e203, 2021. doi:10.1016/S2589-7500(20)30292-2. 23

  16. [16]

    Erickson

    Kang Zhang, Bardia Khosravi, Shahriar Vahdati, and Bradley J. Erickson. FDA review of radiologic AI algorithms: Process and challenges.Radiology, 310(1):e230242, 2024. doi:10.1148/radiol.230242

  17. [17]

    Milam and Chi Wan Koo

    Morgan E. Milam and Chi Wan Koo. The current status and future of FDA-approved artificial in- telligence tools in chest radiology in the United States.Clinical Radiology, 78(2):115–122, 2023. doi:10.1016/j.crad.2022.08.135

  18. [18]

    Lin, Bhav Jain, Jay M

    John C. Lin, Bhav Jain, Jay M. Iyer, Ishan Rola, Anusha R. Srinivasan, Chaerim Kang, Heta Patel, and Ravi B. Parikh. Benefit-risk reporting for FDA-cleared artificial intelligence-enabled medical devices. JAMA Health Forum, 6(9):e253351, 2025. doi:10.1001/jamahealthforum.2025.3351

  19. [19]

    A survey on bias and fairness in machine learning.ACM Computing Surveys, 54(6):1–35, 2022

    Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. A survey on bias and fairness in machine learning.ACM Computing Surveys, 54(6):1–35, 2022. doi:10.1145/3457607

  20. [20]

    Fair prediction with disparate impact: A study of bias in recidivism prediction instruments.Big data, 5(2):153–163, 2017

    Alexandra Chouldechova. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments.Big Data, 5(2):153–163, 2017. doi:10.1089/big.2016.0047

  21. [21]

    Shenkman, Jiang Bian, and Fei Wang

    Jie Xu, Yunyu Xiao, Wendy Hui Wang, Yue Ning, Elizabeth A. Shenkman, Jiang Bian, and Fei Wang. Algorithmic fairness in computational medicine.eBioMedicine, 84:104250, 2022. doi:10.1016/j.ebiom.2022.104250

  22. [22]

    Gender shades: Intersectional accuracy disparities in commercial gender classification

    Joy Buolamwini and Timnit Gebru. Gender shades: Intersectional accuracy disparities in commercial gender classification. InProceedings of the 1st Conference on Fairness, Accountability and Transparency, volume 81 ofProceedings of Machine Learning Research, pages 77–91. PMLR, 2018. URL https: //proceedings.mlr.press/v81/buolamwini18a.html

  23. [23]

    Dissecting racial bias in an algorithm used to manage the health of populations.Science, 366(6464):447–453, 2019

    Ziad Obermeyer, Brian Powers, Christine V ogeli, and Sendhil Mullainathan. Dissecting racial bias in an algorithm used to manage the health of populations.Science, 366(6464):447–453, 2019. doi:10.1126/science.aax2342

  24. [24]

    McDermott, Irene Y

    Laleh Seyyed-Kalantari, Haoran Zhang, Matthew B.A. McDermott, Irene Y . Chen, and Marzyeh Ghassemi. Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations.Nature Medicine, 27(12):2176–2182, 2021. doi:10.1038/s41591-021- 01595-0

  25. [25]

    Larrazabal, Nicolás Nieto, Victoria Peterson, Diego H

    Agostina J. Larrazabal, Nicolás Nieto, Victoria Peterson, Diego H. Milone, and Enzo Ferrante. Gender imbalance in medical imaging datasets produces biased classifiers for computer-aided diagnosis.Proceed- ings of the National Academy of Sciences, 117(23):12592–12594, 2020. doi:10.1073/pnas.1919012117

  26. [26]

    Algorithmic encoding of protected characteristics in chest X-ray disease detection models.eBioMedicine, 89:104467, 2023

    Ben Glocker, Charles Jones, Mélanie Bernhardt, and Stefan Winzeck. Algorithmic encoding of protected characteristics in chest X-ray disease detection models.eBioMedicine, 89:104467, 2023. doi:10.1016/j.ebiom.2023.104467

  27. [27]

    Com- putational limits to the legibility of the imaged human brain.NeuroImage, 291:120600, 2024

    James K Ruffle, Samia Mohinta, Robert Gray, Harpreet Hyare, and Parashkev Nachev. Com- putational limits to the legibility of the imaged human brain.NeuroImage, 291:120600, 2024. doi:10.1016/j.neuroimage.2024.120600

  28. [28]

    Representational ethical model calibration.npj Digital Medicine, 5(1):170, 2022

    Robert Carruthers, Isabel Straw, James K Ruffle, Daniel Herron, Amy Nelson, Danilo Bzdok, Delmiro Fernandez-Reyes, Geraint Rees, and Parashkev Nachev. Representational ethical model calibration.npj Digital Medicine, 5(1):170, 2022. doi:10.1038/s41746-022-00716-4. 24

  29. [29]

    Gichoya, Dina Katabi, and Marzyeh Ghassemi

    Yuzhe Yang, Haoran Zhang, Judy W. Gichoya, Dina Katabi, and Marzyeh Ghassemi. The limits of fair medical imaging AI in real-world generalization.Nature Medicine, 30(10):2838–2848, 2024. doi:10.1038/s41591-024-03113-4

  30. [30]

    Chen, Emma Pierson, Sherri Rose, Shalmali Joshi, Kadija Ferryman, and Marzyeh Ghassemi

    Irene Y . Chen, Emma Pierson, Sherri Rose, Shalmali Joshi, Kadija Ferryman, and Marzyeh Ghassemi. Ethical machine learning in healthcare.Annual Review of Biomedical Data Science, 4:123–144, 2021. doi:10.1146/annurev-biodatasci-092820-114757

  31. [31]

    McCradden, Shalmali Joshi, Mjaye Mazwi, and James A

    Melissa D. McCradden, Shalmali Joshi, Mjaye Mazwi, and James A. Anderson. Ethical limitations of algorithmic fairness solutions in health care machine learning.The Lancet Digital Health, 2(5): e221–e223, 2020. doi:10.1016/S2589-7500(20)30065-0

  32. [32]

    Parikh, Stephanie Teeple, and Amol S

    Ravi B. Parikh, Stephanie Teeple, and Amol S. Navathe. Addressing bias in artificial intelligence in health care.JAMA, 322(24):2377–2378, 2019. doi:10.1001/jama.2019.18058

  33. [33]

    Piechnik, Stefan Neubauer, Steffen E

    Esther Puyol-Antón, Bram Ruijsink, Stefan K. Piechnik, Stefan Neubauer, Steffen E. Petersen, Reza Razavi, and Andrew P. King. Fairness in cardiac magnetic resonance imaging: Assessing sex and racial bias in deep learning-based segmentation.Frontiers in Cardiovascular Medicine, 9:859310, 2022. doi:10.3389/fcvm.2022.859310

  34. [34]

    Addressing fairness in artificial intelligence for medical imaging.Nature Communications, 13:4581, 2022

    María Agustina Ricci Lara, Rodrigo Echeveste, and Enzo Ferrante. Addressing fairness in artificial intelligence for medical imaging.Nature Communications, 13:4581, 2022. doi:10.1038/s41467-022- 32186-3

  35. [35]

    On (assessing) the fairness of risk score models

    Eike Petersen, Melanie Ganz, Søren Holm, and Aasa Feragen. On (assessing) the fairness of risk score models. InProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, pages 817–829, 2023. doi:10.1145/3593013.3594045

  36. [36]

    Kevin Zhou

    Zikang Xu, Jun Li, Qingsong Yao, Han Li, Mingyue Zhao, and S. Kevin Zhou. Addressing fairness issues in deep learning-based medical image analysis: a systematic review.npj Digital Medicine, 7(1): 286, 2024. doi:10.1038/s41746-024-01276-5

  37. [37]

    Villanueva-Meyer, Jeffrey D

    Evan Calabrese, Javier E. Villanueva-Meyer, and Soonmee Cha. The University of California San Francisco Preoperative Diffuse Glioma MRI (UCSF-PDGM) dataset.Radiology: Artificial Intelligence, 4(6):e220058, 2022. doi:10.1148/ryai.220058

  38. [38]

    Rudie, Nazím Flores Santamaría, Anahita Fathi Kazerooni, Sarthak Pati, et al

    Spyridon Bakas, Chiharu Sako, Hamed Akbari, Michel Bilello, Aristeidis Sotiras, Gaurav Shukla, Jeffrey D. Rudie, Nazím Flores Santamaría, Anahita Fathi Kazerooni, Sarthak Pati, et al. The University of Pennsylvania Glioblastoma (UPenn-GBM) cohort: advanced MRI, clinical, genomics, & radiomics. Scientific Data, 9(1):453, 2022. doi:10.1038/s41597-022-01560-7

  39. [39]

    On the measurement of inequalities in health

    Adam Wagstaff, Pierella Paci, and Eddy van Doorslaer. On the measurement of inequalities in health. Social Science & Medicine, 51(5):667–681, 2000. doi:10.1016/S0277-9536(99)00382-2

  40. [40]

    Variabilità e mutabilità: contributo allo studio delle distribuzioni e delle relazioni statistiche.Studi Economico-Giuridici della Regia Università di Cagliari, 3:3–159, 1912

    Corrado Gini. Variabilità e mutabilità: contributo allo studio delle distribuzioni e delle relazioni statistiche.Studi Economico-Giuridici della Regia Università di Cagliari, 3:3–159, 1912

  41. [41]

    Atkinson

    Anthony B. Atkinson. On the measurement of inequality.Journal of Economic Theory, 2(3):244–263,

  42. [42]

    doi:10.1016/0022-0531(70)90039-6

  43. [43]

    North-Holland, Amsterdam, 1967

    Henri Theil.Economics and Information Theory. North-Holland, Amsterdam, 1967. ISBN 978-0-7204- 3347-3. 25

  44. [44]

    Hoover, Jr

    Edgar M. Hoover, Jr. The measurement of industrial localization.The Review of Economics and Statistics, 18(4):162–171, 1936. doi:10.2307/1927875

  45. [45]

    Shorrocks

    Anthony F. Shorrocks. The class of additively decomposable inequality measures.Econometrica, 48(3): 613–625, 1980. doi:10.2307/1913126

  46. [46]

    Homogeneous middles vs

    José Gabriel Palma. Homogeneous middles vs. heterogeneous tails, and the end of the ‘inverted-U’: It’s all about the share of the rich.Development and Change, 42(1):87–153, 2011. doi:10.1111/j.1467- 7660.2011.01694.x

  47. [47]

    Fairboard: a quantitative framework for equity assessment of healthcare models

    James K. Ruffle, Samia Mohinta, Chris Foulon, Mohamad Zeina, Zicheng Wang, Sebastian Brandner, Harpreet Hyare, and Parashkev Nachev. Model inferences for “Fairboard: a quantitative framework for equity assessment of healthcare models”. Zenodo, 2026. doi:10.5281/zenodo.19207798

  48. [48]

    Auto3DSeg for brain tumor segmentation from 3D MRI in BraTS 2023 challenge.arXiv preprint, 2025

    Andriy Myronenko, Dong Yang, Yufan He, and Daguang Xu. Auto3DSeg for brain tumor segmentation from 3D MRI in BraTS 2023 challenge.arXiv preprint, 2025. doi:10.48550/arXiv.2510.25058

  49. [49]

    Enhanced data augmentation using synthetic data for brain tumour segmentation

    André Ferreira, Naida Solak, Jianning Li, Philipp Dammann, Jens Kleesiek, Victor Alves, and Jan Egger. Enhanced data augmentation using synthetic data for brain tumour segmentation. InBrain Tumor Segmentation, and Cross-Modality Domain Adaptation for Medical Image Segmentation. BraTS 2023. LNCS, volume 14669. Springer, 2024. doi:10.1007/978-3-031-76163-8_8

  50. [50]

    Advanced tumor segmentation in medical imaging: An ensemble approach for BraTS 2023 adult glioma and pediatric tumor tasks

    Fadillah Maani, Anees Ur Rehman Hashmi, Mariam Aljuboory, Numan Saeed, Ikboljon Sobirov, and Mohammad Yaqub. Advanced tumor segmentation in medical imaging: An ensemble approach for BraTS 2023 adult glioma and pediatric tumor tasks. InBrain Tumor Segmentation, and Cross-Modality Domain Adaptation for Medical Image Segmentation. BraTS 2023. LNCS, volume 14...

  51. [51]

    doi:10.1007/978-3-031-76163-8_24

  52. [52]

    Extending nn-UNet for brain tumor segmentation

    Huan Minh Luu and Sung-Hong Park. Extending nn-UNet for brain tumor segmentation. InBrainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. BrainLes 2021. LNCS, volume 12963. Springer, 2022. doi:10.1007/978-3-031-09002-8_16

  53. [53]

    Tomás Capretto, Camen Piho, Ravin Kumar, Jacob Westfall, Tal Yarkoni, and Osvaldo A. Martin. Bambi: A simple interface for fitting Bayesian linear models in Python.Journal of Statistical Software, 103(15): 1–29, 2022. doi:10.18637/jss.v103.i15

  54. [54]

    Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC.Statistics and Computing, 27(5):1413–1432, 2017

    Aki Vehtari, Andrew Gelman, and Jonah Gabry. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC.Statistics and Computing, 27(5):1413–1432, 2017. doi:10.1007/s11222- 016-9696-4

  55. [55]

    Meta-analysis in clinical trials.Controlled Clinical Trials, 7(3): 177–188, 1986

    Rebecca DerSimonian and Nan Laird. Meta-analysis in clinical trials.Controlled Clinical Trials, 7(3): 177–188, 1986. doi:10.1016/0197-2456(86)90046-2

  56. [56]

    UMAP: Uniform manifold approximation and projection for dimension reduction.Journal of Open Source Software, 3(29):861, 2018

    Leland McInnes, John Healy, and James Melville. UMAP: Uniform manifold approximation and projection for dimension reduction.Journal of Open Source Software, 3(29):861, 2018. doi:10.21105/joss.00861

  57. [57]

    Yoav Benjamini and Yosef Hochberg. Controlling the false discovery rate: A practical and powerful approach to multiple testing.Journal of the Royal Statistical Society: Series B (Methodological), 57(1): 289–300, 1995. doi:10.1111/j.2517-6161.1995.tb02031.x. 26

  58. [59]

    Emma A. M. Stanley, Roger Y . Tsang, Haley Gillett, Raissa Souza, Vibujithan Vigneshwaran, Chris Kang, Melissa D. McCradden, Matthias Wilms, and Nils D. Forkert. Connecting algorithmic fairness and fair outcomes in a sociotechnical simulation case study of AI-assisted healthcare.Nature Communications, 17(1):788, 2025. doi:10.1038/s41467-025-67470-5

  59. [60]

    Louis, Arie Perry, Pieter Wesseling, Daniel J

    David N. Louis, Arie Perry, Pieter Wesseling, Daniel J. Brat, Ian A. Cree, Dominique Figarella-Branger, Cynthia Hawkins, H.K. Ng, Scott M. Pfister, Guido Reifenberger, Riccardo Soffietti, Andreas von Deimling, and David W. Ellison. The 2021 WHO classification of tumors of the central nervous system: a summary.Neuro-Oncology, 23(8):1231–1251, 2021. doi:10....

  60. [61]

    Brain tumour genetic network signatures of survival.Brain, 146(11):4736–4754, 2023

    James K Ruffle, Samia Mohinta, Guilherme Pombo, Robert Gray, Valeriya Kopanitsa, Faith Lee, Sebastian Brandner, Harpreet Hyare, and Parashkev Nachev. Brain tumour genetic network signatures of survival.Brain, 146(11):4736–4754, 2023. doi:10.1093/brain/awad199

  61. [62]

    Learning fair representations

    Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. Learning fair representations. In Proceedings of the 30th International Conference on Machine Learning, volume 28 ofProceedings of Machine Learning Research, pages 325–333. PMLR, 2013

  62. [63]

    Preventing fairness gerrymandering: Auditing and learning for subgroup fairness

    Michael Kearns, Seth Neel, Aaron Roth, and Zhiwei Steven Wu. Preventing fairness gerrymandering: Auditing and learning for subgroup fairness. InProceedings of the 35th International Conference on Machine Learning, volume 80 ofProceedings of Machine Learning Research, pages 2564–2572. PMLR, 2018

  63. [64]

    Kimberlé Crenshaw. Demarginalizing the intersection of race and sex: A black feminist critique of antidiscrimination doctrine, feminist theory and antiracist politics.University of Chicago Legal Forum, 1989(1):139–167, 1989

  64. [65]

    Elle Lett and William G. La Cava. Translating intersectionality to fair machine learning in health sciences.Nature Machine Intelligence, 5(5):476–479, 2023. doi:10.1038/s42256-023-00651-3

  65. [66]

    Burns, Leo Anthony Celi, Li-Ching Chen, Ramon Correa, Natalie Dullerud, Marzyeh Ghassemi, Shih-Cheng Huang, Po-Chih Kuo, Matthew P

    Judy Wawira Gichoya, Imon Banerjee, Ananth Reddy Bhimireddy, John L. Burns, Leo Anthony Celi, Li-Ching Chen, Ramon Correa, Natalie Dullerud, Marzyeh Ghassemi, Shih-Cheng Huang, Po-Chih Kuo, Matthew P. Lungren, Lyle J. Palmer, Brandon J. Price, Saptarshi Purkayastha, Ayis T. Pyrros, Lauren Oakden-Rayner, Chima Okechukwu, Laleh Seyyed-Kalantari, Hari Trived...

  66. [67]

    Stanislav Nikolov, Sam Blackwell, Alexei Zverovitch, Ruheena Menber, Jeffrey De Fauw, Nenad Patel, Clemens Meyer, Harry Askham, Bernadino Romera-Paredes, Christopher Kelly, et al. Clinically appli- cable segmentation of head and neck anatomy for radiotherapy: Deep learning algorithm development and validation study.Journal of Medical Internet Research, 23...

  67. [68]

    Medical domain knowledge in domain-agnostic generative AI.npj Digital Medicine, 5(1):90, 2022

    Jakob Nikolas Kather, Narmin Ghaffari Laleh, Sebastian Foersch, and Daniel Truhn. Medical domain knowledge in domain-agnostic generative AI.npj Digital Medicine, 5(1):90, 2022. doi:10.1038/s41746- 022-00634-5

  68. [69]

    Collins, Johannes B

    Gary S. Collins, Johannes B. Reitsma, Douglas G. Altman, and Karel G. M. Moons. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement.Annals of Internal Medicine, 162(1):55–63, 2015. doi:10.7326/M14-0697. 27

  69. [70]

    Uncertainty-driven refinement of tumor-core segmentation using 3D-to-2D networks with label uncertainty

    Richard McKinley, Michael Rebsamen, Katrin Daetwyler, Raphael Meier, Piotr Radojewski, and Roland Wiest. Uncertainty-driven refinement of tumor-core segmentation using 3D-to-2D networks with label uncertainty. InBrainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. BrainLes

  70. [71]

    Springer, 2021

    LNCS, volume 12658. Springer, 2021. doi:10.1007/978-3-030-72084-1_36

  71. [72]

    Jaeger, Peter M

    Fabian Isensee, Paul F. Jaeger, Peter M. Full, Philipp V ollmuth, and Klaus H. Maier-Hein. nnU-Net for brain tumor segmentation. InBrainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. BrainLes 2020. LNCS, volume 12659. Springer, 2021. doi:10.1007/978-3-030-72087-2_11

  72. [73]

    Automatic brain tumor segmentation with scale attention network

    Yading Yuan. Automatic brain tumor segmentation with scale attention network. InBrainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. BrainLes 2020. LNCS, volume 12658. Springer,

  73. [74]

    doi:10.1007/978-3-030-72084-1_26

  74. [75]

    Modality-pairing learning for brain tumor segmentation

    Yixin Wang, Yao Zhang, Feng Hou, Yang Liu, Jie Tian, Cheng Zhong, Yang Zhang, and Zhiqiang He. Modality-pairing learning for brain tumor segmentation. InBrainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. BrainLes 2020. LNCS, volume 12658. Springer, 2021. doi:10.1007/978-3-030-72084-1_21

  75. [76]

    H2NF-Net for brain tumor segmentation using multimodal MR imaging: 2nd place solution to BraTS challenge 2020 segmentation task

    Haozhe Jia, Weidong Cai, Heng Huang, and Yong Xia. H2NF-Net for brain tumor segmentation using multimodal MR imaging: 2nd place solution to BraTS challenge 2020 segmentation task. InBrainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. BrainLes 2020. LNCS, volume 12659. Springer, 2021. doi:10.1007/978-3-030-72087-2_6

  76. [77]

    Bag of tricks for 3D MRI brain tumor segmentation

    Yuan-Xing Zhao, Yan-Ming Zhang, and Cheng-Lin Liu. Bag of tricks for 3D MRI brain tumor segmentation. InBrainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. BrainLes

  77. [78]

    Springer, 2020

    LNCS, volume 11992. Springer, 2020. doi:10.1007/978-3-030-46640-4_20

  78. [79]

    Maier-Hein

    Fabian Isensee, Philipp Kickingereder, Wolfgang Wick, Martin Bendszus, and Klaus H. Maier-Hein. No new-net. InBrainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. BrainLes

  79. [80]

    Springer, 2019

    LNCS, volume 11384. Springer, 2019. doi:10.1007/978-3-030-11726-9_21

  80. [81]

    Tustison, Sohil H

    Xue Feng, Nicholas J. Tustison, Sohil H. Patel, and Craig H. Meyer. Brain tumor segmentation using an ensemble of 3D U-Nets and overall survival prediction using radiomic features.Frontiers in Computational Neuroscience, 14:25, 2020. doi:10.3389/fncom.2020.00025

Showing first 80 references.