arxiv: 2605.01706 · v1 · submitted 2026-05-03 · 💻 cs.CV

Recognition: unknown

Exploring Entropy-based Active Learning for Fair Brain Segmentation

Ghazal Danaee , M\'elanie Gaillochet , Christian Desrosiers , Herve Lombaert , Sylvain Bouix

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:53 UTC · model grok-4.3

classification 💻 cs.CV

keywords active learningfairnessbrain segmentationmedical imagingentropy samplingMRIcaudateperformance disparity

0 comments

The pith

A weighted entropy strategy in active learning reduces performance gaps between demographic groups in brain MRI segmentation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that standard uncertainty sampling in active learning for medical image segmentation ignores fairness across groups defined by sensitive attributes such as anatomical volume differences. It introduces a weighted entropy method that adjusts sample selection according to the current performance of each group on the labeled data, combined with entropy restricted to the region of interest. This prioritizes examples from underperforming groups during labeling cycles. Experiments on synthetic brain scans with controlled bias demonstrate that the approach cuts disparity metrics substantially more than random sampling or plain entropy sampling. The result matters because it offers a way to build segmentation models that perform more evenly across patients when expert labeling time is limited.

Core claim

The weighted entropy selection strategy modulates uncertainty scores by the inverse of group-specific performance estimates computed on the growing labeled set. When applied to segment the left caudate in synthetic T1-weighted brain MRIs that contain strong or weak controlled bias in volume, this selection produces final models with markedly smaller differences in segmentation accuracy between the biased subgroups than either random selection or standard entropy selection, while also attaining the highest equity-scaled performance scores.

What carries the argument

Weighted Entropy selection strategy that scales voxel-wise uncertainty by current group performance on the labeled set, using masked scaled entropy confined to the region of interest.

If this is right

Disparity between groups falls by 75 percent under strong bias and 86 percent under weak bias compared with standard entropy at the end of the labeling budget.
The method reaches the highest equity-scaled performance of the strategies tested.
It improves fairness whether the initial labeled set is balanced or strongly imbalanced across groups.
By repeatedly choosing samples from poorly segmented subgroups, the loop reduces gaps without requiring extra total labels.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same performance-modulated weighting could be tested on other segmentation targets such as tumors or white-matter lesions where demographic biases appear in training data.
In clinical deployment the approach might allow hospitals to reach equitable model performance with smaller annotation budgets than current practice.
Direct comparison on multi-site real MRI collections would show whether the synthetic bias control captures the main sources of disparity encountered in practice.

Load-bearing premise

That performance estimates calculated on the current labeled set give a stable and unbiased signal for deciding which groups need more samples to close accuracy gaps.

What would settle it

Apply the same weighted entropy selection to a collection of real clinical brain MRIs that contain documented demographic or anatomical subgroups and check whether the final performance disparity between those subgroups drops by a comparable fraction relative to standard entropy sampling.

Figures

Figures reproduced from arXiv: 2605.01706 by Christian Desrosiers, Ghazal Danaee, Herve Lombaert, M\'elanie Gaillochet, Sylvain Bouix.

**Figure 1.** Figure 1: ESSP under different initial training set compositions. First row: balanced initialization, second row: 80/20 Group 1/Group 2 ratio, third row: 20/80 ratio. Left column: strong bias dataset, Right column: weak bias dataset. 7 [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗

**Figure 2.** Figure 2: ∆ and DSC metrics under different initial training set compositions for the strong bias experiment only. First row: balanced initialization, second row: 80/20 Group 1/Group 2 ratio, third row: 20/80 ratio. Left column: ∆, Right column: DSC. 9 [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: Group 1 ratio in the training set after sampling for each cycle under different initial training set. First row: balanced initialization, second row: 80/20 Group 1/Group 2 ratio, third row: 20/80 ratio. Left column: strong bias dataset, Right column: weak bias dataset. 10 [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 4.** Figure 4: Performance metrics under different initial training set compositions for the weak bias experiments. First row: balanced initialization, second row: 80/20 Group 1/Group 2 ratio, third row: 20/80 ratio. Left column: ∆, Right column: DSC. 16 [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗

read the original abstract

Active learning (AL) has emerged as a crucial strategy for reducing the prohibitive costs associated with medical image segmentation. However, standard uncertainty-based AL methods typically focus on maximizing performance metrics, ignoring performance disparities or fairness across groups with sensitive attributes. While fair active learning has been explored in classification tasks, its intersection with medical image segmentation remains unaddressed. In this work, we introduced a fairness-aware active learning framework with a Weighted Entropy selection strategy that modulates uncertainty based on current group-specific performance estimates on the labeled set. To decouple true epistemic uncertainty from anatomical volume variances, we further utilized a masked, scaled entropy restricted to the region of interest. The framework was evaluated on synthetic T1-weighted brain MRIs with controlled left caudate bias in both strong and weak bias settings. A 3D U-Net was trained to segment the left caudate under several AL strategies, starting from both demographically balanced and strongly imbalanced initial labeled sets. Experiments demonstrated that our method markedly reduces performance disparities between groups compared to random sampling and standard uncertainty sampling. By prioritizing poorly segmented subgroups during the AL cycles, our method consistently achieved the highest equity-scaled performance and reduced the disparity metric by 75% (strong bias) and 86% (weak bias) relative to standard entropy at the final budget. Overall, this work is among the first studies on fair AL for medical image segmentation, offering an efficient strategy to train more equitable models in resource-constrained environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Fair active learning cuts disparity on synthetic caudate segmentation but the controlled bias setup limits what the gains tell us about real data.

read the letter

This paper introduces a weighted entropy active learning strategy for fair brain MRI segmentation. It modulates sample selection using group performance estimates from the labeled set and adds a masked scaled entropy term restricted to the region of interest to separate uncertainty from volume differences. The authors evaluate it on synthetic T1 images with controlled left caudate bias under strong and weak settings, using a 3D U-Net and comparing against random and standard entropy sampling from both balanced and imbalanced starting pools. On this data the method reduces the disparity metric by 75% and 86% relative to standard entropy while reaching the highest equity-scaled performance. That is the main result. The work is new in taking fair active learning ideas, previously mostly classification-focused, and applying them to medical image segmentation with a concrete mechanism for handling subgroup performance. The experiments are straightforward and show the selection rule does prioritize underperforming groups in their controlled setup. The masking step is a reasonable way to avoid letting anatomical size differences dominate the uncertainty signal. The soft spots are clear and worth naming. Everything rests on synthetic data with one artificial volume shift in a single structure. Real brain MRI groups differ along age, sex, scanner, and pathology lines that interact in ways the generator does not reproduce, so the labeled-set performance signal used for weighting may behave differently. The abstract gives no statistical tests, no multiple-run variance, and no ablation of the masking component, leaving the stability of the feedback loop unquantified. This paper is for people working on active learning for medical segmentation who also care about fairness and want a practical starting point for reducing annotation cost without widening gaps. A reader already thinking about equitable models in resource-limited settings will find the framework and the reported numbers useful to build on. It deserves a serious referee. The gap it targets is real, the application is new, and the synthetic results are concrete enough to merit review even if more validation is needed. I would send it to peer review.

Referee Report

3 major / 1 minor

Summary. The paper introduces a fairness-aware active learning framework for 3D brain MRI segmentation using a Weighted Entropy selection strategy. This modulates standard entropy-based uncertainty by incorporating group-specific performance estimates computed on the labeled set and applies masked scaled entropy restricted to the region of interest to isolate epistemic uncertainty from anatomical variance. Experiments on synthetic T1 MRIs with controlled left caudate volume bias (strong and weak settings), starting from balanced or imbalanced initial labeled sets, show the method reduces performance disparities by 75% and 86% relative to standard entropy sampling while achieving the highest equity-scaled performance.

Significance. If the results hold, the work is significant as one of the first explorations of fair active learning specifically for medical image segmentation. The controlled synthetic setup with explicit bias injection and comparisons to random and standard uncertainty baselines provides a clean demonstration of disparity reduction, which is valuable for annotation-efficient training of equitable models in clinical settings. The approach of prioritizing poorly performing subgroups during selection cycles is a practical contribution.

major comments (3)

[Abstract and Methods] Abstract and Methods: The exact mathematical definition of the Weighted Entropy (including the weighting coefficient for group performance modulation and the formula for computing group-specific performance estimates on the labeled set) is not provided. This is load-bearing for the central claim, as the modulation mechanism is the core novelty and without the formula the reported disparity reductions cannot be reproduced or verified.
[Experiments] Experiments: The 75% (strong bias) and 86% (weak bias) disparity reductions are presented without reporting the number of runs, standard deviations, or statistical significance tests. Given the stochastic nature of active learning selection and model training, this absence undermines confidence in the robustness of the equity improvements.
[Evaluation] Evaluation: The framework is evaluated exclusively on synthetic data with a single controlled volume bias in the left caudate. While this isolates the effect, the assumption that group performance signals derived from this artificial bias will behave similarly under real intersecting demographic and anatomical variations is untested and directly affects the generalizability of the disparity-reduction claims.

minor comments (1)

[Abstract] The term 'equity-scaled performance' is referenced in the abstract but not defined; a short definition or reference to its computation would improve clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which highlight important aspects for clarity, robustness, and generalizability. We address each major comment point by point below and will make the necessary revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract and Methods] Abstract and Methods: The exact mathematical definition of the Weighted Entropy (including the weighting coefficient for group performance modulation and the formula for computing group-specific performance estimates on the labeled set) is not provided. This is load-bearing for the central claim, as the modulation mechanism is the core novelty and without the formula the reported disparity reductions cannot be reproduced or verified.

Authors: We agree that the precise equations are essential for reproducibility and verification of the core contribution. While the approach is described in the text, the explicit mathematical formulation was omitted. We will add the full definition in the Methods section, specifying the group performance estimate as the mean Dice score per group on the labeled set and the weighting coefficient as the normalized inverse of these estimates applied to the masked scaled entropy. revision: yes
Referee: [Experiments] Experiments: The 75% (strong bias) and 86% (weak bias) disparity reductions are presented without reporting the number of runs, standard deviations, or statistical significance tests. Given the stochastic nature of active learning selection and model training, this absence undermines confidence in the robustness of the equity improvements.

Authors: We acknowledge that variability reporting is critical given the stochastic elements in active learning. The reported figures are from single runs per setting. In the revised manuscript we will include results from multiple independent runs with varied random seeds, report means and standard deviations for the disparity reductions, and add statistical significance tests (e.g., paired t-tests) against the baselines. revision: yes
Referee: [Evaluation] Evaluation: The framework is evaluated exclusively on synthetic data with a single controlled volume bias in the left caudate. While this isolates the effect, the assumption that group performance signals derived from this artificial bias will behave similarly under real intersecting demographic and anatomical variations is untested and directly affects the generalizability of the disparity-reduction claims.

Authors: We agree that exclusive reliance on synthetic data with one controlled bias limits direct extrapolation to real-world intersecting variations. The synthetic setup was deliberately selected to enable precise bias injection and isolation of the fairness mechanism's effect. We will add an expanded limitations and future work discussion addressing this point and will explore adding preliminary results on a real brain MRI dataset if feasible within the revision timeline. revision: partial

Circularity Check

0 steps flagged

No significant circularity in the proposed fair AL method or experiments

full rationale

The paper proposes an empirical fairness-aware active learning heuristic (Weighted Entropy modulated by group performance on the labeled set, plus masked scaled entropy on ROI) and evaluates it via controlled experiments on synthetic T1 MRIs with injected left-caudate volume bias. No mathematical derivation chain is presented that reduces outputs to inputs by construction; the selection rule is a standard adaptive AL design choice, not a fitted parameter renamed as a prediction. No load-bearing self-citations, uniqueness theorems, or ansatzes imported from prior author work are invoked. The disparity-reduction claims (75%/86%) are experimental measurements against baselines, not tautological results. The framework remains self-contained against its stated synthetic benchmarks and assumptions.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The framework rests on domain assumptions about reliable group performance estimation from limited labels and an ad-hoc masking step whose justification is internal to the method; no explicit free parameters are quantified in the abstract but the weighting mechanism implies at least one tunable hyperparameter.

free parameters (1)

weighting coefficient for group performance modulation
The weighted entropy strategy modulates uncertainty using current group-specific performance; the exact scaling or functional form is not stated but functions as a tunable parameter in the selection rule.

axioms (2)

domain assumption Group-specific performance estimates on the labeled set are sufficiently accurate and stable to guide fair sample selection without introducing selection bias
Invoked when defining the weighted entropy strategy that prioritizes poorly segmented subgroups.
ad hoc to paper Masked and scaled entropy restricted to the region of interest successfully isolates epistemic uncertainty from anatomical volume variance
Introduced specifically in this work to handle caudate segmentation on brains of varying size.

pith-pipeline@v0.9.0 · 5575 in / 1521 out tokens · 47302 ms · 2026-05-10T15:53:50.742797+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

32 extracted references · 3 canonical work pages · 1 internal anchor

[1]

A comprehensive survey on deep active learning in medical image analysis , journal =

Haoran Wang and Qiuye Jin and Shiman Li and Siyu Liu and Manning Wang and Zhijian Song , keywords =. A comprehensive survey on deep active learning in medical image analysis , journal =. 2024 , issn =

2024
[2]

Learning Transferable Visual Models From Natural Language Supervision

Radford, Alec and Kim, Jong Wook and Hallacy, Chris and Ramesh, Aditya and Goh, Gabriel and Agarwal, Sandhini and Sastry, Girish and Askell, Amanda and Mishkin, Pamela and Clark, Jack and Krueger, Gretchen and Sutskever, Ilya , year = 2021, number =. Learning. arXiv , keywords =:2103.00020 , primaryclass =

work page internal anchor Pith review Pith/arXiv arXiv 2021
[3]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Fairness Without Harm: An Influence-Guided Active Sampling Approach , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =
[4]

Medical Image Analysis , volume =

Active learning for medical image segmentation with stochastic batches , author =. Medical Image Analysis , volume =
[5]

2022 , author =

Fair active learning , journal =. 2022 , author =

2022
[6]

Interactive active learning for fairness with partial group label , journal =

Zeyu Yang and Jizhi Zhang and Fuli Feng and Chongming Gao and Qifan Wang and Xiangnan He , keywords =. Interactive active learning for fairness with partial group label , journal =. 2023 , issn =

2023
[7]

Robinson and Bernhard Kainz , keywords =

Samuel Budd and Emma C. Robinson and Bernhard Kainz , keywords =. A survey on active learning and human-in-the-loop deep learning for medical image analysis , journal =. 2021 , issn =

2021
[8]

arXiv preprint arXiv:2207.10018 , year =

Wang, Guanchu and Du, Mengnan and Liu, Ninghao and Zou, Na and Hu, Xia , title =. arXiv preprint arXiv:2207.10018 , year =

work page arXiv
[9]

Expert Systems with Applications , volume =

Fajri, Ricky Maulana and Saxena, Akrati and Pei, Yulong and Pechenizkiy, Mykola , title =. Expert Systems with Applications , volume =
[10]

arXiv preprint arXiv:2510.17999 , year =

Danaee, Ghazal and Niethammer, Marc and Rushmore, Jarrett and Bouix, Sylvain , title =. arXiv preprint arXiv:2510.17999 , year =

work page arXiv
[11]

King , title =

Stefanos Ioannou and Hana Chockler and Alexander Hammers and Andrew P. King , title =. Machine Learning in Clinical Neuroimaging (MLCN 2022) , series =

2022
[12]

Stanley, Emma A. M. and Wilms, Matthias and Forkert, Nils D. , title =. Medical Image Computing and Computer-Assisted Intervention -- MICCAI 2023 , year =

2023
[13]

Proceedings of the Twelfth International Conference on Learning Representations (ICLR) , year =

Tian, Yu and Shi, Min and Luo, Yan and Kouhana, Ava and Elze, Tobias and Wang, Mengyu , title =. Proceedings of the Twelfth International Conference on Learning Representations (ICLR) , year =
[14]

Journal of Imaging Informatics in Medicine , year =

Active Learning in Brain Tumor Segmentation with Uncertainty Sampling and Annotation Redundancy Restriction , author =. Journal of Imaging Informatics in Medicine , year =
[15]

Medical Image Analysis , volume =

Deep active learning for suggestive segmentation of biomedical image stacks via optimisation of Dice scores and traced boundary length , author =. Medical Image Analysis , volume =
[16]

Insights into Imaging , year =

An active learning approach to train a deep learning algorithm for tumor segmentation from brain MR images , author =. Insights into Imaging , year =
[17]

Biomedical Signal Processing and Control , volume =

Rethinking deep active learning for medical image segmentation: A diffusion and angle-based framework , author =. Biomedical Signal Processing and Control , volume =. 2024 , publisher =

2024
[18]

Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , year =

Toward Fair and Accurate Cross-Domain Medical Image Segmentation: A VLM-Driven Active Domain Adaptation Paradigm , author =. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , year =
[19]

Active Learning Mitigates Selection Bias and Group Disparities , booktitle =

Weerts, Hilde and. Active Learning Mitigates Selection Bias and Group Disparities , booktitle =. 2023 , series =

2023
[20]

Frazier and Steven M

Jean A. Frazier and Steven M. Hodge and Janis L. Breeze and Anthony J. Giuliano and Janine E. Terry and Constance M. Moore and David N. Kennedy and Melissa P. Lopez-Larson and Verne S. Caviness and Larry J. Seidman and Benjamin Zablotsky and Nikos Makris , title =. Schizophrenia Bulletin , volume =. 2008 , pmid =

2008
[21]

and MacFall, James and Steffens, David C

Isamah, Nneka and Faison, Warachal and Payne, Martha E. and MacFall, James and Steffens, David C. and Beyer, John L. and Krishnan, K. Ranga and Taylor, Warren D. , title =. PLoS One , volume =
[22]

Emma A. M. Stanley and Matthias Wilms and Pauline Mouches and Nils D. Forkert , title =. J. Med. Imaging (Bellingham) , volume =
[23]

Frontiers in Computational Neuroscience , volume =

Dibaji, Mahsa and Ospel, Jana and Souza, Roberto and Bento, Mariana , title =. Frontiers in Computational Neuroscience , volume =
[24]

2022 , eprint=

A Survey on Bias and Fairness in Machine Learning , author=. 2022 , eprint=

2022
[25]

Scientific Reports , year =

Fair AI-powered orthopedic image segmentation: addressing bias and promoting equitable healthcare , author =. Scientific Reports , year =
[26]

and Neubauer, Stefan and Petersen, Steffen E

Puyol-Ant\'on, Esther and Ruijsink, Bram and Piechnik, Stefan K. and Neubauer, Stefan and Petersen, Steffen E. and Razavi, Reza and King, Andrew P. , title =. Medical Image Computing and Computer Assisted Intervention -- MICCAI 2021 , editor =

2021
[27]

and Neubauer, Stefan and Petersen, Steffen E

Puyol-Ant\'on, Esther and Ruijsink, Bram and Piechnik, Stefan K. and Neubauer, Stefan and Petersen, Steffen E. and Razavi, Reza and King, Andrew P. , TITLE=. Frontiers in Cardiovascular Medicine , VOLUME=
[28]

European Heart Journal - Digital Health , pages =

Lee, Tiarna and Puyol-Antón, Esther and Ruijsink, Bram and Roujol, Sebastien and Barfoot, Theodore and Ogbomo-Harmitt, Shaheim and Shi, Miaojing and King, Andrew , title =. European Heart Journal - Digital Health , pages =. 2025 , abstract =

2025
[29]

Alqarni, Maram and Jones, Emma and Ribeiro, Luis and Hema, Verma and Cooper, Sian and Mullassery, Vinod and Morris, Stephen and Guerrero-Urbano, Teresa and King, Andrew , title =
[30]

Understanding skin color bias in deep learning--based skin lesion segmentation , journal =

Marin Ben. Understanding skin color bias in deep learning--based skin lesion segmentation , journal =
[31]

Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence , series =

Fair Active Learning in Low-Data Regimes , author =. Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence , series =. 2024 , editor =

2024
[32]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year =

Munjal, Prateek and Hayat, Nasir and Hayat, Munawar and Sourati, Jamshid and Khan, Shadab , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year =