People-Centred Medical Image Analysis via Fairness-Aware Human-AI Cooperation

Cuong Nguyen; David Rosewarne; Gustavo Carneiro; Kevin Wells; Milad Masroor; Tahir Hassan; Thanh-Toan Do; Yuanhong Chen; Zheng Zhang

arxiv: 2604.26991 · v2 · pith:K4U2ZSUGnew · submitted 2026-04-28 · 💻 cs.LG · cs.AI

People-Centred Medical Image Analysis via Fairness-Aware Human-AI Cooperation

Zheng Zhang , Milad Masroor , Cuong Nguyen , Tahir Hassan , Yuanhong Chen , David Rosewarne , Kevin Wells , Thanh-Toan Do

show 1 more author

Gustavo Carneiro

This is my paper

Pith reviewed 2026-05-07 16:30 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords medical image analysisAI fairnesshuman-AI collaborationworkflow integrationclinical adoptiondynamic gatingbenchmark

0 comments

The pith

PecMan uses a dynamic gating mechanism to jointly optimize fairness, accuracy, and clinician workload in medical image analysis.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Medical AI systems often achieve high accuracy but struggle with adoption due to biases across patient groups and poor fit with clinical routines. The paper contends that addressing fairness and human collaboration together, rather than separately, under the constraint of limited clinician time, can create more viable tools. It introduces PecMan, a framework with a gating system that decides case assignments, along with a benchmark to measure the balance of these factors. Experiments indicate this integrated approach outperforms methods that handle the issues in isolation.

Core claim

The central discovery is that a people-centred approach to medical image analysis, implemented via PecMan's dynamic gating that routes cases to AI, human clinicians, or joint review while respecting workload limits, achieves better combined performance on accuracy, fairness across diverse populations, and workflow integration than prior separate solutions.

What carries the argument

The dynamic gating mechanism within PecMan, which assigns each medical image case to AI alone, clinician alone, or both, subject to overall clinician availability constraints, while pursuing joint optimization of diagnostic accuracy and fairness.

If this is right

Performance biases that hinder regulatory approval can be mitigated by explicit fairness optimization.
Clinician adoption increases when AI does not disrupt established workflows or overload staff.
Trade-offs between the three goals can be quantified and managed using the FairHAI benchmark.
The framework demonstrates consistent gains over methods optimizing only subsets of these objectives.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Applying similar gating logic could help in other high-stakes domains with scarce expert time.
Real-world deployment might require adapting the workload model to specific hospital schedules and team structures.
The benchmark provides a template for testing other human-AI systems on fairness and integration metrics simultaneously.

Load-bearing premise

The assumption that clinician availability can be modeled as a simple dynamic constraint that captures real clinical settings without overlooking workflow disruptions or introducing new barriers.

What would settle it

If a study in an actual clinic finds that using PecMan results in lower overall diagnostic quality or higher clinician burnout than using separate fairness and deferral tools, the joint optimization benefit would be falsified.

Figures

Figures reproduced from arXiv: 2604.26991 by Cuong Nguyen, David Rosewarne, Gustavo Carneiro, Kevin Wells, Milad Masroor, Tahir Hassan, Thanh-Toan Do, Yuanhong Chen, Zheng Zhang.

**Figure 1.** Figure 1: PecMan: A unified framework that jointly optimises fairness and human-AI collaboration. A gating mechanism selects the appropriate cohort-specific AI model and determines whether clinician input is needed, ensuring high accuracy, balanced group performance, and adherence to workload constraints. Although AI fairness, L2D, and L2C all aim to improve AI-assisted medical decision-making, they have traditiona… view at source ↗

**Figure 2.** Figure 2: Step 0 – Backbone Training: PecMan initialises its backbone model using the FIS loss [82], which jointly optimises overall classification accuracy and fairness across patient groups, which in this case represent the sensitive attribute sex with values “male” and “female” view at source ↗

**Figure 3.** Figure 3: Step 1 – Group-specific Model Training: This step focuses on training classifiers tailored to individual patient cohorts, enabling fairness-aware performance across demographic groups. weights are defined as follows: s I (x, y, B) = exp(ℓBCE(hϕ (fθ(x)), y)) P (˜x,M˜ ,y, ˜ a˜)∈B exp(ℓBCE(hϕ (fθ(˜x)), y˜)), s G(a, B) = exp (DOT(L(B),La(B))) P j∈A exp (DOT(L(B),Lj (B))), (3) where DOT(L(B),La(B)) is the optim… view at source ↗

**Figure 4.** Figure 4: Step 2 – L2D+L2C Unbiased Training: PecMan trains the gating and consolidator models using the FIS loss, enabling unbiased decision-making that combines L2D and L2C strategies. 3.2.1. Step 0: Backbone Model Training - view at source ↗

**Figure 5.** Figure 5: The AUC vs coverage (top row) and ES-AUC vs. coverage (bottom row) of com view at source ↗

**Figure 6.** Figure 6: Performance analysis of PecMan on the testing samples of HAM10000. (a) The view at source ↗

**Figure 7.** Figure 7: The cohort-specific AUC (a,b), overall AUC (C), and ES-AUC vs. coverage view at source ↗

**Figure 8.** Figure 8: Training time of PecMan and competing methods on HAM10000 dataset. view at source ↗

**Figure 9.** Figure 9: Inference time of PecMan and competing methods on HAM10000 dataset. view at source ↗

read the original abstract

Machine learning models for medical image analysis often exhibit subgroup-dependent performance, which impacts how decisions should be allocated between automated systems and human experts under limited resources. Prior work on AI fairness and human-AI cooperation, including learning to defer (L2D) and learning to complement (L2C), typically addresses these problems in isolation. We propose People-Centred Medical Image Analysis (PecMan), a framework for fairness-aware human-AI co-operative classification that jointly models subgroup-dependent reliability, decision allocation, and collaborative prediction. PecMan combines subgroup-specialised predictors with a gating and consolidation mechanism that dynamically assigns cases to automated models, human experts, or their combination, without requiring sensitive attributes at test time. We also introduce the FairHAI benchmark for evaluating trade-offs between predictive accuracy, subgroup equity, and human involvement. In addition, we provide a theoretical analysis of multi-agent gating via selection regret and characterise fairness-coverage trade-offs under input-dependent allocation. Experiments across multiple medical imaging datasets demonstrate that PecMan achieves consistently improved trade-offs compared to methods that address fairness or human-AI cooperation separately.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper proposes People-Centred Medical Image Analysis (PecMan), a human-AI framework that jointly optimizes fairness, diagnostic accuracy, and workflow effectiveness through a dynamic gating mechanism that assigns cases to AI, clinicians, or both under clinician workload constraints. It introduces the FairHAI benchmark for evaluating trade-offs between accuracy, fairness, and clinician workload, and reports that experiments show PecMan consistently outperforms existing methods.

Significance. If the results hold and the modeled constraints align with real clinical environments, this work would be significant in advancing clinically viable medical AI by addressing the interdependence of fairness and workflow integration, areas previously studied in isolation. The FairHAI benchmark could serve as a useful tool for future research in human-centred AI.

major comments (1)

The central claim that PecMan outperforms baselines on FairHAI depends on the dynamic gating jointly optimizing under a modeled clinician availability constraint. However, this treats availability as a clean resource allocation problem, while real clinical settings introduce unmodeled factors including communication costs, decision latency, EHR integration friction, and variable case complexity that could invert the trade-offs. Without validation that the synthetic constraint matches observed clinical logs, the outperformance does not establish clinical viability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their insightful comments on our manuscript. We address the major concern point by point below, acknowledging the limitations of our modeled constraints while clarifying the scope of our claims.

read point-by-point responses

Referee: The central claim that PecMan outperforms baselines on FairHAI depends on the dynamic gating jointly optimizing under a modeled clinician availability constraint. However, this treats availability as a clean resource allocation problem, while real clinical settings introduce unmodeled factors including communication costs, decision latency, EHR integration friction, and variable case complexity that could invert the trade-offs. Without validation that the synthetic constraint matches observed clinical logs, the outperformance does not establish clinical viability.

Authors: We agree that the clinician availability constraint in PecMan and FairHAI is modeled as a simplified resource allocation problem and does not incorporate additional real-world factors such as communication costs, decision latency, EHR integration friction, and variable case complexity. The FairHAI benchmark is a controlled, synthetic environment intended to isolate and evaluate the effects of joint optimization of fairness, accuracy, and workflow under workload constraints. Our central claim is limited to outperformance within this benchmark; we do not assert that the results establish clinical viability. In the revised manuscript, we will expand the limitations and discussion sections to explicitly address these unmodeled factors, analyze how they could alter the observed trade-offs, and propose directions for empirical validation against clinical logs and real workflow data. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain; framework and benchmark are independently proposed

full rationale

The paper introduces PecMan as a joint optimization framework via dynamic gating under workload constraints and the FairHAI benchmark, with performance claims resting on experimental comparisons rather than any closed-form derivation, fitted parameter renamed as prediction, or self-referential definition. No equations, ansatzes, or uniqueness theorems are presented in the provided text that reduce to inputs by construction. Prior work on L2D/L2C is cited externally without self-citation load-bearing the central claim. The derivation chain is self-contained as a proposal validated by new experiments.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only view yields no explicit free parameters, axioms, or invented entities; the framework description implies unstated modeling choices for gating and workload but provides no details for auditing.

pith-pipeline@v0.9.0 · 5554 in / 1025 out tokens · 48759 ms · 2026-05-07T16:30:00.962720+00:00 · methodology

People-Centred Medical Image Analysis via Fairness-Aware Human-AI Cooperation

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)