arxiv: 2605.11060 · v1 · submitted 2026-05-11 · 📡 eess.IV · cs.CV

Recognition: 2 theorem links

· Lean Theorem

SplitFed-CL: A Split Federated Co-Learning Framework for Medical Image Segmentation with Inaccurate Labels

Hadi Hadizadeh, Parvaneh Saeedi, Zahra Hafezi Kafshgari

Pith reviewed 2026-05-13 01:15 UTC · model grok-4.3

classification 📡 eess.IV cs.CV

keywords split federated learningmedical image segmentationnoisy labelsco-learningteacher-student refinementannotation errorsprivacy preservationconsistency regularization

0 comments

The pith

SplitFed-CL uses a global teacher to guide local students in refining unreliable annotations during split federated medical image segmentation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes SplitFed-CL to maintain performance in privacy-preserving collaborative training when medical labels vary in quality across clients. A global teacher model helps local students detect and correct unreliable annotations, applying direct supervision only to reliable labels and weighted refinement to the rest. Consistency regularization adds robustness against input changes while a trainable module balances the loss terms adaptively. The method also includes a strategy that perturbs boundaries according to shape complexity to mimic realistic human annotation errors. These elements together allow institutions to train segmentation models jointly without sharing raw images or perfect labels.

Core claim

SplitFed-CL is a co-learning framework in which a global teacher guides local student models to detect and refine unreliable annotations during split federated training for medical image segmentation. Reliable labels supervise training directly, unreliable labels undergo weighted student-teacher refinement, consistency regularization ensures robustness to perturbations, and a trainable weighting module balances the losses. A difficulty-guided strategy simulates human-like annotation errors centered on complex boundaries.

What carries the argument

The teacher-student co-learning mechanism that refines unreliable local annotations using the global model's guidance, combined with consistency regularization and a trainable weighting module.

If this is right

Segmentation accuracy remains higher across clients even when some contribute noisy labels.
The framework supports collaboration among medical sites without centralizing sensitive data.
Robustness increases against both label noise and input perturbations during inference.
The difficulty-guided noise simulation provides a controlled way to test methods on boundary-centric errors.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The co-learning structure could transfer to other federated tasks such as classification or detection when label quality differs across participants.
If the refinement process proves stable, it might reduce reliance on repeated expert review of annotations collected at different institutions.
Extending the adaptive weighting to include client-specific reliability scores could further tailor the training to varying data qualities.

Load-bearing premise

The global teacher model can reliably identify and correct unreliable annotations from heterogeneous clients without introducing new errors.

What would settle it

On the binary segmentation dataset with real annotation errors, if SplitFed-CL shows no improvement in Dice coefficient or Hausdorff distance over standard SplitFed or the seven baselines, the central claim of consistent outperformance would be falsified.

read the original abstract

Split Federated Learning (SplitFed) combines federated and split learning to preserve privacy while reducing client-side computation. However, in medical image segmentation, heterogeneous label quality across clients can significantly degrade performance. We propose SplitFed-CL, a co-learning framework where a global teacher guides local students to detect and refine unreliable annotations. Reliable labels supervise training directly, while unreliable labels are corrected via weighted student--teacher refinement. SplitFed-CL further incorporates consistency regularization for robustness to input perturbations and a trainable weighting module to balance loss terms adaptively. We also introduce a novel difficulty guided strategy to simulate human like boundary centric annotation errors, where the degree of perturbation is governed by shape complexity and the associated annotation difficulty. Experiments on two multiclass segmentation datasets with controlled synthetic noise, together with a binary segmentation dataset containing real-world annotation errors, demonstrate that SplitFed-CL consistently outperforms seven state-of-the-art baselines, yielding improved segmentation quality and robustness.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SplitFed-CL layers a teacher-student co-learning fix onto split federated segmentation to cope with noisy labels, reports gains on synthetic and real-error datasets, but the gains are not clearly isolated to the label-correction step.

read the letter

The main point is a split federated setup that adds a global teacher to help local student models spot and fix unreliable annotations in medical image segmentation. It uses weighted refinement on bad labels, consistency regularization against input changes, an adaptive weighting module for the loss terms, and a new way to simulate boundary errors scaled by shape difficulty. The experiments run on two multiclass datasets with added synthetic noise plus one binary dataset with actual annotation mistakes, and the method beats seven baselines on segmentation metrics.

Referee Report

1 major / 2 minor

Summary. The manuscript presents SplitFed-CL, a split federated co-learning framework for medical image segmentation under inaccurate labels. A global teacher model detects unreliable annotations from heterogeneous clients and performs weighted refinement with local students; reliable labels supervise training directly while unreliable ones are corrected via student-teacher agreement. The method adds consistency regularization against input perturbations and a trainable weighting module for adaptive loss balancing. It also introduces a difficulty-guided synthetic noise strategy that perturbs boundaries proportionally to shape complexity. Experiments on two multiclass datasets with controlled synthetic noise and one binary dataset with real-world annotation errors claim consistent outperformance over seven state-of-the-art baselines in segmentation quality and robustness.

Significance. If the empirical results hold, the work is significant for privacy-preserving collaborative training in medical imaging, where label noise is common across institutions. It offers a concrete mechanism to mitigate heterogeneous annotation quality without data sharing. Credit is given for evaluating on a real-world error dataset in addition to synthetic cases and for the difficulty-guided noise simulation, which provides a more realistic benchmarking tool than uniform noise.

major comments (1)

[Experiments section (real-world dataset results)] Experiments section (real-world dataset results): The central claim attributes gains to the global teacher's detection and correction of unreliable labels. However, no direct quantitative validation of this mechanism is provided, such as pixel-level precision/recall or F1-score of detected unreliable pixels against the known real-world annotation errors. Without these metrics or an ablation that isolates the teacher's correction accuracy from consistency regularization and the trainable weighting module, it remains unclear whether the reported improvements over baselines are due to the proposed co-learning correction or to other components.

minor comments (2)

[Abstract] The abstract refers to 'seven state-of-the-art baselines' without naming them; explicitly listing the baselines (e.g., in a table or sentence) would improve readability.
[Method] The description of the trainable weighting module lacks detail on its input features, architecture, and joint optimization procedure with the rest of the framework.

Simulated Author's Rebuttal

1 responses · 1 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comment point by point below, providing clarifications and outlining revisions where appropriate.

read point-by-point responses

Referee: Experiments section (real-world dataset results): The central claim attributes gains to the global teacher's detection and correction of unreliable labels. However, no direct quantitative validation of this mechanism is provided, such as pixel-level precision/recall or F1-score of detected unreliable pixels against the known real-world annotation errors. Without these metrics or an ablation that isolates the teacher's correction accuracy from consistency regularization and the trainable weighting module, it remains unclear whether the reported improvements over baselines are due to the proposed co-learning correction or to other components.

Authors: We appreciate this observation, which highlights a need for stronger mechanistic validation. For the real-world binary dataset, the annotation errors are documented at the image or case level but lack per-pixel ground-truth error maps or clean reference labels, precluding direct computation of pixel-level precision, recall, or F1 for the teacher's unreliable-pixel detections. This data limitation prevents the requested quantitative validation on that specific dataset. To address the broader concern about isolating contributions, we will add ablation experiments in the revised manuscript that systematically disable the teacher-driven correction while retaining consistency regularization and the trainable weighting module. These ablations will quantify the incremental benefit of the co-learning refinement on both the real-world and synthetic-noise datasets. Additionally, on the synthetic-noise datasets (where exact error locations are known by construction), we will report the teacher's detection accuracy (precision/recall/F1) to provide direct evidence of the correction mechanism's effectiveness. We believe these changes will clarify that the observed gains stem from the proposed components rather than ancillary factors. revision: partial

standing simulated objections not resolved

Direct pixel-level precision/recall/F1 validation of unreliable label detection specifically on the real-world dataset, due to the absence of per-pixel ground-truth error annotations in the available data.

Circularity Check

0 steps flagged

No circularity: empirical framework proposal without derivation chain

full rationale

The paper introduces SplitFed-CL as a practical co-learning method combining split federated learning with teacher-student refinement for noisy labels. No equations, derivations, or first-principles predictions appear in the provided text. Claims rest on experimental outperformance against baselines on synthetic and real annotation-error datasets, not on any self-referential fitting, ansatz smuggling, or uniqueness theorems. Any self-citations (standard for SplitFed foundations) are not load-bearing for the core contribution, which is the specific framework design and its empirical validation. This matches the default non-circular outcome for applied ML framework papers.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Review based on abstract only; limited visibility into parameters or assumptions. The framework relies on the premise that teacher guidance can correct labels and that synthetic noise matches real errors.

axioms (1)

domain assumption A global teacher model can accurately detect and refine unreliable local annotations in heterogeneous client settings
Central mechanism of the co-learning framework described in the abstract.

invented entities (1)

Trainable weighting module no independent evidence
purpose: Adaptively balance loss terms between reliable and unreliable label paths
Introduced as part of the framework to handle varying label quality

pith-pipeline@v0.9.0 · 5473 in / 1308 out tokens · 82763 ms · 2026-05-13T01:15:59.036347+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear
global teacher guides local students to detect and refine unreliable annotations... difficulty-guided deformation strategy... trainable loss-weighting module
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
Experiments on two multiclass segmentation datasets with controlled synthetic noise... real-world annotation errors

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages · 1 internal anchor

[1]

A difficulty-guided framework for simulating human-like annotation errors

work page
[2]

A global confidence-based mechanism for identifying re- liable and unreliable annotations across clients

work page
[3]

A student–teacher strategy for refining unreliable labels

work page
[4]

SplitFed-CL: A Split Federated Co-Learning Framework for Medical Image Segmentation with Inaccurate Labels

A trainable loss-weighting module that automatically op- timizes component contributions. 2 Related Work Several FL methods proposed to mitigate the effect of anno- tation noise forclassificationtasks. For example, FedLN [6] improves robustness through interpolation-based regulariza- tion and energy-driven scoring, while FedNoiL [7] identifies reliable cl...

work page internal anchor Pith review Pith/arXiv arXiv 2026
[5]

so local models leverage most data; as training proceeds andλshrinks,τtightens to focus learning on more reliable samples, improving robustness. 4.3 Label Correction After identifying reliable and unreliable labels inY, unre- liable labels,Y un, are locally modified to ¯Yun using pre- dictions from both the student and teacher models. A dif- ference maskR...

work page arXiv 2023
[6]

Split learning for health: Distributed deep learning without sharing raw patient data,

P. Vepakomma, O. Gupta, T. Swedish, and R. Raskar, “Split learning for health: Distributed deep learning without sharing raw patient data,”arXiv preprint arXiv:1812.00564, 2018

work page arXiv 2018
[7]

Communication-efficient learning of deep networks from decentralized data,

B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” inArtificial intelligence and statis- tics. PMLR, 2017, pp. 1273–1282

work page 2017
[8]

Federated learning for medical image analysis: A survey,

H. Guan, P. T. Yap, A. Bozoki, and M. Liu, “Federated learning for medical image analysis: A survey,”Pattern Recognition, p. 110424, 2024

work page 2024
[9]

Splitfed: When federated learning meets split learning,

Ch. Thapa, P. Ch. M. Arachchige, S. Camtepe, and L. Sun, “Splitfed: When federated learning meets split learning,” in AAAI Conference on Artificial Intelligence, 2022, vol. 36, pp. 8485–8493

work page 2022
[10]

Denoising and segmenta- tion in medical image analysis: A comprehensive review on machine learning and deep learning approaches,

R. R. Kumar and R. Priyadarshi, “Denoising and segmenta- tion in medical image analysis: A comprehensive review on machine learning and deep learning approaches,”Multime- dia Tools and Applications, vol. 84, no. 12, pp. 10817–10875, 2025

work page 2025
[11]

Fed- erated learning with noisy labels: Achieving generalization in the face of label noise,

V . Tsouvalas, A. Saeed, T. ¨Ozc ¸elebi, and N. Meratnia, “Fed- erated learning with noisy labels: Achieving generalization in the face of label noise,” inFirst Workshop on Interpolation Regularizers and Beyond at NeurIPS 2022, 2022

work page 2022
[12]

Fednoil: a simple two-level sampling method for federated learning with noisy labels,

Zh. Wang, T. Zhou, G. Long, B. Han, and J. Jiang, “Fednoil: a simple two-level sampling method for federated learning with noisy labels,”arXiv preprint arXiv:2205.10110, 2022

work page arXiv 2022
[13]

Fedcorr: Multi-stage federated learning for label noise correction,

J. Xu, Z. Chen, T. QS. Quek, and K. F. E. Chong, “Fedcorr: Multi-stage federated learning for label noise correction,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 10184–10193

work page 2022
[14]

Fnbench: Benchmarking ro- bust federated learning against noisy labels,

X. Jiang, J. Li, N. Wu, Zh. Wu, X. Li, Sh. Sun, G. Xu, Y . Wang, Q. Li, and M. Liu, “Fnbench: Benchmarking ro- bust federated learning against noisy labels,”arXiv preprint arXiv:2505.06684, 2025

work page arXiv 2025
[15]

Federated noisy client learning,

K. Tam, L. Li, B. Han, Ch. Xu, and H. Fu, “Federated noisy client learning,”arXiv preprint arXiv:2106.13239, 2021

work page arXiv 2021
[16]

Auto-weighted robust federated learning with corrupted data sources,

Sh. Li, E. Ngai, F. Ye, and Th. V oigt, “Auto-weighted robust federated learning with corrupted data sources,”ACM Trans- actions on Intelligent Systems and Technology (TIST), vol. 13, no. 5, pp. 1–20, 2022

work page 2022
[17]

Fedmix: Mixed supervised federated learn- ing for medical image segmentation,

J. Wicaksana, Z. Yan, D. Zhang, X. Huang, H. Wu, X. Yang, and K. T. Cheng, “Fedmix: Mixed supervised federated learn- ing for medical image segmentation,”IEEE Transactions on Medical Imaging, 2022

work page 2022
[18]

Feda3i: annotation quality-aware aggregation for federated medical image seg- mentation against heterogeneous annotation noise,

N. Wu, Zh. Sun, Z. Yan, and L. Yu, “Feda3i: annotation quality-aware aggregation for federated medical image seg- mentation against heterogeneous annotation noise,” inAAAI Conference on Artificial Intelligence, 2024, vol. 38, pp. 15943– 15951

work page 2024
[19]

Feddm: Federated weakly supervised segmentation via annotation calibration and gradi- ent de-conflicting,

M. Zhu, Zh. Chen, and Y . Yuan, “Feddm: Federated weakly supervised segmentation via annotation calibration and gradi- ent de-conflicting,”IEEE Transactions on Medical Imaging, vol. 42, no. 6, pp. 1632–1643, 2023

work page 2023
[20]

Quality-adaptive split-federated learning for segmenting med- ical images with inaccurate annotations,

Z. H. Kafshgari, Ch. Shiranthika, P. Saeedi, and I. Baji ´c, “Quality-adaptive split-federated learning for segmenting med- ical images with inaccurate annotations,” in20th International Symposium on Biomedical Imaging. IEEE, 2023, pp. 1–5

work page 2023
[21]

Improving multi- ple sclerosis lesion segmentation across clinical sites: A feder- ated learning approach with noise-resilient training,

L. Bai, D. Wang, H. Wang, M. Barnett, M. Cabezas, W. Cai, F. Calamante, K. Kyle, D. Liu, L. Ly, et al., “Improving multi- ple sclerosis lesion segmentation across clinical sites: A feder- ated learning approach with noise-resilient training,”Artificial Intelligence in Medicine, vol. 152, pp. 102872, 2024

work page 2024
[22]

Robust edge-stop functions for edge-based active contour models in medical im- age segmentation,

A. Pratondo, Ch. K. Chui, and S. H. Ong, “Robust edge-stop functions for edge-based active contour models in medical im- age segmentation,”IEEE Signal Processing Letters, vol. 23, no. 2, pp. 222–226, 2016

work page 2016
[23]

The blur effect: Perception and estimation with a new no-reference per- ceptual blur metric,

F. Cr ´et´e, T. Dolmiere, P. Ladret, and M. Nicolas, “The blur effect: Perception and estimation with a new no-reference per- ceptual blur metric,” inProc. SPIE, 2007

work page 2007
[24]

Learning geodesic active con- tours for embedding object global information in segmentation cnns,

J. Ma, J. He, and X. Yang, “Learning geodesic active con- tours for embedding object global information in segmentation cnns,”IEEE Transactions on Medical Imaging, vol. 40, no. 1, pp. 93–104, 2020

work page 2020
[25]

Deep residual learning for image recognition,

K. He, X. Zhang, Sh. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770– 778

work page 2016
[26]

Encoder-decoder with atrous separable convolution for se- mantic image segmentation,

L. Ch. Chen, Y . Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolution for se- mantic image segmentation,” inthe European conference on computer vision (ECCV), 2018, pp. 801–818

work page 2018
[27]

Deep co-training for semi-supervised image segmentation,

J. Peng, G. Estrada, M. Pedersoli, and Ch. Desrosiers, “Deep co-training for semi-supervised image segmentation,”Pattern Recognition, vol. 107, pp. 107269, 2020

work page 2020
[28]

Semi-supervised tissue segmentation from histopathological images with con- sistency regularization and uncertainty estimation,

G. Sudhamsh, S. Girisha, and R. Rashmi, “Semi-supervised tissue segmentation from histopathological images with con- sistency regularization and uncertainty estimation,”Scientific Reports, vol. 15, no. 1, pp. 6506, 2025

work page 2025
[29]

Multi-task learning using uncertainty to weigh losses for scene geometry and semantics,

A. Kendall, Y . Gal, and R. Cipolla, “Multi-task learning using uncertainty to weigh losses for scene geometry and semantics,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7482–7491

work page 2018
[30]

Automatic iden- tification of human blastocyst components via texture,

P. Saeedi, D. Yee, J. Au, and J. Havelock, “Automatic iden- tification of human blastocyst components via texture,”IEEE Transactions on Biomedical Engineering, vol. 64, no. 12, pp. 2968–2978, 2017

work page 2017
[31]

Psfhs: intra- partum ultrasound image dataset for ai-based segmentation of pubic symphysis and fetal head,

G. Chen, J. Bai, Zh. Ou, Y . Lu, and H. Wang, “Psfhs: intra- partum ultrasound image dataset for ai-based segmentation of pubic symphysis and fetal head,”Scientific Data, vol. 11, no. 1, pp. 436, 2024

work page 2024
[32]

Skin lesion analysis toward melanoma detection: A challenge at isbi 2017,

N. C. Codella, D. Gutman, M. E. Celebi, B. Helba, M. A. Marchetti, S. W. Dusza, A. Kalloo, K. Liopyris, N. Mishra, H. Kittler, and A. Halpern, “Skin lesion analysis toward melanoma detection: A challenge at isbi 2017,” in2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018). IEEE, 2018, pp. 168–172

work page 2017
[33]

Segmenta- tion style discovery: Application to skin lesion images,

K. Abhishek, J. Kawahara, and Gh. Hamarneh, “Segmenta- tion style discovery: Application to skin lesion images,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2024, pp. 24–34

work page 2024
[34]

What can we learn from inter-annotator variability in skin lesion segmen- tation?,

K. Abhishek, J. Kawahara, and Gh. Hamarneh, “What can we learn from inter-annotator variability in skin lesion segmen- tation?,” inMICCAI Workshop on Deep Generative Models. Springer, 2025, pp. 23–33

work page 2025