pith. machine review for the scientific record. sign in

arxiv: 2604.10754 · v1 · submitted 2026-04-12 · 📡 eess.IV

Recognition: unknown

Human Gaze-based Dual Teacher Guidance Learning for Semi-Supervised Medical Image Segmentation

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:28 UTC · model grok-4.3

classification 📡 eess.IV
keywords semi-supervised segmentationmedical image segmentationhuman gazemean teacherdual teachergaze guidancedata mixingmulti-scale perception
0
0 comments X

The pith

Human gaze data acts as an extra teacher to improve semi-supervised medical image segmentation

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to establish that human gaze recordings, which are cheaper to collect than full pixel annotations, can function as a second teacher signal inside the standard mean-teacher semi-supervised learning setup for medical images. By adding a data-mixing step guided by gaze, a multi-scale perception module that incorporates gaze patterns, and a loss term that pulls the network's attention toward gaze locations, the approach aims to enlarge the effective training set and sharpen the model's focus on target structures even when most images lack labels. A reader would care because this could let segmentation models reach higher accuracy using far fewer expensive manual outlines, while still drawing on the large pool of unlabeled scans that clinics already produce. The authors test the claim on several imaging modalities and ten different organs or tissues to argue that the benefit is not limited to one narrow setting.

Core claim

The authors introduce the Human Gaze-based Dual Teacher Guidance Learning (HG-DTGL) model, in which human gaze serves as an additional hidden teacher within the mean-teacher framework. They create GazeMix to produce reliable mixed training examples that carry gaze information, add a Multi-scale Gaze Perception module to extract gaze-informed features at multiple resolutions, and define a Gaze Loss that forces the network output to align with human gaze patterns. Extensive experiments show the resulting model outperforms prior semi-supervised baselines on multiple datasets spanning different modalities and ten organs or tissues.

What carries the argument

The dual-teacher guidance structure that treats human gaze as the second teacher, realized through GazeMix for gaze-informed data mixing, the Multi-scale Gaze Perception module for feature extraction, and the Gaze Loss for aligning model predictions with gaze locations.

If this is right

  • GazeMix expands the diversity and effective size of the training set without requiring additional full annotations.
  • The Multi-scale Gaze Perception module strengthens the network's ability to locate target regions at varying sizes and contexts.
  • The Gaze Loss term pulls the model's internal attention maps into agreement with human visual focus.
  • The combined system delivers higher segmentation accuracy than standard mean-teacher training across varied modalities.
  • Performance advantages hold for a total of ten different organs and tissues, indicating broad applicability.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Routine collection of gaze data during clinical reading could become a low-cost supplement to existing annotation pipelines.
  • The same gaze-guidance idea might transfer to other semi-supervised vision tasks where eye-tracking recordings can be obtained cheaply.
  • Gains may vary with the consistency of gaze collection hardware and instructions, pointing to a need for standardized protocols.
  • If the method scales, it could lower the annotation burden for training segmentation networks in resource-limited medical settings.

Load-bearing premise

Expert gaze data remains a reliable and generalizable signal that does not introduce new biases when applied across different imaging modalities and anatomical targets.

What would settle it

Retraining HG-DTGL on a fresh multi-modal dataset with expert gaze annotations and finding that it yields no accuracy gain over a plain mean-teacher baseline that ignores gaze.

Figures

Figures reproduced from arXiv: 2604.10754 by Chengyu Liu, Chong Wang, Chunqiang Lu, Cong Xia, Daoqiang Zhang, Fangyi Xu, Rongjun Ge, Shuo Li, Yang Chen, Yehui Jiang, Yinsu Zhu, Yuting He, Yuxin Liu.

Figure 1
Figure 1. Figure 1: (a) Human gaze heatmap: Humans are able to accurately gaze and iden￾tify the correct foreground organ region even with various backgrounds. (b) Network attention heatmap: Due to the similarity between the segmentation foreground and background in medical images, the network faces challenges in focusing on the target. Human beings possess the ability to accurately fixate on the same tar￾get even when the su… view at source ↗
Figure 2
Figure 2. Figure 2: Overall architecture of our proposed HG-DTGL model for semi-supervised med [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Gaze points before (a) and after (b) the saccades filtered out. The generated [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Illustration of the human visual perception-based GazeMix procedure. [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Visualization comparison between our proposed HG-DTGL model and different [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Visualization comparison between our proposed HG-DTGL model and different [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Gaze Loss guides the mode to deeply focus on the target tissue like human [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Visualization comparison of our proposed GazeMix with Mixup and CutMix, [PITH_FULL_IMAGE:figures/full_fig_p020_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Visualization comparison of our proposed GazeMix with Mixup and CutMix, [PITH_FULL_IMAGE:figures/full_fig_p021_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Ablation study of the weights λ in the loss function, model performs the best performance in both ACDC and CAMUS datasets with λ = 0.5 . ance of human perception, so that enhances the segmentation ability of the model, even with only a small amount of labeled training data. c) Performance Analysis of MGP vs. Spatial Attention vs. CBAM: As shown in [PITH_FULL_IMAGE:figures/full_fig_p023_10.png] view at source ↗
read the original abstract

In the field of medical image segmentation, the scarcity of labeled data poses a major challenge for existing models to accurately perceive target regions. Compared with manual annotation, gaze data is easier and cheaper to obtain. As a classical semi-supervised learning framework, mean-teacher can effectively use a large number of unlabeled medical images for stable training through self-teaching and collaborative optimization. Our study is based on the mean-teacher framework. By combining gaze data, it aims to address two crucial issues in semi-supervised medical image segmentation: 1) expand the scale and diversity of the dataset with limited labeled data; 2) enhance the network's perception ability. We propose the Human Gaze-based Dual Teacher Guidance Learning model (HG-DTGL). In this model, human gaze serves as an additional hidden `teacher' in the mean-teacher architecture. We introduce the GazeMix to generate reliable mixed data to expand the diversity and scale of the dataset, and the Multi-scale Gaze Perception (MGP) module is used to extract the multi-scale perception of the network. A Gaze Loss is designed to align the model's perception with human gaze. We have verified HG-DTGL on multiple datasets of different modalities and achieved superior performance on a total of ten different organs/tissues, with extensive experiments. This demonstrates that our method has strong generalization ability for medical images of different modalities, and shows the great application potential of gaze data in semi-supervised medical image segmentation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes HG-DTGL, a semi-supervised medical image segmentation model extending the mean-teacher framework by treating human gaze data as an additional 'hidden teacher.' It introduces GazeMix to expand dataset scale and diversity via mixed samples, the Multi-scale Gaze Perception (MGP) module to extract multi-scale gaze-aligned features, and a Gaze Loss to enforce alignment between model predictions and expert gaze patterns. The authors claim verification across multiple modalities with superior performance on ten organs/tissues.

Significance. If the empirical claims hold after proper validation, the work could be significant for demonstrating how low-cost gaze data can serve as a stable auxiliary signal in mean-teacher semi-supervised learning, potentially improving generalization in data-scarce medical imaging settings across CT, MRI, and ultrasound.

major comments (3)
  1. [Abstract] Abstract: the central claim of 'superior performance on a total of ten different organs/tissues, with extensive experiments' is unsupported by any quantitative metrics, tables, baselines, error bars, or ablation results in the manuscript, rendering the primary empirical assertion unevaluable.
  2. [Method] Method section (description of GazeMix, MGP, and Gaze Loss): the assumption that expert gaze signals remain reliable and distributionally consistent across modalities and organs is load-bearing for the dual-teacher claim yet is not tested via cross-modality transfer experiments or bias analysis.
  3. [Experiments] Experiments section: no details are supplied on gaze collection protocol standardization, inter-expert variability, or whether fixation heatmaps require modality-specific recalibration, which directly affects whether the reported gains can be attributed to gaze guidance rather than dataset expansion alone.
minor comments (2)
  1. [Abstract] The abstract uses 'hidden teacher' without a precise definition relative to the standard mean-teacher student-teacher pair.
  2. [Abstract] Acronym HG-DTGL is introduced before its full expansion.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We appreciate the opportunity to clarify aspects of our work and strengthen the presentation. Below we respond point-by-point to the major comments, indicating where revisions will be made to address the concerns.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim of 'superior performance on a total of ten different organs/tissues, with extensive experiments' is unsupported by any quantitative metrics, tables, baselines, error bars, or ablation results in the manuscript, rendering the primary empirical assertion unevaluable.

    Authors: We agree that the abstract would be strengthened by including concrete quantitative support for the claims. Although the experiments section of the manuscript contains the relevant tables, baseline comparisons, ablation studies, and performance metrics (including Dice scores across the ten organs), these are not summarized in the abstract. In the revised manuscript we will update the abstract to explicitly reference key results, such as average Dice improvements and comparisons to mean-teacher baselines, with pointers to the corresponding tables and figures. revision: yes

  2. Referee: [Method] Method section (description of GazeMix, MGP, and Gaze Loss): the assumption that expert gaze signals remain reliable and distributionally consistent across modalities and organs is load-bearing for the dual-teacher claim yet is not tested via cross-modality transfer experiments or bias analysis.

    Authors: The consistency of gaze signals across modalities is indeed important for the dual-teacher framework. Our current experiments already demonstrate performance gains on CT, MRI, and ultrasound datasets spanning ten organs, which provides indirect support for cross-modal reliability. However, we acknowledge the absence of dedicated cross-modality transfer experiments and explicit bias analysis. In the revision we will add a new subsection presenting cross-modality transfer results (training on gaze from one modality and evaluating on another) together with a bias discussion to directly test and substantiate this assumption. revision: yes

  3. Referee: [Experiments] Experiments section: no details are supplied on gaze collection protocol standardization, inter-expert variability, or whether fixation heatmaps require modality-specific recalibration, which directly affects whether the reported gains can be attributed to gaze guidance rather than dataset expansion alone.

    Authors: We regret the omission of these methodological details. The revised manuscript will include a new subsection under Experiments that fully describes the gaze collection protocol, including standardization procedures, the number of participating experts, quantitative inter-expert variability measures (e.g., overlap statistics on fixation maps), and any modality-specific recalibration steps applied to the heatmaps. This addition will allow readers to better assess the contribution of gaze guidance versus simple data augmentation. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical extension of mean-teacher with independent components

full rationale

The paper extends the established mean-teacher framework by adding human gaze as a second teacher signal, along with the GazeMix augmentation, Multi-scale Gaze Perception module, and Gaze Loss. These additions are presented as novel architectural and loss-function contributions without any equations, parameter fits, or derivations that reduce by construction to inputs defined inside the paper. Claims rest on experimental results across ten organs and multiple modalities rather than self-referential mathematical steps. No load-bearing self-citations, uniqueness theorems, or ansatzes imported from prior author work are invoked in the derivation chain. The method is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 3 invented entities

The central claim rests on the assumption that gaze data can be treated as reliable supervisory signal and that the three new modules integrate stably into mean-teacher without introducing instability or bias. No free parameters are explicitly named in the abstract, but loss weighting coefficients between gaze loss and standard teacher losses are implicitly required. The new modules themselves are invented constructs whose independent evidence is limited to the performance claim.

axioms (2)
  • domain assumption Mean-teacher self-teaching produces stable training on unlabeled medical images
    Invoked as the base framework the authors extend.
  • domain assumption Human gaze provides informative and consistent cues for target region perception
    Core premise that justifies adding gaze as an extra teacher.
invented entities (3)
  • GazeMix no independent evidence
    purpose: Generate reliable mixed data to expand dataset diversity
    New data-augmentation procedure introduced in the model.
  • Multi-scale Gaze Perception (MGP) module no independent evidence
    purpose: Extract multi-scale gaze-informed features
    New network component added to the architecture.
  • Gaze Loss no independent evidence
    purpose: Align model perception with recorded human gaze
    New loss term designed to enforce gaze consistency.

pith-pipeline@v0.9.0 · 5601 in / 1607 out tokens · 35114 ms · 2026-05-10T15:28:27.142552+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

47 extracted references · 3 canonical work pages · 2 internal anchors

  1. [1]

    Sohn, et al., Fixmatch: Simplifying semi-supervised learning with consistency and confidence, Advances in neural information processing systems 33 (2020) 596–608

    K. Sohn, et al., Fixmatch: Simplifying semi-supervised learning with consistency and confidence, Advances in neural information processing systems 33 (2020) 596–608

  2. [2]

    T. Miyato, et al., Virtual adversarial training: a regularization method for supervised and semi-supervised learning, IEEE transactions on pat- tern analysis and machine intelligence 41 (8) (2018) 1979–1993

  3. [3]

    X. Li, L. Yu, H. Chen, C.-W. Fu, L. Xing, P.-A. Heng, Transformation- consistent self-ensembling model for semisupervised medical image seg- mentation, IEEE Transactions on Neural Networks and Learning Sys- tems 32 (2) (2020) 523–534. 26

  4. [4]

    X. Lai, et al., Semi-supervised semantic segmentation with directional context-aware consistency, in: Proceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition, 2021, pp. 1205–1214

  5. [5]

    Q. Xie, Z. Dai, E. Hovy, T. Luong, Q. Le, Unsupervised data augmenta- tion for consistency training, Advances in neural information processing systems 33 (2020) 6256–6268

  6. [6]

    mixup: Beyond Empirical Risk Minimization

    H. Zhang, M. Cisse, Y. N. Dauphin, D. Lopez-Paz, mixup: Beyond empirical risk minimization, arXiv preprint arXiv:1710.09412 (2017)

  7. [7]

    S. Yun, et al., Cutmix: Regularization strategy to train strong classi- fiers with localizable features, in: Proceedings of the IEEE/CVF inter- national conference on computer vision, 2019, pp. 6023–6032

  8. [8]

    J. Wang, X. Li, Y. Han, J. Qin, L. Wang, Z. Qichao, Separated con- trastive learning for organ-at-risk and gross-tumor-volume segmentation with limited annotation, in: Proceedings of the AAAI Conference on Ar- tificial Intelligence, Vol. 36, 2022, pp. 2459–2467

  9. [9]

    J. T. Cheng, et al., Eye gaze and visual attention as a window into leadership and followership: A review of empirical insights and future directions, The Leadership Quarterly (2022) 101654

  10. [10]

    Narganes-Pineda, A

    C. Narganes-Pineda, A. B. Chica, J. Lupiáñez, A. Marotta, Explicit vs. implicit spatial processing in arrow vs. eye-gaze spatial congruency effects, Psychological research 87 (1) (2023) 242–259

  11. [11]

    Khosravan, H

    N. Khosravan, H. Celik, B. Turkbey, R. Cheng, E. McCreedy, M. McAuliffe, S. Bednarova, E. Jones, X. Chen, P. Choyke, et al., Gaze2segment: A pilot study for integrating eye-tracking technology into medical image segmentation, in: Medical Computer Vision and Bayesian and Graphical Models for Biomedical Imaging: MICCAI 2016 International Workshops, MCV and ...

  12. [12]

    Ronneberger, P

    O. Ronneberger, P. Fischer, T. Brox, U-net: Convolutional networks for biomedicalimagesegmentation, in: InternationalConferenceonMedical Image Computing and Computer-Assisted Intervention, Springer, 2015, pp. 234–241. 27

  13. [13]

    Milletari, N

    F. Milletari, N. Navab, S.-A. Ahmadi, V-net: Fully convolutional neural networks for volumetric medical image segmentation, in: 2016 fourth international conference on 3D vision (3DV), Ieee, 2016, pp. 565–571

  14. [14]

    TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

    J. Chen, et al., Transunet: Transformers make strong encoders for med- ical image segmentation, arXiv preprint arXiv:2102.04306 (2021)

  15. [15]

    Cao, et al., Swin-unet: Unet-like pure transformer for medical image segmentation, in: European conference on computer vision, 2022, pp

    H. Cao, et al., Swin-unet: Unet-like pure transformer for medical image segmentation, in: European conference on computer vision, 2022, pp. 205–218

  16. [16]

    Zhou, et al., Learning deep features for discriminative localization, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp

    B. Zhou, et al., Learning deep features for discriminative localization, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2921–2929

  17. [17]

    J. Hu, L. Shen, G. Sun, Squeeze-and-excitation networks, in: Proceed- ings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7132–7141

  18. [18]

    Woo, et al., Cbam: Convolutional block attention module, in: the European conference on computer vision, 2018, pp

    S. Woo, et al., Cbam: Convolutional block attention module, in: the European conference on computer vision, 2018, pp. 3–19

  19. [19]

    D.-H.L.Pseudo-Label, Thesimpleandefficientsemi-supervisedlearning method for deep neural networks, in: ICML 2013 Workshop: Challenges in Representation Learning, 2013, pp. 1–6

  20. [20]

    Dong-DongChen, Z

    W. Dong-DongChen, Z. WeiGao, Tri-net for semi-supervised deep learn- ing, in: Proceedings of twenty-seventh international joint conference on artificial intelligence, 2018, pp. 2014–2020

  21. [21]

    Z.-H. Zhou, M. Li, Tri-training: Exploiting unlabeled data using three classifiers, IEEE Transactions on knowledge and Data Engineering 17 (11) (2005) 1529–1541

  22. [22]

    M. N. Rizve, et al., In defense of pseudo-labeling: An uncertainty-aware pseudo-label selection framework for semi-supervised learning, arXiv preprint arXiv:2101.06329 (2021)

  23. [23]

    Bortsova, F

    G. Bortsova, F. Dubost, L. Hogeweg, I. Katramados, M. De Bruijne, Semi-supervised medical image segmentation via learning consistency under transformations, in: MICCAI, Springer, 2019, pp. 810–818. 28

  24. [24]

    Fang, W.-J

    K. Fang, W.-J. Li, Dmnet: difference minimization network for semi- supervised segmentation in medical images, in: MICCAI, Springer, 2020, pp. 532–541

  25. [25]

    S. Li, et al., Shape-aware semi-supervised 3d semantic segmentation for medical images, in: International Conference on Medical Image Com- puting and Computer Assisted Intervention, 2020, pp. 552–561

  26. [26]

    Luo, et al., Semi-supervised medical image segmentation through dual-task consistency, in: Proceedings of the AAAI Conference on Ar- tificial Intelligence, Vol

    X. Luo, et al., Semi-supervised medical image segmentation through dual-task consistency, in: Proceedings of the AAAI Conference on Ar- tificial Intelligence, Vol. 35, 2021, pp. 8801–8809

  27. [27]

    Wang, et al., Semi-supervised medical image segmentation via a tripled-uncertainty guided mean teacher model with contrastive learn- ing, Medical Image Analysis 79 (2022) 102447

    K. Wang, et al., Semi-supervised medical image segmentation via a tripled-uncertainty guided mean teacher model with contrastive learn- ing, Medical Image Analysis 79 (2022) 102447

  28. [28]

    Xia, et al., Uncertainty-aware multi-view co-training for semi- supervised medical imagesegmentation and domainadaptation, Medical image analysis 65 (2020) 101766

    Y. Xia, et al., Uncertainty-aware multi-view co-training for semi- supervised medical imagesegmentation and domainadaptation, Medical image analysis 65 (2020) 101766

  29. [29]

    L. Yu, S. Wang, X. Li, C.-W. Fu, P.-A. Heng, Uncertainty-aware self- ensembling model for semi-supervised 3d left atrium segmentation, in: MICCAI, Springer, 2019, pp. 605–613

  30. [30]

    Y. Wu, Z. Wu, Q. Wu, Z. Ge, J. Cai, Exploring smoothness and class- separation for semi-supervised medical image segmentation, in: MIC- CAI, Springer, 2022, pp. 34–43

  31. [31]

    C. You, et al., Simcvd: Simple contrastive voxel-wise representation dis- tillation for semi-supervised medical image segmentation, IEEE Trans- actions on Medical Imaging 41 (9) (2022) 2228–2237

  32. [32]

    J. N. Stember, et al., Integrating eye tracking and speech recognition ac- curately annotates mr brain images for deep learning: proof of principle, Radiology: Artificial Intelligence 3 (2020) e200047

  33. [33]

    S. Wang, X. Ouyang, T. Liu, Q. Wang, D. Shen, Follow my eye: Us- ing gaze to supervise computer-aided diagnosis, IEEE Transactions on Medical Imaging 41 (7) (2022) 1688–1698. 29

  34. [34]

    C. Wang, D. Zhang, R. Ge, Eye-guided dual-path network for multi- organ segmentation of abdomen, in: International Conference on Med- ical Image Computing and Computer-Assisted Intervention, Springer, 2023, pp. 23–32

  35. [35]

    C. Ma, L. Zhao, Y. Chen, S. Wang, L. Guo, T. Zhang, D. Shen, X. Jiang, T. Liu, Eye-gaze-guided vision transformer for rectifying shortcut learn- ing, IEEE Transactions on Medical Imaging 42 (11) (2023) 3384–3394

  36. [36]

    S.Wang, Z.Zhao, Z.Zhuang, X.Ouyang, L.Zhang, Z.Li, C.Ma, T.Liu, D. Shen, Q. Wang, Learning better contrastive view from radiologist’s gaze, Pattern Recognition 162 (2025) 111350

  37. [37]

    Z. Zhao, S. Wang, Q. Wang, D. Shen, Mining gaze for contrastive learn- ing toward computer-assisted diagnosis, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38, 2024, pp. 7543–7551

  38. [38]

    O. Bernard, et al., Deep learning techniques for automatic mri cardiac multi-structures segmentation and diagnosis: is the problem solved?, IEEE transactions on medical imaging 37 (11) (2018) 2514–2525

  39. [39]

    Leclerc, et al., Deep learning for segmentation using an open large- scale dataset in 2d echocardiography, IEEE transactions on medical imaging 38 (9) (2019) 2198–2210

    S. Leclerc, et al., Deep learning for segmentation using an open large- scale dataset in 2d echocardiography, IEEE transactions on medical imaging 38 (9) (2019) 2198–2210

  40. [40]

    MICCAI multi-atlas labeling beyond cranial vault—workshop challenge, Vol

    B.Landman, Z.Xu, J.Igelsias, M.Styner, T.Langerak, A.Klein, Miccai multi-atlas labeling beyond the cranial vault–workshop and challenge, in: Proc. MICCAI multi-atlas labeling beyond cranial vault—workshop challenge, Vol. 5, Munich, Germany, 2015, p. 12

  41. [41]

    Shiraishi, S

    J. Shiraishi, S. Katsuragawa, J. Ikezoe, T. Matsumoto, T. Kobayashi, K.-i. Komatsu, M. Matsui, H. Fujita, Y. Kodera, K. Doi, Development of a digital image database for chest radiographs with and without a lung nodule: receiver operating characteristic analysis of radiologists’ detec- tion of pulmonary nodules, American journal of roentgenology 174 (1) (2...

  42. [42]

    Y. Bai, et al., Bidirectional copy-paste for semi-supervised medical im- age segmentation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 11514–11524. 30

  43. [43]

    L. Yu, et al., Uncertainty-aware self-ensembling model for semi- supervised 3d left atrium segmentation, in: International Conference on Medical Image Computing and Computer Assisted Intervention, 2019

  44. [44]

    Y. Wu, et al., Semi-supervised left atrium segmentation with mutual consistency training, in: International Conference on Medical Image Computing and Computer Assisted Intervention, 2021, pp. 297–306

  45. [45]

    S. Wang, Z. Zhao, L. Zhang, D. Shen, Q. Wang, Crafting good views of medical images for contrastive learning via expert-level visual attention, in: Gaze Meets Machine Learning Workshop, PMLR, 2024, pp. 266–279

  46. [46]

    J. Hu, L. Shen, S. Albanie, G. Sun, A. Vedaldi, Gather-excite: Exploit- ing feature context in convolutional neural networks, Advances in neural information processing systems 31 (2018)

  47. [47]

    J. Xie, Q. Zhang, Z. Cui, C. Ma, Y. Zhou, W. Wang, D. Shen, Integrat- ing eye tracking with grouped fusion networks for semantic segmentation on mammogram images, IEEE Transactions on Medical Imaging (2024). 31