arxiv: 2604.10754 · v1 · submitted 2026-04-12 · 📡 eess.IV

Recognition: unknown

Human Gaze-based Dual Teacher Guidance Learning for Semi-Supervised Medical Image Segmentation

Rongjun Ge , Chong Wang , Yuxin Liu , Chunqiang Lu , Cong Xia , Yehui Jiang , Fangyi Xu , Yinsu Zhu

show 5 more authors

Daoqiang Zhang Chengyu Liu Yang Chen Shuo Li Yuting He

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:28 UTC · model grok-4.3

classification 📡 eess.IV

keywords semi-supervised segmentationmedical image segmentationhuman gazemean teacherdual teachergaze guidancedata mixingmulti-scale perception

0 comments

The pith

Human gaze data acts as an extra teacher to improve semi-supervised medical image segmentation

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to establish that human gaze recordings, which are cheaper to collect than full pixel annotations, can function as a second teacher signal inside the standard mean-teacher semi-supervised learning setup for medical images. By adding a data-mixing step guided by gaze, a multi-scale perception module that incorporates gaze patterns, and a loss term that pulls the network's attention toward gaze locations, the approach aims to enlarge the effective training set and sharpen the model's focus on target structures even when most images lack labels. A reader would care because this could let segmentation models reach higher accuracy using far fewer expensive manual outlines, while still drawing on the large pool of unlabeled scans that clinics already produce. The authors test the claim on several imaging modalities and ten different organs or tissues to argue that the benefit is not limited to one narrow setting.

Core claim

The authors introduce the Human Gaze-based Dual Teacher Guidance Learning (HG-DTGL) model, in which human gaze serves as an additional hidden teacher within the mean-teacher framework. They create GazeMix to produce reliable mixed training examples that carry gaze information, add a Multi-scale Gaze Perception module to extract gaze-informed features at multiple resolutions, and define a Gaze Loss that forces the network output to align with human gaze patterns. Extensive experiments show the resulting model outperforms prior semi-supervised baselines on multiple datasets spanning different modalities and ten organs or tissues.

What carries the argument

The dual-teacher guidance structure that treats human gaze as the second teacher, realized through GazeMix for gaze-informed data mixing, the Multi-scale Gaze Perception module for feature extraction, and the Gaze Loss for aligning model predictions with gaze locations.

If this is right

GazeMix expands the diversity and effective size of the training set without requiring additional full annotations.
The Multi-scale Gaze Perception module strengthens the network's ability to locate target regions at varying sizes and contexts.
The Gaze Loss term pulls the model's internal attention maps into agreement with human visual focus.
The combined system delivers higher segmentation accuracy than standard mean-teacher training across varied modalities.
Performance advantages hold for a total of ten different organs and tissues, indicating broad applicability.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Routine collection of gaze data during clinical reading could become a low-cost supplement to existing annotation pipelines.
The same gaze-guidance idea might transfer to other semi-supervised vision tasks where eye-tracking recordings can be obtained cheaply.
Gains may vary with the consistency of gaze collection hardware and instructions, pointing to a need for standardized protocols.
If the method scales, it could lower the annotation burden for training segmentation networks in resource-limited medical settings.

Load-bearing premise

Expert gaze data remains a reliable and generalizable signal that does not introduce new biases when applied across different imaging modalities and anatomical targets.

What would settle it

Retraining HG-DTGL on a fresh multi-modal dataset with expert gaze annotations and finding that it yields no accuracy gain over a plain mean-teacher baseline that ignores gaze.

Figures

Figures reproduced from arXiv: 2604.10754 by Chengyu Liu, Chong Wang, Chunqiang Lu, Cong Xia, Daoqiang Zhang, Fangyi Xu, Rongjun Ge, Shuo Li, Yang Chen, Yehui Jiang, Yinsu Zhu, Yuting He, Yuxin Liu.

**Figure 1.** Figure 1: (a) Human gaze heatmap: Humans are able to accurately gaze and identify the correct foreground organ region even with various backgrounds. (b) Network attention heatmap: Due to the similarity between the segmentation foreground and background in medical images, the network faces challenges in focusing on the target. Human beings possess the ability to accurately fixate on the same target even when the su… view at source ↗

**Figure 2.** Figure 2: Overall architecture of our proposed HG-DTGL model for semi-supervised med [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Gaze points before (a) and after (b) the saccades filtered out. The generated [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Illustration of the human visual perception-based GazeMix procedure. [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Visualization comparison between our proposed HG-DTGL model and different [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

**Figure 6.** Figure 6: Visualization comparison between our proposed HG-DTGL model and different [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗

**Figure 7.** Figure 7: Gaze Loss guides the mode to deeply focus on the target tissue like human [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗

**Figure 8.** Figure 8: Visualization comparison of our proposed GazeMix with Mixup and CutMix, [PITH_FULL_IMAGE:figures/full_fig_p020_8.png] view at source ↗

**Figure 9.** Figure 9: Visualization comparison of our proposed GazeMix with Mixup and CutMix, [PITH_FULL_IMAGE:figures/full_fig_p021_9.png] view at source ↗

**Figure 10.** Figure 10: Ablation study of the weights λ in the loss function, model performs the best performance in both ACDC and CAMUS datasets with λ = 0.5 . ance of human perception, so that enhances the segmentation ability of the model, even with only a small amount of labeled training data. c) Performance Analysis of MGP vs. Spatial Attention vs. CBAM: As shown in [PITH_FULL_IMAGE:figures/full_fig_p023_10.png] view at source ↗

read the original abstract

In the field of medical image segmentation, the scarcity of labeled data poses a major challenge for existing models to accurately perceive target regions. Compared with manual annotation, gaze data is easier and cheaper to obtain. As a classical semi-supervised learning framework, mean-teacher can effectively use a large number of unlabeled medical images for stable training through self-teaching and collaborative optimization. Our study is based on the mean-teacher framework. By combining gaze data, it aims to address two crucial issues in semi-supervised medical image segmentation: 1) expand the scale and diversity of the dataset with limited labeled data; 2) enhance the network's perception ability. We propose the Human Gaze-based Dual Teacher Guidance Learning model (HG-DTGL). In this model, human gaze serves as an additional hidden `teacher' in the mean-teacher architecture. We introduce the GazeMix to generate reliable mixed data to expand the diversity and scale of the dataset, and the Multi-scale Gaze Perception (MGP) module is used to extract the multi-scale perception of the network. A Gaze Loss is designed to align the model's perception with human gaze. We have verified HG-DTGL on multiple datasets of different modalities and achieved superior performance on a total of ten different organs/tissues, with extensive experiments. This demonstrates that our method has strong generalization ability for medical images of different modalities, and shows the great application potential of gaze data in semi-supervised medical image segmentation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adds human gaze as a second teacher inside mean-teacher with GazeMix and an MGP module, but the abstract supplies no metrics, ablations, or error bars to show the gains are real.

read the letter

The core move is straightforward: take the standard mean-teacher setup for semi-supervised medical segmentation and treat collected gaze fixations as an extra teacher signal. They add GazeMix to create mixed samples that increase dataset diversity and a Multi-scale Gaze Perception module plus Gaze Loss to steer the student network toward human attention patterns. That specific bundle is not in the prior work they cite, so the combination counts as new engineering rather than a routine tweak. The motivation is practical—gaze is cheaper than full masks—and the two stated goals (expand data scale and improve perception) line up with real bottlenecks in radiology datasets. If the full experiments hold up, this could give groups with eye-trackers a usable way to stretch limited labels across modalities. The main weakness is the missing evidence. The abstract claims superior results on ten organs across CT, MRI, and ultrasound, yet reports no Dice scores, no baseline comparisons, no ablation tables, and no indication that gaze collection was standardized or tested for modality-specific bias. The stress-test concern lands: nothing shows that gaze patterns transfer without recalibration or that the dual-teacher loop actually isolates the gaze contribution from simple data expansion. Without those numbers the central claim stays untestable from the text we have. This is for labs already running mean-teacher variants and willing to add eye-tracking hardware. A reader looking for a new supervision trick might borrow the GazeMix or MGP ideas, but anyone needing reproducible gains or deployment numbers will have to wait for the full tables. I would send it to peer review so referees can demand the missing experiments and check whether the gaze signal stays stable across the claimed modalities.

Referee Report

3 major / 2 minor

Summary. The paper proposes HG-DTGL, a semi-supervised medical image segmentation model extending the mean-teacher framework by treating human gaze data as an additional 'hidden teacher.' It introduces GazeMix to expand dataset scale and diversity via mixed samples, the Multi-scale Gaze Perception (MGP) module to extract multi-scale gaze-aligned features, and a Gaze Loss to enforce alignment between model predictions and expert gaze patterns. The authors claim verification across multiple modalities with superior performance on ten organs/tissues.

Significance. If the empirical claims hold after proper validation, the work could be significant for demonstrating how low-cost gaze data can serve as a stable auxiliary signal in mean-teacher semi-supervised learning, potentially improving generalization in data-scarce medical imaging settings across CT, MRI, and ultrasound.

major comments (3)

[Abstract] Abstract: the central claim of 'superior performance on a total of ten different organs/tissues, with extensive experiments' is unsupported by any quantitative metrics, tables, baselines, error bars, or ablation results in the manuscript, rendering the primary empirical assertion unevaluable.
[Method] Method section (description of GazeMix, MGP, and Gaze Loss): the assumption that expert gaze signals remain reliable and distributionally consistent across modalities and organs is load-bearing for the dual-teacher claim yet is not tested via cross-modality transfer experiments or bias analysis.
[Experiments] Experiments section: no details are supplied on gaze collection protocol standardization, inter-expert variability, or whether fixation heatmaps require modality-specific recalibration, which directly affects whether the reported gains can be attributed to gaze guidance rather than dataset expansion alone.

minor comments (2)

[Abstract] The abstract uses 'hidden teacher' without a precise definition relative to the standard mean-teacher student-teacher pair.
[Abstract] Acronym HG-DTGL is introduced before its full expansion.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We appreciate the opportunity to clarify aspects of our work and strengthen the presentation. Below we respond point-by-point to the major comments, indicating where revisions will be made to address the concerns.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim of 'superior performance on a total of ten different organs/tissues, with extensive experiments' is unsupported by any quantitative metrics, tables, baselines, error bars, or ablation results in the manuscript, rendering the primary empirical assertion unevaluable.

Authors: We agree that the abstract would be strengthened by including concrete quantitative support for the claims. Although the experiments section of the manuscript contains the relevant tables, baseline comparisons, ablation studies, and performance metrics (including Dice scores across the ten organs), these are not summarized in the abstract. In the revised manuscript we will update the abstract to explicitly reference key results, such as average Dice improvements and comparisons to mean-teacher baselines, with pointers to the corresponding tables and figures. revision: yes
Referee: [Method] Method section (description of GazeMix, MGP, and Gaze Loss): the assumption that expert gaze signals remain reliable and distributionally consistent across modalities and organs is load-bearing for the dual-teacher claim yet is not tested via cross-modality transfer experiments or bias analysis.

Authors: The consistency of gaze signals across modalities is indeed important for the dual-teacher framework. Our current experiments already demonstrate performance gains on CT, MRI, and ultrasound datasets spanning ten organs, which provides indirect support for cross-modal reliability. However, we acknowledge the absence of dedicated cross-modality transfer experiments and explicit bias analysis. In the revision we will add a new subsection presenting cross-modality transfer results (training on gaze from one modality and evaluating on another) together with a bias discussion to directly test and substantiate this assumption. revision: yes
Referee: [Experiments] Experiments section: no details are supplied on gaze collection protocol standardization, inter-expert variability, or whether fixation heatmaps require modality-specific recalibration, which directly affects whether the reported gains can be attributed to gaze guidance rather than dataset expansion alone.

Authors: We regret the omission of these methodological details. The revised manuscript will include a new subsection under Experiments that fully describes the gaze collection protocol, including standardization procedures, the number of participating experts, quantitative inter-expert variability measures (e.g., overlap statistics on fixation maps), and any modality-specific recalibration steps applied to the heatmaps. This addition will allow readers to better assess the contribution of gaze guidance versus simple data augmentation. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical extension of mean-teacher with independent components

full rationale

The paper extends the established mean-teacher framework by adding human gaze as a second teacher signal, along with the GazeMix augmentation, Multi-scale Gaze Perception module, and Gaze Loss. These additions are presented as novel architectural and loss-function contributions without any equations, parameter fits, or derivations that reduce by construction to inputs defined inside the paper. Claims rest on experimental results across ten organs and multiple modalities rather than self-referential mathematical steps. No load-bearing self-citations, uniqueness theorems, or ansatzes imported from prior author work are invoked in the derivation chain. The method is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 3 invented entities

The central claim rests on the assumption that gaze data can be treated as reliable supervisory signal and that the three new modules integrate stably into mean-teacher without introducing instability or bias. No free parameters are explicitly named in the abstract, but loss weighting coefficients between gaze loss and standard teacher losses are implicitly required. The new modules themselves are invented constructs whose independent evidence is limited to the performance claim.

axioms (2)

domain assumption Mean-teacher self-teaching produces stable training on unlabeled medical images
Invoked as the base framework the authors extend.
domain assumption Human gaze provides informative and consistent cues for target region perception
Core premise that justifies adding gaze as an extra teacher.

invented entities (3)

GazeMix no independent evidence
purpose: Generate reliable mixed data to expand dataset diversity
New data-augmentation procedure introduced in the model.
Multi-scale Gaze Perception (MGP) module no independent evidence
purpose: Extract multi-scale gaze-informed features
New network component added to the architecture.
Gaze Loss no independent evidence
purpose: Align model perception with recorded human gaze
New loss term designed to enforce gaze consistency.

pith-pipeline@v0.9.0 · 5601 in / 1607 out tokens · 35114 ms · 2026-05-10T15:28:27.142552+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

47 extracted references · 3 canonical work pages · 2 internal anchors

[1]

Sohn, et al., Fixmatch: Simplifying semi-supervised learning with consistency and confidence, Advances in neural information processing systems 33 (2020) 596–608

K. Sohn, et al., Fixmatch: Simplifying semi-supervised learning with consistency and confidence, Advances in neural information processing systems 33 (2020) 596–608

2020
[2]

T. Miyato, et al., Virtual adversarial training: a regularization method for supervised and semi-supervised learning, IEEE transactions on pat- tern analysis and machine intelligence 41 (8) (2018) 1979–1993

2018
[3]

X. Li, L. Yu, H. Chen, C.-W. Fu, L. Xing, P.-A. Heng, Transformation- consistent self-ensembling model for semisupervised medical image seg- mentation, IEEE Transactions on Neural Networks and Learning Sys- tems 32 (2) (2020) 523–534. 26

2020
[4]

X. Lai, et al., Semi-supervised semantic segmentation with directional context-aware consistency, in: Proceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition, 2021, pp. 1205–1214

2021
[5]

Q. Xie, Z. Dai, E. Hovy, T. Luong, Q. Le, Unsupervised data augmenta- tion for consistency training, Advances in neural information processing systems 33 (2020) 6256–6268

2020
[6]

mixup: Beyond Empirical Risk Minimization

H. Zhang, M. Cisse, Y. N. Dauphin, D. Lopez-Paz, mixup: Beyond empirical risk minimization, arXiv preprint arXiv:1710.09412 (2017)

work page internal anchor Pith review arXiv 2017
[7]

S. Yun, et al., Cutmix: Regularization strategy to train strong classi- fiers with localizable features, in: Proceedings of the IEEE/CVF inter- national conference on computer vision, 2019, pp. 6023–6032

2019
[8]

J. Wang, X. Li, Y. Han, J. Qin, L. Wang, Z. Qichao, Separated con- trastive learning for organ-at-risk and gross-tumor-volume segmentation with limited annotation, in: Proceedings of the AAAI Conference on Ar- tificial Intelligence, Vol. 36, 2022, pp. 2459–2467

2022
[9]

J. T. Cheng, et al., Eye gaze and visual attention as a window into leadership and followership: A review of empirical insights and future directions, The Leadership Quarterly (2022) 101654

2022
[10]

Narganes-Pineda, A

C. Narganes-Pineda, A. B. Chica, J. Lupiáñez, A. Marotta, Explicit vs. implicit spatial processing in arrow vs. eye-gaze spatial congruency effects, Psychological research 87 (1) (2023) 242–259

2023
[11]

Khosravan, H

N. Khosravan, H. Celik, B. Turkbey, R. Cheng, E. McCreedy, M. McAuliffe, S. Bednarova, E. Jones, X. Chen, P. Choyke, et al., Gaze2segment: A pilot study for integrating eye-tracking technology into medical image segmentation, in: Medical Computer Vision and Bayesian and Graphical Models for Biomedical Imaging: MICCAI 2016 International Workshops, MCV and ...

2016
[12]

Ronneberger, P

O. Ronneberger, P. Fischer, T. Brox, U-net: Convolutional networks for biomedicalimagesegmentation, in: InternationalConferenceonMedical Image Computing and Computer-Assisted Intervention, Springer, 2015, pp. 234–241. 27

2015
[13]

Milletari, N

F. Milletari, N. Navab, S.-A. Ahmadi, V-net: Fully convolutional neural networks for volumetric medical image segmentation, in: 2016 fourth international conference on 3D vision (3DV), Ieee, 2016, pp. 565–571

2016
[14]

TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

J. Chen, et al., Transunet: Transformers make strong encoders for med- ical image segmentation, arXiv preprint arXiv:2102.04306 (2021)

work page internal anchor Pith review arXiv 2021
[15]

Cao, et al., Swin-unet: Unet-like pure transformer for medical image segmentation, in: European conference on computer vision, 2022, pp

H. Cao, et al., Swin-unet: Unet-like pure transformer for medical image segmentation, in: European conference on computer vision, 2022, pp. 205–218

2022
[16]

Zhou, et al., Learning deep features for discriminative localization, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp

B. Zhou, et al., Learning deep features for discriminative localization, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2921–2929

2016
[17]

J. Hu, L. Shen, G. Sun, Squeeze-and-excitation networks, in: Proceed- ings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7132–7141

2018
[18]

Woo, et al., Cbam: Convolutional block attention module, in: the European conference on computer vision, 2018, pp

S. Woo, et al., Cbam: Convolutional block attention module, in: the European conference on computer vision, 2018, pp. 3–19

2018
[19]

D.-H.L.Pseudo-Label, Thesimpleandefficientsemi-supervisedlearning method for deep neural networks, in: ICML 2013 Workshop: Challenges in Representation Learning, 2013, pp. 1–6

2013
[20]

Dong-DongChen, Z

W. Dong-DongChen, Z. WeiGao, Tri-net for semi-supervised deep learn- ing, in: Proceedings of twenty-seventh international joint conference on artificial intelligence, 2018, pp. 2014–2020

2018
[21]

Z.-H. Zhou, M. Li, Tri-training: Exploiting unlabeled data using three classifiers, IEEE Transactions on knowledge and Data Engineering 17 (11) (2005) 1529–1541

2005
[22]

M. N. Rizve, et al., In defense of pseudo-labeling: An uncertainty-aware pseudo-label selection framework for semi-supervised learning, arXiv preprint arXiv:2101.06329 (2021)

work page arXiv 2021
[23]

Bortsova, F

G. Bortsova, F. Dubost, L. Hogeweg, I. Katramados, M. De Bruijne, Semi-supervised medical image segmentation via learning consistency under transformations, in: MICCAI, Springer, 2019, pp. 810–818. 28

2019
[24]

Fang, W.-J

K. Fang, W.-J. Li, Dmnet: difference minimization network for semi- supervised segmentation in medical images, in: MICCAI, Springer, 2020, pp. 532–541

2020
[25]

S. Li, et al., Shape-aware semi-supervised 3d semantic segmentation for medical images, in: International Conference on Medical Image Com- puting and Computer Assisted Intervention, 2020, pp. 552–561

2020
[26]

Luo, et al., Semi-supervised medical image segmentation through dual-task consistency, in: Proceedings of the AAAI Conference on Ar- tificial Intelligence, Vol

X. Luo, et al., Semi-supervised medical image segmentation through dual-task consistency, in: Proceedings of the AAAI Conference on Ar- tificial Intelligence, Vol. 35, 2021, pp. 8801–8809

2021
[27]

Wang, et al., Semi-supervised medical image segmentation via a tripled-uncertainty guided mean teacher model with contrastive learn- ing, Medical Image Analysis 79 (2022) 102447

K. Wang, et al., Semi-supervised medical image segmentation via a tripled-uncertainty guided mean teacher model with contrastive learn- ing, Medical Image Analysis 79 (2022) 102447

2022
[28]

Xia, et al., Uncertainty-aware multi-view co-training for semi- supervised medical imagesegmentation and domainadaptation, Medical image analysis 65 (2020) 101766

Y. Xia, et al., Uncertainty-aware multi-view co-training for semi- supervised medical imagesegmentation and domainadaptation, Medical image analysis 65 (2020) 101766

2020
[29]

L. Yu, S. Wang, X. Li, C.-W. Fu, P.-A. Heng, Uncertainty-aware self- ensembling model for semi-supervised 3d left atrium segmentation, in: MICCAI, Springer, 2019, pp. 605–613

2019
[30]

Y. Wu, Z. Wu, Q. Wu, Z. Ge, J. Cai, Exploring smoothness and class- separation for semi-supervised medical image segmentation, in: MIC- CAI, Springer, 2022, pp. 34–43

2022
[31]

C. You, et al., Simcvd: Simple contrastive voxel-wise representation dis- tillation for semi-supervised medical image segmentation, IEEE Trans- actions on Medical Imaging 41 (9) (2022) 2228–2237

2022
[32]

J. N. Stember, et al., Integrating eye tracking and speech recognition ac- curately annotates mr brain images for deep learning: proof of principle, Radiology: Artificial Intelligence 3 (2020) e200047

2020
[33]

S. Wang, X. Ouyang, T. Liu, Q. Wang, D. Shen, Follow my eye: Us- ing gaze to supervise computer-aided diagnosis, IEEE Transactions on Medical Imaging 41 (7) (2022) 1688–1698. 29

2022
[34]

C. Wang, D. Zhang, R. Ge, Eye-guided dual-path network for multi- organ segmentation of abdomen, in: International Conference on Med- ical Image Computing and Computer-Assisted Intervention, Springer, 2023, pp. 23–32

2023
[35]

C. Ma, L. Zhao, Y. Chen, S. Wang, L. Guo, T. Zhang, D. Shen, X. Jiang, T. Liu, Eye-gaze-guided vision transformer for rectifying shortcut learn- ing, IEEE Transactions on Medical Imaging 42 (11) (2023) 3384–3394

2023
[36]

S.Wang, Z.Zhao, Z.Zhuang, X.Ouyang, L.Zhang, Z.Li, C.Ma, T.Liu, D. Shen, Q. Wang, Learning better contrastive view from radiologist’s gaze, Pattern Recognition 162 (2025) 111350

2025
[37]

Z. Zhao, S. Wang, Q. Wang, D. Shen, Mining gaze for contrastive learn- ing toward computer-assisted diagnosis, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38, 2024, pp. 7543–7551

2024
[38]

O. Bernard, et al., Deep learning techniques for automatic mri cardiac multi-structures segmentation and diagnosis: is the problem solved?, IEEE transactions on medical imaging 37 (11) (2018) 2514–2525

2018
[39]

Leclerc, et al., Deep learning for segmentation using an open large- scale dataset in 2d echocardiography, IEEE transactions on medical imaging 38 (9) (2019) 2198–2210

S. Leclerc, et al., Deep learning for segmentation using an open large- scale dataset in 2d echocardiography, IEEE transactions on medical imaging 38 (9) (2019) 2198–2210

2019
[40]

MICCAI multi-atlas labeling beyond cranial vault—workshop challenge, Vol

B.Landman, Z.Xu, J.Igelsias, M.Styner, T.Langerak, A.Klein, Miccai multi-atlas labeling beyond the cranial vault–workshop and challenge, in: Proc. MICCAI multi-atlas labeling beyond cranial vault—workshop challenge, Vol. 5, Munich, Germany, 2015, p. 12

2015
[41]

Shiraishi, S

J. Shiraishi, S. Katsuragawa, J. Ikezoe, T. Matsumoto, T. Kobayashi, K.-i. Komatsu, M. Matsui, H. Fujita, Y. Kodera, K. Doi, Development of a digital image database for chest radiographs with and without a lung nodule: receiver operating characteristic analysis of radiologists’ detec- tion of pulmonary nodules, American journal of roentgenology 174 (1) (2...

2000
[42]

Y. Bai, et al., Bidirectional copy-paste for semi-supervised medical im- age segmentation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 11514–11524. 30

2023
[43]

L. Yu, et al., Uncertainty-aware self-ensembling model for semi- supervised 3d left atrium segmentation, in: International Conference on Medical Image Computing and Computer Assisted Intervention, 2019

2019
[44]

Y. Wu, et al., Semi-supervised left atrium segmentation with mutual consistency training, in: International Conference on Medical Image Computing and Computer Assisted Intervention, 2021, pp. 297–306

2021
[45]

S. Wang, Z. Zhao, L. Zhang, D. Shen, Q. Wang, Crafting good views of medical images for contrastive learning via expert-level visual attention, in: Gaze Meets Machine Learning Workshop, PMLR, 2024, pp. 266–279

2024
[46]

J. Hu, L. Shen, S. Albanie, G. Sun, A. Vedaldi, Gather-excite: Exploit- ing feature context in convolutional neural networks, Advances in neural information processing systems 31 (2018)

2018
[47]

J. Xie, Q. Zhang, Z. Cui, C. Ma, Y. Zhou, W. Wang, D. Shen, Integrat- ing eye tracking with grouped fusion networks for semantic segmentation on mammogram images, IEEE Transactions on Medical Imaging (2024). 31

2024