Synergistic Foundation Models for Semi-Supervised Fetal Cardiac Ultrasound Analysis: SAM-Med2D Boundary Refinement and DINOv3 Semantic Enhancement
Pith reviewed 2026-05-20 06:31 UTC · model grok-4.3
The pith
A semi-supervised framework integrates SAM-Med2D boundary refinement and DINOv3 semantic enhancement to improve joint segmentation and classification of fetal cardiac ultrasound images.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that a semi-supervised joint segmentation and classification pipeline, built on the EchoCare backbone and augmented with SAM-Med2D for boundary refinement plus DINOv3 for better pseudo-labels, succeeds when equipped with view-specific hard masking and a two-stage optimization that first consolidates segmentation via EMA and then performs classification fine-tuning with segmentation parameters frozen.
What carries the argument
The two-stage optimization strategy of EMA consolidation for segmentation followed by classification fine-tuning with frozen segmentation parameters and a reset classification head, together with view-specific hard masking.
Load-bearing premise
The two-stage process of EMA segmentation consolidation followed by frozen-segmentation classification fine-tuning will recover classification performance without degrading the segmentation gains from SAM-Med2D and DINOv3.
What would settle it
An ablation experiment on the FETUS 2026 data that directly compares the two-stage method against a single-stage joint training baseline and shows whether both the Dice score and F1-score can be maintained simultaneously or if one metric drops when the other is optimized.
Figures
read the original abstract
We present a semi-supervised framework for joint segmentation and classification of fetal cardiac ultrasound images. Built upon the EchoCare multi-task backbone, our method integrates SAM-Med2D for boundary refinement and leverages DINOv3 to enhance pseudo-label quality. We introduce view-specific hard masking along with a two-stage optimization strategy: an EMA phase to consolidate segmentation capabilities, followed by a Classification Fine-Tuning phase that freezes segmentation parameters and resets the classification head to recover classification performance without compromising segmentation gains. Evaluated on the FETUS 2026 leaderboard, our method achieves a Dice Similarity Coefficient at 79.99%, Normalized Surface Distance at 61.62%, and F1-score at 41.20%, validating the effectiveness of our approach for prenatal congenital heart disease screening. Source code is publicly available at: https://github.com/2826056177/zcst_fetus2026.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a semi-supervised framework for joint segmentation and classification of fetal cardiac ultrasound images using the EchoCare backbone enhanced with SAM-Med2D for boundary refinement and DINOv3 for semantic pseudo-label improvement. It employs view-specific hard masking and a two-stage optimization consisting of an EMA phase for segmentation consolidation and a subsequent classification fine-tuning phase with frozen segmentation parameters. On the FETUS 2026 leaderboard, it reports a Dice Similarity Coefficient of 79.99%, Normalized Surface Distance of 61.62%, and F1-score of 41.20%.
Significance. Should the results prove robust upon verification of baselines and ablations, this work has the potential to advance semi-supervised techniques in prenatal ultrasound analysis for congenital heart disease. The public availability of the source code at the provided GitHub repository is a notable strength supporting reproducibility.
major comments (2)
- [Abstract] The abstract reports specific performance metrics (DSC at 79.99%, NSD at 61.62%, F1 at 41.20%) but omits baseline comparisons, statistical significance, error bars, data split details, and measurement of pseudo-label quality. Without these, the metrics cannot be fully verified as supporting the effectiveness claim.
- [Two-stage optimization] The two-stage optimization strategy is outlined, but no ablation study is presented to confirm that the segmentation gains from SAM-Med2D and DINOv3 are maintained after the classification fine-tuning phase. This absence is load-bearing for the central claim that the full pipeline achieves the reported segmentation metrics without degradation.
Simulated Author's Rebuttal
We thank the referee for their thoughtful review and constructive suggestions. We address each of the major comments below and outline the revisions we will make to the manuscript.
read point-by-point responses
-
Referee: [Abstract] The abstract reports specific performance metrics (DSC at 79.99%, NSD at 61.62%, F1 at 41.20%) but omits baseline comparisons, statistical significance, error bars, data split details, and measurement of pseudo-label quality. Without these, the metrics cannot be fully verified as supporting the effectiveness claim.
Authors: We agree that including baseline comparisons and additional details in the abstract would enhance clarity and verifiability. In the revised version, we will expand the abstract to briefly mention comparisons against standard semi-supervised baselines (e.g., Mean Teacher and FixMatch), note that results are reported as mean ± std over multiple runs, and reference the data splits and pseudo-label evaluation sections in the main text. This will better contextualize the reported metrics. revision: yes
-
Referee: [Two-stage optimization] The two-stage optimization strategy is outlined, but no ablation study is presented to confirm that the segmentation gains from SAM-Med2D and DINOv3 are maintained after the classification fine-tuning phase. This absence is load-bearing for the central claim that the full pipeline achieves the reported segmentation metrics without degradation.
Authors: We recognize that an ablation study is necessary to substantiate that the classification fine-tuning phase does not compromise the segmentation performance achieved in the EMA phase. Although the manuscript specifies that segmentation parameters are frozen during this phase, we will add a dedicated ablation experiment in the revised manuscript. This will compare segmentation metrics (Dice, NSD) immediately after the EMA phase versus after the full two-stage process to confirm no degradation occurs. revision: yes
Circularity Check
No significant circularity in method description or results reporting
full rationale
The paper describes an empirical semi-supervised pipeline that applies existing foundation models (SAM-Med2D for boundary refinement, DINOv3 for pseudo-label enhancement) plus a two-stage optimization (EMA consolidation followed by frozen-segmentation classification fine-tuning). No mathematical derivation chain, equations, or first-principles results are present in the abstract or described text. Performance numbers are reported directly against the external FETUS 2026 leaderboard benchmark rather than being derived from internal fits or self-referential definitions. The two-stage strategy is presented as a procedural choice without any reduction of final metrics to the inputs by construction, and no load-bearing self-citations or uniqueness theorems are invoked. This is a standard application paper whose central claims remain independent of the listed circularity patterns.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Pre-trained foundation models SAM-Med2D and DINOv3 produce useful features and pseudo-labels when applied to fetal cardiac ultrasound without extensive domain-specific retraining.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
two-stage optimization strategy: an EMA phase to consolidate segmentation capabilities, followed by a Classification Fine-Tuning phase that freezes segmentation parameters
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
INTRODUCTION Congenital heart disease (CHD) represents the most prevalent structural anomaly in fetuses and remains a leading cause of morbidity and mortality in newborns [2]. Prenatal screening for CHD relies heavily on the expert assessment of ultrasound images from standard cardiac views, including the four- chamber view (4CH), left ventricular outflow...
-
[2]
RELATED WORKS 2.1. Medical Image Segmentation U-Net [9] and its variants have achieved remarkable success in medical imaging. Transformer-based models like UNETR
-
[3]
and Swin-Transformer [13] have further advanced the field through self-attention mechanisms capturing long-range dependencies. The Segment Anything Model (SAM) [5] rep- resents a foundation model breakthrough, with SAM-Med2D specifically optimized for medical images, demonstrating su- perior adaptability to medical imaging characteristics. 2.2. Semi-Super...
-
[4]
EXPERIMENTAL DESIGN AND RESULT ANALYSIS 4.1. Dataset and Task Description The FETUS 2026 Challenge dataset comprises 5,000 stand- ard-view B-mode fetal cardiac ultrasound images, among which 2,800 are allocated for training (partially sourced from the FOCUS dataset [11]). Collected across multiple centers and devices, the dataset exhibits real-world chara...
work page 2026
-
[5]
ACKNOWLEDGMENTS This work was supported by the Guangdong Key Disciplines Project under grant number 2024ZDJS137
-
[6]
UNETR: Transformers for 3D Medical Image Segmentation,
Ali Hatamizadeh, Dong Yang, Holger R. Roth, and Daguang X u, “UNETR: Transformers for 3D Medical Image Segmentation,” 2 022 IEEE/CVF Winter Conference on Applications of Computer Vi sion (WACV), 2021 pp. 1748-1758
work page 2021
-
[7]
Prenatal di- agnosis of congenital heart disease: A review of current knowledge,
Bravo-Valenzuela, Nathalie Jeanne Magioli, et al., “Prenatal di- agnosis of congenital heart disease: A review of current knowledge,” Indian Heart Journal, vol. 70, no. 1, pp. 150-164, 2018
work page 2018
-
[8]
Elena D'Alberti, Olga Patey, Carolyn Smith, et al., “Artificial in- telligence-enabled prenatal ultrasound for the detection of fetal car- diac abnormalities: a systematic review and meta-analysis,” eClin- icalMedicine, vol. 84, May 2025
work page 2025
-
[9]
A Fully Open and Generalizable Foundation Model for Ultrasound Clinical Applications
Hongyuan Zhang, Yuheng Wu, Mingyang Zhao, et al., “A Fully Open and Generalizable Foundation Model for Ultrasound Clinical Applications.” arXiv preprint arXiv:2509.11752, 2025
-
[10]
Junlong Cheng, Jin Ye, Zhongying Deng, et al., “SAM-Med2D,” arXiv preprint arXiv:2308.16184, 2023
-
[11]
FixMat ch: Simplifying Semi-Supervised Learning with Consistency and C onfidence,
Kihyuk Sohn, David Berthelot, Nicholas Carlini, et al., “FixMat ch: Simplifying Semi-Supervised Learning with Consistency and C onfidence,” Advances in Neural Information Processing Systems (N eurIPS), 2020, vol. 33, pp. 596-608
work page 2020
-
[12]
Revisiting Weak-to-Strong Consistency in Semi-Supervised Se- mantic Segmentation,
Lihe Yang, Lei Qi, Litong Feng, Wayne Zhang, Yinghuan Shi, “Revisiting Weak-to-Strong Consistency in Semi-Supervised Se- mantic Segmentation,” 2023 IEEE/CVF Conference on Computer Vision and Patter n Recognition (CVPR), Vancouver, BC, Can- ada, 2023, pp. 7236-7246
work page 2023
-
[13]
Ogge G, Gaglioti P, Maccanti S, et al., “Prenatal screening for congenital heart disease with four‐chamber and outflow‐tract views: a multicenter study,” Ultrasound in Obstetrics and Gynecology: The Official Journal of the International Society of Ultrasound in Ob- stetrics and Gynecology, vol. 28, no. 6, pp. 779-784, 2006
work page 2006
-
[14]
U-net: C onvolutional networks for biomedical image segmentation,
Olaf Ronneberger, Philipp Fischer, and Thomas Brox,“U-net: C onvolutional networks for biomedical image segmentation,” Medic al Image Computing and Computer-Assisted Intervention – MICCA I 2015, Nassir Navab, Joachim Hornegger, William M. Wells, and Alejandro F. Frangi, Eds., Cham, 2015, pp. 234–241
work page 2015
-
[15]
Oriane Siméoni, Huy V. Vo, Maximilian Seitzer, et al., “DI- NOv3,” arXiv preprint arXiv:2508.10104, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[16]
Songxiong Wu, Hongyuan Zhang, Tingting Ye, et al., “FOCU S: Four-chamber Ultrasound Image Dataset for Fetal Cardiac Biom etric Measurement (1.0) [Data set],” Zenodo, https://doi.org/10.528 1/zenodo.14597550
-
[17]
CutMix : Regularization Strategy to Train Strong Classifiers with Localizab le Features,
Sangdoo Yun, Dongyoon Han, Seong Joon Oh, et al., “CutMix : Regularization Strategy to Train Strong Classifiers with Localizab le Features,” 2019 IEEE/CVF International Conference on Comput er Vision (ICCV), pp. 6022-6031, 2019
work page 2019
-
[18]
Swin Transformer: Hier archical Vision Transformer using Shifted Windows,
Ze Liu, Yutong Lin, Yue Cao, et al., “Swin Transformer: Hier archical Vision Transformer using Shifted Windows,” 2021 IEEE/ CVF International Conference on Computer Vision (ICCV), 2021, pp. 9992-10002
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.