pith. sign in

arxiv: 2605.19799 · v1 · pith:E32QJUT6new · submitted 2026-05-19 · 💻 cs.CV · cs.AI

Synergistic Foundation Models for Semi-Supervised Fetal Cardiac Ultrasound Analysis: SAM-Med2D Boundary Refinement and DINOv3 Semantic Enhancement

Pith reviewed 2026-05-20 06:31 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords semi-supervised learningfetal cardiac ultrasoundimage segmentationimage classificationSAM-Med2DDINOv3congenital heart disease
0
0 comments X

The pith

A semi-supervised framework integrates SAM-Med2D boundary refinement and DINOv3 semantic enhancement to improve joint segmentation and classification of fetal cardiac ultrasound images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a semi-supervised approach for fetal cardiac ultrasound that jointly segments heart structures and classifies images or views. It starts from an EchoCare multi-task model and adds SAM-Med2D to sharpen segmentation boundaries while using DINOv3 to raise the quality of pseudo-labels created from unlabeled scans. View-specific hard masking is applied during training, and a two-stage optimization first uses EMA to strengthen segmentation before freezing those parameters and retraining only the classification head. Results on the FETUS 2026 leaderboard show concrete metric gains that indicate practical value for prenatal congenital heart disease screening.

Core claim

The central claim is that a semi-supervised joint segmentation and classification pipeline, built on the EchoCare backbone and augmented with SAM-Med2D for boundary refinement plus DINOv3 for better pseudo-labels, succeeds when equipped with view-specific hard masking and a two-stage optimization that first consolidates segmentation via EMA and then performs classification fine-tuning with segmentation parameters frozen.

What carries the argument

The two-stage optimization strategy of EMA consolidation for segmentation followed by classification fine-tuning with frozen segmentation parameters and a reset classification head, together with view-specific hard masking.

Load-bearing premise

The two-stage process of EMA segmentation consolidation followed by frozen-segmentation classification fine-tuning will recover classification performance without degrading the segmentation gains from SAM-Med2D and DINOv3.

What would settle it

An ablation experiment on the FETUS 2026 data that directly compares the two-stage method against a single-stage joint training baseline and shows whether both the Dice score and F1-score can be maintained simultaneously or if one metric drops when the other is optimized.

Figures

Figures reproduced from arXiv: 2605.19799 by China), Shanglong Hu (1), Technology, Tonghao Zhuang (1), Yongsheng Luo (1), Yu Li (1) ((1) Zhuhai College of Science, Zhiqi Zhang (1), Zhuhai.

Figure 2
Figure 2. Figure 2: Overall Framework* [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 2
Figure 2. Figure 2: Comparison between Ground Truth and Method e) 5. CONCLUSION This paper presents a semi-supervised framework for joint fe￾tal cardiac ultrasound image segmentation and classification, built upon the EchoCare multi-task backbone and UniMatch training paradigm. By integrating SAM-Med2D for boundary refinement and DINOv3 for semantic enhancement of pseudo-labels, our method achieves significant improve￾ments o… view at source ↗
read the original abstract

We present a semi-supervised framework for joint segmentation and classification of fetal cardiac ultrasound images. Built upon the EchoCare multi-task backbone, our method integrates SAM-Med2D for boundary refinement and leverages DINOv3 to enhance pseudo-label quality. We introduce view-specific hard masking along with a two-stage optimization strategy: an EMA phase to consolidate segmentation capabilities, followed by a Classification Fine-Tuning phase that freezes segmentation parameters and resets the classification head to recover classification performance without compromising segmentation gains. Evaluated on the FETUS 2026 leaderboard, our method achieves a Dice Similarity Coefficient at 79.99%, Normalized Surface Distance at 61.62%, and F1-score at 41.20%, validating the effectiveness of our approach for prenatal congenital heart disease screening. Source code is publicly available at: https://github.com/2826056177/zcst_fetus2026.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper introduces a semi-supervised framework for joint segmentation and classification of fetal cardiac ultrasound images using the EchoCare backbone enhanced with SAM-Med2D for boundary refinement and DINOv3 for semantic pseudo-label improvement. It employs view-specific hard masking and a two-stage optimization consisting of an EMA phase for segmentation consolidation and a subsequent classification fine-tuning phase with frozen segmentation parameters. On the FETUS 2026 leaderboard, it reports a Dice Similarity Coefficient of 79.99%, Normalized Surface Distance of 61.62%, and F1-score of 41.20%.

Significance. Should the results prove robust upon verification of baselines and ablations, this work has the potential to advance semi-supervised techniques in prenatal ultrasound analysis for congenital heart disease. The public availability of the source code at the provided GitHub repository is a notable strength supporting reproducibility.

major comments (2)
  1. [Abstract] The abstract reports specific performance metrics (DSC at 79.99%, NSD at 61.62%, F1 at 41.20%) but omits baseline comparisons, statistical significance, error bars, data split details, and measurement of pseudo-label quality. Without these, the metrics cannot be fully verified as supporting the effectiveness claim.
  2. [Two-stage optimization] The two-stage optimization strategy is outlined, but no ablation study is presented to confirm that the segmentation gains from SAM-Med2D and DINOv3 are maintained after the classification fine-tuning phase. This absence is load-bearing for the central claim that the full pipeline achieves the reported segmentation metrics without degradation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful review and constructive suggestions. We address each of the major comments below and outline the revisions we will make to the manuscript.

read point-by-point responses
  1. Referee: [Abstract] The abstract reports specific performance metrics (DSC at 79.99%, NSD at 61.62%, F1 at 41.20%) but omits baseline comparisons, statistical significance, error bars, data split details, and measurement of pseudo-label quality. Without these, the metrics cannot be fully verified as supporting the effectiveness claim.

    Authors: We agree that including baseline comparisons and additional details in the abstract would enhance clarity and verifiability. In the revised version, we will expand the abstract to briefly mention comparisons against standard semi-supervised baselines (e.g., Mean Teacher and FixMatch), note that results are reported as mean ± std over multiple runs, and reference the data splits and pseudo-label evaluation sections in the main text. This will better contextualize the reported metrics. revision: yes

  2. Referee: [Two-stage optimization] The two-stage optimization strategy is outlined, but no ablation study is presented to confirm that the segmentation gains from SAM-Med2D and DINOv3 are maintained after the classification fine-tuning phase. This absence is load-bearing for the central claim that the full pipeline achieves the reported segmentation metrics without degradation.

    Authors: We recognize that an ablation study is necessary to substantiate that the classification fine-tuning phase does not compromise the segmentation performance achieved in the EMA phase. Although the manuscript specifies that segmentation parameters are frozen during this phase, we will add a dedicated ablation experiment in the revised manuscript. This will compare segmentation metrics (Dice, NSD) immediately after the EMA phase versus after the full two-stage process to confirm no degradation occurs. revision: yes

Circularity Check

0 steps flagged

No significant circularity in method description or results reporting

full rationale

The paper describes an empirical semi-supervised pipeline that applies existing foundation models (SAM-Med2D for boundary refinement, DINOv3 for pseudo-label enhancement) plus a two-stage optimization (EMA consolidation followed by frozen-segmentation classification fine-tuning). No mathematical derivation chain, equations, or first-principles results are present in the abstract or described text. Performance numbers are reported directly against the external FETUS 2026 leaderboard benchmark rather than being derived from internal fits or self-referential definitions. The two-stage strategy is presented as a procedural choice without any reduction of final metrics to the inputs by construction, and no load-bearing self-citations or uniqueness theorems are invoked. This is a standard application paper whose central claims remain independent of the listed circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Only the abstract is available, so the ledger is necessarily incomplete; the method rests on the unexamined transferability of SAM-Med2D and DINOv3 features to fetal ultrasound and on the unproven stability of the two-stage optimization.

axioms (1)
  • domain assumption Pre-trained foundation models SAM-Med2D and DINOv3 produce useful features and pseudo-labels when applied to fetal cardiac ultrasound without extensive domain-specific retraining.
    The abstract invokes these models for boundary refinement and semantic enhancement but supplies no ablation or domain-adaptation analysis.

pith-pipeline@v0.9.0 · 5731 in / 1521 out tokens · 64001 ms · 2026-05-20T06:31:54.466841+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · 1 internal anchor

  1. [1]

    INTRODUCTION Congenital heart disease (CHD) represents the most prevalent structural anomaly in fetuses and remains a leading cause of morbidity and mortality in newborns [2]. Prenatal screening for CHD relies heavily on the expert assessment of ultrasound images from standard cardiac views, including the four- chamber view (4CH), left ventricular outflow...

  2. [2]

    Medical Image Segmentation U-Net [9] and its variants have achieved remarkable success in medical imaging

    RELATED WORKS 2.1. Medical Image Segmentation U-Net [9] and its variants have achieved remarkable success in medical imaging. Transformer-based models like UNETR

  3. [3]

    and Swin-Transformer [13] have further advanced the field through self-attention mechanisms capturing long-range dependencies. The Segment Anything Model (SAM) [5] rep- resents a foundation model breakthrough, with SAM-Med2D specifically optimized for medical images, demonstrating su- perior adaptability to medical imaging characteristics. 2.2. Semi-Super...

  4. [4]

    EXPERIMENTAL DESIGN AND RESULT ANALYSIS 4.1. Dataset and Task Description The FETUS 2026 Challenge dataset comprises 5,000 stand- ard-view B-mode fetal cardiac ultrasound images, among which 2,800 are allocated for training (partially sourced from the FOCUS dataset [11]). Collected across multiple centers and devices, the dataset exhibits real-world chara...

  5. [5]

    ACKNOWLEDGMENTS This work was supported by the Guangdong Key Disciplines Project under grant number 2024ZDJS137

  6. [6]

    UNETR: Transformers for 3D Medical Image Segmentation,

    Ali Hatamizadeh, Dong Yang, Holger R. Roth, and Daguang X u, “UNETR: Transformers for 3D Medical Image Segmentation,” 2 022 IEEE/CVF Winter Conference on Applications of Computer Vi sion (WACV), 2021 pp. 1748-1758

  7. [7]

    Prenatal di- agnosis of congenital heart disease: A review of current knowledge,

    Bravo-Valenzuela, Nathalie Jeanne Magioli, et al., “Prenatal di- agnosis of congenital heart disease: A review of current knowledge,” Indian Heart Journal, vol. 70, no. 1, pp. 150-164, 2018

  8. [8]

    Artificial in- telligence-enabled prenatal ultrasound for the detection of fetal car- diac abnormalities: a systematic review and meta-analysis,

    Elena D'Alberti, Olga Patey, Carolyn Smith, et al., “Artificial in- telligence-enabled prenatal ultrasound for the detection of fetal car- diac abnormalities: a systematic review and meta-analysis,” eClin- icalMedicine, vol. 84, May 2025

  9. [9]

    A Fully Open and Generalizable Foundation Model for Ultrasound Clinical Applications

    Hongyuan Zhang, Yuheng Wu, Mingyang Zhao, et al., “A Fully Open and Generalizable Foundation Model for Ultrasound Clinical Applications.” arXiv preprint arXiv:2509.11752, 2025

  10. [10]

    Sam-med2d,

    Junlong Cheng, Jin Ye, Zhongying Deng, et al., “SAM-Med2D,” arXiv preprint arXiv:2308.16184, 2023

  11. [11]

    FixMat ch: Simplifying Semi-Supervised Learning with Consistency and C onfidence,

    Kihyuk Sohn, David Berthelot, Nicholas Carlini, et al., “FixMat ch: Simplifying Semi-Supervised Learning with Consistency and C onfidence,” Advances in Neural Information Processing Systems (N eurIPS), 2020, vol. 33, pp. 596-608

  12. [12]

    Revisiting Weak-to-Strong Consistency in Semi-Supervised Se- mantic Segmentation,

    Lihe Yang, Lei Qi, Litong Feng, Wayne Zhang, Yinghuan Shi, “Revisiting Weak-to-Strong Consistency in Semi-Supervised Se- mantic Segmentation,” 2023 IEEE/CVF Conference on Computer Vision and Patter n Recognition (CVPR), Vancouver, BC, Can- ada, 2023, pp. 7236-7246

  13. [13]

    Prenatal screening for congenital heart disease with four‐chamber and outflow‐tract views: a multicenter study,

    Ogge G, Gaglioti P, Maccanti S, et al., “Prenatal screening for congenital heart disease with four‐chamber and outflow‐tract views: a multicenter study,” Ultrasound in Obstetrics and Gynecology: The Official Journal of the International Society of Ultrasound in Ob- stetrics and Gynecology, vol. 28, no. 6, pp. 779-784, 2006

  14. [14]

    U-net: C onvolutional networks for biomedical image segmentation,

    Olaf Ronneberger, Philipp Fischer, and Thomas Brox,“U-net: C onvolutional networks for biomedical image segmentation,” Medic al Image Computing and Computer-Assisted Intervention – MICCA I 2015, Nassir Navab, Joachim Hornegger, William M. Wells, and Alejandro F. Frangi, Eds., Cham, 2015, pp. 234–241

  15. [15]

    DINOv3

    Oriane Siméoni, Huy V. Vo, Maximilian Seitzer, et al., “DI- NOv3,” arXiv preprint arXiv:2508.10104, 2025

  16. [16]

    FOCU S: Four-chamber Ultrasound Image Dataset for Fetal Cardiac Biom etric Measurement (1.0) [Data set],

    Songxiong Wu, Hongyuan Zhang, Tingting Ye, et al., “FOCU S: Four-chamber Ultrasound Image Dataset for Fetal Cardiac Biom etric Measurement (1.0) [Data set],” Zenodo, https://doi.org/10.528 1/zenodo.14597550

  17. [17]

    CutMix : Regularization Strategy to Train Strong Classifiers with Localizab le Features,

    Sangdoo Yun, Dongyoon Han, Seong Joon Oh, et al., “CutMix : Regularization Strategy to Train Strong Classifiers with Localizab le Features,” 2019 IEEE/CVF International Conference on Comput er Vision (ICCV), pp. 6022-6031, 2019

  18. [18]

    Swin Transformer: Hier archical Vision Transformer using Shifted Windows,

    Ze Liu, Yutong Lin, Yue Cao, et al., “Swin Transformer: Hier archical Vision Transformer using Shifted Windows,” 2021 IEEE/ CVF International Conference on Computer Vision (ICCV), 2021, pp. 9992-10002