FADA: Accessible fetal ultrasound interpretation and annotation with a selectively distilled unified vision-language model
Pith reviewed 2026-06-27 13:06 UTC · model grok-4.3
The pith
FADA builds a single vision-language model that unifies fetal ultrasound interpretation, detection, segmentation, and classification through selective distillation from four domain models without external labels at inference.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Selective distillation from the four domain-specific foundation models into Qwen3.5-VL via offline feature caching produces a unified vision-language model that executes a complete five-phase fetal ultrasound pipeline without requiring external labels or separate models at inference, with the recommended FADA-SKD variant reaching 0.8820 mean Dice, 0.7671 mAP@0.50, and 100% structured interpretation compliance while remaining trainable on one consumer GPU and deployable on edge devices.
What carries the argument
Selective distillation with offline pre-computed feature caching from four domain-specific foundation models, restricting feature alignment to annotation tasks only.
If this is right
- A single model replaces the need for separate task-specific networks for fetal ultrasound analysis.
- No expert-specified labels or external models are required at inference for any task.
- Clinically acceptable outputs are produced in both fully autonomous and human-guided modes.
- Full offline execution on commodity smartphones enables deployment in settings without internet or cloud access.
- Training fits on a single consumer GPU, lowering the barrier to local adaptation.
Where Pith is reading between the lines
- The selective restriction of distillation to annotation tasks may help preserve the base model's interpretive strengths compared with uniform alignment.
- The same caching-plus-selective-distillation pattern could be tested on other ultrasound domains such as cardiac or abdominal imaging.
- Direct integration with portable probe hardware would create an end-to-end offline prenatal screening workflow.
- Reducing the number of source models while monitoring performance could further simplify the pipeline.
Load-bearing premise
The pre-computed features from the four domain-specific models combined with selective distillation will produce a unified model that generalizes reliably to new clinical data without external labels or additional models at inference.
What would settle it
Running the model on a new, unseen dataset from different ultrasound machines or patient populations and observing Dice scores below 0.80 or interpretation compliance below 90 percent would falsify reliable generalization.
read the original abstract
A global shortage of trained sonographers limits prenatal ultrasound screening in low- and middle-income countries, where over half of pregnant women receive no skilled sonography. Current deep learning approaches address detection, segmentation, or classification in isolation, each demanding a separate model and expert-specified labels at inference. We present FADA, a unified vision-language model built on Qwen3.5-VL that performs clinical interpretation, classification, detection, and segmentation through a single interpretation-first pipeline without external labels. FADA distills knowledge from four domain-specific foundation models (FetalCLIP, UltraSAM, USF-MAE, UltraFedFM) via offline pre-computed feature caching. Selective distillation, which applies feature alignment only to annotation tasks while interpretation relies on standard fine-tuning, consistently outperforms full distillation across most evaluation axes. The recommended variant, FADA-SKD, achieves 0.8820 mean Dice for segmentation, 0.7671 mAP@0.50 for detection, and 100% structured interpretation compliance. Expert sonographer validation across 237 images confirms clinically acceptable outputs in both autonomous and human-in-the-loop modes, with 73.5% of interpretations scoring perfectly under clinician guidance. The system is trainable on a single consumer GPU and deployable without cloud connectivity. We validate edge deployment by running the compressed 0.8B model on a commodity smartphone (Qualcomm Snapdragon 7 Gen 1, 12 GB RAM) using llama.cpp with GGUF quantization, completing the full 5-phase pipeline in approximately 60 seconds entirely offline. This establishes a practical pathway for integrating AI-assisted fetal assessment with portable ultrasound devices, directly addressing diagnostic access gaps in resource-constrained settings. Code, models, and data are available at https://github.com/mahmoodphd/FADA.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces FADA, a unified vision-language model based on Qwen3.5-VL for fetal ultrasound that performs clinical interpretation, classification, detection, and segmentation via a single interpretation-first pipeline. It employs selective knowledge distillation from four offline domain-specific models (FetalCLIP, UltraSAM, USF-MAE, UltraFedFM) with feature caching, claiming that the FADA-SKD variant achieves 0.8820 mean Dice for segmentation, 0.7671 mAP@0.50 for detection, and 100% structured interpretation compliance. Expert sonographer review on 237 images confirms clinical acceptability in autonomous and human-in-the-loop modes, with the compressed model runnable offline on a smartphone in ~60 seconds. The work targets accessibility in low-resource settings without requiring external labels or source models at inference.
Significance. If the performance and generalization claims hold, the work has substantial significance for prenatal care in LMICs by unifying multiple ultrasound tasks into one deployable model that eliminates per-task labeling at inference and supports edge deployment on commodity hardware. The selective distillation approach and expert validation on real images are notable strengths if supported by fuller experimental detail.
major comments (4)
- [Methods] Methods/Results: The manuscript provides no description of the training dataset (size, sources, acquisition parameters, or train/val/test splits) or the composition of the 237-image expert validation set, which is load-bearing for interpreting the headline metrics of 0.8820 Dice and 0.7671 mAP.
- [Results] Results: No ablation tables or quantitative comparisons between selective distillation (SKD) and full distillation are shown, despite the explicit claim that SKD 'consistently outperforms full distillation across most evaluation axes'; this omission weakens the justification for the recommended variant.
- [Evaluation] Evaluation: Generalization is asserted for 'new clinical data' and 'unseen clinical distributions,' yet the only external check is expert review on a single 237-image internal set with no cross-site, multi-scanner, or geographic-shift experiments reported; this directly tests the central claim of reliable out-of-distribution performance without source models at inference.
- [Results] Results: No statistical tests, confidence intervals, or inter-rater agreement metrics accompany the performance numbers or the 73.5% perfect-score clinician guidance result, limiting assessment of whether the reported figures reliably support the clinical-acceptability conclusion.
minor comments (2)
- [Abstract] The abstract states that the system is 'trainable on a single consumer GPU' but provides no training protocol details (optimizer, learning rate schedule, epochs, or hardware specifications) that would allow reproduction.
- Consider adding a summary table comparing all FADA variants on the key metrics (Dice, mAP, compliance) to improve readability of the selective-distillation advantage.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments, which help improve the clarity and rigor of the manuscript. We address each major comment point by point below, indicating where revisions will be made.
read point-by-point responses
-
Referee: [Methods] Methods/Results: The manuscript provides no description of the training dataset (size, sources, acquisition parameters, or train/val/test splits) or the composition of the 237-image expert validation set, which is load-bearing for interpreting the headline metrics of 0.8820 Dice and 0.7671 mAP.
Authors: We agree that a detailed description of the datasets is essential. In the revised manuscript, we will add a dedicated subsection in Methods providing the training dataset size, sources, acquisition parameters, and train/val/test splits, along with the composition, demographics, and selection criteria for the 237-image expert validation set. revision: yes
-
Referee: [Results] Results: No ablation tables or quantitative comparisons between selective distillation (SKD) and full distillation are shown, despite the explicit claim that SKD 'consistently outperforms full distillation across most evaluation axes'; this omission weakens the justification for the recommended variant.
Authors: We acknowledge this gap. We will include new ablation tables in the revised Results section with quantitative comparisons between FADA-SKD and full distillation variants across segmentation, detection, classification, and interpretation metrics to support the stated performance advantages. revision: yes
-
Referee: [Evaluation] Evaluation: Generalization is asserted for 'new clinical data' and 'unseen clinical distributions,' yet the only external check is expert review on a single 237-image internal set with no cross-site, multi-scanner, or geographic-shift experiments reported; this directly tests the central claim of reliable out-of-distribution performance without source models at inference.
Authors: The 237-image set consists of images from new clinical acquisitions not used in training. We will revise the Evaluation section to clarify this and to explicitly note the limitations regarding multi-site and geographic generalization. Broader cross-site experiments are beyond the scope of the current resources and will be listed as future work. revision: partial
-
Referee: [Results] Results: No statistical tests, confidence intervals, or inter-rater agreement metrics accompany the performance numbers or the 73.5% perfect-score clinician guidance result, limiting assessment of whether the reported figures reliably support the clinical-acceptability conclusion.
Authors: We agree that statistical support is needed. In the revised manuscript, we will add statistical tests, 95% confidence intervals for the key metrics (Dice, mAP), and inter-rater agreement metrics (e.g., Cohen's kappa) for the expert sonographer evaluations. revision: yes
Circularity Check
No significant circularity; empirical results independent of inputs
full rationale
The paper describes an empirical pipeline: offline feature caching from four external foundation models, selective distillation into a Qwen3.5-VL backbone, and standard fine-tuning for interpretation. Reported metrics (0.8820 Dice, 0.7671 mAP, 100% compliance) and clinician review on 237 held-out images are obtained via conventional train/test splits and external validation, not by algebraic reduction to the training inputs or by re-labeling fitted parameters as predictions. No equations, self-definitions, or load-bearing self-citations appear in the method; the derivation chain consists of standard supervised training followed by independent evaluation and therefore remains self-contained.
Axiom & Free-Parameter Ledger
free parameters (1)
- selective distillation hyperparameters
axioms (1)
- domain assumption Pre-computed features from FetalCLIP, UltraSAM, USF-MAE, and UltraFedFM are sufficiently rich and aligned for the target fetal ultrasound tasks
Reference graph
Works this paper leans on
-
[1]
Who recommendations on antenatal care for a positive pregnancy experience-going beyond survival.BJOG: an international journal of obstetrics and gynaecology(2017)
Lawrie, T. Who recommendations on antenatal care for a positive pregnancy experience-going beyond survival.BJOG: an international journal of obstetrics and gynaecology(2017). 27
2017
-
[2]
T., Singh, K., Moran, A., Armbruster, D
Kim, E. T., Singh, K., Moran, A., Armbruster, D. & Kozuki, N. Obstetric ultra- sound use in low and middle income countries: a narrative review.Reproductive health15, 129 (2018)
2018
-
[3]
P.et al.Evaluation of deep convolutional neural networks for automatic classification of common maternal fetal ultrasound planes.Scientific Reports10, 10200 (2020)
Burgos-Artizzu, X. P.et al.Evaluation of deep convolutional neural networks for automatic classification of common maternal fetal ultrasound planes.Scientific Reports10, 10200 (2020)
2020
-
[4]
Guo, J.et al.Anatomical structures detection using topological constraint knowledge in fetal ultrasound.Neurocomputing619, 129143 (2025)
2025
-
[5]
L., de Bruijn, D., de Korte, C
van den Heuvel, T. L., de Bruijn, D., de Korte, C. L. & Ginneken, B. v. Automated measurement of fetal head circumference using 2d ultrasound images.PloS one 13, e0200412 (2018)
2018
-
[6]
Li, C.et al.Llava-med: Training a large language-and-vision assistant for biomedicine in one day.Advances in Neural Information Processing Systems36, 28541–28564 (2023)
2023
-
[7]
Jin, J.et al.Ultrasound-clip: Semantic-aware contrastive pre-training for ultrasound image-text understanding.arXiv preprint arXiv:2604.01749(2026)
arXiv 2026
-
[8]
He, X.et al.Epistemic-aware vision-language foundation model for fetal ultrasound interpretation.arXiv preprint arXiv:2510.12953(2025)
arXiv 2025
-
[9]
S., Kang, H., Chu, Y
Ryu, J. S., Kang, H., Chu, Y. & Yang, S. Vision-language foundation models for medical imaging: a review of current practices and innovations.Biomedical Engineering Letters15, 809–830 (2025)
2025
-
[10]
Kalp´ elb´ e, B. C., Adaambiik, A. G. & Peng, W. Vision language models in medicine.arXiv preprint arXiv:2503.01863(2025)
arXiv 2025
-
[11]
Weng, T.et al.Dolphin technical report: Multimodal large language models for ultrasound understanding.arXiv preprint arXiv:2509.25748(2025)
arXiv 2025
-
[12]
Li, X.et al.Knowledge distillation and teacher-student learning in medical imag- ing: Comprehensive overview, pivotal role, and future directions.Medical Image Analysis103819 (2025)
2025
-
[13]
Tran-Anh, D., Nguyen, T. N. A., Yang, H.-J. & Vu, H. N. Multiple teacher- student model guided knowledge distillation for malpositioned catheters and lines detection on chest x-rays.Discover Artificial Intelligence6, 40 (2026)
2026
-
[14]
Slimani, S.et al.Fetal biometry and amniotic fluid volume assessment end-to-end automation using deep learning.Nature Communications14, 7047 (2023)
2023
-
[15]
Benson, M.et al.Fetal gestational age estimation using artificial intelligence on non-targeted ultrasound images and video.npj Digital Medicine8, 700 (2025). 28
2025
-
[16]
Medical Image Analysis104043 (2026)
Bai, J.et al.Beyond benchmarks of iugc: Rethinking requirements of deep learn- ing method for intrapartum ultrasound biometry from fetal ultrasound videos. Medical Image Analysis104043 (2026)
2026
-
[17]
Guo, X.et al.A visually grounded language model for fetal ultrasound understanding.Nature Biomedical Engineering1–17 (2026)
2026
-
[18]
Maani, F.et al.Fetalclip: A visual-language foundation model for fetal ultrasound image analysis.arXiv preprint arXiv:2502.14807(2025)
arXiv 2025
-
[19]
Saeed, N., Maani, F. A. & Yaqub, M. Mobilefetalclip: Selective repulsive knowledge distillation for mobile fetal ultrasound analysis.arXiv preprint arXiv:2603.05421(2026)
Pith/arXiv arXiv 2026
-
[20]
B.et al.Human in the loop artificial intelligence in healthcare: applications, outcomes, and implementation challenges.International Journal of Medical Informatics106362 (2026)
Olawade, D. B.et al.Human in the loop artificial intelligence in healthcare: applications, outcomes, and implementation challenges.International Journal of Medical Informatics106362 (2026)
2026
-
[21]
& Alhejaily, A.-M
Wadie, P., Zakher, B., Elgazzar, K., Alsbakhi, A. & Alhejaily, A.-M. G. Ai in point-of-care imaging for clinical decision support: Systematic review of diagnostic accuracy, task-shifting, and explainability.JMIR AI5, e80928 (2026)
2026
-
[22]
Vega, R.et al.Overcoming barriers in the use of artificial intelligence in point of care ultrasound.NPJ Digital Medicine8, 213 (2025)
2025
-
[23]
& Walker, D
Della Ripa, S., Santos, N. & Walker, D. Ai-enabled obstetric point-of-care ultra- sound as an emerging technology in low-and middle-income countries: provider and health system perspectives.BMC Pregnancy and Childbirth25, 729 (2025)
2025
-
[24]
K., Ruby, L
Abrokwa, S. K., Ruby, L. C., Heuvelings, C. C. & Belard, S. Task shifting for point of care ultrasound in primary healthcare in low-and middle-income countries-a systematic review.EClinicalMedicine45(2022)
2022
-
[25]
& Giansanti, D
Morelli, S. & Giansanti, D. Recent advances in ai-driven mobile health enhancing healthcare—narrative insights into latest progress.Bioengineering13, 54 (2025)
2025
-
[26]
F., Humayun, M
Almufareh, M. F., Humayun, M. & Haseeb, K. Transforming smart health- care systems with ai-driven edge computing for distributed iomt networks. Bioengineering12, 1232 (2025)
2025
-
[27]
& Chen, X
Feng, Q., Li, W., Lin, T. & Chen, X. Align-kd: Distilling cross-modal alignment knowledge for mobile vision-language large model enhancement.CVPR4178– 4188 (2025)
2025
-
[28]
Gou, J., Yu, B., Maybank, S. J. & Tao, D. Knowledge distillation: A survey. International journal of computer vision129, 1789–1819 (2021). 29
2021
-
[29]
Ge, H.et al.Clinkd: Cross-modal clinical knowledge distiller for multi-task medical images.arXiv preprint arXiv:2502.05928(2025)
arXiv 2025
-
[30]
Cao, J.et al.Move-kd: Knowledge distillation for vlms with mixture of visual encoders.CVPR19846–19856 (2025)
2025
-
[31]
Computer Methods and Programs in Biomedicine226, 107170 (2022)
Lin, Q.et al.How much can AI see in early pregnancy: A multi-center study of fetus head characterization in week 10–14 in ultrasound using deep learning. Computer Methods and Programs in Biomedicine226, 107170 (2022)
2022
-
[32]
& Dong, F
Cui, C. & Dong, F. Dataset for fetus framework (2022). URL https://data. mendeley.com/datasets/n2rbrb9t4f/1
2022
-
[33]
Ashkani Chenarlogh, V.et al.Fast and accurate U-Net model for fetal ultrasound image segmentation.Ultrasonic Imaging44, 25–38 (2022)
2022
-
[34]
URL https://github.com/vahidashkani/Fast-U-Net
Ashkani Chenarlogh, V.et al.Fast-U-Net pubic symphysis segmentation dataset (2022). URL https://github.com/vahidashkani/Fast-U-Net. GitHub repository
2022
-
[35]
S.et al.Fetal abdominal structures segmentation dataset using ultrasonic images (2023)
Da Correggio, K. S.et al.Fetal abdominal structures segmentation dataset using ultrasonic images (2023). URL https://data.mendeley.com/datasets/ 4gcpm9dsc3/1
2023
-
[36]
Stoean, R.et al.First trimester fetal echocardiography data set for classifi- cation (2022). URL https://figshare.com/articles/figure/First Trimester Fetal Echocardiography Data Set for Classification/21215492
arXiv 2022
-
[37]
Alzubaidi, M.et al.Large-scale annotation dataset for fetal head biometry in ultrasound images.Data in Brief51, 109708 (2023)
2023
-
[38]
URL https://zenodo.org/records/14597550
Wu, S.et al.FOCUS: Four-chamber ultrasound image dataset for fetal cardiac biometric measurement (2025). URL https://zenodo.org/records/14597550
arXiv 2025
-
[39]
S., Hamelmann, P., Ostrowski, E
Prabakaran, B. S., Hamelmann, P., Ostrowski, E. & Shafique, M. FPUS23: an ultrasound fetus phantom dataset with deep neural network evaluations for fetus orientations, fetal planes, and anatomical features.IEEE Access11, 58308–58317 (2023)
2023
-
[40]
Chen, Z.et al.Fetal head and pubic symphysis segmentation in intrapartum ultrasound image using a dual-path boundary-guided residual network.IEEE Journal of Biomedical and Health Informatics28, 4648–4659 (2024)
2024
-
[41]
P.et al.FETAL PLANES DB: Common maternal-fetal ultrasound images (2020)
Burgos-Artizzu, X. P.et al.FETAL PLANES DB: Common maternal-fetal ultrasound images (2020). URL https://zenodo.org/records/3904280
arXiv 2020
-
[42]
Bai, J., Chen, G., Lu, Y., Wang, H. & Ou, Z. PSFHS: Intrapartum ultra- sound image dataset for AI-based segmentation of pubic symphysis and fetal head (2024). URL https://zenodo.org/records/10969427. 30
arXiv 2024
-
[43]
Bai, S.et al.Qwen3-vl technical report.arXiv preprint arXiv:2511.21631(2025)
Pith/arXiv arXiv 2025
-
[44]
J.et al.Lora: Low-rank adaptation of large language models.Iclr1, 3 (2022)
Hu, E. J.et al.Lora: Low-rank adaptation of large language models.Iclr1, 3 (2022)
2022
-
[45]
& Padoy, N
Meyer, A., Murali, A., Zarin, F., Mutter, D. & Padoy, N. Ultrasam: a foundation model for ultrasound using large open-access segmentation datasets.International Journal of Computer Assisted Radiology and Surgery21, 93–102 (2026)
2026
-
[46]
Megahed, Y.et al.Usf-mae: Ultrasound self-supervised foundation model with masked autoencoding.Biomedical Signal Processing and Control122, 110313 (2026)
2026
-
[47]
Jiang, Y.et al.From pretraining to privacy: federated ultrasound foundation model with self-supervised learning.npj Digital Medicine8, 714 (2025)
2025
-
[48]
& Han, M
Han, D. & Han, M. Unsloth: Fast and memory-efficient fine-tuning. https://github.com/unslothai/unsloth (2024). 31
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.