FetSelect: Task-Specific Architectures and Self-Supervised Learning for Automated Fetal Ultrasound Frame Selection

Khalid Alyafei; Mahmood Alzubaidi; Marco Agus; Mohammed Ammar; Mowafa Househ; Raden Muaz; Uzair Shah

arxiv: 2606.22487 · v1 · pith:TVVE3DGPnew · submitted 2026-06-21 · 💻 cs.CV

FetSelect: Task-Specific Architectures and Self-Supervised Learning for Automated Fetal Ultrasound Frame Selection

Mahmood Alzubaidi , Raden Muaz , Uzair Shah , Mohammed Ammar , Khalid Alyafei , Mowafa Househ , Marco Agus This is my paper

Pith reviewed 2026-06-26 10:43 UTC · model grok-4.3

classification 💻 cs.CV

keywords fetal ultrasoundframe selectionself-supervised learningquality assessmentbiometrycomputer visiondeep learning

0 comments

The pith

FetSelect uses a frozen vision backbone with BYOL pretraining and a hybrid multi-head design to select quality fetal ultrasound frames for biometry.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents FetSelect as a task-specific system for picking suitable frames from fetal ultrasound videos to support measurements such as crown-rump length, nuchal translucency, nasal bone, and scalebar. Most earlier methods either score general image quality or start from the assumption that good frames are already at hand. FetSelect keeps a vision foundation model frozen, adapts it through BYOL self-supervision on nearly 19,000 unlabeled images, and routes features through a Task-Gated classification head and a detection-derived quality head whose outputs are fused by learned weights. On 974 held-out expert-labeled frames the model reaches a mean AUROC of 0.956 and a mean correlation of 0.818 with the expert scores, with further gains shown on external videos and an additional 509 CRL images.

Core claim

FetSelect shows that pairing a frozen vision backbone pretrained with BYOL on unlabeled fetal ultrasound images with a hybrid architecture of task-gated classification and detection-derived quality scoring fused by learned weights produces frame selections that align closely with expert quality judgments across four distinct biometry targets.

What carries the argument

Hybrid multi-head design of a Task-Gated classification head and a Detection-derived quality head combined via learned fusion, applied after BYOL adaptation of a frozen vision backbone.

If this is right

Downstream fetal biometry pipelines can receive frames directly from video without manual curation.
Self-supervision on unlabeled scans improves discrimination without extra expert annotations.
Task-specific heads allow quality criteria to differ across targets such as CRL versus NT.
Performance observed on external clinical videos and additional CRL images suggests the method can transfer beyond the original training distribution.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same hybrid-head pattern could be tested on frame selection for other ultrasound examinations such as cardiac or abdominal scans.
Adding explicit temporal modeling across consecutive video frames might further raise agreement with experts.
The reliance on a frozen backbone implies that larger foundation models pretrained on broader medical data could raise the ceiling without retraining the entire network.

Load-bearing premise

Expert-labeled frames supply a reliable and unbiased ground truth for quality, and the held-out test set together with external evaluations are enough to show generalization across sites and equipment.

What would settle it

A prospective study that feeds FetSelect-selected frames into actual clinical biometry software and measures whether the resulting length or thickness values differ systematically from those obtained by expert-selected frames on the same patients and machines.

Figures

Figures reproduced from arXiv: 2606.22487 by Khalid Alyafei, Mahmood Alzubaidi, Marco Agus, Mohammed Ammar, Mowafa Househ, Raden Muaz, Uzair Shah.

**Figure 1.** Figure 1: FetSelect overview. FetSelect ingests midsagittal ultrasound videos or frames and outputs task-specific frame-quality scores for CRL, NT, NB, and scalebar; the highest-scoring frames are selected for clinical use. Medical image quality assessment [15], ultrasound frame selection [1,11,22], clip retrieval [21], and foundation-model IQA [12] typically emphasize global quality or single endpoints and do not e… view at source ↗

**Figure 2.** Figure 2: FetSelect architecture. Phase 1: BYOL pretraining on 19K unlabeled images with C-RADIO-B backbone. Phase 2: Supervised training of Task-Gated, Detection, and Fusion heads (backbone frozen). Phase 3: Inference-time frame ranking across four clinical tasks. Annotation quality and label definition. The dataset was annotated by a senior sonographer (4 years experience). We employ rule-guided scoring (Equatio… view at source ↗

read the original abstract

Automated frame selection for fetal biometry remains under addressed, with most prior work targeting generic quality assessment or downstream measurement pipelines that assume suitable frames are available. We introduce FetSelect, a task-specific framework that pairs a frozen vision foundation backbone with a hybrid multi-head design: a Task-Gated classification head and a Detection-derived quality head combined via learned fusion. We curate 6,486 expert-labeled frames across four targets: Crown-Rump Length (CRL), Nuchal Translucency (NT), Nasal Bone (NB), and Scalebar, and adapt the backbone with BYOL pretraining on 19,019 unlabeled images. On a held-out test set (974 frames), FetSelect achieves mean AUROC 0.956 and mean correlation 0.818 with expert quality annotations. Ablations confirm that hybrid fusion surpasses single-head variants, and ultrasound-specific self-supervision yields consistent gains. Evaluation on external clinical videos and 509 external CRL images demonstrates task-specific discrimination.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FetSelect applies a hybrid head plus ultrasound-tuned BYOL to fetal frame selection and reports AUROC 0.956 on held-out data, but the single-expert labels without agreement stats are the main limit on what those numbers can tell us.

read the letter

The colleague should know two things up front. First, the paper takes existing pieces—frozen vision backbone, BYOL on unlabeled ultrasound, and a hybrid classification-plus-detection head fused by learned weights—and shows they work together for picking usable frames in four fetal biometry targets. Second, the numbers on the 974-frame test set (mean AUROC 0.956, correlation 0.818) plus external video and CRL image checks are the concrete output; ablations indicate the fusion and the ultrasound-specific pretraining each add something.

The dataset curation (6486 labeled frames from 19019 unlabeled) and the external evaluations are the parts that feel like real work. The task is narrow but practical, and the paper stays focused on it rather than claiming broad new methods.

The soft spot is the ground truth. All metrics rest on expert quality annotations, yet the abstract gives no inter-rater overlap, no kappa or ICC, and no protocol details. If those labels carry rater-specific noise or bias, the reported gains and the external results inherit the same uncertainty. That is not fatal for an application paper, but it caps how strongly the numbers can be read as evidence of clinical reliability.

This is for groups already working on medical imaging pipelines or prenatal workflow tools. A reader who needs a ready frame-selection component for fetal US would get value from the architecture choices and the ablation results. It is not a foundational methods paper, so most people outside that niche can skip it.

I would send it to peer review. The empirical framing is honest, the task gap is real, and the missing label validation is fixable with added statistics rather than a rewrite of the core claim.

Referee Report

1 major / 2 minor

Summary. The manuscript introduces FetSelect, a task-specific framework for automated fetal ultrasound frame selection. It pairs a frozen vision foundation backbone (adapted via BYOL self-supervision on 19,019 unlabeled images) with a hybrid multi-head architecture consisting of a Task-Gated classification head and a Detection-derived quality head, fused via learned weights. The authors curate 6,486 expert-labeled frames across four targets (CRL, NT, NB, Scalebar) and report mean AUROC 0.956 and mean correlation 0.818 with expert annotations on a 974-frame held-out test set. Ablations are presented to support the hybrid design and self-supervision benefits, with additional results on external clinical videos and 509 external CRL images.

Significance. If the central results hold after addressing label reliability, the work provides a concrete empirical demonstration that hybrid task-specific heads plus ultrasound-adapted self-supervision can yield strong discrimination for frame quality in fetal biometry, an area with limited prior task-specific methods. The external evaluations offer modest support for generalization claims beyond the primary dataset.

major comments (1)

[Abstract] Abstract (performance claims) and data curation description: All reported metrics (mean AUROC 0.956 and mean correlation 0.818 on the 974-frame held-out set, plus external results) are computed exclusively against expert quality annotations as ground truth. No inter-rater agreement statistics (kappa, ICC, or multi-rater overlap) are supplied for the 6,486 labeled frames. This is load-bearing for the central claim of reliable task-specific selection, because without evidence that the labels reflect a stable quality signal rather than rater-specific noise, the ablation gains and external-video discrimination cannot be interpreted as evidence of clinical utility.

minor comments (2)

[Abstract] Abstract: clarify whether the reported 'mean AUROC' and 'mean correlation' are macro-averages across the four targets or computed differently; this affects interpretation of the aggregate numbers.
The manuscript would benefit from an explicit limitations paragraph addressing potential site/equipment biases in the external sets and the dependence on single-rater labels.

Simulated Author's Rebuttal

1 responses · 1 unresolved

We thank the referee for highlighting the importance of label reliability. We address the single major comment below and note that our response is limited by the data collection protocol used in the study.

read point-by-point responses

Referee: [Abstract] Abstract (performance claims) and data curation description: All reported metrics (mean AUROC 0.956 and mean correlation 0.818 on the 974-frame held-out set, plus external results) are computed exclusively against expert quality annotations as ground truth. No inter-rater agreement statistics (kappa, ICC, or multi-rater overlap) are supplied for the 6,486 labeled frames. This is load-bearing for the central claim of reliable task-specific selection, because without evidence that the labels reflect a stable quality signal rather than rater-specific noise, the ablation gains and external-video discrimination cannot be interpreted as evidence of clinical utility.

Authors: We agree that the absence of inter-rater agreement metrics is a substantive limitation. Each of the 6,486 frames was annotated by a single expert following standard clinical protocols for the four targets, and no multi-rater overlap was collected. Consequently we cannot compute kappa, ICC, or similar statistics. We will revise the manuscript to (i) state this limitation explicitly in a new Limitations paragraph, (ii) qualify all performance claims as being relative to the available single-expert annotations rather than a proven stable ground truth, and (iii) moderate language concerning immediate clinical utility. The external-video and external-CRL results remain informative as consistency checks but cannot substitute for multi-rater reliability data. revision: partial

standing simulated objections not resolved

Provision of inter-rater agreement statistics (kappa, ICC, or multi-rater overlap) for the 6,486 labeled frames, as multiple independent annotations per frame were never collected.

Circularity Check

0 steps flagged

No circularity: purely empirical reporting with no derivations or self-referential predictions

full rationale

The manuscript contains no equations, derivations, or first-principles claims. All reported results are direct empirical measurements (AUROC 0.956, correlation 0.818) computed on a held-out test set against external expert labels. Self-supervision (BYOL) and hybrid-head ablations are standard training procedures whose outputs are evaluated on independent data; none reduce to the inputs by construction. No self-citation chains, uniqueness theorems, or fitted-parameter-as-prediction patterns appear. The work is therefore self-contained as standard ML experimentation.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, the central claim rests on standard machine-learning assumptions plus the domain-specific premise that expert frame quality labels are reliable and that the curated dataset distribution matches real clinical use.

free parameters (1)

learned fusion weights
The hybrid combination of the two heads uses learned fusion parameters that are fitted during training on the labeled data.

axioms (1)

domain assumption Expert annotations serve as reliable ground truth for ultrasound frame quality.
All reported AUROC and correlation numbers are computed against these expert labels.

pith-pipeline@v0.9.1-grok · 5726 in / 1287 out tokens · 35573 ms · 2026-06-26T10:43:19.278123+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

32 extracted references · 19 canonical work pages

[1]

In: MICCAI

Akumu, T., Elbatel, M., Campello, V.M., Osuala, R., Martin-Isla, C., Valenzuela, I., Li, X., Khanal, B., Lekadir, K.: Adaptive frame selection for gestational age es- timation from blind sweep fetal ultrasound videos. In: MICCAI. pp. 3–13. Springer (2025). https://doi.org/10.1007/978-3-032-05185-1_1

work page doi:10.1007/978-3-032-05185-1_1 2025
[2]

https://doi.org/https://doi.org/10.1016/j.dib.2023.109708

Alzubaidi, M., Agus, M., Makhlouf, M., Anver, F., Alyafei, K., Househ, M.: Large- scaleannotationdatasetforfetalheadbiometryinultrasoundimages.DatainBrief 51, 109708 (2023). https://doi.org/https://doi.org/10.1016/j.dib.2023.109708

work page doi:10.1016/j.dib.2023.109708 2023
[3]

Bardes, A., Ponce, J., LeCun, Y.: Vicreg: Variance-invariance-covariance regular- ization for self-supervised learning (2022), https://arxiv.org/abs/2105.04906

Pith/arXiv arXiv 2022
[4]

IEEE Transactions on Medical Imaging36(11), 2204–2215 (2017)

Baumgartner, C.F., Kamnitsas, K., Matthew, J., Smith, S., Kainz, B., Rueckert, D.: Sononet: Real-time detection and localisation of fetal standard scan planes in freehand ultrasound. IEEE Transactions on Medical Imaging36(11), 2204–2215 (2017). https://doi.org/10.1109/TMI.2017.2712367

work page doi:10.1109/tmi.2017.2712367 2017
[5]

Nature Scientific Reports10, 10200 (2020)

Burgos-Artizzu, X., Coronado-Gutiérrez, D., Valenzuela-Alcaraz, B., Bonet-Carne, E., Eixarch, E., Crispi, F., Gratacós, E.: Evaluation of deep convolutional neu- ral networks for automatic classification of common maternal fetal ultrasound planes. Nature Scientific Reports10, 10200 (2020). https://doi.org/10.1038/ s41598-020-67076-5

2020
[6]

org/abs/2104.14294

Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., Joulin, A.: Emerging properties in self-supervised vision transformers (2021), https://arxiv. org/abs/2104.14294

Pith/arXiv arXiv 2021
[7]

Scientific Data11(1), 436 (2024)

Chen, G., Bai, J., Ou, Z., Lu, Y., Wang, H.: Psfhs: Intrapartum ultrasound image dataset for ai-based segmentation of pubic symphysis and fetal head. Scientific Data11(1), 436 (2024). https://doi.org/10.1038/s41597-024-03266-4

work page doi:10.1038/s41597-024-03266-4 2024
[8]

https://doi.org/10.17632/4gcpm9dsc3.1, https://data.mendeley.com/datasets/4gcpm9dsc3/1

Correggio, K.S.D., Galluzzo, R.N., Santos, L.O., Barroso, F.S.M., Chaves, T.Z.L., Onofre, A.S.C., von Wangenheim, A.: Fetal abdominal structures segmentation dataset using ultrasonic images (2023). https://doi.org/10.17632/4gcpm9dsc3.1, https://data.mendeley.com/datasets/4gcpm9dsc3/1

work page doi:10.17632/4gcpm9dsc3.1 2023
[9]

https://doi.org/10.6084/m9.figshare.16570518.v1, https://figshare.com/articles/dataset/CRL/16570518

Ghelichoghli, M.: CRL (9 2021). https://doi.org/10.6084/m9.figshare.16570518.v1, https://figshare.com/articles/dataset/CRL/16570518

work page doi:10.6084/m9.figshare.16570518.v1 2021
[10]

In: Proceedings of the 34th International Conference on Neural Information Processing Systems

Grill, J.B., Strub, F., Altché, F., Tallec, C., Richemond, P.H., Buchatskaya, E., Do- ersch, C., Pires, B.A., Guo, Z.D., Azar, M.G., Piot, B., Kavukcuoglu, K., Munos, FetSelect: Automated Fetal Ultrasound Frame Selection 13 R., Valko, M.: Bootstrap your own latent a new approach to self-supervised learn- ing. In: Proceedings of the 34th International Conf...

2020
[11]

In: MICCAI

Guo, X., Men, Q., Noble, J.A.: Mmsummary: Multimodal summary generation for fetal ultrasound video. In: MICCAI. pp. 678–688. Springer (2024). https://doi. org/10.1007/978-3-031-72083-3_63

work page doi:10.1007/978-3-031-72083-3_63 2024
[12]

He, D., Wang, H., Yaqub, M.: Advancing fetal ultrasound image quality assessment in low-resource settings (2025), https://arxiv.org/abs/2507.22802

arXiv 2025
[13]

Local deep im- plicit functions for 3d shape

He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (June 2020). https://doi.org/ 10.1109/CVPR42600.2020.00975

work page doi:10.1109/cvpr42600.2020.00975 2020
[14]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Heinrich, G., Ranzinger, M., Yin, H., Lu, Y., Kautz, J., Tao, A., Catanzaro, B., Molchanov, P.: Radiov2.5: Improved baselines for agglomerative vision foundation models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 22487–22497 (June 2025)

2025
[15]

Herath, H.M.S.S., Herath, H.M.K.K.M.B., Madusanka, N., Lee, B.I.: A systematic reviewofmedicalimagequalityassessment.JournalofImaging11(4)(2025).https: //doi.org/10.3390/jimaging11040100

work page doi:10.3390/jimaging11040100 2025
[16]

Synaptic Partner Assignment Using Attentional V oxel Association Networks

Jiao, J., Droste, R., Drukker, L., Papageorghiou, A.T., Noble, J.A.: Self-supervised representation learning for ultrasound video. In: IEEE 17th International Sympo- sium on Biomedical Imaging (ISBI). pp. 1125–1129. IEEE (2020). https://doi.org/ 10.1109/ISBI45749.2020.9098666

work page doi:10.1109/isbi45749.2020.9098666 2020
[17]

Medical image analysis 96, 103202 (2024)

Jiao,J.,Zhou,J.,Li,X.,Xia,M.,Huang,Y.,Huang,L.,Wang,N.,Zhang,X.,Zhou, S., Wang, Y., et al.: Usfm: A universal ultrasound foundation model generalized to tasks and organs towards label efficient image analysis. Medical image analysis 96, 103202 (2024)

2024
[18]

Computer Methods and Programs in Biomedicine226, 107170 (2022)

Lin, Q., Zhou, Y., Shi, S., Zhang, Y., Yin, S., Liu, X., Peng, Q., Huang, S., Jiang, Y., Cui, C., She, R., Xu, J., Dong, F.: How much can ai see in early pregnancy: A multi-center study of fetus head characterization in week 10–14 in ultrasound using deep learning. Computer Methods and Programs in Biomedicine226, 107170 (2022). https://doi.org/https://doi...

work page doi:10.1016/j.cmpb.2022.107170 2022
[19]

Maani,F.,Saeed,N.,Saleem,T.,Farooq,Z.,Alasmawi,H.,Diehl,W.,Mohammad, A., Waring, G., Valappi, S., Bricker, L., Yaqub, M.: Fetalclip: A visual-language foundation model for fetal ultrasound image analysis (2025), https://arxiv.org/ abs/2502.14807

arXiv 2025
[20]

Biomedical Signal Processing and Control122, 110313 (2026)

Megahed, Y., Ducharme, R., Erman, A., Walker, M.C., Hawken, S., Chan, A.D.: Usf-mae: Ultrasound self-supervised foundation model with masked autoencoding. Biomedical Signal Processing and Control122, 110313 (2026). https://doi.org/ https://doi.org/10.1016/j.bspc.2026.110313

work page doi:10.1016/j.bspc.2026.110313 2026
[21]

Medical Image Analysis 103, 103611 (2025)

Mishra, D., Saha, P., Zhao, H., Hernandez-Cruz, N., Patey, O., Papageorghiou, A.T., Noble, J.A.: Tier-loc: Visual query-based video clip localization in fe- tal ultrasound videos with a multi-tier transformer. Medical Image Analysis 103, 103611 (2025). https://doi.org/https://doi.org/10.1016/j.media.2025.103611, https://www.sciencedirect.com/science/artic...

work page doi:10.1016/j.media.2025.103611 2025
[22]

Imbalanced data problem in machine learning: A review,

Nehary, E.A., Rajan, S., Rossa, C.: Metric-based frame selection and deep learning model with multi-head self attention for classification of ultrasound lung video images. IEEE Access12, 79297–79310 (2024). https://doi.org/10.1109/ACCESS. 2024.10547274 14 M. Alzubaidi et al

work page doi:10.1109/access 2024
[23]

Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F., El-Nouby, A., Assran, M., Ballas, N., Galuba, W.,Howes,R.,Huang,P.Y.,Li,S.W.,Misra,I.,Rabbat,M.,Sharma,V.,Synnaeve, G., Xu, H., Jegou, H., Mairal, J., Labatut, P., Joulin, A., Bojanowski, P.: Dinov2: Learning robust visual features without su...

Pith/arXiv arXiv 2024
[24]

Prenatal diagnosis32(3), 240–244 (2012)

Persico, N., Molina, F., Azumendi, G., Fedele, L., Nicolaides, K.H.: Nasal bone assessment in fetuses with trisomy 21 at 16–24 weeks of gestation by three- dimensional ultrasound. Prenatal diagnosis32(3), 240–244 (2012)

2012
[25]

American Journal of Obstetrics & Gyne- cology MFM7(4) (2025)

Płotka, S., Pustelnik, K., Szenejko, P., Żebrowska, K., Rzucidło-Szymańska, I., Szymecka-Samaha, N., Łęgowik, T., Kosińska-Kaczyńska, K., Korzeniowski, P., Biliński, P., Khalil, A., Brawura-Biskupski-Samaha, R., Išgum, I., Sánchez, C.I., Sitek, A.: Direct estimation of fetal biometry measurements from ultrasound video scans through deep learning. American...

work page doi:10.1016/j.ajogmf.2025.101623 2025
[26]

American Jour- nal of Obstetrics & Gynecology MFM5(12), 101182 (2023)

Płotka, S.S., Grzeszczyk, M.K., Szenejko, P.I., Żebrowska, K., Szymecka-Samaha, N.A., Łęgowik, T., Lipa, M.A., Kosińska-Kaczyńska, K., Brawura-Biskupski- Samaha, R., Išgum, I., Sánchez, C.I., Sitek, A.: Deep learning for estimation of fetal weight throughout the pregnancy from fetal abdominal ultrasound. American Jour- nal of Obstetrics & Gynecology MFM5(...

work page doi:10.1016/j.ajogmf.2023.101182 2023
[27]

In: Intrapartum Ultrasound

Ramesh, J., Bacher, V., Eid, M.C., Kalabizadeh, H., Rupprecht, C., Namburete, A.I.L., Yeung, P.H., Wyburd, M.K., Dinsdale, N.K.: Automated fetal biometry assessment with deep ensembles using sparse-sampling of 2d intrapartum ultra- sound images. In: Intrapartum Ultrasound. pp. 46–60. Springer Nature Switzerland (2025). https://doi.org/10.1007/978-3-031-96318-6_5

work page doi:10.1007/978-3-031-96318-6_5 2025
[28]

In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Ranzinger, M., Heinrich, G., Kautz, J., Molchanov, P.: Am-radio: Agglomerative vision foundation model reduce all domains into one. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 12490–12500 (June 2024)

2024
[29]

Stebler, Y., Sutter, T.M., Ozkan, E., Vogt, J.E.: Temporal representation learning for real-time ultrasound analysis (2025), https://arxiv.org/abs/2509.01433

arXiv 2025
[30]

npj Digital Medicine8(1), 22 (2025)

Venturini, L., Budd, S., Farruggia, A., Wright, R., Matthew, J., Day, T.G., Kainz, B., Razavi, R., Hajnal, J.V.: Whole-examination ai estimation of fetal biometrics from 20-week ultrasound scans. npj Digital Medicine8(1), 22 (2025). https://doi. org/10.1038/s41746-024-01406-z

work page doi:10.1038/s41746-024-01406-z 2025
[31]

& Chen, B

Yasrab, R., Fu, Z., Drukker, L., Lee, L.H., Zhao, H., Papageorghiou, A.T., Noble, J.A.: End-to-end first trimester fetal ultrasound video automated crl and nt seg- mentation. In: 2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI). pp. 1–5 (2022). https://doi.org/10.1109/ISBI52829.2022.9761400

work page doi:10.1109/isbi52829.2022.9761400 2022
[32]

Medicine100(4) (2021)

Zhang, B., Liu, H., Luo, H., Li, K.: Automatic quality assessment for 2d fetal sonographic standard plane based on multitask learning. Medicine100(4) (2021)

2021

[1] [1]

In: MICCAI

Akumu, T., Elbatel, M., Campello, V.M., Osuala, R., Martin-Isla, C., Valenzuela, I., Li, X., Khanal, B., Lekadir, K.: Adaptive frame selection for gestational age es- timation from blind sweep fetal ultrasound videos. In: MICCAI. pp. 3–13. Springer (2025). https://doi.org/10.1007/978-3-032-05185-1_1

work page doi:10.1007/978-3-032-05185-1_1 2025

[2] [2]

https://doi.org/https://doi.org/10.1016/j.dib.2023.109708

Alzubaidi, M., Agus, M., Makhlouf, M., Anver, F., Alyafei, K., Househ, M.: Large- scaleannotationdatasetforfetalheadbiometryinultrasoundimages.DatainBrief 51, 109708 (2023). https://doi.org/https://doi.org/10.1016/j.dib.2023.109708

work page doi:10.1016/j.dib.2023.109708 2023

[3] [3]

Bardes, A., Ponce, J., LeCun, Y.: Vicreg: Variance-invariance-covariance regular- ization for self-supervised learning (2022), https://arxiv.org/abs/2105.04906

Pith/arXiv arXiv 2022

[4] [4]

IEEE Transactions on Medical Imaging36(11), 2204–2215 (2017)

Baumgartner, C.F., Kamnitsas, K., Matthew, J., Smith, S., Kainz, B., Rueckert, D.: Sononet: Real-time detection and localisation of fetal standard scan planes in freehand ultrasound. IEEE Transactions on Medical Imaging36(11), 2204–2215 (2017). https://doi.org/10.1109/TMI.2017.2712367

work page doi:10.1109/tmi.2017.2712367 2017

[5] [5]

Nature Scientific Reports10, 10200 (2020)

Burgos-Artizzu, X., Coronado-Gutiérrez, D., Valenzuela-Alcaraz, B., Bonet-Carne, E., Eixarch, E., Crispi, F., Gratacós, E.: Evaluation of deep convolutional neu- ral networks for automatic classification of common maternal fetal ultrasound planes. Nature Scientific Reports10, 10200 (2020). https://doi.org/10.1038/ s41598-020-67076-5

2020

[6] [6]

org/abs/2104.14294

Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., Joulin, A.: Emerging properties in self-supervised vision transformers (2021), https://arxiv. org/abs/2104.14294

Pith/arXiv arXiv 2021

[7] [7]

Scientific Data11(1), 436 (2024)

Chen, G., Bai, J., Ou, Z., Lu, Y., Wang, H.: Psfhs: Intrapartum ultrasound image dataset for ai-based segmentation of pubic symphysis and fetal head. Scientific Data11(1), 436 (2024). https://doi.org/10.1038/s41597-024-03266-4

work page doi:10.1038/s41597-024-03266-4 2024

[8] [8]

https://doi.org/10.17632/4gcpm9dsc3.1, https://data.mendeley.com/datasets/4gcpm9dsc3/1

Correggio, K.S.D., Galluzzo, R.N., Santos, L.O., Barroso, F.S.M., Chaves, T.Z.L., Onofre, A.S.C., von Wangenheim, A.: Fetal abdominal structures segmentation dataset using ultrasonic images (2023). https://doi.org/10.17632/4gcpm9dsc3.1, https://data.mendeley.com/datasets/4gcpm9dsc3/1

work page doi:10.17632/4gcpm9dsc3.1 2023

[9] [9]

https://doi.org/10.6084/m9.figshare.16570518.v1, https://figshare.com/articles/dataset/CRL/16570518

Ghelichoghli, M.: CRL (9 2021). https://doi.org/10.6084/m9.figshare.16570518.v1, https://figshare.com/articles/dataset/CRL/16570518

work page doi:10.6084/m9.figshare.16570518.v1 2021

[10] [10]

In: Proceedings of the 34th International Conference on Neural Information Processing Systems

Grill, J.B., Strub, F., Altché, F., Tallec, C., Richemond, P.H., Buchatskaya, E., Do- ersch, C., Pires, B.A., Guo, Z.D., Azar, M.G., Piot, B., Kavukcuoglu, K., Munos, FetSelect: Automated Fetal Ultrasound Frame Selection 13 R., Valko, M.: Bootstrap your own latent a new approach to self-supervised learn- ing. In: Proceedings of the 34th International Conf...

2020

[11] [11]

In: MICCAI

Guo, X., Men, Q., Noble, J.A.: Mmsummary: Multimodal summary generation for fetal ultrasound video. In: MICCAI. pp. 678–688. Springer (2024). https://doi. org/10.1007/978-3-031-72083-3_63

work page doi:10.1007/978-3-031-72083-3_63 2024

[12] [12]

He, D., Wang, H., Yaqub, M.: Advancing fetal ultrasound image quality assessment in low-resource settings (2025), https://arxiv.org/abs/2507.22802

arXiv 2025

[13] [13]

Local deep im- plicit functions for 3d shape

He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (June 2020). https://doi.org/ 10.1109/CVPR42600.2020.00975

work page doi:10.1109/cvpr42600.2020.00975 2020

[14] [14]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Heinrich, G., Ranzinger, M., Yin, H., Lu, Y., Kautz, J., Tao, A., Catanzaro, B., Molchanov, P.: Radiov2.5: Improved baselines for agglomerative vision foundation models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 22487–22497 (June 2025)

2025

[15] [15]

Herath, H.M.S.S., Herath, H.M.K.K.M.B., Madusanka, N., Lee, B.I.: A systematic reviewofmedicalimagequalityassessment.JournalofImaging11(4)(2025).https: //doi.org/10.3390/jimaging11040100

work page doi:10.3390/jimaging11040100 2025

[16] [16]

Synaptic Partner Assignment Using Attentional V oxel Association Networks

Jiao, J., Droste, R., Drukker, L., Papageorghiou, A.T., Noble, J.A.: Self-supervised representation learning for ultrasound video. In: IEEE 17th International Sympo- sium on Biomedical Imaging (ISBI). pp. 1125–1129. IEEE (2020). https://doi.org/ 10.1109/ISBI45749.2020.9098666

work page doi:10.1109/isbi45749.2020.9098666 2020

[17] [17]

Medical image analysis 96, 103202 (2024)

Jiao,J.,Zhou,J.,Li,X.,Xia,M.,Huang,Y.,Huang,L.,Wang,N.,Zhang,X.,Zhou, S., Wang, Y., et al.: Usfm: A universal ultrasound foundation model generalized to tasks and organs towards label efficient image analysis. Medical image analysis 96, 103202 (2024)

2024

[18] [18]

Computer Methods and Programs in Biomedicine226, 107170 (2022)

Lin, Q., Zhou, Y., Shi, S., Zhang, Y., Yin, S., Liu, X., Peng, Q., Huang, S., Jiang, Y., Cui, C., She, R., Xu, J., Dong, F.: How much can ai see in early pregnancy: A multi-center study of fetus head characterization in week 10–14 in ultrasound using deep learning. Computer Methods and Programs in Biomedicine226, 107170 (2022). https://doi.org/https://doi...

work page doi:10.1016/j.cmpb.2022.107170 2022

[19] [19]

Maani,F.,Saeed,N.,Saleem,T.,Farooq,Z.,Alasmawi,H.,Diehl,W.,Mohammad, A., Waring, G., Valappi, S., Bricker, L., Yaqub, M.: Fetalclip: A visual-language foundation model for fetal ultrasound image analysis (2025), https://arxiv.org/ abs/2502.14807

arXiv 2025

[20] [20]

Biomedical Signal Processing and Control122, 110313 (2026)

Megahed, Y., Ducharme, R., Erman, A., Walker, M.C., Hawken, S., Chan, A.D.: Usf-mae: Ultrasound self-supervised foundation model with masked autoencoding. Biomedical Signal Processing and Control122, 110313 (2026). https://doi.org/ https://doi.org/10.1016/j.bspc.2026.110313

work page doi:10.1016/j.bspc.2026.110313 2026

[21] [21]

Medical Image Analysis 103, 103611 (2025)

Mishra, D., Saha, P., Zhao, H., Hernandez-Cruz, N., Patey, O., Papageorghiou, A.T., Noble, J.A.: Tier-loc: Visual query-based video clip localization in fe- tal ultrasound videos with a multi-tier transformer. Medical Image Analysis 103, 103611 (2025). https://doi.org/https://doi.org/10.1016/j.media.2025.103611, https://www.sciencedirect.com/science/artic...

work page doi:10.1016/j.media.2025.103611 2025

[22] [22]

Imbalanced data problem in machine learning: A review,

Nehary, E.A., Rajan, S., Rossa, C.: Metric-based frame selection and deep learning model with multi-head self attention for classification of ultrasound lung video images. IEEE Access12, 79297–79310 (2024). https://doi.org/10.1109/ACCESS. 2024.10547274 14 M. Alzubaidi et al

work page doi:10.1109/access 2024

[23] [23]

Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F., El-Nouby, A., Assran, M., Ballas, N., Galuba, W.,Howes,R.,Huang,P.Y.,Li,S.W.,Misra,I.,Rabbat,M.,Sharma,V.,Synnaeve, G., Xu, H., Jegou, H., Mairal, J., Labatut, P., Joulin, A., Bojanowski, P.: Dinov2: Learning robust visual features without su...

Pith/arXiv arXiv 2024

[24] [24]

Prenatal diagnosis32(3), 240–244 (2012)

Persico, N., Molina, F., Azumendi, G., Fedele, L., Nicolaides, K.H.: Nasal bone assessment in fetuses with trisomy 21 at 16–24 weeks of gestation by three- dimensional ultrasound. Prenatal diagnosis32(3), 240–244 (2012)

2012

[25] [25]

American Journal of Obstetrics & Gyne- cology MFM7(4) (2025)

Płotka, S., Pustelnik, K., Szenejko, P., Żebrowska, K., Rzucidło-Szymańska, I., Szymecka-Samaha, N., Łęgowik, T., Kosińska-Kaczyńska, K., Korzeniowski, P., Biliński, P., Khalil, A., Brawura-Biskupski-Samaha, R., Išgum, I., Sánchez, C.I., Sitek, A.: Direct estimation of fetal biometry measurements from ultrasound video scans through deep learning. American...

work page doi:10.1016/j.ajogmf.2025.101623 2025

[26] [26]

American Jour- nal of Obstetrics & Gynecology MFM5(12), 101182 (2023)

Płotka, S.S., Grzeszczyk, M.K., Szenejko, P.I., Żebrowska, K., Szymecka-Samaha, N.A., Łęgowik, T., Lipa, M.A., Kosińska-Kaczyńska, K., Brawura-Biskupski- Samaha, R., Išgum, I., Sánchez, C.I., Sitek, A.: Deep learning for estimation of fetal weight throughout the pregnancy from fetal abdominal ultrasound. American Jour- nal of Obstetrics & Gynecology MFM5(...

work page doi:10.1016/j.ajogmf.2023.101182 2023

[27] [27]

In: Intrapartum Ultrasound

Ramesh, J., Bacher, V., Eid, M.C., Kalabizadeh, H., Rupprecht, C., Namburete, A.I.L., Yeung, P.H., Wyburd, M.K., Dinsdale, N.K.: Automated fetal biometry assessment with deep ensembles using sparse-sampling of 2d intrapartum ultra- sound images. In: Intrapartum Ultrasound. pp. 46–60. Springer Nature Switzerland (2025). https://doi.org/10.1007/978-3-031-96318-6_5

work page doi:10.1007/978-3-031-96318-6_5 2025

[28] [28]

In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Ranzinger, M., Heinrich, G., Kautz, J., Molchanov, P.: Am-radio: Agglomerative vision foundation model reduce all domains into one. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 12490–12500 (June 2024)

2024

[29] [29]

Stebler, Y., Sutter, T.M., Ozkan, E., Vogt, J.E.: Temporal representation learning for real-time ultrasound analysis (2025), https://arxiv.org/abs/2509.01433

arXiv 2025

[30] [30]

npj Digital Medicine8(1), 22 (2025)

Venturini, L., Budd, S., Farruggia, A., Wright, R., Matthew, J., Day, T.G., Kainz, B., Razavi, R., Hajnal, J.V.: Whole-examination ai estimation of fetal biometrics from 20-week ultrasound scans. npj Digital Medicine8(1), 22 (2025). https://doi. org/10.1038/s41746-024-01406-z

work page doi:10.1038/s41746-024-01406-z 2025

[31] [31]

& Chen, B

Yasrab, R., Fu, Z., Drukker, L., Lee, L.H., Zhao, H., Papageorghiou, A.T., Noble, J.A.: End-to-end first trimester fetal ultrasound video automated crl and nt seg- mentation. In: 2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI). pp. 1–5 (2022). https://doi.org/10.1109/ISBI52829.2022.9761400

work page doi:10.1109/isbi52829.2022.9761400 2022

[32] [32]

Medicine100(4) (2021)

Zhang, B., Liu, H., Luo, H., Li, K.: Automatic quality assessment for 2d fetal sonographic standard plane based on multitask learning. Medicine100(4) (2021)

2021