Understanding Task Aggregation for Generalizable Ultrasound Foundation Models

Amelia Jim\'enez-S\'anchez; Fangyijie Wang; Gu\'enol\'e Silvestre; Jieyun Bai; Karim Lekadir; Kathleen M. Curran; Tanya Akumu; Vien Ngoc Dang

arxiv: 2603.18123 · v3 · pith:N7MQOJTQnew · submitted 2026-03-18 · 📡 eess.IV · cs.AI

Understanding Task Aggregation for Generalizable Ultrasound Foundation Models

Fangyijie Wang , Tanya Akumu , Vien Ngoc Dang , Amelia Jim\'enez-S\'anchez , Jieyun Bai , Gu\'enol\'e Silvestre , Karim Lekadir , Kathleen M. Curran This is my paper

Pith reviewed 2026-05-25 06:37 UTC · model grok-4.3

classification 📡 eess.IV cs.AI

keywords ultrasoundfoundation modelstask aggregationmulti-task learningmixture of expertsmedical imagingsegmentationclassification

0 comments

The pith

Task aggregation in ultrasound models must weigh data scale and task type over clinical groupings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines when multiple ultrasound tasks can be trained together in one model without performance loss. It compares task-specific models, clinically grouped training, and all-task unified training across 27 tasks using a new framework called M2DINO. Results show that clinically grouped training boosts results only when data is plentiful but causes clear negative transfer when data is scarce. Unified training across all tasks delivers more stable performance regardless of clinical group. Segmentation tasks prove more sensitive to these choices than regression or classification tasks.

Core claim

Aggregation effectiveness depends strongly on training data scale. While clinically-grouped training can improve performance in data-rich settings, it may induce substantial negative transfer in low-data settings. In contrast, all-task unified training exhibits more consistent performance across clinical groups. Task sensitivity varies by task type, with segmentation showing the largest performance drops compared with regression and classification.

What carries the argument

M2DINO, a multi-organ multi-task framework on DINOv3 that inserts task-conditioned Mixture-of-Experts blocks to allocate capacity adaptively across tasks.

If this is right

Clinically-grouped training improves results only when training data is abundant for each group.
All-task unified training yields more consistent outcomes across different clinical groups and data regimes.
Segmentation tasks suffer larger performance drops from suboptimal aggregation than regression or classification tasks.
Aggregation decisions should jointly factor in data availability and task characteristics instead of clinical taxonomy alone.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Data-scarce medical imaging domains may favor unified training over expert clinical groupings.
The observed interaction between data scale and aggregation strategy could inform adapter design in other imaging modalities.
Testing whether the same scale-dependent pattern appears in CT or MRI foundation models would clarify generality.

Load-bearing premise

The reported performance differences arise primarily from the choice of task aggregation strategy and its interaction with data scale rather than from unmentioned differences in data preprocessing, hyperparameter tuning, or the Mixture-of-Experts implementation.

What would settle it

Retraining the same models under identical preprocessing and hyperparameter settings while varying only the aggregation strategy and checking whether the performance gaps between grouped and unified training disappear or reverse.

Figures

Figures reproduced from arXiv: 2603.18123 by Amelia Jim\'enez-S\'anchez, Fangyijie Wang, Gu\'enol\'e Silvestre, Jieyun Bai, Karim Lekadir, Kathleen M. Curran, Tanya Akumu, Vien Ngoc Dang.

**Figure 1.** Figure 1: Overview of our M2DINO framework. (a) Ultrasound images are processed by a shared DINOv3 encoder augmented with task-conditioned MoE blocks. The unified representation is optimized for segmentation, detection, regression, and classification via task-specific prediction heads. Frozen and trainable components are indicated. (b) A conceptual comparison of the three training paradigms. Although the architectur… view at source ↗

**Figure 2.** Figure 2: Absolute performance of TS, CG, and AU training paradigms across representative tasks: segmentation (DSC ↑), classification (AUC ↑), and regression (MRE ↓). Abd: Abdomen; MO: Multi-organ. multi-organ classification, CG and AU yield small, modest gains. However, the Breast and Lung groups exhibit different trends. AU improves lung classification (AUC: 0.396 → 0.525). In contrast, CG shows large performance… view at source ↗

**Figure 3.** Figure 3: Relative performance change (∆, %) with respect to TS [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

read the original abstract

Foundation models promise to unify multiple clinical tasks within a single framework, but recent ultrasound studies report that unified models can underperform task-specific baselines. We hypothesize that this degradation arises not from model capacity limitations, but from task aggregation strategies that ignore interactions between task heterogeneity and available training data scale. In this work, we systematically analyze when heterogeneous ultrasound tasks can be jointly learned without performance loss, establishing practical criteria for task aggregation in unified clinical imaging models. We introduce M2DINO, a multi-organ, multi-task framework built on DINOv3 with task-conditioned Mixture-of-Experts blocks for adaptive capacity allocation. We systematically evaluate 27 ultrasound tasks spanning segmentation, classification, detection, and regression under three paradigms: task-specific, clinically-grouped, and all-task unified training. Our results show that aggregation effectiveness depends strongly on training data scale. While clinically-grouped training can improve performance in data-rich settings, it may induce substantial negative transfer in low-data settings. In contrast, all-task unified training exhibits more consistent performance across clinical groups. We further observe that task sensitivity varies by task type in our experiments: segmentation shows the largest performance drops compared with regression and classification. These findings provide practical guidance for ultrasound foundation models, emphasizing that aggregation strategies should jointly consider training data availability and task characteristics rather than relying on clinical taxonomy alone.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper reports that clinically-grouped training helps ultrasound models in data-rich settings but hurts in low-data ones while all-task training is steadier, yet the abstract gives no numbers or controls to assess whether those differences are real.

read the letter

The main observation is that aggregation effectiveness depends on training data scale. Clinically-grouped training can improve performance when data is abundant but induces negative transfer when data is scarce, while all-task unified training shows more consistent results across groups. Segmentation tasks appear more sensitive than regression or classification. They also introduce M2DINO, a DINOv3 model with task-conditioned Mixture-of-Experts blocks, and test it on 27 tasks under three paradigms: task-specific, clinically-grouped, and all-task unified training. The breadth of that comparison stands out. Most papers in this area examine fewer tasks or stick to one aggregation approach, so running all three side by side on this many tasks supplies a clearer picture of the trade-offs. The practical framing around data scale rather than clinical taxonomy alone is the part that could actually change how people set up training runs. The soft spot is the complete lack of quantitative evidence in the abstract. There are no performance deltas, error bars, statistical tests, or even basic tables, so the size of the reported effects cannot be judged. The design compares the three paradigms on the same tasks, but nothing states that preprocessing pipelines, optimizer settings, learning-rate choices, or the exact MoE routing were matched across conditions. If any of those were tuned separately, the interaction with data scale could be an artifact of unequal optimization rather than a property of the aggregation strategy itself. This paper is aimed at researchers building or evaluating multi-task models for ultrasound and similar medical imaging. A reader who needs to decide whether to merge tasks or keep them separate will get some actionable framing, though they will need the actual results to see how large the effects are. It deserves a serious referee because the experimental scale is large and the question is directly relevant to current foundation-model work. Referees can require the missing numbers and a clear statement on matched conditions before the claims can be evaluated.

Referee Report

2 major / 0 minor

Summary. The paper introduces M2DINO, a multi-organ multi-task ultrasound foundation model based on DINOv3 augmented with task-conditioned Mixture-of-Experts blocks. It evaluates 27 tasks (segmentation, classification, detection, regression) across three aggregation paradigms—task-specific, clinically-grouped, and all-task unified training—and concludes that aggregation effectiveness depends strongly on training data scale: clinically-grouped training can improve performance in data-rich regimes but induces negative transfer in low-data regimes, while all-task unified training yields more consistent results; segmentation tasks are most sensitive to aggregation.

Significance. If the reported performance differences are shown to arise from the aggregation strategies themselves rather than confounding factors, the work supplies actionable criteria for designing unified ultrasound models by jointly considering data scale and task type. The M2DINO architecture with adaptive MoE capacity allocation represents a concrete technical contribution that could be adopted in future multi-task imaging frameworks.

major comments (2)

[Abstract / Methods] Abstract and Methods: the central claim that performance differences arise from the choice of task aggregation strategy and its interaction with data scale is not supported by any quantitative results, error bars, statistical tests, or controls in the abstract; the experimental description supplies no explicit statement that data preprocessing pipelines, optimizer schedules, learning-rate searches, or MoE routing/expert allocation were held identical across the three paradigms.
[Experiments] Experimental evaluation: without matched controls on preprocessing, hyperparameter tuning, and MoE implementation details, the observed interaction between clinically-grouped training and data scale (positive in data-rich, negative transfer in low-data) cannot be attributed to aggregation strategy rather than unequal optimization effort; this directly undermines the strongest claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comments below regarding support for claims and experimental controls.

read point-by-point responses

Referee: [Abstract / Methods] Abstract and Methods: the central claim that performance differences arise from the choice of task aggregation strategy and its interaction with data scale is not supported by any quantitative results, error bars, statistical tests, or controls in the abstract; the experimental description supplies no explicit statement that data preprocessing pipelines, optimizer schedules, learning-rate searches, or MoE routing/expert allocation were held identical across the three paradigms.

Authors: The abstract is a concise summary and does not include detailed quantitative results or statistical tests, which is standard practice. The full manuscript reports performance metrics for all 27 tasks under the three paradigms. We agree an explicit statement on controls is missing from the experimental description and will add it to the Methods section, confirming identical preprocessing pipelines, optimizer schedules, learning-rate searches, and MoE routing/expert allocation across paradigms. revision: yes
Referee: [Experiments] Experimental evaluation: without matched controls on preprocessing, hyperparameter tuning, and MoE implementation details, the observed interaction between clinically-grouped training and data scale (positive in data-rich, negative transfer in low-data) cannot be attributed to aggregation strategy rather than unequal optimization effort; this directly undermines the strongest claim.

Authors: Matched controls were used throughout: identical data preprocessing, hyperparameter tuning procedures, and MoE implementation details were applied to all three training paradigms to isolate the effect of aggregation strategy. We will add an explicit statement documenting these controls in the revised Methods section. revision: yes

Circularity Check

0 steps flagged

No circularity: experimental comparisons rest on independent benchmarks, not self-referential definitions or fitted predictions.

full rationale

The paper reports empirical results from training and evaluating M2DINO on 27 ultrasound tasks under three paradigms (task-specific, clinically-grouped, all-task unified). No equations, fitted parameters, or derivations are presented that reduce to their own inputs. Performance differences are attributed to data scale and task aggregation via direct experimental comparison; the abstract and reader's summary confirm absence of self-definitional constructs, self-citation load-bearing for uniqueness theorems, or renaming of known results as novel derivations. Central claims remain falsifiable against external benchmarks and do not collapse by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no information on free parameters, background axioms, or new entities; full manuscript details unavailable for audit.

pith-pipeline@v0.9.0 · 5795 in / 1112 out tokens · 34254 ms · 2026-05-25T06:37:52.761949+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages · 2 internal anchors

[1]

IEEE Transactions on Pattern Analysis and Machine Intelligence47(4), 2245–2264 (2025).https://doi.org/10.1109/TPAMI.2024.3506283

Awais, M., Naseer, M., Khan, S., Anwer, R.M., Cholakkal, H., Shah, M., Yang, M.H., Khan, F.S.: Foundation models defining a new era in vision: A survey and outlook. IEEE Transactions on Pattern Analysis and Machine Intelligence47(4), 2245–2264 (2025).https://doi.org/10.1109/TPAMI.2024.3506283

work page doi:10.1109/tpami.2024.3506283 2025
[2]

IEEE Transactions on Medical Imaging44(2), 1005–1018 (2025).https://doi.org/10.1109/TMI.2024.3472672

Chen, H., Cai, Y., Wang, C., Chen, L., Zhang, B., Han, H., Guo, Y., Ding, H., Zhang, Q.: Multi-organ foundation model for universal ultrasound image segmen- tation with task prompt and anatomical prior. IEEE Transactions on Medical Imaging44(2), 1005–1018 (2025).https://doi.org/10.1109/TMI.2024.3472672

work page doi:10.1109/tmi.2024.3472672 2025
[3]

Dice,L.R.:Measuresoftheamountofecologicassociationbetweenspecies.Ecology 26(3), 297–302 (1945)

work page 1945
[4]

In: Explainable Artificial Intelligence

Dorszewski, T., Tětková, L., Jenssen, R., Hansen, L.K., Wickstrøm, K.K.: From colors to classes: Emergence of concepts in vision transformers. In: Explainable Artificial Intelligence. pp. 28–47. Springer Nature Switzerland (2026)

work page 2026
[5]

Medical Image Analysis 95, 103187 (2024).https://doi.org/10.1016/j.media.2024.103187

Huang, L., Zhou, J., Jiao, J., Zhou, S., Chang, C., Wang, Y., Guo, Y.: Stan- dardization of ultrasound images across various centers: M2o-diffgan bridging the gaps among unpaired multi-domain ultrasound images. Medical Image Analysis 95, 103187 (2024).https://doi.org/10.1016/j.media.2024.103187

work page doi:10.1016/j.media.2024.103187 2024
[6]

IEEE Transactions on Pattern Analysis and Machine Intelli- gence15(9), 850–863 (1993).https://doi.org/10.1109/34.232073

Huttenlocher, D., Klanderman, G., Rucklidge, W.: Comparing images using the hausdorff distance. IEEE Transactions on Pattern Analysis and Machine Intelli- gence15(9), 850–863 (1993).https://doi.org/10.1109/34.232073

work page doi:10.1109/34.232073 1993
[7]

Iakubovskii, P.: Segmentation models pytorch (2019),https://github.com/ qubvel/segmentation_models.pytorch

work page 2019
[8]

Advances in Neural Information Processing Systems36, 69625–69637 (2023)

Jain, Y., Behl, H., Kira, Z., Vineet, V.: Damex: Dataset-aware mixture-of-experts for visual understanding of mixture-of-datasets. Advances in Neural Information Processing Systems36, 69625–69637 (2023)

work page 2023
[9]

Medical Image Analysis 96, 103202 (2024).https://doi.org/10.1016/j.media.2024.103202 10 F

Jiao,J.,Zhou,J.,Li,X.,Xia,M.,Huang,Y.,Huang,L.,Wang,N.,Zhang,X.,Zhou, S., Wang, Y., Guo, Y.: Usfm: A universal ultrasound foundation model generalized to tasks and organs towards label efficient image analysis. Medical Image Analysis 96, 103202 (2024).https://doi.org/10.1016/j.media.2024.103202 10 F. Wang et al

work page doi:10.1016/j.media.2024.103202 2024
[10]

IScience28(8) (2025)

Kang, Q., Lao, Q., Gao, J., Bao, W., He, Z., Du, C., Lu, Q., Li, K.: Urfm: a gen- eral ultrasound representation foundation model for advancing ultrasound image diagnosis. IScience28(8) (2025)

work page 2025
[11]

IEEE Transactions on Medical Imaging44(10), 4049–4062 (2025).https://doi.org/10.1109/TMI

Kim, S., Jin, P., Song, S., Chen, C., Li, Y., Ren, H., Li, X., Liu, T., Li, Q.: Echofm: Foundation model for generalizable echocardiogram analysis. IEEE Transactions on Medical Imaging44(10), 4049–4062 (2025).https://doi.org/10.1109/TMI. 2025.3580713

work page doi:10.1109/tmi 2025
[12]

In: Proceedings of the IEEE/CVF International Conference on Com- puter Vision (ICCV)

Lu, Y., Weng, M., Xiao, Z., Jiang, R., Su, W., Zheng, G., Lu, P., Li, X.: Dynamic- dino: Fine-grained mixture of experts tuning for real-time open-vocabulary object detection. In: Proceedings of the IEEE/CVF International Conference on Com- puter Vision (ICCV). pp. 20847–20856 (October 2025)

work page 2025
[13]

TinyUSFM: Towards Compact and Efficient Ultrasound Foundation Models

Ma, C., Jiao, J., Liang, S., Fu, J., Wang, Q., Li, Z., Wang, Y., Guo, Y.: Tinyusfm: Towards compact and efficient ultrasound foundation models. arXiv preprint arXiv:2510.19239 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[14]

Can- cer Imaging11(1A), S167 (2011)

Madsen, H.H.T., Rasmussen, F.: Contrast-enhanced ultrasound in oncology. Can- cer Imaging11(1A), S167 (2011)

work page 2011
[15]

J Med Imaging (Bellingham)7(1), 014501 (Jan 2020)

Maraci, M.A., Yaqub, M., Craik, R., Beriwal, S., Self, A., von Dadelszen, P., Pa- pageorghiou, A., Noble, J.A.: Toward point-of-care ultrasound estimation of fetal gestational age from the trans-cerebellar diameter using CNN-based ultrasound image analysis. J Med Imaging (Bellingham)7(1), 014501 (Jan 2020)

work page 2020
[16]

Biochimica et Biophysica Acta (BBA) - Protein Structure405(2), 442–451 (1975).https://doi.org/10.1016/0005-2795(75)90109-9

Matthews, B.: Comparison of the predicted and observed secondary structure of t4 phage lysozyme. Biochimica et Biophysica Acta (BBA) - Protein Structure405(2), 442–451 (1975).https://doi.org/10.1016/0005-2795(75)90109-9

work page doi:10.1016/0005-2795(75)90109-9 1975
[17]

Transactions oftheIREProfessionalGrouponInformationTheory4(4),171–212(1954).https: //doi.org/10.1109/TIT.1954.1057460

Peterson, W., Birdsall, T., Fox, W.: The theory of signal detectability. Transactions oftheIREProfessionalGrouponInformationTheory4(4),171–212(1954).https: //doi.org/10.1109/TIT.1954.1057460

work page doi:10.1109/tit.1954.1057460 1954
[18]

In: Proceedings of the IEEE/CVF international conference on computer vision

Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 12179–12188 (2021)

work page 2021
[19]

Ultrasound Obstet

Sarris, I., Ioannou, C., Chamberlain, P., Ohuma, E., Roseman, F., Hoch, L., Alt- man, D.G., Papageorghiou, A.T., International Fetal and Newborn Growth Con- sortium for the 21st Century (INTERGROWTH-21st): Intra- and interobserver variability in fetal ultrasound measurements. Ultrasound Obstet. Gynecol.39(3), 266–273 (2012)

work page 2012
[20]

Pediatric Transplantation19(1), E1–E6 (2015)

Sasaki, K., Sakamoto, S., Uchida, H., Shigeta, T., Matsunami, M., Kanazawa, H., Fukuda, A., Nakazawa, A., Sato, M., Ito, S., et al.: Two-step transplantation for primary hyperoxaluria: A winning strategy to prevent progression of systemic oxalosis in early onset renal insufficiency cases. Pediatric Transplantation19(1), E1–E6 (2015)

work page 2015
[21]

JMIR Res Protoc11(9), e37374 (Sep 2022)

Self, A., Chen, Q., Desiraju, B.K., Dhariwal, S., Gleed, A.D., Mishra, D., et al.: Developing clinical artificial intelligence for obstetric ultrasound to improve access in underserved regions: Protocol for a computer-assisted low-cost point-of-care ul- trasound (calopus) study. JMIR Res Protoc11(9), e37374 (Sep 2022)

work page 2022
[22]

Siméoni, O., Vo, H.V., Seitzer, M., Baldassarre, F., Oquab, M., Jose, C., Khalidov, V., Szafraniec, M., Yi, S., Ramamonjisoa, M., Massa, F., Haziza, D., Wehrstedt, L., Wang, J., Darcet, T., Moutakanni, T., Sentana, L., Roberts, C., Vedaldi, A., Tolan, J., Brandt, J., Couprie, C., Mairal, J., Jégou, H., Labatut, P., Bojanowski, P.: DINOv3 (2025),https://ar...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[23]

IEEE Transactions on Medical Imaging44(9), 3809–3819 (2025).https://doi.org/10.1109/TMI.2025.3567247 Title Suppressed Due to Excessive Length 11

Song, X., Xu, X., Zhang, J., Machado Reyes, D., Yan, P.: Dino-reg: Efficient mul- timodal image registration with distilled features. IEEE Transactions on Medical Imaging44(9), 3809–3819 (2025).https://doi.org/10.1109/TMI.2025.3567247 Title Suppressed Due to Excessive Length 11

work page doi:10.1109/tmi.2025.3567247 2025
[24]

npj Digital Medicine8(1), 213 (Apr 2025)

Vega, R., Dehghan, M., Nagdev, A., Buchanan, B., Kapur, J., Jaremko, J.L., Zonoobi, D.: Overcoming barriers in the use of artificial intelligence in point of care ultrasound. npj Digital Medicine8(1), 213 (Apr 2025)

work page 2025
[25]

JACC: Cardiovascu- lar Imaging13(8), 1771–1791 (2020).https://doi.org/10.1016/j.jcmg.2019

Villemain,O.,Baranger,J.,Friedberg,M.K.,Papadacci,C.,Dizeux,A.,Messas,E., Tanter, M., Pernot, M., Mertens, L.: Ultrafast ultrasound imaging in pediatric and adult cardiology: Techniques, applications, and perspectives. JACC: Cardiovascu- lar Imaging13(8), 1771–1791 (2020).https://doi.org/10.1016/j.jcmg.2019. 09.019

work page doi:10.1016/j.jcmg.2019 2020
[26]

In: The AAAI Conference on Artificial Intelligence (AAAI)

Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., Ren, D.: Distance-iou loss: Faster and better learning for bounding box regression. In: The AAAI Conference on Artificial Intelligence (AAAI). pp. 12993–13000 (2020)

work page 2020

[1] [1]

IEEE Transactions on Pattern Analysis and Machine Intelligence47(4), 2245–2264 (2025).https://doi.org/10.1109/TPAMI.2024.3506283

Awais, M., Naseer, M., Khan, S., Anwer, R.M., Cholakkal, H., Shah, M., Yang, M.H., Khan, F.S.: Foundation models defining a new era in vision: A survey and outlook. IEEE Transactions on Pattern Analysis and Machine Intelligence47(4), 2245–2264 (2025).https://doi.org/10.1109/TPAMI.2024.3506283

work page doi:10.1109/tpami.2024.3506283 2025

[2] [2]

IEEE Transactions on Medical Imaging44(2), 1005–1018 (2025).https://doi.org/10.1109/TMI.2024.3472672

Chen, H., Cai, Y., Wang, C., Chen, L., Zhang, B., Han, H., Guo, Y., Ding, H., Zhang, Q.: Multi-organ foundation model for universal ultrasound image segmen- tation with task prompt and anatomical prior. IEEE Transactions on Medical Imaging44(2), 1005–1018 (2025).https://doi.org/10.1109/TMI.2024.3472672

work page doi:10.1109/tmi.2024.3472672 2025

[3] [3]

Dice,L.R.:Measuresoftheamountofecologicassociationbetweenspecies.Ecology 26(3), 297–302 (1945)

work page 1945

[4] [4]

In: Explainable Artificial Intelligence

Dorszewski, T., Tětková, L., Jenssen, R., Hansen, L.K., Wickstrøm, K.K.: From colors to classes: Emergence of concepts in vision transformers. In: Explainable Artificial Intelligence. pp. 28–47. Springer Nature Switzerland (2026)

work page 2026

[5] [5]

Medical Image Analysis 95, 103187 (2024).https://doi.org/10.1016/j.media.2024.103187

Huang, L., Zhou, J., Jiao, J., Zhou, S., Chang, C., Wang, Y., Guo, Y.: Stan- dardization of ultrasound images across various centers: M2o-diffgan bridging the gaps among unpaired multi-domain ultrasound images. Medical Image Analysis 95, 103187 (2024).https://doi.org/10.1016/j.media.2024.103187

work page doi:10.1016/j.media.2024.103187 2024

[6] [6]

IEEE Transactions on Pattern Analysis and Machine Intelli- gence15(9), 850–863 (1993).https://doi.org/10.1109/34.232073

Huttenlocher, D., Klanderman, G., Rucklidge, W.: Comparing images using the hausdorff distance. IEEE Transactions on Pattern Analysis and Machine Intelli- gence15(9), 850–863 (1993).https://doi.org/10.1109/34.232073

work page doi:10.1109/34.232073 1993

[7] [7]

Iakubovskii, P.: Segmentation models pytorch (2019),https://github.com/ qubvel/segmentation_models.pytorch

work page 2019

[8] [8]

Advances in Neural Information Processing Systems36, 69625–69637 (2023)

Jain, Y., Behl, H., Kira, Z., Vineet, V.: Damex: Dataset-aware mixture-of-experts for visual understanding of mixture-of-datasets. Advances in Neural Information Processing Systems36, 69625–69637 (2023)

work page 2023

[9] [9]

Medical Image Analysis 96, 103202 (2024).https://doi.org/10.1016/j.media.2024.103202 10 F

Jiao,J.,Zhou,J.,Li,X.,Xia,M.,Huang,Y.,Huang,L.,Wang,N.,Zhang,X.,Zhou, S., Wang, Y., Guo, Y.: Usfm: A universal ultrasound foundation model generalized to tasks and organs towards label efficient image analysis. Medical Image Analysis 96, 103202 (2024).https://doi.org/10.1016/j.media.2024.103202 10 F. Wang et al

work page doi:10.1016/j.media.2024.103202 2024

[10] [10]

IScience28(8) (2025)

Kang, Q., Lao, Q., Gao, J., Bao, W., He, Z., Du, C., Lu, Q., Li, K.: Urfm: a gen- eral ultrasound representation foundation model for advancing ultrasound image diagnosis. IScience28(8) (2025)

work page 2025

[11] [11]

IEEE Transactions on Medical Imaging44(10), 4049–4062 (2025).https://doi.org/10.1109/TMI

Kim, S., Jin, P., Song, S., Chen, C., Li, Y., Ren, H., Li, X., Liu, T., Li, Q.: Echofm: Foundation model for generalizable echocardiogram analysis. IEEE Transactions on Medical Imaging44(10), 4049–4062 (2025).https://doi.org/10.1109/TMI. 2025.3580713

work page doi:10.1109/tmi 2025

[12] [12]

In: Proceedings of the IEEE/CVF International Conference on Com- puter Vision (ICCV)

Lu, Y., Weng, M., Xiao, Z., Jiang, R., Su, W., Zheng, G., Lu, P., Li, X.: Dynamic- dino: Fine-grained mixture of experts tuning for real-time open-vocabulary object detection. In: Proceedings of the IEEE/CVF International Conference on Com- puter Vision (ICCV). pp. 20847–20856 (October 2025)

work page 2025

[13] [13]

TinyUSFM: Towards Compact and Efficient Ultrasound Foundation Models

Ma, C., Jiao, J., Liang, S., Fu, J., Wang, Q., Li, Z., Wang, Y., Guo, Y.: Tinyusfm: Towards compact and efficient ultrasound foundation models. arXiv preprint arXiv:2510.19239 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[14] [14]

Can- cer Imaging11(1A), S167 (2011)

Madsen, H.H.T., Rasmussen, F.: Contrast-enhanced ultrasound in oncology. Can- cer Imaging11(1A), S167 (2011)

work page 2011

[15] [15]

J Med Imaging (Bellingham)7(1), 014501 (Jan 2020)

Maraci, M.A., Yaqub, M., Craik, R., Beriwal, S., Self, A., von Dadelszen, P., Pa- pageorghiou, A., Noble, J.A.: Toward point-of-care ultrasound estimation of fetal gestational age from the trans-cerebellar diameter using CNN-based ultrasound image analysis. J Med Imaging (Bellingham)7(1), 014501 (Jan 2020)

work page 2020

[16] [16]

Biochimica et Biophysica Acta (BBA) - Protein Structure405(2), 442–451 (1975).https://doi.org/10.1016/0005-2795(75)90109-9

Matthews, B.: Comparison of the predicted and observed secondary structure of t4 phage lysozyme. Biochimica et Biophysica Acta (BBA) - Protein Structure405(2), 442–451 (1975).https://doi.org/10.1016/0005-2795(75)90109-9

work page doi:10.1016/0005-2795(75)90109-9 1975

[17] [17]

Transactions oftheIREProfessionalGrouponInformationTheory4(4),171–212(1954).https: //doi.org/10.1109/TIT.1954.1057460

Peterson, W., Birdsall, T., Fox, W.: The theory of signal detectability. Transactions oftheIREProfessionalGrouponInformationTheory4(4),171–212(1954).https: //doi.org/10.1109/TIT.1954.1057460

work page doi:10.1109/tit.1954.1057460 1954

[18] [18]

In: Proceedings of the IEEE/CVF international conference on computer vision

Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 12179–12188 (2021)

work page 2021

[19] [19]

Ultrasound Obstet

Sarris, I., Ioannou, C., Chamberlain, P., Ohuma, E., Roseman, F., Hoch, L., Alt- man, D.G., Papageorghiou, A.T., International Fetal and Newborn Growth Con- sortium for the 21st Century (INTERGROWTH-21st): Intra- and interobserver variability in fetal ultrasound measurements. Ultrasound Obstet. Gynecol.39(3), 266–273 (2012)

work page 2012

[20] [20]

Pediatric Transplantation19(1), E1–E6 (2015)

Sasaki, K., Sakamoto, S., Uchida, H., Shigeta, T., Matsunami, M., Kanazawa, H., Fukuda, A., Nakazawa, A., Sato, M., Ito, S., et al.: Two-step transplantation for primary hyperoxaluria: A winning strategy to prevent progression of systemic oxalosis in early onset renal insufficiency cases. Pediatric Transplantation19(1), E1–E6 (2015)

work page 2015

[21] [21]

JMIR Res Protoc11(9), e37374 (Sep 2022)

Self, A., Chen, Q., Desiraju, B.K., Dhariwal, S., Gleed, A.D., Mishra, D., et al.: Developing clinical artificial intelligence for obstetric ultrasound to improve access in underserved regions: Protocol for a computer-assisted low-cost point-of-care ul- trasound (calopus) study. JMIR Res Protoc11(9), e37374 (Sep 2022)

work page 2022

[22] [22]

Siméoni, O., Vo, H.V., Seitzer, M., Baldassarre, F., Oquab, M., Jose, C., Khalidov, V., Szafraniec, M., Yi, S., Ramamonjisoa, M., Massa, F., Haziza, D., Wehrstedt, L., Wang, J., Darcet, T., Moutakanni, T., Sentana, L., Roberts, C., Vedaldi, A., Tolan, J., Brandt, J., Couprie, C., Mairal, J., Jégou, H., Labatut, P., Bojanowski, P.: DINOv3 (2025),https://ar...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[23] [23]

IEEE Transactions on Medical Imaging44(9), 3809–3819 (2025).https://doi.org/10.1109/TMI.2025.3567247 Title Suppressed Due to Excessive Length 11

Song, X., Xu, X., Zhang, J., Machado Reyes, D., Yan, P.: Dino-reg: Efficient mul- timodal image registration with distilled features. IEEE Transactions on Medical Imaging44(9), 3809–3819 (2025).https://doi.org/10.1109/TMI.2025.3567247 Title Suppressed Due to Excessive Length 11

work page doi:10.1109/tmi.2025.3567247 2025

[24] [24]

npj Digital Medicine8(1), 213 (Apr 2025)

Vega, R., Dehghan, M., Nagdev, A., Buchanan, B., Kapur, J., Jaremko, J.L., Zonoobi, D.: Overcoming barriers in the use of artificial intelligence in point of care ultrasound. npj Digital Medicine8(1), 213 (Apr 2025)

work page 2025

[25] [25]

JACC: Cardiovascu- lar Imaging13(8), 1771–1791 (2020).https://doi.org/10.1016/j.jcmg.2019

Villemain,O.,Baranger,J.,Friedberg,M.K.,Papadacci,C.,Dizeux,A.,Messas,E., Tanter, M., Pernot, M., Mertens, L.: Ultrafast ultrasound imaging in pediatric and adult cardiology: Techniques, applications, and perspectives. JACC: Cardiovascu- lar Imaging13(8), 1771–1791 (2020).https://doi.org/10.1016/j.jcmg.2019. 09.019

work page doi:10.1016/j.jcmg.2019 2020

[26] [26]

In: The AAAI Conference on Artificial Intelligence (AAAI)

Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., Ren, D.: Distance-iou loss: Faster and better learning for bounding box regression. In: The AAAI Conference on Artificial Intelligence (AAAI). pp. 12993–13000 (2020)

work page 2020