A large examination-level ultrasound dataset with long-form reports enables simple LVLM fine-tuning to outperform prior complex methods.
0 technical report
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 3years
2026 3representative citing papers
Introduces Zoom-then-Diagnose paradigm and uncertainty-aware reward in GRPO for confidence-aware ultrasound VQA, reporting 39.3% improvement in lesion localization across liver, breast, and thyroid datasets.
FADA is a selectively distilled unified vision-language model for fetal ultrasound that performs interpretation, classification, detection, and segmentation in one pipeline, achieves strong metrics, and deploys offline on mobile devices.
citing papers explorer
-
Towards Real-World Ultrasound Understanding: Large Vision-Language Models from Multi-Image Examinations with Long-Form Reports
A large examination-level ultrasound dataset with long-form reports enables simple LVLM fine-tuning to outperform prior complex methods.
-
Look-Closer-Then-Diagnose: Confidence-Aware Ultrasound VQA via Active Zooming
Introduces Zoom-then-Diagnose paradigm and uncertainty-aware reward in GRPO for confidence-aware ultrasound VQA, reporting 39.3% improvement in lesion localization across liver, breast, and thyroid datasets.
-
FADA: Accessible fetal ultrasound interpretation and annotation with a selectively distilled unified vision-language model
FADA is a selectively distilled unified vision-language model for fetal ultrasound that performs interpretation, classification, detection, and segmentation in one pipeline, achieves strong metrics, and deploys offline on mobile devices.