A large examination-level ultrasound dataset with long-form reports enables simple LVLM fine-tuning to outperform prior complex methods.
UMind-VL: A generalist ultrasound vision-language model for unified grounded perception and comprehensive interpretation.arXiv preprint arXiv:2511.22256, 2025
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Towards Real-World Ultrasound Understanding: Large Vision-Language Models from Multi-Image Examinations with Long-Form Reports
A large examination-level ultrasound dataset with long-form reports enables simple LVLM fine-tuning to outperform prior complex methods.