Larger models generally outperform smaller ones, with Aloe-Vision-72B achieving the highest MCQ scores and GLM-4.5V leading in the open 21 Guasch-Mart´ı et al

Across all models, performance on MCQ is consistently higher than on open-ended tasks, highlighting ongoing challenges in free-text medical reasoning · arXiv 2757.1443

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

read on arXiv browse 1 citing papers

representative citing papers

Aloe-Vision: Robust Vision-Language Models for Healthcare

cs.CV · 2026-06-25 · unverdicted · novelty 5.0

Releases open medical LVLMs trained on a quality-filtered multimodal dataset, introduces CareQA-Vision benchmark from exams, reports performance gains over baselines, and flags adversarial vulnerabilities.

citing papers explorer

Showing 1 of 1 citing paper.

Aloe-Vision: Robust Vision-Language Models for Healthcare cs.CV · 2026-06-25 · unverdicted · none · ref 26
Releases open medical LVLMs trained on a quality-filtered multimodal dataset, introduces CareQA-Vision benchmark from exams, reports performance gains over baselines, and flags adversarial vulnerabilities.

Larger models generally outperform smaller ones, with Aloe-Vision-72B achieving the highest MCQ scores and GLM-4.5V leading in the open 21 Guasch-Mart´ı et al

fields

years

verdicts

representative citing papers

citing papers explorer