Med-Gemini sets new records on 10 of 14 medical benchmarks including 91.1% on MedQA-USMLE, beats GPT-4V by 44.5% on multimodal tasks, and surpasses humans on medical text summarization.
Overview of the M ed V id QA 2022 Shared Task on Medical Video Question-Answering
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.AI 2verdicts
UNVERDICTED 2representative citing papers
Reasoning models detect modifications to their chains of thought with only modest accuracy and cannot reliably identify the nature of those modifications.
citing papers explorer
-
Capabilities of Gemini Models in Medicine
Med-Gemini sets new records on 10 of 14 medical benchmarks including 91.1% on MedQA-USMLE, beats GPT-4V by 44.5% on multimodal tasks, and surpasses humans on medical text summarization.
-
Can Reasoning Models Detect Changes to their Chains of Thought?
Reasoning models detect modifications to their chains of thought with only modest accuracy and cannot reliably identify the nature of those modifications.