PhysicianBench is a new benchmark of 100 physician-reviewed, execution-grounded tasks in live EHR environments where the best LLM agent reaches only 46% success and open-source models reach 19%.
AI-based Clinical Decision Support for Primary Care: A Real- World Study.arXiv preprint
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.AI 3years
2026 3representative citing papers
AI co-clinician is a multimodal conversational AI that uses live audio-visual data for real-time medical reasoning in simulated telemedicine, approaching primary care physicians in management plans and differentials but lagging in physical exam and disease-specific tasks.
Case-specific clinician rubrics for clinical AI notes achieve strong discrimination between outputs, high stability, and clinician-LLM agreement matching clinician-clinician levels at far lower cost.
citing papers explorer
-
PhysicianBench: Evaluating LLM Agents in Real-World EHR Environments
PhysicianBench is a new benchmark of 100 physician-reviewed, execution-grounded tasks in live EHR environments where the best LLM agent reaches only 46% success and open-source models reach 19%.
-
Towards Conversational Medical AI with Eyes, Ears and a Voice
AI co-clinician is a multimodal conversational AI that uses live audio-visual data for real-time medical reasoning in simulated telemedicine, approaching primary care physicians in management plans and differentials but lagging in physical exam and disease-specific tasks.
-
Case-Specific Rubrics for Clinical AI Evaluation: Methodology, Validation, and LLM-Clinician Agreement Across 823 Encounters
Case-specific clinician rubrics for clinical AI notes achieve strong discrimination between outputs, high stability, and clinician-LLM agreement matching clinician-clinician levels at far lower cost.