HealthAgentBench is a new benchmark of 54 healthcare agent tasks where even the strongest frontier AI agent reaches only about 42% success rate on end-to-end clinical workflows.
MIMIC-IV Clinical Database Demo.PhysioNet, January 2023
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
MILM fine-tunes LLMs on XML-encoded multimodal irregular time series via a two-stage process that exploits informative sampling patterns to achieve top performance on EHR classification datasets.
citing papers explorer
-
HealthAgentBench: A Unified Benchmark Suite of Realistic Agentic Healthcare Environments for Challenging Frontier AI Agents
HealthAgentBench is a new benchmark of 54 healthcare agent tasks where even the strongest frontier AI agent reaches only about 42% success rate on end-to-end clinical workflows.
-
MILM: Large Language Models for Multimodal Irregular Time Series with Informative Sampling
MILM fine-tunes LLMs on XML-encoded multimodal irregular time series via a two-stage process that exploits informative sampling patterns to achieve top performance on EHR classification datasets.