EHRBench uses an EHR-LLM-KB pipeline to automatically create 960,067 reliable QA items spanning diagnosis, treatment, and prognosis for large-scale LLM evaluation in clinical decision making.
Dorfner, Amin Dada, Felix Busch, Mar- cus R
2 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 2representative citing papers
Mistral uses careful lexical simplification to raise readability while keeping BERTScore at 0.91 comparable to humans, whereas QWen improves readability but shows a disconnect with its 0.89 BERTScore in biomedical text simplification.
citing papers explorer
-
EHRBench: An Automated and Reliable EHR-based Benchmark for Clinical Decision Making with LLMs
EHRBench uses an EHR-LLM-KB pipeline to automatically create 960,067 reliable QA items spanning diagnosis, treatment, and prognosis for large-scale LLM evaluation in clinical decision making.
-
Making Knowledge Accessible: Divergent Readability-Accuracy Strategies of Mistral and QWen in Biomedical Text Simplification
Mistral uses careful lexical simplification to raise readability while keeping BERTScore at 0.91 comparable to humans, whereas QWen improves readability but shows a disconnect with its 0.89 BERTScore in biomedical text simplification.