A temporary CLM phase followed by MLM decay during encoder continued pretraining outperforms standard MLM on biomedical tasks by 0.3-2.8pp across languages and model sizes.
Ehr-r1: A reasoning-enhanced foundational language model for electronic health record analysis
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 9representative citing papers
EHRBench uses an EHR-LLM-KB pipeline to automatically create 960,067 reliable QA items spanning diagnosis, treatment, and prognosis for large-scale LLM evaluation in clinical decision making.
ClinSeekAgent automates active multimodal evidence seeking for clinical reasoning, improving LLM performance on raw EHR and CXR tasks while enabling distillation into smaller models.
CLR-voyance reformulates inpatient reasoning as POMDP with clinician-validated outcome rubrics, yielding an 8B model that outperforms larger frontier models on the authors' new benchmark.
CASCADE enables LLMs to continually adapt at deployment via case-based episodic memory and contextual bandits, improving macro-averaged success by 20.9% over zero-shot on 16 tasks spanning medicine, law, code, and robotics.
C-MIG uses multi-view information gain from retrieved documents and refinements to supervise RAG-RL for clinical diagnosis, claiming top performance on four medical benchmarks.
Autoregressive transformer modeling with missingness-aware contrastive pre-training outperforms baselines on MIMIC-IV and eICU benchmarks and mitigates divergent behavior from removed modalities in clinical trajectories.
LLMs show strong exam performance on medical tasks but exhibit a clear gap in accuracy on authentic clinical decision-making as measured by the new MR-Bench benchmark and unified evaluations.