A temporary CLM phase followed by MLM decay during encoder continued pretraining outperforms standard MLM on biomedical tasks by 0.3-2.8pp across languages and model sizes.
Ehr-r1: A reasoning-enhanced foundational language model for electronic health record analysis
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 9representative citing papers
EHRBench uses an EHR-LLM-KB pipeline to automatically create 960,067 reliable QA items spanning diagnosis, treatment, and prognosis for large-scale LLM evaluation in clinical decision making.
ClinSeekAgent automates active multimodal evidence seeking for clinical reasoning, improving LLM performance on raw EHR and CXR tasks while enabling distillation into smaller models.
CLR-voyance reformulates inpatient reasoning as POMDP with clinician-validated outcome rubrics, yielding an 8B model that outperforms larger frontier models on the authors' new benchmark.
CASCADE enables LLMs to continually adapt at deployment via case-based episodic memory and contextual bandits, improving macro-averaged success by 20.9% over zero-shot on 16 tasks spanning medicine, law, code, and robotics.
C-MIG uses multi-view information gain from retrieved documents and refinements to supervise RAG-RL for clinical diagnosis, claiming top performance on four medical benchmarks.
Autoregressive transformer modeling with missingness-aware contrastive pre-training outperforms baselines on MIMIC-IV and eICU benchmarks and mitigates divergent behavior from removed modalities in clinical trajectories.
LLMs show strong exam performance on medical tasks but exhibit a clear gap in accuracy on authentic clinical decision-making as measured by the new MR-Bench benchmark and unified evaluations.
citing papers explorer
-
A Causal Language Modeling Detour Improves Encoder Continued Pretraining
A temporary CLM phase followed by MLM decay during encoder continued pretraining outperforms standard MLM on biomedical tasks by 0.3-2.8pp across languages and model sizes.
-
EHRBench: An Automated and Reliable EHR-based Benchmark for Clinical Decision Making with LLMs
EHRBench uses an EHR-LLM-KB pipeline to automatically create 960,067 reliable QA items spanning diagnosis, treatment, and prognosis for large-scale LLM evaluation in clinical decision making.
-
ClinSeekAgent: Automating Multimodal Evidence Seeking for Agentic Clinical Reasoning
ClinSeekAgent automates active multimodal evidence seeking for clinical reasoning, improving LLM performance on raw EHR and CXR tasks while enabling distillation into smaller models.
-
CLR-voyance: Reinforcing Open-Ended Reasoning for Inpatient Clinical Decision Support with Outcome-Aware Rubrics
CLR-voyance reformulates inpatient reasoning as POMDP with clinician-validated outcome rubrics, yielding an 8B model that outperforms larger frontier models on the authors' new benchmark.
-
CASCADE: Case-Based Continual Adaptation for Large Language Models During Deployment
CASCADE enables LLMs to continually adapt at deployment via case-based episodic memory and contextual bandits, improving macro-averaged success by 20.9% over zero-shot on 16 tasks spanning medicine, law, code, and robotics.
-
C-MIG: Multi-view Information Gain-based Retrieval-Augmented Generation for Clinical Diagnosis Reasoning
C-MIG uses multi-view information gain from retrieved documents and refinements to supervise RAG-RL for clinical diagnosis, claiming top performance on four medical benchmarks.
-
Handling and Interpreting Missing Modalities in Patient Clinical Trajectories via Autoregressive Sequence Modeling
Autoregressive transformer modeling with missingness-aware contrastive pre-training outperforms baselines on MIMIC-IV and eICU benchmarks and mitigates divergent behavior from removed modalities in clinical trajectories.
-
Medical Reasoning with Large Language Models: A Survey and MR-Bench
LLMs show strong exam performance on medical tasks but exhibit a clear gap in accuracy on authentic clinical decision-making as measured by the new MR-Bench benchmark and unified evaluations.
- Clin-JEPA: A Multi-Phase Co-Training Framework for Joint-Embedding Predictive Pretraining on EHR Patient Trajectories