D2MDT: Department-aware Multidisciplinary Team Consultation with Deliberation for Efficient Clinical Prediction

Chen Li; Chunze Yang; Jiusong Ge; Lei Wu; Ni Zhang; Qidong Liu; Yongqi Liang

arxiv: 2606.03543 · v1 · pith:LT6BKSKUnew · submitted 2026-06-02 · 💻 cs.MA

D2MDT: Department-aware Multidisciplinary Team Consultation with Deliberation for Efficient Clinical Prediction

Yongqi Liang , Qidong Liu , Chunze Yang , Lei Wu , Jiusong Ge , Ni Zhang , Chen Li This is my paper

Pith reviewed 2026-06-28 07:51 UTC · model grok-4.3

classification 💻 cs.MA

keywords clinical predictionmulti-agent systemselectronic health recordsmortality predictionresidual deliberationdepartment-awaremultidisciplinary consultation

0 comments

The pith

D2MDT assigns patient-specific department perspectives to doctor agents and applies residual deliberation to improve both predictive performance and consultation efficiency in EHR-based mortality prediction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes D2MDT to overcome the limitations of correlation-driven models and single LLMs in clinical prediction from EHRs. It addresses weak evidence differentiation and redundant interactions in current multi-agent systems by assigning department perspectives and using residual deliberation. Experiments demonstrate gains in performance and efficiency on mortality prediction. This approach aims to enable more effective multidisciplinary reasoning in clinical settings.

Core claim

D2MDT constructs structured EHR evidence and consultation-ready semantic evidence. It assigns patient-specific department perspectives to doctor agents and retrieves complementary evidence for collaborative consultation. Residual deliberation updates only unresolved consensus instead of replaying full history. The refined consensus is fused with structured EHR representations for the final prediction.

What carries the argument

Residual deliberation mechanism that updates only unresolved consensus, enabled by department-aware assignment of perspectives to doctor agents.

If this is right

D2MDT achieves improved predictive performance on mortality prediction tasks.
D2MDT enhances consultation efficiency by reducing redundant interactions.
The method provides better evidence differentiation across agents.
The code release supports further testing of the approach.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The framework could extend to other clinical tasks like readmission prediction or diagnosis support.
Residual deliberation may lower token usage in broader multi-agent applications for medical data.
Testing in diverse hospital settings could validate the efficiency claims beyond the reported experiments.

Load-bearing premise

That assigning patient-specific department perspectives to doctor agents and using residual deliberation leads to better evidence differentiation and reduced redundancy compared to existing MAS methods for EHR.

What would settle it

A study that applies D2MDT and baseline methods to the same mortality prediction datasets and finds no gains in performance or efficiency metrics.

Figures

Figures reproduced from arXiv: 2606.03543 by Chen Li, Chunze Yang, Jiusong Ge, Lei Wu, Ni Zhang, Qidong Liu, Yongqi Liang.

**Figure 1.** Figure 1: The illustration of D2MDT and existing MAS. ClinNoteAgents (Zhou et al., 2025a) strengthen EHR grounding and bring collaborative reasoning closer to quantitative patient-level prediction. However, there are still two challenges faced by EHR-aware MAS methods, as illustrated in Figure 1. i) Weak Evidence Differentiation. Multiagent frameworks can improve apparent diversity by assigning different roles to … view at source ↗

**Figure 2.** Figure 2: The overview of the proposed D2MDT. refine disagreement through compact updates instead of replaying the entire interaction history. Finally, D2MDT performs Consensus-Guided Multimodal Prediction by fusing the refined consensus report with structured EHR representations for final risk estimation. In this way, D2MDT turns collaboration into an evidence-grounded and refined reasoning process for clinica… view at source ↗

**Figure 3.** Figure 3: The results of efficiency study. 2 3 4 5 max_round 51.0 52.5 54.0 55.5 AUPRC (%) (a) MIMIC-III AUPRC 2 3 4 5 max_round 62.4 63.2 64.0 64.8 65.6 AUPRC (%) (b) MIMIC-IV AUPRC [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Sensitivity analysis of D2MDT with respect to max_round, measured by AUPRC on two datasets. demonstrates the effectiveness of our Residual Deliberation. Although D2MDT keeps competitive prompt cost in the first round, its main advantage lies in reducing prompt burden in later rounds. Compared with other MDT-based baselines, MDAgents and ColaCare usually require more prompt tokens to continue the consultat… view at source ↗

**Figure 5.** Figure 5: Representative D2MDT case study. The orange and blue modules denote Doctor Agents and the Leader Agent, respectively, while gray boxes show the initial and final reports during the consultation process. assigned to critical care/hemodynamic, respiratory/oxygenation, and metabolic/general perspectives, respectively, so that the same EHR trajectory is examined through different clinical roles. As illustrat… view at source ↗

**Figure 6.** Figure 6: Implementation-faithful prompt and compact-state templates used in D [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

**Figure 8.** Figure 8: Sensitivity analysis of min(+P, Se) under dif [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗

**Figure 9.** Figure 9: Sensitivity analysis of communication cost [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗

**Figure 10.** Figure 10: Sensitivity analysis of predictive metrics [PITH_FULL_IMAGE:figures/full_fig_p016_10.png] view at source ↗

read the original abstract

Electronic health records (EHRs) are central to clinical prediction, but existing methods either rely on correlation-driven deep models or use single large language models (LLMs), making it difficult to support multidisciplinary clinical reasoning. Recent multi-agent systems (MAS) provide a promising alternative, yet current EHR-grounded MAS methods still suffer from weak evidence differentiation across agents and redundant multi-round interaction. We propose D2MDT, a Department-aware MultiDisciplinary Team Consultation with Deliberation for Efficient clinical prediction. D2MDT first constructs structured EHR evidence and consultation-ready semantic evidence for multi-agent consultation. It then assigns patient-specific department perspectives to doctor agents and retrieves complementary evidence for collaborative consultation. To improve efficiency, D2MDT further introduces residual deliberation, which updates only unresolved consensus rather than replaying the full discussion history. Finally, D2MDT fuses the refined consensus report with structured EHR representations for prediction. Experiments on mortality prediction show that D2MDT improves both predictive performance and consultation efficiency. We release the code online to ease the reproducibility of this paper.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

D2MDT adds department-aware agents plus residual deliberation to multi-agent EHR prediction and releases code, but the abstract gives no numbers or baselines to support the claimed gains in performance and efficiency.

read the letter

The paper's main move is a multi-agent setup for mortality prediction from EHR data. It first turns records into structured evidence and semantic summaries, then assigns each agent a patient-specific department lens, pulls complementary evidence across agents, and applies residual deliberation so only unresolved consensus gets updated instead of replaying full history. The final prediction fuses the refined report with the structured features.

This combination of department perspectives and the residual step looks like a concrete variant on existing MAS work for clinical tasks. The efficiency focus is practical, and releasing the code lets others verify the implementation directly.

The clear limitation is the missing experimental detail. The abstract asserts better predictive performance and consultation efficiency but supplies no metrics, baselines, dataset description, effect sizes, or error bars. That gap makes it impossible to judge whether the improvements are real or meaningful. The reader's soundness score of 2.0 tracks with this absence. No circularity or hidden fitting shows up in the description, and the stress-test correctly notes the claim is empirical rather than derived.

The work is aimed at researchers building multi-agent systems for healthcare applications. A reader already working on MAS or clinical prediction might extract useful design ideas from the evidence construction and deliberation mechanics if the full results are solid.

I would bring it to a reading group as a maybe to see the actual experiments. I would not cite it without those results. It should go to peer review because the proposal is specific, the code is public, and the topic is relevant enough for referees to evaluate the empirical support.

Referee Report

1 major / 0 minor

Summary. The manuscript proposes D2MDT, a department-aware multi-agent system for clinical prediction from EHRs. It constructs structured EHR evidence and consultation-ready semantic evidence, assigns patient-specific department perspectives to doctor agents, retrieves complementary evidence for collaborative consultation, introduces residual deliberation to update only unresolved consensus (avoiding full history replay), and fuses the refined consensus report with structured EHR representations for the final prediction. The central empirical claim is that experiments on mortality prediction demonstrate improvements in both predictive performance and consultation efficiency over existing methods; code is released for reproducibility.

Significance. If the empirical gains hold under rigorous evaluation, the work could advance MAS applications in healthcare by addressing weak evidence differentiation across agents and redundant multi-round interactions. The residual deliberation mechanism is a practical efficiency contribution, and the open release of code supports reproducibility and extension.

major comments (1)

[Experiments] §Experiments (or equivalent results section): the central claim of improved predictive performance and consultation efficiency on mortality prediction is presented without any reported details on datasets, baselines, metrics, number of runs, error bars, statistical tests, or data handling. This information is load-bearing for assessing whether the results support the claims.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and for highlighting the need for complete experimental reporting. We agree that the details listed are essential to substantiate the central claims and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Experiments] §Experiments (or equivalent results section): the central claim of improved predictive performance and consultation efficiency on mortality prediction is presented without any reported details on datasets, baselines, metrics, number of runs, error bars, statistical tests, or data handling. This information is load-bearing for assessing whether the results support the claims.

Authors: We acknowledge that the current manuscript version does not provide sufficient detail on the experimental setup. In the revised version we will expand the Experiments section (and any supplementary material) to explicitly report: the specific EHR datasets employed (including size, time span, and access details), the full list of baselines with implementation references, the complete set of evaluation metrics, the number of independent runs, standard deviations or error bars, the statistical tests used for significance, and all data-handling steps (preprocessing, imputation, train/validation/test splits, and any filtering criteria). Code release already exists; we will also add a reproducibility checklist linking the reported numbers to the released repository. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents an empirical MAS architecture for EHR-based mortality prediction. It describes a sequence of engineering steps (EHR structuring, department-aware agent assignment, residual deliberation, consensus fusion) followed by experimental validation on predictive performance and efficiency. No equations, parameter-fitting steps, uniqueness theorems, or self-citations appear in the provided text. The central claims rest on reported experimental outcomes rather than any derivation that reduces to its own inputs by construction. The work is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only abstract available; no specific free parameters, axioms, or invented entities can be identified from the provided text.

pith-pipeline@v0.9.1-grok · 5736 in / 971 out tokens · 19019 ms · 2026-06-28T07:51:06.823875+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

107 extracted references · 1 canonical work pages

[1]

Erin P. Balogh. 2015. Improving diagnosis in health care

2015
[2]

Shah Bds

Savyasachi V. Shah Bds. 2024. Accuracy, consistency, and hallucination of large language models when analyzing unstructured clinical notes in electronic medical records. JAMA Network Open, 7(8):2

2024
[3]

Ofir Ben Shoham and Nadav Rappoport. 2024. Cpllm: Clinical prediction with large language models. PLOS Digital Health, 3(12):e0000680

2024
[4]

Graber, J

Alexandra Campione Russo, Jean‐Luc Tilly, Leah Kaufman, Melissa Danforth, Mark L. Graber, J. Matthew Austin, and Hardeep Singh. 2025. Hospital commitments to address diagnostic errors: An assessment of 95 us hospitals. Journal of Hospital Medicine, 20(2)

2025
[5]

Layden, Mudassir Rashid, Lu Cheng, Ali Cinar, and Barbara Di Eugenio

Rochana Chaturvedi, Yue Zhou, Andrew Boyd, Brian T. Layden, Mudassir Rashid, Lu Cheng, Ali Cinar, and Barbara Di Eugenio. 2025. Early risk prediction with temporally and contextually grounded clinical language processing

2025
[6]

Kai Chen, Xinfeng Li, Tianpei Yang, Hewei Wang, Wei Dong, and Yang Gao. 2025 a . Mdteamgpt: A self-evolving llm-based multi-agent framework for multi-disciplinary team medical consultation

2025
[7]

Xi Chen, Huahui Yi, Mingke You, WeiZhi Liu, Li Wang, Hairui Li, Xue Zhang, Yingman Guo, Lei Fan, Gang Chen, et al. 2025 b . Enhancing diagnostic capability with multi-agents conversational large language models. NPJ digital medicine, 8(1):159

2025
[8]

Stewart, and Jimeng Sun

Edward Choi, Mohammad Taha Bahadori, Andy Schuetz, Walter F. Stewart, and Jimeng Sun. 2016 a . Retain: Interpretable predictive model in healthcare using reverse time attention mechanism. Curran Associates Inc

2016
[9]

Edward Choi, Mohammad Taha Bahadori, Jimeng Sun, Joshua Kulas, Andy Schuetz, and Walter Stewart. 2016 b . Retain: An interpretable predictive model for healthcare using reverse time attention mechanism. Advances in neural information processing systems, 29

2016
[10]

Xinsong Du, Zhengyang Zhou, Yifei Wang, Ya-Wen Chuang, Yiming Li, Richard Yang, Wenyu Zhang, Xinyi Wang, Xinyu Chen, Hao Guan, et al. 2026. Testing and evaluation of generative large language models in electronic health record applications: a systematic review. Journal of the American Medical Informatics Association, page ocaf233

2026
[11]

Mark L Graber, Diana Rusz, Melissa L Jones, Diana Farm-Franks, Barbara Jones, Jeannine Cyr Gluck, Dana B Thomas, Kelly T Gleason, Kathy Welte, Jennifer Abfalter, et al. 2017. The new diagnostic team. Diagnosis, 4(4):225--238

2017
[12]

Paul Hager, Friederike Jungmann, Robbie Holland, Kunal Bhagat, Inga Hubrecht, Manuel Knauer, Jakob Vielhauer, Marcus Makowski, Rickmer Braren, and Georgios Kaissis. 2024. Evaluation and mitigation of the limitations of large language models in clinical decision-making. Nature Medicine, 30(9):26

2024
[13]

Hrayr Harutyunyan, Hrant Khachatrian, David C Kale, Greg Ver Steeg, and Aram Galstyan. 2019. Multitask learning and benchmarking with clinical time series data. Scientific data, 6(1):96

2019
[15]

Black, Danny Park, James Zou, Andrew Y

Yixing Jiang, Kameron C. Black, Danny Park, James Zou, Andrew Y. Ng, and Jonathan H. Chen. 2025. Medagentbench: A realistic virtual ehr environment to benchmark medical llm agents

2025
[16]

Alistair EW Johnson, Lucas Bulgarelli, Lu Shen, Alvin Gayles, Ayad Shammout, Steven Horng, Tom J Pollard, Sicheng Hao, Benjamin Moody, Brian Gow, et al. 2023. Mimic-iv, a freely accessible electronic health record dataset. Scientific data, 10(1):1

2023
[17]

Alistair EW Johnson, Tom J Pollard, Lu Shen, Li-wei H Lehman, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G Mark. 2016. Mimic-iii, a freely accessible critical care database. Scientific data, 3(1):1--9

2016
[18]

Misuk Kim and Kyu-Baek Hwang. 2022. An empirical evaluation of sampling methods for the classification of imbalanced data. PloS one, 17(7):e0271260

2022
[19]

Yubin Kim, Chanwoo Park, Hyewon Jeong, Yik Siu Chan, Xuhai Xu, Daniel Mcduff, Hyeonhoon Lee, Marzyeh Ghassemi, Cynthia Breazeal, and Hae Won Park. 2024. Mdagents: An adaptive collaboration of llms for medical decision-making

2024
[20]

Ching Yi Lee, Hung Yi Lai, Ching Hsin Lee, Mi Mi Chen, and Sze Yuen Yau. 2024. Collaborative clinical reasoning: a scoping review. PeerJ

2024
[21]

Weibin Liao, Yinghao Zhu, Zhongji Zhang, Yuhang Wang, Zixiang Wang, Xu Chu, Yasha Wang, and Liantao Ma. 2024. Learnable prompt as pseudo-imputation: Rethinking the necessity of traditional ehr data imputation in downstream clinical prediction

2024
[22]

Qicai Liu, Zhichao Hu, Tao Huang, Yupeng Niu, Xinche Zhang, Shanwu Ma, Chutong Lin, Goh Kim Huat, Hyeokkoo Eric Kwon, and Feng Gao. 2026. Evomdt: a self-evolving multi-agent system for structured clinical decision-making in multi-cancer. npj Digital Medicine, 9(1)

2026
[23]

Liantao Ma, Junyi Gao, Yasha Wang, Chaohe Zhang, and Xinyu Ma. 2020. Adacare: Explainable clinical health status representation learning via scale-adaptive feature extraction and recalibration. Proceedings of the AAAI Conference on Artificial Intelligence, 34(1):825--832

2020
[24]

Xinyu Ma, Yasha Wang, Xu Chu, Liantao Ma, Wen Tang, Junfeng Zhao, Ye Yuan, and Guoren Wang. 2022. Patient health representation learning via correlational sparse prior of medical features. IEEE Transactions on Knowledge and Data Engineering, 35(11):11769--11783

2022
[25]

Matthew B McDermott, Haoran Zhang, Lasse H Hansen, Giovanni Angelotti, and Jack Gallifant. 2024. A closer look at auroc and auprc under class imbalance. Advances in Neural Information Processing Systems, 37:44102--44163

2024
[26]

Harsha Nori, Mayank Daswani, Christopher Kelly, Scott Lundberg, Marco Tulio Ribeiro, Marc Wilson, Xiaoxuan Liu, Viknesh Sounderajah, Jonathan Carlson, and Matthew P Lungren. 2025. Sequential diagnosis with language models

2025
[27]

Robert S Porter and Justin L Kaplan. 2011. The Merck manual of diagnosis and therapy. Merck Sharp & Dohme Corp

2011
[28]

Weijieying Ren, Jingxi Zhu, Zehao Liu, Tianxiang Zhao, and Vasant Honavar. 2025. A comprehensive survey of electronic health record modeling: From deep learning approaches to large language models

2025
[30]

Wenqi Shi, Ran Xu, Yuchen Zhuang, Yue Yu, Jieyu Zhang, Hang Wu, Yuanda Zhu, Joyce Ho, Carl Yang, and May D. Wang. 2024. Ehragent: Code empowers large language models for few-shot complex tabular reasoning on electronic health records

2024
[31]

Mukund Sundararajan and Amir Najmi. 2020. The many shapley values for model explanation. In International conference on machine learning, pages 9269--9278. PMLR

2020
[32]

Xiangru Tang, Anni Zou, Zhuosheng Zhang, Ziming Li, Yilun Zhao, Xingyao Zhang, Arman Cohan, and Mark Gerstein. 2024. Medagents: Large language models as collaborators for zero-shot medical reasoning. Findings of the Association for Computational Linguistics ACL 2024, pages 599--621

2024
[33]

Janneke E. W. Walraven, Olga L. Van, der Hel, J. J. M. Van, der Hoeven, Valery E. P. P. Lemmens, Rob H. A. Verhoeven, and Ingrid M. E. Desar. 2022. Factors influencing the quality and functioning of oncological multidisciplinary team meetings: results of a systematic review. BMC Health Services Research, 22(1):1--27

2022
[34]

Haochun Wang, Sendong Zhao, Zewen Qiang, Nuwa Xi, Bing Qin, and Ting Liu. 2024. Beyond direct diagnosis: Llm-based multi-specialist agent consultation for automatic diagnosis

2024
[36]

Ran Xu, Wenqi Shi, Yue Yu, Yuchen Zhuang, Bowen Jin, May Dongmei Wang, Joyce Ho, and Carl Yang. 2024. Ram-ehr: Retrieval augmentation meets clinical predictions on electronic health records. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 754--765

2024
[37]

Sihang Zeng, Lucas Jing Liu, Jun Wen, Meliha Yetisgen, Ruth Etzioni, and Gang Luo. 2025. Trajsurv: Learning continuous latent trajectories from electronic health records for trustworthy survival prediction

2025
[38]

Rongjia Zhou, Chengzhuo Li, Carl Yang, and Jiaying Lu. 2025 a . Clinnoteagents: An llm multi-agent system for predicting and interpreting heart failure 30-day readmission from clinical notes

2025
[39]

Yucheng Zhou, Lingran Song, and Jianbing Shen. 2025 b . Mam: Modular multi-agent framework for multi-modal medical diagnosis via role-specialized collaboration. In Findings of the Association for Computational Linguistics: ACL 2025, pages 25319--25333

2025
[40]

Yinghao Zhu, Ziyi He, Haoran Hu, Xiaochen Zheng, Xichen Zhang, Zixiang Wang, Junyi Gao, Liantao Ma, and Lequan Yu. 2025. Medagentboard: Benchmarking multi-agent collaboration with conventional methods for diverse medical tasks

2025
[41]

Yinghao Zhu, Changyu Ren, Zixiang Wang, Xiaochen Zheng, Shiyun Xie, Junlan Feng, Xi Zhu, Zhoujun Li, Liantao Ma, and Chengwei Pan. 2024. Emerge: Enhancing multimodal electronic health records predictive modeling with retrieval-augmented generation. In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, pages 3549--3559

2024
[42]

Curran Associates Inc

RETAIN: Interpretable Predictive Model in Healthcare using Reverse Time Attention Mechanism , author=. Curran Associates Inc. , year=
[43]

Scientific Reports , volume=

BEHRT: Transformer for Electronic Health Records , author=. Scientific Reports , volume=
[44]

A Comprehensive Survey of Electronic Health Record Modeling: From Deep Learning Approaches to Large Language Models , author=
[45]

Journal of the American Medical Informatics Association , pages=

Testing and evaluation of generative large language models in electronic health record applications: a systematic review , author=. Journal of the American Medical Informatics Association , pages=. 2026 , publisher=

2026
[46]

Improving Diagnosis in Health Care , author=
[47]

Diagnosis , volume=

The new diagnostic team , author=. Diagnosis , volume=. 2017 , publisher=

2017
[48]

Journal of Hospital Medicine , volume=

Hospital commitments to address diagnostic errors: An assessment of 95 US hospitals , author=. Journal of Hospital Medicine , volume=
[49]

Systematic Reviews , volume=

Interprofessional diagnostic management teams: a scoping review protocol , author=. Systematic Reviews , volume=
[50]

BMC Health Services Research , volume=

Factors influencing the quality and functioning of oncological multidisciplinary team meetings: results of a systematic review , author=. BMC Health Services Research , volume=
[51]

Cancer Treatment Reviews , volume=

The impact of multidisciplinary team meetings on patient assessment, management and outcomes in oncology settings: A systematic review of the literature , author=. Cancer Treatment Reviews , volume=
[52]

Frontiers in Health Services , year=

Implementation of streamlining measures in selecting and prioritising complex cases for the cancer multidisciplinary team meeting: a mini review of the recent developments , author=. Frontiers in Health Services , year=
[53]

Frontiers in Public Health , year=

Accelerating rare disease detection: an experience of multidisciplinary team model in undiagnosed diseases program in a children's hospital , author=. Frontiers in Public Health , year=
[54]

Nature Machine Intelligence , volume=

LLM-based agentic systems in medicine and healthcare , author=. Nature Machine Intelligence , volume=
[55]

Findings of the Association for Computational Linguistics ACL 2024 , pages=

MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning , author=. Findings of the Association for Computational Linguistics ACL 2024 , pages=

2024
[56]

MDAgents: An Adaptive Collaboration of LLMs for Medical Decision-Making , author=
[57]

Beyond Direct Diagnosis: LLM-based Multi-Specialist Agent Consultation for Automatic Diagnosis , author=
[58]

Sequential Diagnosis with Language Models , author=
[59]

npj Digital Medicine , volume=

EvoMDT: a self-evolving multi-agent system for structured clinical decision-making in multi-cancer , author=. npj Digital Medicine , volume=
[60]

MDTeamGPT: A Self-Evolving LLM-based Multi-Agent Framework for Multi-Disciplinary Team Medical Consultation , author=
[61]

MedAgentBoard: Benchmarking Multi-Agent Collaboration with Conventional Methods for Diverse Medical Tasks , author=
[62]

EHRAgent: Code Empowers Large Language Models for Few-shot Complex Tabular Reasoning on Electronic Health Records , author=
[63]

KDD'24 Workshop: Artificial Intelligence and Data Science for Healthcare: Bridging Data-Centric AI and People-Centric Healthcare , year=

EHRFlow: A Large Language Model-Driven Iterative Multi-Agent Electronic Health Record Data Analysis Workflow , author=. KDD'24 Workshop: Artificial Intelligence and Data Science for Healthcare: Bridging Data-Centric AI and People-Centric Healthcare , year=
[64]

MedAgentBench: A Realistic Virtual EHR Environment to Benchmark Medical LLM Agents , author=
[65]

arXiv preprint arXiv:2509.19319 , year=

FHIR-AgentBench: Benchmarking LLM Agents for Realistic Interoperable EHR Question Answering , author=. arXiv preprint arXiv:2509.19319 , year=

arXiv
[66]

Proceedings of the ACM Web Conference 2025 , pages=

ColaCare: Enhancing Electronic Health Record Modeling through Large Language Model-Driven Multi-Agent Collaboration , author=. Proceedings of the ACM Web Conference 2025 , pages=. 2025 , publisher=. doi:10.1145/3696410.3714877 , url=

work page doi:10.1145/3696410.3714877 2025
[67]

MoMA: A Mixture-of-Multimodal-Agents Architecture for Enhancing Clinical Prediction Modelling , author=
[68]

npj Digital Medicine , volume=

CARE-AD: a multi-agent large language model framework for Alzheimer's disease prediction using longitudinal clinical notes , author=. npj Digital Medicine , volume=. 2025 , doi=

2025
[69]

ClinNoteAgents: An LLM Multi-Agent System for Predicting and Interpreting Heart Failure 30-Day Readmission from Clinical Notes , author=
[70]

Cancers , volume=

The effects of multidisciplinary team meetings on clinical practice for colorectal, lung, prostate and breast cancer: a systematic review , author=. Cancers , volume=. 2021 , publisher=

2021
[71]

Frontiers in Health Services , volume=

Implementation of streamlining measures in selecting and prioritising complex cases for the cancer multidisciplinary team meeting: a mini review of the recent developments , author=. Frontiers in Health Services , volume=. 2024 , publisher=

2024
[72]

BJS open , volume=

Cancer multidisciplinary team meetings: impact of logistical challenges on communication and decision-making , author=. BJS open , volume=. 2022 , publisher=

2022
[73]

BMJ open , volume=

Mapping cognitive biases in multidisciplinary team (MDT) decision-making for cancer care in Scotland: a cognitive ethnography study protocol , author=. BMJ open , volume=. 2024 , publisher=

2024
[74]

TrajSurv: Learning Continuous Latent Trajectories from Electronic Health Records for Trustworthy Survival Prediction , author=
[75]

Early Risk Prediction with Temporally and Contextually Grounded Clinical Language Processing , author=
[76]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

AdaCare: Explainable Clinical Health Status Representation Learning via Scale-Adaptive Feature Extraction and Recalibration , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
[77]

Learnable Prompt as Pseudo-Imputation: Rethinking the Necessity of Traditional EHR Data Imputation in Downstream Clinical Prediction , author=
[78]

PeerJ , year=

Collaborative clinical reasoning: a scoping review , author=. PeerJ , year=
[79]

Nature Medicine , volume=

Evaluation and mitigation of the limitations of large language models in clinical decision-making , author=. Nature Medicine , volume=
[80]

JAMA Network Open , volume=

Accuracy, Consistency, and Hallucination of Large Language Models When Analyzing Unstructured Clinical Notes in Electronic Medical Records , author=. JAMA Network Open , volume=
[81]

NPJ digital medicine , volume=

A multi-center study on the adaptability of a shared foundation model for electronic health records , author=. NPJ digital medicine , volume=. 2024 , publisher=

2024
[82]

npj Digital Medicine , volume=

Transformer patient embedding using electronic health records enables patient stratification and progression analysis , author=. npj Digital Medicine , volume=. 2025 , publisher=

2025
[83]

Journal of Biomedical Informatics , volume=

HEART: Learning better representation of EHR data with a heterogeneous relation-aware transformer , author=. Journal of Biomedical Informatics , volume=. 2024 , publisher=

2024

Showing first 80 references.

[1] [1]

Erin P. Balogh. 2015. Improving diagnosis in health care

2015

[2] [2]

Shah Bds

Savyasachi V. Shah Bds. 2024. Accuracy, consistency, and hallucination of large language models when analyzing unstructured clinical notes in electronic medical records. JAMA Network Open, 7(8):2

2024

[3] [3]

Ofir Ben Shoham and Nadav Rappoport. 2024. Cpllm: Clinical prediction with large language models. PLOS Digital Health, 3(12):e0000680

2024

[4] [4]

Graber, J

Alexandra Campione Russo, Jean‐Luc Tilly, Leah Kaufman, Melissa Danforth, Mark L. Graber, J. Matthew Austin, and Hardeep Singh. 2025. Hospital commitments to address diagnostic errors: An assessment of 95 us hospitals. Journal of Hospital Medicine, 20(2)

2025

[5] [5]

Layden, Mudassir Rashid, Lu Cheng, Ali Cinar, and Barbara Di Eugenio

Rochana Chaturvedi, Yue Zhou, Andrew Boyd, Brian T. Layden, Mudassir Rashid, Lu Cheng, Ali Cinar, and Barbara Di Eugenio. 2025. Early risk prediction with temporally and contextually grounded clinical language processing

2025

[6] [6]

Kai Chen, Xinfeng Li, Tianpei Yang, Hewei Wang, Wei Dong, and Yang Gao. 2025 a . Mdteamgpt: A self-evolving llm-based multi-agent framework for multi-disciplinary team medical consultation

2025

[7] [7]

Xi Chen, Huahui Yi, Mingke You, WeiZhi Liu, Li Wang, Hairui Li, Xue Zhang, Yingman Guo, Lei Fan, Gang Chen, et al. 2025 b . Enhancing diagnostic capability with multi-agents conversational large language models. NPJ digital medicine, 8(1):159

2025

[8] [8]

Stewart, and Jimeng Sun

Edward Choi, Mohammad Taha Bahadori, Andy Schuetz, Walter F. Stewart, and Jimeng Sun. 2016 a . Retain: Interpretable predictive model in healthcare using reverse time attention mechanism. Curran Associates Inc

2016

[9] [9]

Edward Choi, Mohammad Taha Bahadori, Jimeng Sun, Joshua Kulas, Andy Schuetz, and Walter Stewart. 2016 b . Retain: An interpretable predictive model for healthcare using reverse time attention mechanism. Advances in neural information processing systems, 29

2016

[10] [10]

Xinsong Du, Zhengyang Zhou, Yifei Wang, Ya-Wen Chuang, Yiming Li, Richard Yang, Wenyu Zhang, Xinyi Wang, Xinyu Chen, Hao Guan, et al. 2026. Testing and evaluation of generative large language models in electronic health record applications: a systematic review. Journal of the American Medical Informatics Association, page ocaf233

2026

[11] [11]

Mark L Graber, Diana Rusz, Melissa L Jones, Diana Farm-Franks, Barbara Jones, Jeannine Cyr Gluck, Dana B Thomas, Kelly T Gleason, Kathy Welte, Jennifer Abfalter, et al. 2017. The new diagnostic team. Diagnosis, 4(4):225--238

2017

[12] [12]

Paul Hager, Friederike Jungmann, Robbie Holland, Kunal Bhagat, Inga Hubrecht, Manuel Knauer, Jakob Vielhauer, Marcus Makowski, Rickmer Braren, and Georgios Kaissis. 2024. Evaluation and mitigation of the limitations of large language models in clinical decision-making. Nature Medicine, 30(9):26

2024

[13] [13]

Hrayr Harutyunyan, Hrant Khachatrian, David C Kale, Greg Ver Steeg, and Aram Galstyan. 2019. Multitask learning and benchmarking with clinical time series data. Scientific data, 6(1):96

2019

[14] [15]

Black, Danny Park, James Zou, Andrew Y

Yixing Jiang, Kameron C. Black, Danny Park, James Zou, Andrew Y. Ng, and Jonathan H. Chen. 2025. Medagentbench: A realistic virtual ehr environment to benchmark medical llm agents

2025

[15] [16]

Alistair EW Johnson, Lucas Bulgarelli, Lu Shen, Alvin Gayles, Ayad Shammout, Steven Horng, Tom J Pollard, Sicheng Hao, Benjamin Moody, Brian Gow, et al. 2023. Mimic-iv, a freely accessible electronic health record dataset. Scientific data, 10(1):1

2023

[16] [17]

Alistair EW Johnson, Tom J Pollard, Lu Shen, Li-wei H Lehman, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G Mark. 2016. Mimic-iii, a freely accessible critical care database. Scientific data, 3(1):1--9

2016

[17] [18]

Misuk Kim and Kyu-Baek Hwang. 2022. An empirical evaluation of sampling methods for the classification of imbalanced data. PloS one, 17(7):e0271260

2022

[18] [19]

Yubin Kim, Chanwoo Park, Hyewon Jeong, Yik Siu Chan, Xuhai Xu, Daniel Mcduff, Hyeonhoon Lee, Marzyeh Ghassemi, Cynthia Breazeal, and Hae Won Park. 2024. Mdagents: An adaptive collaboration of llms for medical decision-making

2024

[19] [20]

Ching Yi Lee, Hung Yi Lai, Ching Hsin Lee, Mi Mi Chen, and Sze Yuen Yau. 2024. Collaborative clinical reasoning: a scoping review. PeerJ

2024

[20] [21]

Weibin Liao, Yinghao Zhu, Zhongji Zhang, Yuhang Wang, Zixiang Wang, Xu Chu, Yasha Wang, and Liantao Ma. 2024. Learnable prompt as pseudo-imputation: Rethinking the necessity of traditional ehr data imputation in downstream clinical prediction

2024

[21] [22]

Qicai Liu, Zhichao Hu, Tao Huang, Yupeng Niu, Xinche Zhang, Shanwu Ma, Chutong Lin, Goh Kim Huat, Hyeokkoo Eric Kwon, and Feng Gao. 2026. Evomdt: a self-evolving multi-agent system for structured clinical decision-making in multi-cancer. npj Digital Medicine, 9(1)

2026

[22] [23]

Liantao Ma, Junyi Gao, Yasha Wang, Chaohe Zhang, and Xinyu Ma. 2020. Adacare: Explainable clinical health status representation learning via scale-adaptive feature extraction and recalibration. Proceedings of the AAAI Conference on Artificial Intelligence, 34(1):825--832

2020

[23] [24]

Xinyu Ma, Yasha Wang, Xu Chu, Liantao Ma, Wen Tang, Junfeng Zhao, Ye Yuan, and Guoren Wang. 2022. Patient health representation learning via correlational sparse prior of medical features. IEEE Transactions on Knowledge and Data Engineering, 35(11):11769--11783

2022

[24] [25]

Matthew B McDermott, Haoran Zhang, Lasse H Hansen, Giovanni Angelotti, and Jack Gallifant. 2024. A closer look at auroc and auprc under class imbalance. Advances in Neural Information Processing Systems, 37:44102--44163

2024

[25] [26]

Harsha Nori, Mayank Daswani, Christopher Kelly, Scott Lundberg, Marco Tulio Ribeiro, Marc Wilson, Xiaoxuan Liu, Viknesh Sounderajah, Jonathan Carlson, and Matthew P Lungren. 2025. Sequential diagnosis with language models

2025

[26] [27]

Robert S Porter and Justin L Kaplan. 2011. The Merck manual of diagnosis and therapy. Merck Sharp & Dohme Corp

2011

[27] [28]

Weijieying Ren, Jingxi Zhu, Zehao Liu, Tianxiang Zhao, and Vasant Honavar. 2025. A comprehensive survey of electronic health record modeling: From deep learning approaches to large language models

2025

[28] [30]

Wenqi Shi, Ran Xu, Yuchen Zhuang, Yue Yu, Jieyu Zhang, Hang Wu, Yuanda Zhu, Joyce Ho, Carl Yang, and May D. Wang. 2024. Ehragent: Code empowers large language models for few-shot complex tabular reasoning on electronic health records

2024

[29] [31]

Mukund Sundararajan and Amir Najmi. 2020. The many shapley values for model explanation. In International conference on machine learning, pages 9269--9278. PMLR

2020

[30] [32]

Xiangru Tang, Anni Zou, Zhuosheng Zhang, Ziming Li, Yilun Zhao, Xingyao Zhang, Arman Cohan, and Mark Gerstein. 2024. Medagents: Large language models as collaborators for zero-shot medical reasoning. Findings of the Association for Computational Linguistics ACL 2024, pages 599--621

2024

[31] [33]

Janneke E. W. Walraven, Olga L. Van, der Hel, J. J. M. Van, der Hoeven, Valery E. P. P. Lemmens, Rob H. A. Verhoeven, and Ingrid M. E. Desar. 2022. Factors influencing the quality and functioning of oncological multidisciplinary team meetings: results of a systematic review. BMC Health Services Research, 22(1):1--27

2022

[32] [34]

Haochun Wang, Sendong Zhao, Zewen Qiang, Nuwa Xi, Bing Qin, and Ting Liu. 2024. Beyond direct diagnosis: Llm-based multi-specialist agent consultation for automatic diagnosis

2024

[33] [36]

Ran Xu, Wenqi Shi, Yue Yu, Yuchen Zhuang, Bowen Jin, May Dongmei Wang, Joyce Ho, and Carl Yang. 2024. Ram-ehr: Retrieval augmentation meets clinical predictions on electronic health records. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 754--765

2024

[34] [37]

Sihang Zeng, Lucas Jing Liu, Jun Wen, Meliha Yetisgen, Ruth Etzioni, and Gang Luo. 2025. Trajsurv: Learning continuous latent trajectories from electronic health records for trustworthy survival prediction

2025

[35] [38]

Rongjia Zhou, Chengzhuo Li, Carl Yang, and Jiaying Lu. 2025 a . Clinnoteagents: An llm multi-agent system for predicting and interpreting heart failure 30-day readmission from clinical notes

2025

[36] [39]

Yucheng Zhou, Lingran Song, and Jianbing Shen. 2025 b . Mam: Modular multi-agent framework for multi-modal medical diagnosis via role-specialized collaboration. In Findings of the Association for Computational Linguistics: ACL 2025, pages 25319--25333

2025

[37] [40]

Yinghao Zhu, Ziyi He, Haoran Hu, Xiaochen Zheng, Xichen Zhang, Zixiang Wang, Junyi Gao, Liantao Ma, and Lequan Yu. 2025. Medagentboard: Benchmarking multi-agent collaboration with conventional methods for diverse medical tasks

2025

[38] [41]

Yinghao Zhu, Changyu Ren, Zixiang Wang, Xiaochen Zheng, Shiyun Xie, Junlan Feng, Xi Zhu, Zhoujun Li, Liantao Ma, and Chengwei Pan. 2024. Emerge: Enhancing multimodal electronic health records predictive modeling with retrieval-augmented generation. In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, pages 3549--3559

2024

[39] [42]

Curran Associates Inc

RETAIN: Interpretable Predictive Model in Healthcare using Reverse Time Attention Mechanism , author=. Curran Associates Inc. , year=

[40] [43]

Scientific Reports , volume=

BEHRT: Transformer for Electronic Health Records , author=. Scientific Reports , volume=

[41] [44]

A Comprehensive Survey of Electronic Health Record Modeling: From Deep Learning Approaches to Large Language Models , author=

[42] [45]

Journal of the American Medical Informatics Association , pages=

Testing and evaluation of generative large language models in electronic health record applications: a systematic review , author=. Journal of the American Medical Informatics Association , pages=. 2026 , publisher=

2026

[43] [46]

Improving Diagnosis in Health Care , author=

[44] [47]

Diagnosis , volume=

The new diagnostic team , author=. Diagnosis , volume=. 2017 , publisher=

2017

[45] [48]

Journal of Hospital Medicine , volume=

Hospital commitments to address diagnostic errors: An assessment of 95 US hospitals , author=. Journal of Hospital Medicine , volume=

[46] [49]

Systematic Reviews , volume=

Interprofessional diagnostic management teams: a scoping review protocol , author=. Systematic Reviews , volume=

[47] [50]

BMC Health Services Research , volume=

Factors influencing the quality and functioning of oncological multidisciplinary team meetings: results of a systematic review , author=. BMC Health Services Research , volume=

[48] [51]

Cancer Treatment Reviews , volume=

The impact of multidisciplinary team meetings on patient assessment, management and outcomes in oncology settings: A systematic review of the literature , author=. Cancer Treatment Reviews , volume=

[49] [52]

Frontiers in Health Services , year=

Implementation of streamlining measures in selecting and prioritising complex cases for the cancer multidisciplinary team meeting: a mini review of the recent developments , author=. Frontiers in Health Services , year=

[50] [53]

Frontiers in Public Health , year=

Accelerating rare disease detection: an experience of multidisciplinary team model in undiagnosed diseases program in a children's hospital , author=. Frontiers in Public Health , year=

[51] [54]

Nature Machine Intelligence , volume=

LLM-based agentic systems in medicine and healthcare , author=. Nature Machine Intelligence , volume=

[52] [55]

Findings of the Association for Computational Linguistics ACL 2024 , pages=

MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning , author=. Findings of the Association for Computational Linguistics ACL 2024 , pages=

2024

[53] [56]

MDAgents: An Adaptive Collaboration of LLMs for Medical Decision-Making , author=

[54] [57]

Beyond Direct Diagnosis: LLM-based Multi-Specialist Agent Consultation for Automatic Diagnosis , author=

[55] [58]

Sequential Diagnosis with Language Models , author=

[56] [59]

npj Digital Medicine , volume=

EvoMDT: a self-evolving multi-agent system for structured clinical decision-making in multi-cancer , author=. npj Digital Medicine , volume=

[57] [60]

MDTeamGPT: A Self-Evolving LLM-based Multi-Agent Framework for Multi-Disciplinary Team Medical Consultation , author=

[58] [61]

MedAgentBoard: Benchmarking Multi-Agent Collaboration with Conventional Methods for Diverse Medical Tasks , author=

[59] [62]

EHRAgent: Code Empowers Large Language Models for Few-shot Complex Tabular Reasoning on Electronic Health Records , author=

[60] [63]

KDD'24 Workshop: Artificial Intelligence and Data Science for Healthcare: Bridging Data-Centric AI and People-Centric Healthcare , year=

EHRFlow: A Large Language Model-Driven Iterative Multi-Agent Electronic Health Record Data Analysis Workflow , author=. KDD'24 Workshop: Artificial Intelligence and Data Science for Healthcare: Bridging Data-Centric AI and People-Centric Healthcare , year=

[61] [64]

MedAgentBench: A Realistic Virtual EHR Environment to Benchmark Medical LLM Agents , author=

[62] [65]

arXiv preprint arXiv:2509.19319 , year=

FHIR-AgentBench: Benchmarking LLM Agents for Realistic Interoperable EHR Question Answering , author=. arXiv preprint arXiv:2509.19319 , year=

arXiv

[63] [66]

Proceedings of the ACM Web Conference 2025 , pages=

ColaCare: Enhancing Electronic Health Record Modeling through Large Language Model-Driven Multi-Agent Collaboration , author=. Proceedings of the ACM Web Conference 2025 , pages=. 2025 , publisher=. doi:10.1145/3696410.3714877 , url=

work page doi:10.1145/3696410.3714877 2025

[64] [67]

MoMA: A Mixture-of-Multimodal-Agents Architecture for Enhancing Clinical Prediction Modelling , author=

[65] [68]

npj Digital Medicine , volume=

CARE-AD: a multi-agent large language model framework for Alzheimer's disease prediction using longitudinal clinical notes , author=. npj Digital Medicine , volume=. 2025 , doi=

2025

[66] [69]

ClinNoteAgents: An LLM Multi-Agent System for Predicting and Interpreting Heart Failure 30-Day Readmission from Clinical Notes , author=

[67] [70]

Cancers , volume=

The effects of multidisciplinary team meetings on clinical practice for colorectal, lung, prostate and breast cancer: a systematic review , author=. Cancers , volume=. 2021 , publisher=

2021

[68] [71]

Frontiers in Health Services , volume=

Implementation of streamlining measures in selecting and prioritising complex cases for the cancer multidisciplinary team meeting: a mini review of the recent developments , author=. Frontiers in Health Services , volume=. 2024 , publisher=

2024

[69] [72]

BJS open , volume=

Cancer multidisciplinary team meetings: impact of logistical challenges on communication and decision-making , author=. BJS open , volume=. 2022 , publisher=

2022

[70] [73]

BMJ open , volume=

Mapping cognitive biases in multidisciplinary team (MDT) decision-making for cancer care in Scotland: a cognitive ethnography study protocol , author=. BMJ open , volume=. 2024 , publisher=

2024

[71] [74]

TrajSurv: Learning Continuous Latent Trajectories from Electronic Health Records for Trustworthy Survival Prediction , author=

[72] [75]

Early Risk Prediction with Temporally and Contextually Grounded Clinical Language Processing , author=

[73] [76]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

AdaCare: Explainable Clinical Health Status Representation Learning via Scale-Adaptive Feature Extraction and Recalibration , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

[74] [77]

Learnable Prompt as Pseudo-Imputation: Rethinking the Necessity of Traditional EHR Data Imputation in Downstream Clinical Prediction , author=

[75] [78]

PeerJ , year=

Collaborative clinical reasoning: a scoping review , author=. PeerJ , year=

[76] [79]

Nature Medicine , volume=

Evaluation and mitigation of the limitations of large language models in clinical decision-making , author=. Nature Medicine , volume=

[77] [80]

JAMA Network Open , volume=

Accuracy, Consistency, and Hallucination of Large Language Models When Analyzing Unstructured Clinical Notes in Electronic Medical Records , author=. JAMA Network Open , volume=

[78] [81]

NPJ digital medicine , volume=

A multi-center study on the adaptability of a shared foundation model for electronic health records , author=. NPJ digital medicine , volume=. 2024 , publisher=

2024

[79] [82]

npj Digital Medicine , volume=

Transformer patient embedding using electronic health records enables patient stratification and progression analysis , author=. npj Digital Medicine , volume=. 2025 , publisher=

2025

[80] [83]

Journal of Biomedical Informatics , volume=

HEART: Learning better representation of EHR data with a heterogeneous relation-aware transformer , author=. Journal of Biomedical Informatics , volume=. 2024 , publisher=

2024