pith. machine review for the scientific record. sign in

arxiv: 2605.12335 · v1 · submitted 2026-05-12 · 💻 cs.IR · cs.AI· cs.LG

Recognition: 2 theorem links

· Lean Theorem

EHR-RAGp: Retrieval-Augmented Prototype-Guided Foundation Model for Electronic Health Records

Dana El Samad, Farah E. Shamout, Mariam Al-Omari, Saeed Shurrab

Pith reviewed 2026-05-13 03:03 UTC · model grok-4.3

classification 💻 cs.IR cs.AIcs.LG
keywords electronic health recordsretrieval-augmented generationprototype-guided retrievalclinical predictionfoundation modelslongitudinal datainformation retrieval
0
0 comments X

The pith

A prototype-guided retrieval module lets an EHR foundation model select the most relevant patient history chunks for each prediction task.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Electronic health records contain long, irregular patient histories that are difficult to use effectively for predictions because fixed windows or uniform aggregation can hide important signals. The paper introduces EHR-RAGp, a retrieval-augmented foundation model that adds a prototype-guided module to dynamically retrieve and align the most relevant historical data segments with the current task. This replaces crude summarization with task-specific context selection across heterogeneous clinical events. A sympathetic reader would care because more precise use of past records could raise accuracy on outcome forecasts without processing entire trajectories at once. The reported results show consistent gains over current top EHR models and further improvements when the approach is combined with existing foundation models.

Core claim

EHR-RAGp is a retrieval-augmented foundation model for electronic health records that incorporates a prototype-guided retrieval module. The module functions as an alignment mechanism that estimates the relevance of retrieved historical patient data chunks with respect to a given prediction task and steers the model toward the most informative context. Across multiple clinical prediction tasks, this yields consistent outperformance relative to state-of-the-art EHR foundation models and transformer-based baselines. The same module can be integrated with existing clinical foundation models to produce substantial additional performance gains while providing a scalable way to handle long-range,ir

What carries the argument

The prototype-guided retrieval module, which estimates relevance of historical data chunks to a prediction task and guides the model to the most informative context.

If this is right

  • EHR-RAGp consistently outperforms state-of-the-art EHR foundation models and transformer-based baselines on multiple clinical prediction tasks.
  • Integrating EHR-RAGp with existing clinical foundation models produces substantial performance gains.
  • The approach supplies a scalable and efficient way to leverage long-range clinical context instead of fixed windows or uniform aggregation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Dynamic relevance estimation may prove useful in other longitudinal datasets where the value of past events varies by task.
  • The modular design suggests the retrieval component could be added to existing clinical models without retraining the entire system from scratch.
  • Focusing computation on selected chunks rather than full histories could lower the cost of processing very long patient trajectories.

Load-bearing premise

The prototype-guided retrieval module must reliably identify and prioritize the most relevant historical chunks for each specific prediction task.

What would settle it

If an ablation study on the same clinical prediction benchmarks shows that removing the prototype guidance or replacing it with fixed-window or random retrieval produces no performance drop relative to the full model.

Figures

Figures reproduced from arXiv: 2605.12335 by Dana El Samad, Farah E. Shamout, Mariam Al-Omari, Saeed Shurrab.

Figure 1
Figure 1. Figure 1: Overview of the EHR-RAGp, including chunking of patient trajectories to build the EHR vector database (top panel), masked pre-training of the backbone encoder with chunked trajectories (bottom left), and end-to-end fine-tuning of the final architecture (bottom right). and integrate information scattered across the patient’s complete trajectory. The central assumption motivating this work is that retrieval … view at source ↗
Figure 2
Figure 2. Figure 2: UMAP projections of history embeddings and prototypes for (a) long length of stay and (b) [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
read the original abstract

Electronic Health Records (EHR) contain rich longitudinal patient information and are widely used in predictive modeling applications. However, effectively leveraging historical data remains challenging due to long trajectories, heterogeneous events, temporal irregularity, and the varying relevance of past clinical context. Existing approaches often rely on fixed windows or uniform aggregation, which can obscure clinically important signals. In this work, we introduce EHR-RAGp, a retrieval-augmented foundation model that dynamically integrates the most relevant patient history across diverse clinical event types. We propose a prototype-guided retrieval module that acts as an alignment mechanism and estimates the relevance of retrieved historical chunks with respect to a given prediction task, guiding the model towards the most informative context. Across multiple clinical prediction tasks, EHR-RAGp consistently outperforms state-of-the-art EHR foundation models and transformer-based baselines. Furthermore, integrating EHR-RAGp with existing clinical foundation models yields substantial performance gains. Overall, EHR-RAGp provides a scalable and efficient framework for leveraging long-range clinical context to improve downstream performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces EHR-RAGp, a retrieval-augmented foundation model for electronic health records featuring a prototype-guided retrieval module that dynamically selects relevant historical patient data chunks by estimating their task-specific relevance. The central claims are that EHR-RAGp consistently outperforms state-of-the-art EHR foundation models and transformer-based baselines across multiple clinical prediction tasks, and that integrating EHR-RAGp with existing clinical foundation models produces substantial performance gains.

Significance. If the outperformance and integration benefits are empirically substantiated, the work would advance EHR predictive modeling by offering a scalable mechanism to exploit long, irregular, and heterogeneous patient trajectories without fixed windows or uniform aggregation. The plug-in integration capability could enable incremental improvements to existing clinical foundation models, with potential downstream benefits for healthcare decision support.

major comments (2)
  1. Abstract: The claims of consistent outperformance and integration gains are made without any quantitative results, baseline comparisons, dataset descriptions, statistical tests, or ablation studies. This directly undermines evaluation of the central claim that the prototype-guided module drives the gains rather than factors such as context length or model capacity.
  2. Prototype-guided retrieval module (as described in the abstract and methods): No implementation details are supplied for prototype construction, the similarity function used to score historical chunk relevance, or the training loss for the retrieval component. No retrieval-specific metrics (e.g., precision against task-relevant labels) or ablations versus plain RAG or fixed-window baselines are reported, leaving the load-bearing assumption that the module reliably acts as an alignment mechanism unverified.
minor comments (1)
  1. Abstract: Including one or two concrete performance numbers or task names would help readers immediately gauge the scale of the reported improvements.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and have revised the manuscript to incorporate additional details and clarifications where appropriate.

read point-by-point responses
  1. Referee: Abstract: The claims of consistent outperformance and integration gains are made without any quantitative results, baseline comparisons, dataset descriptions, statistical tests, or ablation studies. This directly undermines evaluation of the central claim that the prototype-guided module drives the gains rather than factors such as context length or model capacity.

    Authors: We agree that the abstract, being a concise summary, does not include quantitative results or other specifics. The full manuscript provides these in the Experiments and Results sections, including performance tables, baseline comparisons, dataset descriptions, and ablation studies. To directly address the concern, we have revised the abstract to include key quantitative highlights (e.g., average AUC improvements and dataset names) and references to statistical significance and ablations, while preserving brevity. revision: yes

  2. Referee: Prototype-guided retrieval module (as described in the abstract and methods): No implementation details are supplied for prototype construction, the similarity function used to score historical chunk relevance, or the training loss for the retrieval component. No retrieval-specific metrics (e.g., precision against task-relevant labels) or ablations versus plain RAG or fixed-window baselines are reported, leaving the load-bearing assumption that the module reliably acts as an alignment mechanism unverified.

    Authors: We acknowledge that the original Methods section provided high-level descriptions but insufficient implementation specifics for reproducibility. We have expanded this section to detail prototype construction (via clustering on EHR embeddings), the similarity function (cosine similarity on task-conditioned representations), and the retrieval training loss (contrastive objective combined with the primary task loss). We have also added retrieval-specific metrics (e.g., precision@K on task-relevant chunks) and new ablation experiments versus plain RAG and fixed-window baselines to empirically verify the prototype guidance mechanism. revision: yes

Circularity Check

0 steps flagged

No circularity: model proposal is architectural, not derived from self-referential equations or fits

full rationale

The paper describes an EHR-RAGp architecture with a prototype-guided retrieval module but presents no equations, derivations, or first-principles results. Performance claims are empirical comparisons against baselines rather than predictions that reduce to fitted inputs or self-definitions. No self-citations, uniqueness theorems, or ansatzes are invoked in the provided text to justify core components. The approach builds on existing foundation models without reducing its central claims to its own outputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the model description implies standard deep-learning assumptions but none are stated or quantified.

pith-pipeline@v0.9.0 · 5494 in / 1271 out tokens · 117626 ms · 2026-05-13T03:03:34.801169+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Reference graph

Works this paper leans on

89 extracted references · 89 canonical work pages · 6 internal anchors

  1. [1]

    A review of deep learning models and online healthcare databases for electronic health records and their use for health prediction.Artificial Intelligence Review, 57(9):249, 2024

    Nurul Athirah Nasarudin, Fatma Al Jasmi, Richard O Sinnott, Nazar Zaki, Hany Al Ashwal, Elfadil A Mohamed, and Mohd Saberi Mohamad. A review of deep learning models and online healthcare databases for electronic health records and their use for health prediction.Artificial Intelligence Review, 57(9):249, 2024

  2. [2]

    Cao Xiao, Edward Choi, and Jimeng Sun. Opportunities and challenges in developing deep learning models using electronic health records data: a systematic review.Journal of the American Medical Informatics Association, 25(10):1419–1428, 2018

  3. [3]

    The shaky foundations of large language models and foundation models for electronic health records.npj digital medicine, 6(1):135, 2023

    Michael Wornow, Yizhe Xu, Rahul Thapa, Birju Patel, Ethan Steinberg, Scott Fleming, Michael A Pfeffer, Jason Fries, and Nigam H Shah. The shaky foundations of large language models and foundation models for electronic health records.npj digital medicine, 6(1):135, 2023

  4. [4]

    A scoping review of using large language models (llms) to investigate electronic health records (ehrs).arXiv preprint arXiv:2405.03066, 2024

    Lingyao Li, Jiayan Zhou, Zhenxiang Gao, Wenyue Hua, Lizhou Fan, Huizi Yu, Loni Hagen, Yongfeng Zhang, Themistocles L Assimes, Libby Hemphill, et al. A scoping review of using large language models (llms) to investigate electronic health records (ehrs).arXiv preprint arXiv:2405.03066, 2024

  5. [5]

    A comprehensive survey of electronic health record modeling: From deep learning approaches to large language models.arXiv preprint arXiv:2507.12774, 2025

    Weijieying Ren, Jingxi Zhu, Zehao Liu, Tianxiang Zhao, and Vasant Honavar. A comprehensive survey of electronic health record modeling: From deep learning approaches to large language models.arXiv preprint arXiv:2507.12774, 2025

  6. [6]

    Context clues: Evaluating long context models for clinical prediction tasks on ehrs.arXiv preprint arXiv:2412.16178, 2024

    Michael Wornow, Suhana Bedi, Miguel Angel Fuentes Hernandez, Ethan Steinberg, Jason Alan Fries, Christopher Ré, Sanmi Koyejo, and Nigam H Shah. Context clues: Evaluating long context models for clinical prediction tasks on ehrs.arXiv preprint arXiv:2412.16178, 2024

  7. [7]

    Shaojie Zhong, Li Rong Wang, Zhuoxuan Zhan, Yih Yng Ng, and Xiuyi Fan. A hybrid approach for irregular-time series prediction using electronic health records: an intensive care unit mortality case study.ACM Transactions on Computing for Healthcare, 6(4):1–33, 2025

  8. [8]

    Ehrshot: An ehr benchmark for few-shot evaluation of foundation models.Advances in Neural Information Processing Systems, 36:67125–67137, 2023

    Michael Wornow, Rahul Thapa, Ethan Steinberg, Jason Fries, and Nigam Shah. Ehrshot: An ehr benchmark for few-shot evaluation of foundation models.Advances in Neural Information Processing Systems, 36:67125–67137, 2023

  9. [9]

    Medmod: Multimodal benchmark for medical prediction tasks with electronic health records and chest x-ray scans.Proceedings of Machine Learning Research, 287:1–23, 2025

    Shaza Elsharief, Saeed Shurrab, Baraa Al Jorf, L Julián Lechuga López, and Farah E Shamout. Medmod: Multimodal benchmark for medical prediction tasks with electronic health records and chest x-ray scans.Proceedings of Machine Learning Research, 287:1–23, 2025

  10. [10]

    Ehrmamba: Towards generalizable and scalable foundation models for electronic health records.arXiv preprint arXiv:2405.14567, 2024

    Adibvafa Fallahpour, Mahshid Alinoori, Wenqian Ye, Xu Cao, Arash Afkanpour, and Amrit Krishnan. Ehrmamba: Towards generalizable and scalable foundation models for electronic health records.arXiv preprint arXiv:2405.14567, 2024

  11. [11]

    Core-behrt: A carefully optimized and rigorously evaluated behrt.arXiv preprint arXiv:2404.15201, 2024

    Mikkel Odgaard, Kiril Vadimovic Klein, Sanne Møller Thysen, Espen Jimenez-Solem, Martin Sillesen, and Mads Nielsen. Core-behrt: A carefully optimized and rigorously evaluated behrt. arXiv preprint arXiv:2404.15201, 2024

  12. [12]

    Unihpf: Universal healthcare predictive framework with zero domain knowledge.arXiv preprint arXiv:2211.08082, 2022

    Kyunghoon Hur, Jungwoo Oh, Junu Kim, Jiyoun Kim, Min Jae Lee, Eunbyeol Cho, Seong-Eun Moon, Young-Hak Kim, and Edward Choi. Unihpf: Universal healthcare predictive framework with zero domain knowledge.arXiv preprint arXiv:2211.08082, 2022. 10

  13. [13]

    Genhpf: General healthcare predictive framework for multi-task multi-source learning.IEEE Journal of Biomedical and Health Informatics, 28(1):502–513, 2023

    Kyunghoon Hur, Jungwoo Oh, Junu Kim, Jiyoun Kim, Min Jae Lee, Eunbyeol Cho, Seong- Eun Moon, Young-Hak Kim, Louis Atallah, and Edward Choi. Genhpf: General healthcare predictive framework for multi-task multi-source learning.IEEE Journal of Biomedical and Health Informatics, 28(1):502–513, 2023

  14. [14]

    Transformehr: transformer-based encoder-decoder generative model to enhance prediction of disease out- comes using electronic health records.Nature communications, 14(1):7857, 2023

    Zhichao Yang, Avijit Mitra, Weisong Liu, Dan Berlowitz, and Hong Yu. Transformehr: transformer-based encoder-decoder generative model to enhance prediction of disease out- comes using electronic health records.Nature communications, 14(1):7857, 2023

  15. [15]

    Predicting clinical states in individual patients.Annals of Internal Medicine, 125(5):406–412, 1996

    Leonard E Braitman and Frank Davidoff. Predicting clinical states in individual patients.Annals of Internal Medicine, 125(5):406–412, 1996

  16. [16]

    On clinical event prediction in patient treatment trajectory using longitudinal electronic health records.IEEE Journal of Biomedical and Health Informatics, 24(7):2053–2063, 2019

    Huilong Duan, Zhoujian Sun, Wei Dong, Kunlun He, and Zhengxing Huang. On clinical event prediction in patient treatment trajectory using longitudinal electronic health records.IEEE Journal of Biomedical and Health Informatics, 24(7):2053–2063, 2019

  17. [17]

    Retrieval-augmented generation for knowledge-intensive nlp tasks.Advances in neural information processing systems, 33:9459–9474, 2020

    Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. Retrieval-augmented generation for knowledge-intensive nlp tasks.Advances in neural information processing systems, 33:9459–9474, 2020

  18. [18]

    Retrieval-augmented generation for natural language processing: A survey.arXiv preprint arXiv:2407.13193, 2024

    Shangyu Wu, Ying Xiong, Yufei Cui, Haolun Wu, Can Chen, Ye Yuan, Lianming Huang, Xue Liu, Tei-Wei Kuo, Nan Guan, et al. Retrieval-augmented generation for natural language processing: A survey.arXiv preprint arXiv:2407.13193, 2024

  19. [19]

    Evaluation of retrieval-augmented generation: A survey

    Hao Yu, Aoran Gan, Kai Zhang, Shiwei Tong, Qi Liu, and Zhaofeng Liu. Evaluation of retrieval-augmented generation: A survey. InCCF Conference on Big Data, pages 102–120. Springer, 2024

  20. [20]

    Retrieval-augmented generation for ai-generated content: A survey

    Penghao Zhao, Hailin Zhang, Qinhan Yu, Zhengren Wang, Yunteng Geng, Fangcheng Fu, Ling Yang, Wentao Zhang, Jie Jiang, and Bin Cui. Retrieval-augmented generation for ai-generated content: A survey.arXiv preprint arXiv:2402.19473, 2024

  21. [21]

    Retrieval augmented generation and understanding in vision: A survey and new outlook.arXiv preprint arXiv:2503.18016, 2025

    Xu Zheng, Ziqiao Weng, Yuanhuiyi Lyu, Lutao Jiang, Haiwei Xue, Bin Ren, Danda Paudel, Nicu Sebe, Luc Van Gool, and Xuming Hu. Retrieval augmented generation and understanding in vision: A survey and new outlook.arXiv preprint arXiv:2503.18016, 2025

  22. [22]

    Long short-term memory.Neural computation, 9(8):1735–1780, 1997

    Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory.Neural computation, 9(8):1735–1780, 1997

  23. [23]

    On the properties of neural machine translation: Encoder-decoder approaches.arXiv preprint arXiv:1409.1259, 2014

    Kyunghyun Cho, Bart Van Merriënboer, Dzmitry Bahdanau, and Yoshua Bengio. On the properties of neural machine translation: Encoder-decoder approaches.arXiv preprint arXiv:1409.1259, 2014

  24. [24]

    Recurrent neural networks for multivariate time series with missing values.Scientific reports, 8(1):6085, 2018

    Zhengping Che, Sanjay Purushotham, Kyunghyun Cho, David Sontag, and Yan Liu. Recurrent neural networks for multivariate time series with missing values.Scientific reports, 8(1):6085, 2018

  25. [25]

    Multitask learning and benchmarking with clinical time series data.Scientific data, 6(1):96, 2019

    Hrayr Harutyunyan, Hrant Khachatrian, David C Kale, Greg Ver Steeg, and Aram Galstyan. Multitask learning and benchmarking with clinical time series data.Scientific data, 6(1):96, 2019

  26. [26]

    Medfuse: Multi-modal fusion with clini- cal time-series data and chest x-ray images

    Nasir Hayat, Krzysztof J Geras, and Farah E Shamout. Medfuse: Multi-modal fusion with clini- cal time-series data and chest x-ray images. InMachine Learning for Healthcare Conference, pages 479–503. PMLR, 2022

  27. [27]

    Bert: Pre-training of deep bidirectional transformers for language understanding

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186, 2019

  28. [28]

    Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019

    Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019. 11

  29. [29]

    Behrt: transformer for electronic health records.Scientific reports, 10(1):7155, 2020

    Yikuan Li, Shishir Rao, José Roberto Ayala Solares, Abdelaali Hassaine, Rema Ramakrishnan, Dexter Canoy, Yajie Zhu, Kazem Rahimi, and Gholamreza Salimi-Khorshidi. Behrt: transformer for electronic health records.Scientific reports, 10(1):7155, 2020

  30. [30]

    Yikuan Li, Mohammad Mamouei, Gholamreza Salimi-Khorshidi, Shishir Rao, Abdelaali Hassaine, Dexter Canoy, Thomas Lukasiewicz, and Kazem Rahimi. Hi-behrt: hierarchical transformer-based model for accurate prediction of clinical events using multimodal longitudinal electronic health records.IEEE journal of biomedical and health informatics, 27(2):1106–1117, 2022

  31. [31]

    Cehr-bert: Incorporating temporal information from structured ehr data to improve prediction tasks

    Chao Pang, Xinzhuo Jiang, Krishna S Kalluri, Matthew Spotnitz, RuiJun Chen, Adler Perotte, and Karthik Natarajan. Cehr-bert: Incorporating temporal information from structured ehr data to improve prediction tasks. InMachine Learning for Health, pages 239–260. PMLR, 2021

  32. [32]

    Med-bert: pretrained contex- tualized embeddings on large-scale structured electronic health records for disease prediction

    Laila Rasmy, Yang Xiang, Ziqian Xie, Cui Tao, and Degui Zhi. Med-bert: pretrained contex- tualized embeddings on large-scale structured electronic health records for disease prediction. NPJ digital medicine, 4(1):86, 2021

  33. [33]

    Foundation model of electronic medical records for adaptive risk estimation.GigaScience, 14:giaf107, 2025

    Pawel Renc, Michal K Grzeszczyk, Nassim Oufattole, Deirdre Goode, Yugang Jia, Szymon Bieganski, Matthew BA McDermott, Jaroslaw Was, Anthony E Samir, Jonathan W Cunningham, et al. Foundation model of electronic medical records for adaptive risk estimation.GigaScience, 14:giaf107, 2025

  34. [34]

    Cehr-gpt: Generating electronic health records with chronological patient timelines.arXiv preprint arXiv:2402.04400, 2024

    Chao Pang, Xinzhuo Jiang, Nishanth Parameshwar Pavinkurve, Krishna S Kalluri, Elise L Minto, Jason Patterson, Linying Zhang, George Hripcsak, Gamze Gürsoy, Noémie Elhadad, et al. Cehr-gpt: Generating electronic health records with chronological patient timelines.arXiv preprint arXiv:2402.04400, 2024

  35. [35]

    Generative medical event models improve with scale.arXiv preprint arXiv:2508.12104, 2025

    Shane Waxler, Paul Blazek, Davis White, Daniel Sneider, Kevin Chung, Mani Nagarathnam, Patrick Williams, Hank V oeller, Karen Wong, Matthew Swanhorst, et al. Generative medical event models improve with scale.arXiv preprint arXiv:2508.12104, 2025

  36. [36]

    From ehrs to patient pathways: Scalable modeling of longitudinal health trajectories with llms.arXiv preprint arXiv:2506.04831, 2025

    Chantal Pellegrini, Ege Özsoy, David Bani-Harouni, Matthias Keicher, and Nassir Navab. From ehrs to patient pathways: Scalable modeling of longitudinal health trajectories with llms.arXiv preprint arXiv:2506.04831, 2025

  37. [37]

    Clinical decision support using pseudo-notes from multiple streams of ehr data.npj Digital Medicine, 8(1):394, 2025

    Simon A Lee, Sujay Jain, Alex Chen, Kyoka Ono, Arabdha Biswas, Ákos Rudas, Jennifer Fang, and Jeffrey N Chiang. Clinical decision support using pseudo-notes from multiple streams of ehr data.npj Digital Medicine, 8(1):394, 2025

  38. [38]

    Improving language models by retrieving from trillions of tokens

    Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George Bm Van Den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, et al. Improving language models by retrieving from trillions of tokens. InInternational conference on machine learning, pages 2206–2240. PMLR, 2022

  39. [39]

    Active retrieval augmented generation

    Zhengbao Jiang, Frank F Xu, Luyu Gao, Zhiqing Sun, Qian Liu, Jane Dwivedi-Yu, Yiming Yang, Jamie Callan, and Graham Neubig. Active retrieval augmented generation. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 7969–7992, 2023

  40. [40]

    Replug: Retrieval-augmented black-box language models

    Weijia Shi, Sewon Min, Michihiro Yasunaga, Minjoon Seo, Richard James, Mike Lewis, Luke Zettlemoyer, and Wen-tau Yih. Replug: Retrieval-augmented black-box language models. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 8371–8384, 2024

  41. [41]

    Question-answering based summa- rization of electronic health records using retrieval augmented generation.arXiv preprint arXiv:2401.01469, 2024

    Walid Saba, Suzanne Wendelken, and James Shanahan. Question-answering based summa- rization of electronic health records using retrieval augmented generation.arXiv preprint arXiv:2401.01469, 2024

  42. [42]

    Rationale-guided retrieval augmented generation for medical question answering

    Jiwoong Sohn, Yein Park, Chanwoong Yoon, Sihyeon Park, Hyeon Hwang, Mujeen Sung, Hyunjae Kim, and Jaewoo Kang. Rationale-guided retrieval augmented generation for medical question answering. InProceedings of the 2025 Conference of the Nations of the Americas 12 Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1...

  43. [43]

    Grounding large language models in clinical evidence: A retrieval-augmented generation system for querying uk nice clinical guidelines.arXiv preprint arXiv:2510.02967, 2025

    Matthew Lewis, Samuel Thio, Richard JB Dobson, and Spiros Denaxas. Grounding large language models in clinical evidence: A retrieval-augmented generation system for querying uk nice clinical guidelines.arXiv preprint arXiv:2510.02967, 2025

  44. [44]

    Agentic memory-augmented retrieval and evidence grounding for medical question-answering tasks

    Shuyue Jia, Subhrangshu Bit, Varuna H Jasodanand, Yi Liu, and Vijaya B Kolachalama. Agentic memory-augmented retrieval and evidence grounding for medical question-answering tasks. medRxiv, pages 2025–08, 2025

  45. [45]

    A retrieval-augmented knowledge mining method with deep thinking llms for biomedical research and clinical support

    Yichun Feng, Jiawei Wang, Ruikun He, Lu Zhou, and Yixue Li. A retrieval-augmented knowledge mining method with deep thinking llms for biomedical research and clinical support. GigaScience, 14:giaf109, 2025

  46. [46]

    Ontologyrag: better and faster biomedical code mapping with retrieval-augmented generation (rag) leveraging ontology knowl- edge graphs and large language models

    Hui Feng, Yuntzu Yin, Emiliano Reynares, and Jay Nanavati. Ontologyrag: better and faster biomedical code mapping with retrieval-augmented generation (rag) leveraging ontology knowl- edge graphs and large language models. InInternational Workshop on Knowledge-Enhanced Information Retrieval, pages 71–86. Springer, 2025

  47. [47]

    Generating patient cohorts from electronic health records using two-step retrieval-augmented text-to-sql generation.arXiv preprint arXiv:2502.21107, 2025

    Angelo Ziletti and Leonardo D’Ambrosi. Generating patient cohorts from electronic health records using two-step retrieval-augmented text-to-sql generation.arXiv preprint arXiv:2502.21107, 2025

  48. [48]

    A survey on retrieval-augmentation generation (rag) models for healthcare applications.Neural Computing and Applications, 37(33):28191–28267, 2025

    Mohamed Abo El-Enen, Sally Saad, and Taymoor Nazmy. A survey on retrieval-augmentation generation (rag) models for healthcare applications.Neural Computing and Applications, 37(33):28191–28267, 2025

  49. [49]

    Retrieval augmented generation for large language models in healthcare: A systematic review.PLOS Digital Health, 4(6):e0000877, 2025

    Lameck Mbangula Amugongo, Pietro Mascheroni, Steven Brooks, Stefan Doering, and Jan Seidel. Retrieval augmented generation for large language models in healthcare: A systematic review.PLOS Digital Health, 4(6):e0000877, 2025

  50. [50]

    Emerge: Enhancing multimodal electronic health records predictive modeling with retrieval-augmented generation

    Yinghao Zhu, Changyu Ren, Zixiang Wang, Xiaochen Zheng, Shiyun Xie, Junlan Feng, Xi Zhu, Zhoujun Li, Liantao Ma, and Chengwei Pan. Emerge: Enhancing multimodal electronic health records predictive modeling with retrieval-augmented generation. InProceedings of the 33rd ACM International Conference on Information and Knowledge Management, pages 3549–3559, 2024

  51. [51]

    Ram-ehr: Retrieval augmentation meets clinical predictions on electronic health records

    Ran Xu, Wenqi Shi, Yue Yu, Yuchen Zhuang, Bowen Jin, May Dongmei Wang, Joyce Ho, and Carl Yang. Ram-ehr: Retrieval augmentation meets clinical predictions on electronic health records. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 754–765, 2024

  52. [52]

    Realm: Rag-driven enhancement of multimodal electronic health records analysis via large language models,

    Yinghao Zhu, Changyu Ren, Shiyun Xie, Shukai Liu, Hangyuan Ji, Zixiang Wang, Tao Sun, Long He, Zhoujun Li, Xi Zhu, et al. Realm: Rag-driven enhancement of multimodal electronic health records analysis via large language models.arXiv preprint arXiv:2402.07016, 2024

  53. [53]

    Improving hospital risk prediction with knowledge- augmented multimodal ehr modeling.arXiv preprint arXiv:2508.01970, 2025

    Rituparna Datta, Jiaming Cui, Zihan Guan, Vishal G Reddy, Joshua C Eby, Gregory Madden, Rupesh Silwal, and Anil Vullikanti. Improving hospital risk prediction with knowledge- augmented multimodal ehr modeling.arXiv preprint arXiv:2508.01970, 2025

  54. [54]

    General-purpose retrieval-enhanced medical prediction model using near-infinite history.arXiv preprint arXiv:2310.20204, 2023

    Junu Kim, Chaeeun Shim, Bosco Seong Kyu Yang, Chami Im, Sung Yoon Lim, Han-Gil Jeong, and Edward Choi. General-purpose retrieval-enhanced medical prediction model using near-infinite history.arXiv preprint arXiv:2310.20204, 2023

  55. [55]

    Masked siamese networks for label-efficient learning

    Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Florian Bordes, Pascal Vincent, Armand Joulin, Mike Rabbat, and Nicolas Ballas. Masked siamese networks for label-efficient learning. InEuropean conference on computer vision, pages 456–473. Springer, 2022

  56. [56]

    Mimic-iv.PhysioNet

    Alistair Johnson, Lucas Bulgarelli, Tom Pollard, Steven Horng, Leo Anthony Celi, and Roger Mark. Mimic-iv.PhysioNet. Available online at: https://physionet. org/content/mimiciv/1.0/(accessed August 23, 2021), pages 49–55, 2020. 13

  57. [57]

    Mimic-iv, a freely accessible electronic health record dataset.Scientific data, 10(1):1, 2023

    Alistair EW Johnson, Lucas Bulgarelli, Lu Shen, Alvin Gayles, Ayad Shammout, Steven Horng, Tom J Pollard, Sicheng Hao, Benjamin Moody, Brian Gow, et al. Mimic-iv, a freely accessible electronic health record dataset.Scientific data, 10(1):1, 2023

  58. [58]

    Unifying heterogeneous electronic health records systems via text-based code embedding

    Kyunghoon Hur, Jiyoung Lee, Jungwoo Oh, Wesley Price, Younghak Kim, and Edward Choi. Unifying heterogeneous electronic health records systems via text-based code embedding. In Conference on Health, Inference, and Learning, pages 183–203. PMLR, 2022

  59. [59]

    Smarter, better, faster, longer: A modern bidirectional encoder for fast, memory efficient, and long context finetuning and inference

    Benjamin Warner, Antoine Chaffin, Benjamin Clavié, Orion Weller, Oskar Hallström, Said Taghadouini, Alexis Gallagher, Raja Biswas, Faisal Ladhak, Tom Aarsen, et al. Smarter, better, faster, longer: A modern bidirectional encoder for fast, memory efficient, and long context finetuning and inference. InProceedings of the 63rd Annual Meeting of the Associati...

  60. [60]

    RoBERTa: A Robustly Optimized BERT Pretraining Approach

    Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. Roberta: A robustly optimized bert pretraining approach.arXiv preprint arXiv:1907.11692, 2019

  61. [61]

    Longformer: The Long-Document Transformer

    Iz Beltagy, Matthew E Peters, and Arman Cohan. Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150, 2020

  62. [62]

    Big bird: Transformers for longer sequences.Advances in neural information processing systems, 33:17283–17297, 2020

    Manzil Zaheer, Guru Guruganesh, Kumar Avinava Dubey, Joshua Ainslie, Chris Alberti, Santi- ago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, et al. Big bird: Transformers for longer sequences.Advances in neural information processing systems, 33:17283–17297, 2020

  63. [63]

    Roformer: Enhanced transformer with rotary position embedding.Neurocomputing, 568:127063, 2024

    Jianlin Su, Murtadha Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, and Yunfeng Liu. Roformer: Enhanced transformer with rotary position embedding.Neurocomputing, 568:127063, 2024

  64. [64]

    On the variety of methods for calculating confidence intervals by bootstrapping.Journal of Animal Ecology, 84(4):892–897, 2015

    Marie-Therese Puth, Markus Neuhäuser, and Graeme D Ruxton. On the variety of methods for calculating confidence intervals by bootstrapping.Journal of Animal Ecology, 84(4):892–897, 2015

  65. [65]

    Qwen2 Technical Report

    An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, Jin Xu, Jingren Zhou, Jinze Bai, Jinzheng He, Junyang Lin, Kai Dang, Keming Lu, Keqin Chen, Kexin Yang, Mei Li, Mingfeng ...

  66. [66]

    Mistral 7B

    Albert Q Jiang, A Sablayrolles, A Mensch, C Bamford, D Singh Chaplot, Ddl Casas, F Bressand, G Lengyel, G Lample, L Saulnier, et al. Mistral 7b. arxiv.arXiv preprint arXiv:2310.06825, 10:3, 2023

  67. [67]

    MedGemma 1.5 Technical Report

    Andrew Sellergren, Chufan Gao, Fereshteh Mahvar, Timo Kohlberger, Fayaz Jamil, Madeleine Traverse, Alberto Tono, Bashir Sadjad, Lin Yang, Charles Lau, et al. Medgemma 1.5 technical report.arXiv preprint arXiv:2604.05081, 2026

  68. [68]

    Biomistral: A collection of open-source pretrained large language models for medical domains, 2024

    Yanis Labrak, Adrien Bazoge, Emmanuel Morin, Pierre-Antoine Gourraud, Mickael Rouvier, and Richard Dufour. Biomistral: A collection of open-source pretrained large language models for medical domains, 2024

  69. [69]

    Medical event data standard (meds): Facilitating machine learning for health

    Bert Arnrich, Edward Choi, Jason Alan Fries, Matthew BA McDermott, Jungwoo Oh, Tom Pollard, Nigam Shah, Ethan Steinberg, Michael Wornow, and Robin van de Water. Medical event data standard (meds): Facilitating machine learning for health. InICLR 2024 Workshop on Learning from Time Series For Health, pages 03–08, 2024

  70. [70]

    Meds: Building models and tools in a reproducible health ai ecosystem

    Matthew BA McDermott, Justin Xu, Teya S Bergamaschi, Hyewon Jeong, Simon A Lee, Nassim Oufattole, Patrick Rockenschaub, Kamil˙e Stankeviˇci¯ut˙e, Ethan Steinberg, Jimeng Sun, et al. Meds: Building models and tools in a reproducible health ai ecosystem. InProceedings of 14 the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 2, pages 6...

  71. [71]

    Time2Vec: Learning a Vector Representation of Time

    Seyed Mehran Kazemi, Rishab Goel, Sepehr Eghbali, Janahan Ramanan, Jaspreet Sahota, Sanjay Thakur, Stella Wu, Cathal Smyth, Pascal Poupart, and Marcus Brubaker. Time2vec: Learning a vector representation of time.arXiv preprint arXiv:1907.05321, 2019

  72. [72]

    Machine learning based clinical prediction model for 1-year mortality in sepsis patients with atrial fibrillation

    Hong Meng, Liang Guo, Yucheng Pan, Bin Kong, Wei Shuai, and He Huang. Machine learning based clinical prediction model for 1-year mortality in sepsis patients with atrial fibrillation. Heliyon, 10(21), 2024

  73. [73]

    Developing machine learning models for prediction of mortality in the medical intensive care unit.Computer Methods and Programs in Biomedicine, 216:106663, 2022

    Beatriz Nistal-Nuño. Developing machine learning models for prediction of mortality in the medical intensive care unit.Computer Methods and Programs in Biomedicine, 216:106663, 2022

  74. [74]

    Unfolding physiological state: Mortality modelling in intensive care units

    Marzyeh Ghassemi, Tristan Naumann, Finale Doshi-Velez, Nicole Brimmer, Rohit Joshi, Anna Rumshisky, and Peter Szolovits. Unfolding physiological state: Mortality modelling in intensive care units. InProceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 75–84, 2014

  75. [75]

    Early prediction of long hospital stay for intensive care units readmission patients using medication information.Computers in biology and medicine, 174:108451, 2024

    Min Zhang and Tsung-Ting Kuo. Early prediction of long hospital stay for intensive care units readmission patients using medication information.Computers in biology and medicine, 174:108451, 2024

  76. [76]

    Khalid Alghatani, Nariman Ammar, Abdelmounaam Rezgui, and Arash Shaban-Nejad. Predict- ing intensive care unit length of stay and mortality using patient vital signs: machine learning model development and validation.JMIR medical informatics, 9(5):e21347, 2021

  77. [77]

    Huiling Hu, Jiashuai Li, Hui Ge, Bilin Wu, Tingting Feng, Xue Wu, and Xuanna Wu. Prognostic models for unplanned intensive care unit readmission risk prediction: A systematic review and meta-analysis based on hsroc model.Nursing in critical care, 30(2):e13306, 2025

  78. [78]

    Machine learning methods for hospital readmission prediction: systematic analysis of literature.Journal of Reliable Intelligent Environments, 8(1):49–66, 2022

    Talen Chen, Samaneh Madanian, David Airehrour, and Marianne Cherrington. Machine learning methods for hospital readmission prediction: systematic analysis of literature.Journal of Reliable Intelligent Environments, 8(1):49–66, 2022

  79. [79]

    Explainable machine learning for icu readmission prediction.arXiv preprint arXiv:2309.13781, 2023

    Alex GC de Sá, Daniel Gould, Anna Fedyukova, Mitchell Nicholas, Lucy Dockrell, Calvin Fletcher, David Pilcher, Daniel Capurro, David B Ascher, Khaled El-Khawas, et al. Explainable machine learning for icu readmission prediction.arXiv preprint arXiv:2309.13781, 2023

  80. [80]

    Decoupled Weight Decay Regularization

    Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.arXiv preprint arXiv:1711.05101, 2017

Showing first 80 references.