arxiv: 2605.12335 · v1 · submitted 2026-05-12 · 💻 cs.IR · cs.AI· cs.LG

Recognition: 2 theorem links

· Lean Theorem

EHR-RAGp: Retrieval-Augmented Prototype-Guided Foundation Model for Electronic Health Records

Dana El Samad, Farah E. Shamout, Mariam Al-Omari, Saeed Shurrab

Pith reviewed 2026-05-13 03:03 UTC · model grok-4.3

classification 💻 cs.IR cs.AIcs.LG

keywords electronic health recordsretrieval-augmented generationprototype-guided retrievalclinical predictionfoundation modelslongitudinal datainformation retrieval

0 comments

The pith

A prototype-guided retrieval module lets an EHR foundation model select the most relevant patient history chunks for each prediction task.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Electronic health records contain long, irregular patient histories that are difficult to use effectively for predictions because fixed windows or uniform aggregation can hide important signals. The paper introduces EHR-RAGp, a retrieval-augmented foundation model that adds a prototype-guided module to dynamically retrieve and align the most relevant historical data segments with the current task. This replaces crude summarization with task-specific context selection across heterogeneous clinical events. A sympathetic reader would care because more precise use of past records could raise accuracy on outcome forecasts without processing entire trajectories at once. The reported results show consistent gains over current top EHR models and further improvements when the approach is combined with existing foundation models.

Core claim

EHR-RAGp is a retrieval-augmented foundation model for electronic health records that incorporates a prototype-guided retrieval module. The module functions as an alignment mechanism that estimates the relevance of retrieved historical patient data chunks with respect to a given prediction task and steers the model toward the most informative context. Across multiple clinical prediction tasks, this yields consistent outperformance relative to state-of-the-art EHR foundation models and transformer-based baselines. The same module can be integrated with existing clinical foundation models to produce substantial additional performance gains while providing a scalable way to handle long-range,ir

What carries the argument

The prototype-guided retrieval module, which estimates relevance of historical data chunks to a prediction task and guides the model to the most informative context.

If this is right

EHR-RAGp consistently outperforms state-of-the-art EHR foundation models and transformer-based baselines on multiple clinical prediction tasks.
Integrating EHR-RAGp with existing clinical foundation models produces substantial performance gains.
The approach supplies a scalable and efficient way to leverage long-range clinical context instead of fixed windows or uniform aggregation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Dynamic relevance estimation may prove useful in other longitudinal datasets where the value of past events varies by task.
The modular design suggests the retrieval component could be added to existing clinical models without retraining the entire system from scratch.
Focusing computation on selected chunks rather than full histories could lower the cost of processing very long patient trajectories.

Load-bearing premise

The prototype-guided retrieval module must reliably identify and prioritize the most relevant historical chunks for each specific prediction task.

What would settle it

If an ablation study on the same clinical prediction benchmarks shows that removing the prototype guidance or replacing it with fixed-window or random retrieval produces no performance drop relative to the full model.

Figures

Figures reproduced from arXiv: 2605.12335 by Dana El Samad, Farah E. Shamout, Mariam Al-Omari, Saeed Shurrab.

**Figure 1.** Figure 1: Overview of the EHR-RAGp, including chunking of patient trajectories to build the EHR vector database (top panel), masked pre-training of the backbone encoder with chunked trajectories (bottom left), and end-to-end fine-tuning of the final architecture (bottom right). and integrate information scattered across the patient’s complete trajectory. The central assumption motivating this work is that retrieval … view at source ↗

**Figure 2.** Figure 2: UMAP projections of history embeddings and prototypes for (a) long length of stay and (b) [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

read the original abstract

Electronic Health Records (EHR) contain rich longitudinal patient information and are widely used in predictive modeling applications. However, effectively leveraging historical data remains challenging due to long trajectories, heterogeneous events, temporal irregularity, and the varying relevance of past clinical context. Existing approaches often rely on fixed windows or uniform aggregation, which can obscure clinically important signals. In this work, we introduce EHR-RAGp, a retrieval-augmented foundation model that dynamically integrates the most relevant patient history across diverse clinical event types. We propose a prototype-guided retrieval module that acts as an alignment mechanism and estimates the relevance of retrieved historical chunks with respect to a given prediction task, guiding the model towards the most informative context. Across multiple clinical prediction tasks, EHR-RAGp consistently outperforms state-of-the-art EHR foundation models and transformer-based baselines. Furthermore, integrating EHR-RAGp with existing clinical foundation models yields substantial performance gains. Overall, EHR-RAGp provides a scalable and efficient framework for leveraging long-range clinical context to improve downstream performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

EHR-RAGp adds prototype-guided retrieval to pull relevant chunks from long EHR histories, but the abstract gives no numbers or controls so the performance claims stay untested.

read the letter

The main point is that the paper targets a practical bottleneck in EHR modeling: long patient trajectories with uneven relevance across events. It proposes EHR-RAGp, which uses a prototype-guided retrieval module to select task-relevant history chunks instead of fixed windows or uniform pooling, then feeds them into a foundation model. The abstract also says the approach can be plugged into existing clinical models for further gains.

Referee Report

2 major / 1 minor

Summary. The paper introduces EHR-RAGp, a retrieval-augmented foundation model for electronic health records featuring a prototype-guided retrieval module that dynamically selects relevant historical patient data chunks by estimating their task-specific relevance. The central claims are that EHR-RAGp consistently outperforms state-of-the-art EHR foundation models and transformer-based baselines across multiple clinical prediction tasks, and that integrating EHR-RAGp with existing clinical foundation models produces substantial performance gains.

Significance. If the outperformance and integration benefits are empirically substantiated, the work would advance EHR predictive modeling by offering a scalable mechanism to exploit long, irregular, and heterogeneous patient trajectories without fixed windows or uniform aggregation. The plug-in integration capability could enable incremental improvements to existing clinical foundation models, with potential downstream benefits for healthcare decision support.

major comments (2)

Abstract: The claims of consistent outperformance and integration gains are made without any quantitative results, baseline comparisons, dataset descriptions, statistical tests, or ablation studies. This directly undermines evaluation of the central claim that the prototype-guided module drives the gains rather than factors such as context length or model capacity.
Prototype-guided retrieval module (as described in the abstract and methods): No implementation details are supplied for prototype construction, the similarity function used to score historical chunk relevance, or the training loss for the retrieval component. No retrieval-specific metrics (e.g., precision against task-relevant labels) or ablations versus plain RAG or fixed-window baselines are reported, leaving the load-bearing assumption that the module reliably acts as an alignment mechanism unverified.

minor comments (1)

Abstract: Including one or two concrete performance numbers or task names would help readers immediately gauge the scale of the reported improvements.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and have revised the manuscript to incorporate additional details and clarifications where appropriate.

read point-by-point responses

Referee: Abstract: The claims of consistent outperformance and integration gains are made without any quantitative results, baseline comparisons, dataset descriptions, statistical tests, or ablation studies. This directly undermines evaluation of the central claim that the prototype-guided module drives the gains rather than factors such as context length or model capacity.

Authors: We agree that the abstract, being a concise summary, does not include quantitative results or other specifics. The full manuscript provides these in the Experiments and Results sections, including performance tables, baseline comparisons, dataset descriptions, and ablation studies. To directly address the concern, we have revised the abstract to include key quantitative highlights (e.g., average AUC improvements and dataset names) and references to statistical significance and ablations, while preserving brevity. revision: yes
Referee: Prototype-guided retrieval module (as described in the abstract and methods): No implementation details are supplied for prototype construction, the similarity function used to score historical chunk relevance, or the training loss for the retrieval component. No retrieval-specific metrics (e.g., precision against task-relevant labels) or ablations versus plain RAG or fixed-window baselines are reported, leaving the load-bearing assumption that the module reliably acts as an alignment mechanism unverified.

Authors: We acknowledge that the original Methods section provided high-level descriptions but insufficient implementation specifics for reproducibility. We have expanded this section to detail prototype construction (via clustering on EHR embeddings), the similarity function (cosine similarity on task-conditioned representations), and the retrieval training loss (contrastive objective combined with the primary task loss). We have also added retrieval-specific metrics (e.g., precision@K on task-relevant chunks) and new ablation experiments versus plain RAG and fixed-window baselines to empirically verify the prototype guidance mechanism. revision: yes

Circularity Check

0 steps flagged

No circularity: model proposal is architectural, not derived from self-referential equations or fits

full rationale

The paper describes an EHR-RAGp architecture with a prototype-guided retrieval module but presents no equations, derivations, or first-principles results. Performance claims are empirical comparisons against baselines rather than predictions that reduce to fitted inputs or self-definitions. No self-citations, uniqueness theorems, or ansatzes are invoked in the provided text to justify core components. The approach builds on existing foundation models without reducing its central claims to its own outputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the model description implies standard deep-learning assumptions but none are stated or quantified.

pith-pipeline@v0.9.0 · 5494 in / 1271 out tokens · 117626 ms · 2026-05-13T03:03:34.801169+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
We propose a prototype-guided retrieval mechanism that acts as an alignment operator, estimating the relevance of retrieved historical patient chunks with respect to the prediction task... αi = −∑ πq(l) log πi(l) ... wi = softmax(−αi / Ts)
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean reality_from_one_distinction unclear
EHR-RAGp is a retrieval-augmented foundation model... four chunking methods... event-based, time-based, visit-level, and care-stage chunking

Reference graph

Works this paper leans on

89 extracted references · 89 canonical work pages · 6 internal anchors

[1]

A review of deep learning models and online healthcare databases for electronic health records and their use for health prediction.Artificial Intelligence Review, 57(9):249, 2024

Nurul Athirah Nasarudin, Fatma Al Jasmi, Richard O Sinnott, Nazar Zaki, Hany Al Ashwal, Elfadil A Mohamed, and Mohd Saberi Mohamad. A review of deep learning models and online healthcare databases for electronic health records and their use for health prediction.Artificial Intelligence Review, 57(9):249, 2024

work page 2024
[2]

Cao Xiao, Edward Choi, and Jimeng Sun. Opportunities and challenges in developing deep learning models using electronic health records data: a systematic review.Journal of the American Medical Informatics Association, 25(10):1419–1428, 2018

work page 2018
[3]

The shaky foundations of large language models and foundation models for electronic health records.npj digital medicine, 6(1):135, 2023

Michael Wornow, Yizhe Xu, Rahul Thapa, Birju Patel, Ethan Steinberg, Scott Fleming, Michael A Pfeffer, Jason Fries, and Nigam H Shah. The shaky foundations of large language models and foundation models for electronic health records.npj digital medicine, 6(1):135, 2023

work page 2023
[4]

A scoping review of using large language models (llms) to investigate electronic health records (ehrs).arXiv preprint arXiv:2405.03066, 2024

Lingyao Li, Jiayan Zhou, Zhenxiang Gao, Wenyue Hua, Lizhou Fan, Huizi Yu, Loni Hagen, Yongfeng Zhang, Themistocles L Assimes, Libby Hemphill, et al. A scoping review of using large language models (llms) to investigate electronic health records (ehrs).arXiv preprint arXiv:2405.03066, 2024

work page arXiv 2024
[5]

A comprehensive survey of electronic health record modeling: From deep learning approaches to large language models.arXiv preprint arXiv:2507.12774, 2025

Weijieying Ren, Jingxi Zhu, Zehao Liu, Tianxiang Zhao, and Vasant Honavar. A comprehensive survey of electronic health record modeling: From deep learning approaches to large language models.arXiv preprint arXiv:2507.12774, 2025

work page arXiv 2025
[6]

Context clues: Evaluating long context models for clinical prediction tasks on ehrs.arXiv preprint arXiv:2412.16178, 2024

Michael Wornow, Suhana Bedi, Miguel Angel Fuentes Hernandez, Ethan Steinberg, Jason Alan Fries, Christopher Ré, Sanmi Koyejo, and Nigam H Shah. Context clues: Evaluating long context models for clinical prediction tasks on ehrs.arXiv preprint arXiv:2412.16178, 2024

work page arXiv 2024
[7]

Shaojie Zhong, Li Rong Wang, Zhuoxuan Zhan, Yih Yng Ng, and Xiuyi Fan. A hybrid approach for irregular-time series prediction using electronic health records: an intensive care unit mortality case study.ACM Transactions on Computing for Healthcare, 6(4):1–33, 2025

work page 2025
[8]

Ehrshot: An ehr benchmark for few-shot evaluation of foundation models.Advances in Neural Information Processing Systems, 36:67125–67137, 2023

Michael Wornow, Rahul Thapa, Ethan Steinberg, Jason Fries, and Nigam Shah. Ehrshot: An ehr benchmark for few-shot evaluation of foundation models.Advances in Neural Information Processing Systems, 36:67125–67137, 2023

work page 2023
[9]

Medmod: Multimodal benchmark for medical prediction tasks with electronic health records and chest x-ray scans.Proceedings of Machine Learning Research, 287:1–23, 2025

Shaza Elsharief, Saeed Shurrab, Baraa Al Jorf, L Julián Lechuga López, and Farah E Shamout. Medmod: Multimodal benchmark for medical prediction tasks with electronic health records and chest x-ray scans.Proceedings of Machine Learning Research, 287:1–23, 2025

work page 2025
[10]

Ehrmamba: Towards generalizable and scalable foundation models for electronic health records.arXiv preprint arXiv:2405.14567, 2024

Adibvafa Fallahpour, Mahshid Alinoori, Wenqian Ye, Xu Cao, Arash Afkanpour, and Amrit Krishnan. Ehrmamba: Towards generalizable and scalable foundation models for electronic health records.arXiv preprint arXiv:2405.14567, 2024

work page arXiv 2024
[11]

Core-behrt: A carefully optimized and rigorously evaluated behrt.arXiv preprint arXiv:2404.15201, 2024

Mikkel Odgaard, Kiril Vadimovic Klein, Sanne Møller Thysen, Espen Jimenez-Solem, Martin Sillesen, and Mads Nielsen. Core-behrt: A carefully optimized and rigorously evaluated behrt. arXiv preprint arXiv:2404.15201, 2024

work page arXiv 2024
[12]

Unihpf: Universal healthcare predictive framework with zero domain knowledge.arXiv preprint arXiv:2211.08082, 2022

Kyunghoon Hur, Jungwoo Oh, Junu Kim, Jiyoun Kim, Min Jae Lee, Eunbyeol Cho, Seong-Eun Moon, Young-Hak Kim, and Edward Choi. Unihpf: Universal healthcare predictive framework with zero domain knowledge.arXiv preprint arXiv:2211.08082, 2022. 10

work page arXiv 2022
[13]

Genhpf: General healthcare predictive framework for multi-task multi-source learning.IEEE Journal of Biomedical and Health Informatics, 28(1):502–513, 2023

Kyunghoon Hur, Jungwoo Oh, Junu Kim, Jiyoun Kim, Min Jae Lee, Eunbyeol Cho, Seong- Eun Moon, Young-Hak Kim, Louis Atallah, and Edward Choi. Genhpf: General healthcare predictive framework for multi-task multi-source learning.IEEE Journal of Biomedical and Health Informatics, 28(1):502–513, 2023

work page 2023
[14]

Transformehr: transformer-based encoder-decoder generative model to enhance prediction of disease out- comes using electronic health records.Nature communications, 14(1):7857, 2023

Zhichao Yang, Avijit Mitra, Weisong Liu, Dan Berlowitz, and Hong Yu. Transformehr: transformer-based encoder-decoder generative model to enhance prediction of disease out- comes using electronic health records.Nature communications, 14(1):7857, 2023

work page 2023
[15]

Predicting clinical states in individual patients.Annals of Internal Medicine, 125(5):406–412, 1996

Leonard E Braitman and Frank Davidoff. Predicting clinical states in individual patients.Annals of Internal Medicine, 125(5):406–412, 1996

work page 1996
[16]

On clinical event prediction in patient treatment trajectory using longitudinal electronic health records.IEEE Journal of Biomedical and Health Informatics, 24(7):2053–2063, 2019

Huilong Duan, Zhoujian Sun, Wei Dong, Kunlun He, and Zhengxing Huang. On clinical event prediction in patient treatment trajectory using longitudinal electronic health records.IEEE Journal of Biomedical and Health Informatics, 24(7):2053–2063, 2019

work page 2053
[17]

Retrieval-augmented generation for knowledge-intensive nlp tasks.Advances in neural information processing systems, 33:9459–9474, 2020

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. Retrieval-augmented generation for knowledge-intensive nlp tasks.Advances in neural information processing systems, 33:9459–9474, 2020

work page 2020
[18]

Retrieval-augmented generation for natural language processing: A survey.arXiv preprint arXiv:2407.13193, 2024

Shangyu Wu, Ying Xiong, Yufei Cui, Haolun Wu, Can Chen, Ye Yuan, Lianming Huang, Xue Liu, Tei-Wei Kuo, Nan Guan, et al. Retrieval-augmented generation for natural language processing: A survey.arXiv preprint arXiv:2407.13193, 2024

work page arXiv 2024
[19]

Evaluation of retrieval-augmented generation: A survey

Hao Yu, Aoran Gan, Kai Zhang, Shiwei Tong, Qi Liu, and Zhaofeng Liu. Evaluation of retrieval-augmented generation: A survey. InCCF Conference on Big Data, pages 102–120. Springer, 2024

work page 2024
[20]

Retrieval-augmented generation for ai-generated content: A survey

Penghao Zhao, Hailin Zhang, Qinhan Yu, Zhengren Wang, Yunteng Geng, Fangcheng Fu, Ling Yang, Wentao Zhang, Jie Jiang, and Bin Cui. Retrieval-augmented generation for ai-generated content: A survey.arXiv preprint arXiv:2402.19473, 2024

work page arXiv 2024
[21]

Retrieval augmented generation and understanding in vision: A survey and new outlook.arXiv preprint arXiv:2503.18016, 2025

Xu Zheng, Ziqiao Weng, Yuanhuiyi Lyu, Lutao Jiang, Haiwei Xue, Bin Ren, Danda Paudel, Nicu Sebe, Luc Van Gool, and Xuming Hu. Retrieval augmented generation and understanding in vision: A survey and new outlook.arXiv preprint arXiv:2503.18016, 2025

work page arXiv 2025
[22]

Long short-term memory.Neural computation, 9(8):1735–1780, 1997

Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory.Neural computation, 9(8):1735–1780, 1997

work page 1997
[23]

On the properties of neural machine translation: Encoder-decoder approaches.arXiv preprint arXiv:1409.1259, 2014

Kyunghyun Cho, Bart Van Merriënboer, Dzmitry Bahdanau, and Yoshua Bengio. On the properties of neural machine translation: Encoder-decoder approaches.arXiv preprint arXiv:1409.1259, 2014

work page arXiv 2014
[24]

Recurrent neural networks for multivariate time series with missing values.Scientific reports, 8(1):6085, 2018

Zhengping Che, Sanjay Purushotham, Kyunghyun Cho, David Sontag, and Yan Liu. Recurrent neural networks for multivariate time series with missing values.Scientific reports, 8(1):6085, 2018

work page 2018
[25]

Multitask learning and benchmarking with clinical time series data.Scientific data, 6(1):96, 2019

Hrayr Harutyunyan, Hrant Khachatrian, David C Kale, Greg Ver Steeg, and Aram Galstyan. Multitask learning and benchmarking with clinical time series data.Scientific data, 6(1):96, 2019

work page 2019
[26]

Medfuse: Multi-modal fusion with clini- cal time-series data and chest x-ray images

Nasir Hayat, Krzysztof J Geras, and Farah E Shamout. Medfuse: Multi-modal fusion with clini- cal time-series data and chest x-ray images. InMachine Learning for Healthcare Conference, pages 479–503. PMLR, 2022

work page 2022
[27]

Bert: Pre-training of deep bidirectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186, 2019

work page 2019
[28]

Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019. 11

work page 2019
[29]

Behrt: transformer for electronic health records.Scientific reports, 10(1):7155, 2020

Yikuan Li, Shishir Rao, José Roberto Ayala Solares, Abdelaali Hassaine, Rema Ramakrishnan, Dexter Canoy, Yajie Zhu, Kazem Rahimi, and Gholamreza Salimi-Khorshidi. Behrt: transformer for electronic health records.Scientific reports, 10(1):7155, 2020

work page 2020
[30]

Yikuan Li, Mohammad Mamouei, Gholamreza Salimi-Khorshidi, Shishir Rao, Abdelaali Hassaine, Dexter Canoy, Thomas Lukasiewicz, and Kazem Rahimi. Hi-behrt: hierarchical transformer-based model for accurate prediction of clinical events using multimodal longitudinal electronic health records.IEEE journal of biomedical and health informatics, 27(2):1106–1117, 2022

work page 2022
[31]

Cehr-bert: Incorporating temporal information from structured ehr data to improve prediction tasks

Chao Pang, Xinzhuo Jiang, Krishna S Kalluri, Matthew Spotnitz, RuiJun Chen, Adler Perotte, and Karthik Natarajan. Cehr-bert: Incorporating temporal information from structured ehr data to improve prediction tasks. InMachine Learning for Health, pages 239–260. PMLR, 2021

work page 2021
[32]

Med-bert: pretrained contex- tualized embeddings on large-scale structured electronic health records for disease prediction

Laila Rasmy, Yang Xiang, Ziqian Xie, Cui Tao, and Degui Zhi. Med-bert: pretrained contex- tualized embeddings on large-scale structured electronic health records for disease prediction. NPJ digital medicine, 4(1):86, 2021

work page 2021
[33]

Foundation model of electronic medical records for adaptive risk estimation.GigaScience, 14:giaf107, 2025

Pawel Renc, Michal K Grzeszczyk, Nassim Oufattole, Deirdre Goode, Yugang Jia, Szymon Bieganski, Matthew BA McDermott, Jaroslaw Was, Anthony E Samir, Jonathan W Cunningham, et al. Foundation model of electronic medical records for adaptive risk estimation.GigaScience, 14:giaf107, 2025

work page 2025
[34]

Cehr-gpt: Generating electronic health records with chronological patient timelines.arXiv preprint arXiv:2402.04400, 2024

Chao Pang, Xinzhuo Jiang, Nishanth Parameshwar Pavinkurve, Krishna S Kalluri, Elise L Minto, Jason Patterson, Linying Zhang, George Hripcsak, Gamze Gürsoy, Noémie Elhadad, et al. Cehr-gpt: Generating electronic health records with chronological patient timelines.arXiv preprint arXiv:2402.04400, 2024

work page arXiv 2024
[35]

Generative medical event models improve with scale.arXiv preprint arXiv:2508.12104, 2025

Shane Waxler, Paul Blazek, Davis White, Daniel Sneider, Kevin Chung, Mani Nagarathnam, Patrick Williams, Hank V oeller, Karen Wong, Matthew Swanhorst, et al. Generative medical event models improve with scale.arXiv preprint arXiv:2508.12104, 2025

work page arXiv 2025
[36]

From ehrs to patient pathways: Scalable modeling of longitudinal health trajectories with llms.arXiv preprint arXiv:2506.04831, 2025

Chantal Pellegrini, Ege Özsoy, David Bani-Harouni, Matthias Keicher, and Nassir Navab. From ehrs to patient pathways: Scalable modeling of longitudinal health trajectories with llms.arXiv preprint arXiv:2506.04831, 2025

work page arXiv 2025
[37]

Clinical decision support using pseudo-notes from multiple streams of ehr data.npj Digital Medicine, 8(1):394, 2025

Simon A Lee, Sujay Jain, Alex Chen, Kyoka Ono, Arabdha Biswas, Ákos Rudas, Jennifer Fang, and Jeffrey N Chiang. Clinical decision support using pseudo-notes from multiple streams of ehr data.npj Digital Medicine, 8(1):394, 2025

work page 2025
[38]

Improving language models by retrieving from trillions of tokens

Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George Bm Van Den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, et al. Improving language models by retrieving from trillions of tokens. InInternational conference on machine learning, pages 2206–2240. PMLR, 2022

work page 2022
[39]

Active retrieval augmented generation

Zhengbao Jiang, Frank F Xu, Luyu Gao, Zhiqing Sun, Qian Liu, Jane Dwivedi-Yu, Yiming Yang, Jamie Callan, and Graham Neubig. Active retrieval augmented generation. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 7969–7992, 2023

work page 2023
[40]

Replug: Retrieval-augmented black-box language models

Weijia Shi, Sewon Min, Michihiro Yasunaga, Minjoon Seo, Richard James, Mike Lewis, Luke Zettlemoyer, and Wen-tau Yih. Replug: Retrieval-augmented black-box language models. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 8371–8384, 2024

work page 2024
[41]

Question-answering based summa- rization of electronic health records using retrieval augmented generation.arXiv preprint arXiv:2401.01469, 2024

Walid Saba, Suzanne Wendelken, and James Shanahan. Question-answering based summa- rization of electronic health records using retrieval augmented generation.arXiv preprint arXiv:2401.01469, 2024

work page arXiv 2024
[42]

Rationale-guided retrieval augmented generation for medical question answering

Jiwoong Sohn, Yein Park, Chanwoong Yoon, Sihyeon Park, Hyeon Hwang, Mujeen Sung, Hyunjae Kim, and Jaewoo Kang. Rationale-guided retrieval augmented generation for medical question answering. InProceedings of the 2025 Conference of the Nations of the Americas 12 Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1...

work page 2025
[43]

Grounding large language models in clinical evidence: A retrieval-augmented generation system for querying uk nice clinical guidelines.arXiv preprint arXiv:2510.02967, 2025

Matthew Lewis, Samuel Thio, Richard JB Dobson, and Spiros Denaxas. Grounding large language models in clinical evidence: A retrieval-augmented generation system for querying uk nice clinical guidelines.arXiv preprint arXiv:2510.02967, 2025

work page arXiv 2025
[44]

Agentic memory-augmented retrieval and evidence grounding for medical question-answering tasks

Shuyue Jia, Subhrangshu Bit, Varuna H Jasodanand, Yi Liu, and Vijaya B Kolachalama. Agentic memory-augmented retrieval and evidence grounding for medical question-answering tasks. medRxiv, pages 2025–08, 2025

work page 2025
[45]

A retrieval-augmented knowledge mining method with deep thinking llms for biomedical research and clinical support

Yichun Feng, Jiawei Wang, Ruikun He, Lu Zhou, and Yixue Li. A retrieval-augmented knowledge mining method with deep thinking llms for biomedical research and clinical support. GigaScience, 14:giaf109, 2025

work page 2025
[46]

Ontologyrag: better and faster biomedical code mapping with retrieval-augmented generation (rag) leveraging ontology knowl- edge graphs and large language models

Hui Feng, Yuntzu Yin, Emiliano Reynares, and Jay Nanavati. Ontologyrag: better and faster biomedical code mapping with retrieval-augmented generation (rag) leveraging ontology knowl- edge graphs and large language models. InInternational Workshop on Knowledge-Enhanced Information Retrieval, pages 71–86. Springer, 2025

work page 2025
[47]

Generating patient cohorts from electronic health records using two-step retrieval-augmented text-to-sql generation.arXiv preprint arXiv:2502.21107, 2025

Angelo Ziletti and Leonardo D’Ambrosi. Generating patient cohorts from electronic health records using two-step retrieval-augmented text-to-sql generation.arXiv preprint arXiv:2502.21107, 2025

work page arXiv 2025
[48]

A survey on retrieval-augmentation generation (rag) models for healthcare applications.Neural Computing and Applications, 37(33):28191–28267, 2025

Mohamed Abo El-Enen, Sally Saad, and Taymoor Nazmy. A survey on retrieval-augmentation generation (rag) models for healthcare applications.Neural Computing and Applications, 37(33):28191–28267, 2025

work page 2025
[49]

Retrieval augmented generation for large language models in healthcare: A systematic review.PLOS Digital Health, 4(6):e0000877, 2025

Lameck Mbangula Amugongo, Pietro Mascheroni, Steven Brooks, Stefan Doering, and Jan Seidel. Retrieval augmented generation for large language models in healthcare: A systematic review.PLOS Digital Health, 4(6):e0000877, 2025

work page 2025
[50]

Emerge: Enhancing multimodal electronic health records predictive modeling with retrieval-augmented generation

Yinghao Zhu, Changyu Ren, Zixiang Wang, Xiaochen Zheng, Shiyun Xie, Junlan Feng, Xi Zhu, Zhoujun Li, Liantao Ma, and Chengwei Pan. Emerge: Enhancing multimodal electronic health records predictive modeling with retrieval-augmented generation. InProceedings of the 33rd ACM International Conference on Information and Knowledge Management, pages 3549–3559, 2024

work page 2024
[51]

Ram-ehr: Retrieval augmentation meets clinical predictions on electronic health records

Ran Xu, Wenqi Shi, Yue Yu, Yuchen Zhuang, Bowen Jin, May Dongmei Wang, Joyce Ho, and Carl Yang. Ram-ehr: Retrieval augmentation meets clinical predictions on electronic health records. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 754–765, 2024

work page 2024
[52]

Realm: Rag-driven enhancement of multimodal electronic health records analysis via large language models,

Yinghao Zhu, Changyu Ren, Shiyun Xie, Shukai Liu, Hangyuan Ji, Zixiang Wang, Tao Sun, Long He, Zhoujun Li, Xi Zhu, et al. Realm: Rag-driven enhancement of multimodal electronic health records analysis via large language models.arXiv preprint arXiv:2402.07016, 2024

work page arXiv 2024
[53]

Improving hospital risk prediction with knowledge- augmented multimodal ehr modeling.arXiv preprint arXiv:2508.01970, 2025

Rituparna Datta, Jiaming Cui, Zihan Guan, Vishal G Reddy, Joshua C Eby, Gregory Madden, Rupesh Silwal, and Anil Vullikanti. Improving hospital risk prediction with knowledge- augmented multimodal ehr modeling.arXiv preprint arXiv:2508.01970, 2025

work page arXiv 2025
[54]

General-purpose retrieval-enhanced medical prediction model using near-infinite history.arXiv preprint arXiv:2310.20204, 2023

Junu Kim, Chaeeun Shim, Bosco Seong Kyu Yang, Chami Im, Sung Yoon Lim, Han-Gil Jeong, and Edward Choi. General-purpose retrieval-enhanced medical prediction model using near-infinite history.arXiv preprint arXiv:2310.20204, 2023

work page arXiv 2023
[55]

Masked siamese networks for label-efficient learning

Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Florian Bordes, Pascal Vincent, Armand Joulin, Mike Rabbat, and Nicolas Ballas. Masked siamese networks for label-efficient learning. InEuropean conference on computer vision, pages 456–473. Springer, 2022

work page 2022
[56]

Mimic-iv.PhysioNet

Alistair Johnson, Lucas Bulgarelli, Tom Pollard, Steven Horng, Leo Anthony Celi, and Roger Mark. Mimic-iv.PhysioNet. Available online at: https://physionet. org/content/mimiciv/1.0/(accessed August 23, 2021), pages 49–55, 2020. 13

work page 2021
[57]

Mimic-iv, a freely accessible electronic health record dataset.Scientific data, 10(1):1, 2023

Alistair EW Johnson, Lucas Bulgarelli, Lu Shen, Alvin Gayles, Ayad Shammout, Steven Horng, Tom J Pollard, Sicheng Hao, Benjamin Moody, Brian Gow, et al. Mimic-iv, a freely accessible electronic health record dataset.Scientific data, 10(1):1, 2023

work page 2023
[58]

Unifying heterogeneous electronic health records systems via text-based code embedding

Kyunghoon Hur, Jiyoung Lee, Jungwoo Oh, Wesley Price, Younghak Kim, and Edward Choi. Unifying heterogeneous electronic health records systems via text-based code embedding. In Conference on Health, Inference, and Learning, pages 183–203. PMLR, 2022

work page 2022
[59]

Smarter, better, faster, longer: A modern bidirectional encoder for fast, memory efficient, and long context finetuning and inference

Benjamin Warner, Antoine Chaffin, Benjamin Clavié, Orion Weller, Oskar Hallström, Said Taghadouini, Alexis Gallagher, Raja Biswas, Faisal Ladhak, Tom Aarsen, et al. Smarter, better, faster, longer: A modern bidirectional encoder for fast, memory efficient, and long context finetuning and inference. InProceedings of the 63rd Annual Meeting of the Associati...

work page 2025
[60]

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. Roberta: A robustly optimized bert pretraining approach.arXiv preprint arXiv:1907.11692, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1907
[61]

Longformer: The Long-Document Transformer

Iz Beltagy, Matthew E Peters, and Arman Cohan. Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2004
[62]

Big bird: Transformers for longer sequences.Advances in neural information processing systems, 33:17283–17297, 2020

Manzil Zaheer, Guru Guruganesh, Kumar Avinava Dubey, Joshua Ainslie, Chris Alberti, Santi- ago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, et al. Big bird: Transformers for longer sequences.Advances in neural information processing systems, 33:17283–17297, 2020

work page 2020
[63]

Roformer: Enhanced transformer with rotary position embedding.Neurocomputing, 568:127063, 2024

Jianlin Su, Murtadha Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, and Yunfeng Liu. Roformer: Enhanced transformer with rotary position embedding.Neurocomputing, 568:127063, 2024

work page 2024
[64]

On the variety of methods for calculating confidence intervals by bootstrapping.Journal of Animal Ecology, 84(4):892–897, 2015

Marie-Therese Puth, Markus Neuhäuser, and Graeme D Ruxton. On the variety of methods for calculating confidence intervals by bootstrapping.Journal of Animal Ecology, 84(4):892–897, 2015

work page 2015
[65]

Qwen2 Technical Report

An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, Jin Xu, Jingren Zhou, Jinze Bai, Jinzheng He, Junyang Lin, Kai Dang, Keming Lu, Keqin Chen, Kexin Yang, Mei Li, Mingfeng ...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[66]

Mistral 7B

Albert Q Jiang, A Sablayrolles, A Mensch, C Bamford, D Singh Chaplot, Ddl Casas, F Bressand, G Lengyel, G Lample, L Saulnier, et al. Mistral 7b. arxiv.arXiv preprint arXiv:2310.06825, 10:3, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[67]

MedGemma 1.5 Technical Report

Andrew Sellergren, Chufan Gao, Fereshteh Mahvar, Timo Kohlberger, Fayaz Jamil, Madeleine Traverse, Alberto Tono, Bashir Sadjad, Lin Yang, Charles Lau, et al. Medgemma 1.5 technical report.arXiv preprint arXiv:2604.05081, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[68]

Biomistral: A collection of open-source pretrained large language models for medical domains, 2024

Yanis Labrak, Adrien Bazoge, Emmanuel Morin, Pierre-Antoine Gourraud, Mickael Rouvier, and Richard Dufour. Biomistral: A collection of open-source pretrained large language models for medical domains, 2024

work page 2024
[69]

Medical event data standard (meds): Facilitating machine learning for health

Bert Arnrich, Edward Choi, Jason Alan Fries, Matthew BA McDermott, Jungwoo Oh, Tom Pollard, Nigam Shah, Ethan Steinberg, Michael Wornow, and Robin van de Water. Medical event data standard (meds): Facilitating machine learning for health. InICLR 2024 Workshop on Learning from Time Series For Health, pages 03–08, 2024

work page 2024
[70]

Meds: Building models and tools in a reproducible health ai ecosystem

Matthew BA McDermott, Justin Xu, Teya S Bergamaschi, Hyewon Jeong, Simon A Lee, Nassim Oufattole, Patrick Rockenschaub, Kamil˙e Stankeviˇci¯ut˙e, Ethan Steinberg, Jimeng Sun, et al. Meds: Building models and tools in a reproducible health ai ecosystem. InProceedings of 14 the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 2, pages 6...

work page 2025
[71]

Time2Vec: Learning a Vector Representation of Time

Seyed Mehran Kazemi, Rishab Goel, Sepehr Eghbali, Janahan Ramanan, Jaspreet Sahota, Sanjay Thakur, Stella Wu, Cathal Smyth, Pascal Poupart, and Marcus Brubaker. Time2vec: Learning a vector representation of time.arXiv preprint arXiv:1907.05321, 2019

work page Pith review arXiv 1907
[72]

Machine learning based clinical prediction model for 1-year mortality in sepsis patients with atrial fibrillation

Hong Meng, Liang Guo, Yucheng Pan, Bin Kong, Wei Shuai, and He Huang. Machine learning based clinical prediction model for 1-year mortality in sepsis patients with atrial fibrillation. Heliyon, 10(21), 2024

work page 2024
[73]

Developing machine learning models for prediction of mortality in the medical intensive care unit.Computer Methods and Programs in Biomedicine, 216:106663, 2022

Beatriz Nistal-Nuño. Developing machine learning models for prediction of mortality in the medical intensive care unit.Computer Methods and Programs in Biomedicine, 216:106663, 2022

work page 2022
[74]

Unfolding physiological state: Mortality modelling in intensive care units

Marzyeh Ghassemi, Tristan Naumann, Finale Doshi-Velez, Nicole Brimmer, Rohit Joshi, Anna Rumshisky, and Peter Szolovits. Unfolding physiological state: Mortality modelling in intensive care units. InProceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 75–84, 2014

work page 2014
[75]

Early prediction of long hospital stay for intensive care units readmission patients using medication information.Computers in biology and medicine, 174:108451, 2024

Min Zhang and Tsung-Ting Kuo. Early prediction of long hospital stay for intensive care units readmission patients using medication information.Computers in biology and medicine, 174:108451, 2024

work page 2024
[76]

Khalid Alghatani, Nariman Ammar, Abdelmounaam Rezgui, and Arash Shaban-Nejad. Predict- ing intensive care unit length of stay and mortality using patient vital signs: machine learning model development and validation.JMIR medical informatics, 9(5):e21347, 2021

work page 2021
[77]

Huiling Hu, Jiashuai Li, Hui Ge, Bilin Wu, Tingting Feng, Xue Wu, and Xuanna Wu. Prognostic models for unplanned intensive care unit readmission risk prediction: A systematic review and meta-analysis based on hsroc model.Nursing in critical care, 30(2):e13306, 2025

work page 2025
[78]

Machine learning methods for hospital readmission prediction: systematic analysis of literature.Journal of Reliable Intelligent Environments, 8(1):49–66, 2022

Talen Chen, Samaneh Madanian, David Airehrour, and Marianne Cherrington. Machine learning methods for hospital readmission prediction: systematic analysis of literature.Journal of Reliable Intelligent Environments, 8(1):49–66, 2022

work page 2022
[79]

Explainable machine learning for icu readmission prediction.arXiv preprint arXiv:2309.13781, 2023

Alex GC de Sá, Daniel Gould, Anna Fedyukova, Mitchell Nicholas, Lucy Dockrell, Calvin Fletcher, David Pilcher, Daniel Capurro, David B Ascher, Khaled El-Khawas, et al. Explainable machine learning for icu readmission prediction.arXiv preprint arXiv:2309.13781, 2023

work page arXiv 2023
[80]

Decoupled Weight Decay Regularization

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.arXiv preprint arXiv:1711.05101, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

Showing first 80 references.