Sequence models reveal diagnosis accumulation pathways beyond comorbidity burden in population-scale hospital data

Katharina Ledebur; Mitja Devetak; Peter Klimek

arxiv: 2605.30962 · v1 · pith:UMLJMTJTnew · submitted 2026-05-29 · ⚛️ physics.soc-ph

Sequence models reveal diagnosis accumulation pathways beyond comorbidity burden in population-scale hospital data

Katharina Ledebur , Mitja Devetak , Peter Klimek This is my paper

Pith reviewed 2026-06-28 20:13 UTC · model grok-4.3

classification ⚛️ physics.soc-ph

keywords diagnosis sequencescontrastive transformercomorbidity burdenlongitudinal hospital datadisease predictionevent-free survivaldisease accumulation

0 comments

The pith

Longitudinal hospital diagnosis sequences contain predictive information beyond age, sex, and comorbidity burden.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper asks whether the timing, sequence, and pace of diagnoses in hospital records hold information not captured by standard cross-sectional comorbidity measures such as the Elixhauser index. It trains a visit-level contrastive transformer on 13 years of Austrian inpatient data covering millions of patients to produce embeddings that incorporate diagnosis order and inter-admission intervals. These embeddings yield modest AUC gains over comorbidity-only models for 93 of 131 incident disease-block outcomes, concentrated in mental, musculoskeletal, nervous system, and metabolic disorders. The embeddings also identify patients with shorter event-free survival, linking the added signal to the breadth, recency, and pace of prior disease accumulation.

Core claim

A visit-level contrastive transformer encodes diagnosis sequences and inter-admission timing into patient-history embeddings that improve prediction of 93 of 131 incident ICD-10 disease blocks over Elixhauser-based models, with the added signal concentrated in the breadth, recency, and pace of prior disease accumulation as measured by reduced event-free survival.

What carries the argument

visit-level contrastive transformer that encodes diagnosis sequences and inter-admission timing into patient-history embeddings

If this is right

Embeddings improve prediction for 93 of 131 incident disease blocks with a median AUC gain of 0.006.
Gains concentrate in mental, musculoskeletal, nervous system, and metabolic disorders.
Patients with high residual risk have 132-183 fewer event-free days over five years.
Event rates for high-residual-risk patients match those of low-residual-risk patients more than a decade older.
The embedding signal tracks the breadth, recency, and pace of prior disease accumulation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could be applied to outpatient or claims data to test whether sequence effects persist outside inpatient settings.
Residual risk scores derived from embeddings might support targeted monitoring for patients showing rapid accumulation patterns.
Shuffling diagnosis order in retraining experiments would isolate the contribution of sequence versus simple count of conditions.
Similar embeddings could be compared across countries to examine whether accumulation pace varies by healthcare system.

Load-bearing premise

The embeddings from the contrastive transformer capture information about diagnosis sequences and timing that is not already contained in age, sex, and the Elixhauser comorbidity index.

What would settle it

A model that adds the embeddings to a baseline already containing the Elixhauser index, age, and sex shows no AUC improvement, or randomizing the order of diagnoses within patient histories removes the observed gains.

Figures

Figures reproduced from arXiv: 2605.30962 by Katharina Ledebur, Mitja Devetak, Peter Klimek.

**Figure 1.** Figure 1: Overview of the visit transformer framework with contrastive self-supervised learning and downstream prediction models. (A) Visit-level transformer architecture applied to Austrian nationwide hospital claims data (1997-2009). Each hospital visit is represented by ICD-10 diagnosis embeddings and time since previous admission, processed through a four-layer BERT-style transformer to produce a patient-level e… view at source ↗

**Figure 2.** Figure 2: Prediction of incident disease blocks from learned patient-history embeddings. Each point represents one incident ICD-10 disease-block outcome among patients free of the respective block at the 2010 landmark. The demographic model included age and sex; the comorbidity model included age, sex, and Elixhauser score; the embedding model included age, sex, and the learned patient-history embedding; and the com… view at source ↗

**Figure 3.** Figure 3: Embedding residual risk separates future event-free trajectories among patients with similar comorbidity-model risk. Embedding residual risk was defined as the difference between embedding-model and comorbidity-model predicted risk for second incident ICD-10 disease block or death. Residual-risk quintiles were assigned within strata of age band, sex, and comorbidity-model predicted-risk decile in the held-… view at source ↗

**Figure 4.** Figure 4: Embedding residual risk identifies heterogeneous future morbidity among patients with similar comorbiditymodel risk. Embedding residual risk was defined as the difference between embedding-model and comorbidity-model predicted risk for second incident ICD-10 disease block or death. Residual-risk quintiles were assigned within age, sex, and comorbidity-model risk strata in the held-out test set. (A) Observ… view at source ↗

**Figure 5.** Figure 5: Clinical structure of the learned patient-history embedding. Patient-history embeddings were projected onto principal components (PCs) to characterize the structure learned by the self-supervised encoder. (A) Smoothed binned maps of the first two PCs show mean age at landmark (2010), mean Elixhauser comorbidity score, mean hospital visit count, and in-hospital death during follow-up across embedding space.… view at source ↗

read the original abstract

Aging trajectories vary among individuals of similar age and disease burden. Comorbidity indices, e.g. the Elixhauser index, summarize conditions cross-sectionally, but discard the timing, sequence, and pace of morbidity accumulation. Here we ask whether longitudinal hospital diagnosis histories contain information beyond age, sex, and comorbidity burden, and where it is concentrated. Using 13 years of Austrian inpatient data covering 7.4 million patients, we trained a visit-level contrastive transformer to encode diagnosis sequences and inter-admission timing into patient-history embeddings. In a downstream cohort of 1.7 million individuals, embeddings improved prediction over the Elixhauser-based comorbidity model for 93 of 131 incident ICD-10 disease-block outcomes, with a modest median AUC gain of 0.006. Gains concentrated in mental, musculoskeletal, nervous system, and metabolic disorders. We then evaluated event-free survival, defined as remaining alive without accumulating a second unrecorded ICD-10 disease block. The embedding model achieved an AUC of 0.726 versus 0.722 for the comorbidity model. However, among patients with similar age, sex, and comorbidity-model risk, those assigned high residual risk had 132--183 fewer event-free days over five years and observed event rates comparable to low-residual-risk patients more than a decade older. Together, these findings link the embedding's signal to the breadth, recency, and pace of prior disease accumulation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Small AUC gains from a contrastive transformer on diagnosis sequences over Elixhauser, but no ablation isolates timing or order from richer comorbidity encoding.

read the letter

The paper finds modest predictive lifts from visit-level contrastive transformer embeddings of longitudinal hospital diagnoses compared with the Elixhauser index. On 1.7 million patients the embeddings improve AUC for 93 of 131 incident disease-block outcomes with a median gain of 0.006, concentrated in mental, musculoskeletal, nervous-system, and metabolic blocks, and they produce a small survival AUC edge plus a residual-risk split that tracks 132-183 fewer event-free days.

The scale of the Austrian inpatient records (7.4 million patients, 13 years) and the concrete breakdown by disease category are the clearest contributions. The residual-risk survival analysis also gives a direct link between the embedding signal and breadth, recency, and pace of prior accumulation.

The central weakness is the missing ablation. Nothing in the reported work tests whether a permutation-invariant aggregator over the same diagnoses would recover similar gains; without that comparison the source of the extra signal stays ambiguous. The effect sizes themselves are small enough that even statistically reliable differences may not shift practice.

This work is for health-informatics groups that already use administrative data for risk modeling and want category-level diagnostics on where sequence information adds anything. A reader building or refining comorbidity-adjusted predictors will find the numbers and the residual-risk framing useful.

The paper engages the existing comorbidity literature directly and reports falsifiable counts, so it deserves a serious referee. I would send it for review and specifically request the sequence-versus-set ablation plus confidence intervals on the AUC differences.

Referee Report

2 major / 2 minor

Summary. The paper trains a visit-level contrastive transformer on 13 years of Austrian inpatient records (7.4M patients) to produce patient-history embeddings from diagnosis sequences and inter-admission intervals. These embeddings are evaluated in a 1.7M-patient downstream cohort and reported to improve prediction of 93/131 incident ICD-10 disease-block outcomes over an Elixhauser comorbidity baseline (median AUC gain 0.006), with additional gains in event-free survival (AUC 0.726 vs 0.722) that are linked to the breadth, recency, and pace of prior morbidity accumulation.

Significance. If the residual predictive signal is shown to originate from temporal ordering and timing rather than richer cross-sectional encoding of the same diagnoses, the result would demonstrate that sequence models can extract prognostic information beyond standard comorbidity indices in large-scale hospital data. The modest effect sizes and concentration in specific disease categories (mental, musculoskeletal, nervous, metabolic) limit immediate clinical translation but could inform targeted longitudinal risk modeling.

major comments (2)

[Abstract] Abstract and implied Methods: the central claim that embeddings capture information 'beyond' the Elixhauser index requires an ablation that replaces the sequential contrastive transformer with a permutation-invariant aggregator (e.g., mean-pooled diagnosis embeddings or set transformer). Without this control, the reported median AUC lift of 0.006 cannot be attributed to sequence or timing rather than higher-capacity encoding of the identical past diagnoses.
[Abstract] Abstract/Results: the modest median AUC gain (0.006) and survival AUC lift (0.004) are presented without reported confidence intervals, statistical tests for improvement, or assessment of calibration; given the sample size of 1.7M, even small gains may be statistically detectable yet clinically marginal, weakening the link to 'breadth, recency, and pace'.

minor comments (2)

Clarify how inter-admission timing is tokenized and whether the contrastive objective explicitly penalizes or rewards temporal order.
Specify the exact train/validation split between the embedding pre-training cohort and the 1.7M downstream cohort to rule out leakage.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful review and constructive feedback. We address the major comments point-by-point below.

read point-by-point responses

Referee: [Abstract] Abstract and implied Methods: the central claim that embeddings capture information 'beyond' the Elixhauser index requires an ablation that replaces the sequential contrastive transformer with a permutation-invariant aggregator (e.g., mean-pooled diagnosis embeddings or set transformer). Without this control, the reported median AUC lift of 0.006 cannot be attributed to sequence or timing rather than higher-capacity encoding of the identical past diagnoses.

Authors: We agree that demonstrating the specific contribution of sequential information requires an ablation against a permutation-invariant baseline. In the revised version, we will add this control experiment using mean-pooled embeddings of the same diagnosis representations, allowing direct comparison to isolate the effect of ordering and timing. revision: yes
Referee: [Abstract] Abstract/Results: the modest median AUC gain (0.006) and survival AUC lift (0.004) are presented without reported confidence intervals, statistical tests for improvement, or assessment of calibration; given the sample size of 1.7M, even small gains may be statistically detectable yet clinically marginal, weakening the link to 'breadth, recency, and pace'.

Authors: We will include bootstrap-derived confidence intervals for the AUC values and differences, along with p-values from appropriate statistical tests (e.g., DeLong's test for AUC comparison). We will also add calibration metrics and plots to the revised manuscript to provide a more complete evaluation of the model's performance. revision: yes

Circularity Check

0 steps flagged

No circularity: embeddings trained contrastively on sequences, evaluated on held-out downstream prediction against fixed external baseline

full rationale

The paper trains a visit-level contrastive transformer on diagnosis sequences and inter-admission timing to produce embeddings, then evaluates those embeddings as features for predicting incident disease blocks and event-free survival in a downstream cohort, reporting modest AUC gains over a fixed Elixhauser comorbidity model plus demographics. No step reduces by the paper's own equations or definitions to a quantity already fitted in the baseline; the contrastive objective operates on sequence order and timing, the baseline is an external non-learned index, and the prediction tasks are on held-out future outcomes. No self-citation chains, ansatzes smuggled via prior work, or fitted parameters renamed as predictions are present in the provided text. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review; model architecture, training objective, and data preprocessing details are not specified, so free parameters and exact assumptions cannot be enumerated exhaustively.

axioms (1)

domain assumption Contrastive learning on diagnosis sequences produces embeddings that capture temporal structure beyond cross-sectional counts
Invoked by the choice to train a visit-level contrastive transformer and compare it to Elixhauser

pith-pipeline@v0.9.1-grok · 5800 in / 1326 out tokens · 29415 ms · 2026-06-28T20:13:11.211516+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

29 extracted references · 21 canonical work pages · 1 internal anchor

[1]

Luigi Ferrucci and George A. Kuchel. Heterogeneity of Aging: Individual Risk Factors, Mech- anisms, Patient Priorities, and Outcomes.Journal of the American Geriatrics Society, 69 (3):610–612, March 2021. ISSN 0002-8614, 1532-5415. doi: 10.1111/jgs.17011. URL https://agsjournals.onlinelibrary.wiley.com/doi/10.1111/jgs.17011

work page doi:10.1111/jgs.17011 2021
[2]

Moodie, Marie-France Forget, Philippe Desmarais, Mark R

Quoc Dinh Nguyen, Erica M. Moodie, Marie-France Forget, Philippe Desmarais, Mark R. Keezer, and Christina Wolfson. Health Heterogeneity in Older Adults: Exploration in the Canadian Longitudinal Study on Aging.Journal of the American Geriatrics Society, 69(3): 678–687, March 2021. ISSN 0002-8614, 1532-5415. doi: 10.1111/jgs.16919. URLhttps: //agsjournals.o...

work page doi:10.1111/jgs.16919 2021
[3]

Amaia Calderón-Larrañaga, Xiaonan Hu, Miriam Haaksma, Debora Rizzuto, Laura Fratiglioni, and Davide L. Vetrano. Health trajectories after age 60: the role of individual behaviors and the social context.Aging, 13(15):19186–19206, August 2021. ISSN 1945-4589. doi: 10.18632/agi ng.203407. URLhttps://www.aging-us.com/lookup/doi/10.18632/aging.203407

work page doi:10.18632/agi 2021
[4]

Siebra, Mascha Kurpicz-Briki, and Katarzyna Wac

Clauirton A. Siebra, Mascha Kurpicz-Briki, and Katarzyna Wac. Transformers in health: a systematic review on architectures for longitudinal data analysis.Artificial Intelligence Review, 57(2):32, February 2024. ISSN 1573-7462. doi: 10.1007/s10462-023-10677-z. URL https://link.springer.com/10.1007/s10462-023-10677-z

work page doi:10.1007/s10462-023-10677-z 2024
[5]

BEHRT: Transformer for Electronic Health Records.Scientific Reports, 10(1):7155, April 2020

Yikuan Li, Shishir Rao, José Roberto Ayala Solares, Abdelaali Hassaine, Rema Ramakrish- nan, Dexter Canoy, Yajie Zhu, Kazem Rahimi, and Gholamreza Salimi-Khorshidi. BEHRT: Transformer for Electronic Health Records.Scientific Reports, 10(1):7155, April 2020. ISSN 2045-2322. doi: 10.1038/s41598-020-62922-y. URLhttps://www.nature.com/article s/s41598-020-62922-y

work page doi:10.1038/s41598-020-62922-y 2020
[6]

ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission

Kexin Huang, Jaan Altosaar, and Rajesh Ranganath. ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission, November 2020. URLhttp://arxiv.org/abs/1904 .05342. arXiv:1904.05342 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2020
[7]

Med-BERT: pretrained contex- tualized embeddings on large-scale structured electronic health records for disease prediction

Laila Rasmy, Yang Xiang, Ziqian Xie, Cui Tao, and Degui Zhi. Med-BERT: pretrained contex- tualized embeddings on large-scale structured electronic health records for disease prediction. npjDigitalMedicine,4(1):86,May2021. ISSN2398-6352. doi: 10.1038/s41746-021-00455-y. URLhttps://www.nature.com/articles/s41746-021-00455-y

work page doi:10.1038/s41746-021-00455-y
[8]

Yikuan Li, Mohammad Mamouei, Gholamreza Salimi-Khorshidi, Shishir Rao, Abdelaali Has- saine, Dexter Canoy, Thomas Lukasiewicz, and Kazem Rahimi. Hi-BEHRT: Hierarchical Transformer-Based Model for Accurate Prediction of Clinical Events Using Multimodal Longi- tudinal Electronic Health Records.IEEE Journal of Biomedical and Health Informatics, 27(2): 1106–1...

work page doi:10.1109/jbhi.2022.3224727 2023
[9]

Zeljko Kraljevic, Dan Bean, Anthony Shek, Rebecca Bendayan, Harry Hemingway, Joshua Au Yeung, Alexander Deng, Alfred Balston, Jack Ross, Esther Idowu, James T Teo, and Richard J B Dobson. Foresight—a generative pretrained transformer for modelling of patient timelines using electronic health records: a retrospective modelling study.The Lancet Digital Heal...

work page doi:10.1016/s2589-7500(24)00025-6 2024
[10]

Zhichao Yang, Avijit Mitra, Weisong Liu, Dan Berlowitz, and Hong Yu. TransformEHR: transformer-based encoder-decoder generative model to enhance prediction of disease outcomes using electronic health records.Nature Communications, 14(1):7857, November 2023. ISSN 22 2041-1723. doi: 10.1038/s41467-023-43715-z. URLhttps://www.nature.com/article s/s41467-023-43715-z

work page doi:10.1038/s41467-023-43715-z 2023
[11]

Towardsmodelingevolvinglongitudinalhealthtrajectorieswithatransformer- based deep learning model.Annals of Epidemiology, 111:30–43, November 2025

Hans Moen, Vishnu Raj, Andrius Vabalas, Markus Perola, Samuel Kaski, Andrea Ganna, and PekkaMarttinen. Towardsmodelingevolvinglongitudinalhealthtrajectorieswithatransformer- based deep learning model.Annals of Epidemiology, 111:30–43, November 2025. ISSN 10472797. doi: 10.1016/j.annepidem.2025.08.025. URLhttps://linkinghub.elsev ier.com/retrieve/pii/S1047...

work page doi:10.1016/j.annepidem.2025.08.025 2025
[12]

TheUseofMachineLearningforAnalyzingReal-WorldDatainDiseasePredictionandManage- ment: Systematic Review.JMIR Medical Informatics, 13:e68898, June 2025

Norah Hamad Alhumaidi, Doni Dermawan, Hanin Farhana Kamaruzaman, and Nasser Alotaiq. TheUseofMachineLearningforAnalyzingReal-WorldDatainDiseasePredictionandManage- ment: Systematic Review.JMIR Medical Informatics, 13:e68898, June 2025. ISSN 2291-9694. doi: 10.2196/68898. URLhttps://medinform.jmir.org/2025/1/e68898

work page doi:10.2196/68898 2025
[13]

Using sequences of life-events to predict human lives.Nature Computational Science, 4(1):43–56, January 2024

Germans Savcisens, Tina Eliassi-Rad, Lars Kai Hansen, Laust Hvas Mortensen, Lau Lilleholt, Anna Rogers, Ingo Zettler, and Sune Lehmann. Using sequences of life-events to predict human lives.Nature Computational Science, 4(1):43–56, January 2024. ISSN 2662-8457. doi: 10.1038/s43588-023-00573-5. URLhttps://doi.org/10.1038/s43588-023-00573-5

work page doi:10.1038/s43588-023-00573-5 2024
[14]

SurvivEHR: a competing risks, time-to-event foundation model for multiple long-term conditions from primary care electronic health records

Charles Gadd, Krishna Gokhale, Aditya Acharya, Jennifer Cooper, Leah Fitzsimmons, Thomas Jackson, Krishnarajah Nirantharakumar, and Christopher Yau. SurvivEHR: a competing risks, time-to-event foundation model for multiple long-term conditions from primary care electronic health records
[15]

Learning the natural history of human disease with generative transformers, June 2024

Artem Shmatko, Alexander Wolfgang Jung, Kumar Gaurav, Søren Brunak, Laust Mortensen, Ewan Birney, Tom Fitzgerald, and Moritz Gerstung. Learning the natural history of human disease with generative transformers, June 2024. URLhttp://medrxiv.org/lookup/doi/1 0.1101/2024.06.07.24308553

2024
[16]

Marks, Aviv Regev, Siamack Ayandeh, MaryT.Brophy,NhanV.Do,PeterKraft,BrianM.Wolpin,MichaelH.Rosenthal,NathanaelR

DavidePlacido,BoYuan,JessicaX.Hjaltelin,ChunleiZheng,AmalieD.Haue,PiotrJ.Chmura, Chen Yuan, Jihye Kim, Renato Umeton, Gregory Antell, Alexander Chowdhury, Alexandra Franz, Lauren Brais, Elizabeth Andrews, Debora S. Marks, Aviv Regev, Siamack Ayandeh, MaryT.Brophy,NhanV.Do,PeterKraft,BrianM.Wolpin,MichaelH.Rosenthal,NathanaelR. Fillmore,SørenBrunak,andChri...

work page doi:10.1038/s41591-023-02332-5 2023
[17]

URLhttps://www.nature.com/articles/s41591-025-04006-w

KaiWang,FeiLiu,WeiWu,ChangxiHu,XianShen,MeihaoWang,GenLi,FanxinZeng,LiLiu, Io Nam Wong, Sian Liu, Zixing Zou, Bingzhou Li, Jinghang Li, Xiaoying Huang, Shengwei Jin, Zhuomin Li, Hui Xu, Gang Chen, Xiaodong Chen, Ying Zhu, Ping Li, Zhe Feng, Winston Wang,LinlingCheng,MingqiYang,QiangHou,WenyangLu,YiwenSun,KunLi,TianZhong, 23 Zhuo Sun, Yun Yin, Alexandre Lo...

work page doi:10.1038/s41591-025-04006-w
[18]

Robert Harris, and Rosanna M

Anne Elixhauser, Claudia Steiner, D. Robert Harris, and Rosanna M. Coffey. Comorbidity MeasuresforUsewithAdministrativeData.MedicalCare,36(1),1998. ISSN0025-7079. URL https://journals.lww.com/lww-medicalcare/fulltext/1998/01000/comorbidity _measures_for_use_with_administrative.4.aspx

1998
[19]

Mansour T. A. Sharabiani, Paul Aylin, and Alex Bottle. Systematic Review of Comorbidity Indices for Administrative Data.Medical Care, 50(12), 2012. ISSN 0025-7079. URLhttps: //journals.lww.com/lww-medicalcare/fulltext/2012/12000/systematic_review _of_comorbidity_indices_for.14.aspx

2012
[20]

Austin, Yu-Ning Wong, Robert G

Steven R. Austin, Yu-Ning Wong, Robert G. Uzzo, J. Robert Beck, and Brian L. Egleston. Why SummaryComorbidityMeasuresSuchAstheCharlsonComorbidityIndexandElixhauserScore Work.MedicalCare, 53(9), 2015. ISSN0025-7079. URLhttps://journals.lww.com/lww -medicalcare/fulltext/2015/09000/why_summary_comorbidity_measures_such_a s_the.14.aspx

2015
[21]

Austin, Alison Jennings, Hude Quan, and Alan J

Carl van Walraven, Peter C. Austin, Alison Jennings, Hude Quan, and Alan J. Forster. A Modification of the Elixhauser Comorbidity Measures Into a Point System for Hospital Death Using Administrative Data.Medical Care, 47(6), 2009. ISSN 0025-7079. URLhttps: //journals.lww.com/lww-medicalcare/fulltext/2009/06000/a_modification_of _the_elixhauser_comorbidity.4.aspx

2009
[22]

Large language model-based biological age prediction in large-scale populations.Nature Medicine, 31(9):2977–2990, September 2025

Yanjun Li, Qi Huang, Jin Jiang, Xusheng Du, Wenxin Xiang, Shiqi Zhang, Zean Pan, Liyuan Zhao, Yuyan Cui, Limei Ke, Bo Yin, Linfeng Liu, Guoqing Feng, Shouyi Yan, Liangcai Gao, Yang Liu, Yujuan Yuan, Yanying Guo, Yuqing Yang, Weizhi Ma, Yining Yang, and Qian Di. Large language model-based biological age prediction in large-scale populations.Nature Medicine...

work page doi:10.1038/s4 2025
[23]

Understandingchangesincomplexcare 24 needs over time: key research insights into multimorbidity trajectories.The Lancet Healthy Longevity, 6(11):100790, November 2025

Amaia Calderón-Larrañaga, Elisa Fabbri, Ana Isabel González, Rafael Perera-Salazar, Nina Grede, BruceGuthrie,JoséMValderas, CaterinaGregorio, ChristianeMuth, DavideLVetrano, GabrieleMeyer,LuigiFerrucci,JeanetWBlom,KerstinBernartz,LaraSchürmann,MariaHanf, Martin Scherer, Michael A Steinman, Mieke Rijken, Sharon Straus, Susan M Smith, Victor M Montori, Svet...

work page doi:10.1016/j.lanhl.2025.100790 2025
[24]

Studying trajectories of multimorbidity: a systematic scoping review of longitudinal approaches and evidence.BMJ Open, 11(11):e048485, November 2021

Genevieve Cezard, Calum Thomas McHale, Frank Sullivan, Juliana Kuster Filipe Bowles, and Katherine Keenan. Studying trajectories of multimorbidity: a systematic scoping review of longitudinal approaches and evidence.BMJ Open, 11(11):e048485, November 2021. ISSN 2044-6055, 2044-6055. doi: 10.1136/bmjopen-2020-048485. URLhttps://bmjopen.bmj. com/lookup/doi/...

work page doi:10.1136/bmjopen-2020-048485 2021
[25]

Generating Older Adult Multimorbidity Trajectories Using Various Comorbidity Indices and Calculation Methods.In- novation in Aging, 7(3):igad023, April 2023

Michael G Newman, Christina A Porucznik, Ankita P Date, Samir Abdelrahman, Karen C Schliep, James A VanDerslice, Ken R Smith, and Heidi A Hanson. Generating Older Adult Multimorbidity Trajectories Using Various Comorbidity Indices and Calculation Methods.In- novation in Aging, 7(3):igad023, April 2023. ISSN 2399-5300. doi: 10.1093/geroni/igad023. URLhttps...

work page doi:10.1093/geroni/igad023 2023
[26]

BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Jill Burstein, Christy Doran, andThamarSolorio,editors,Proceedingsofthe2019ConferenceoftheNorthAmericanChapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long a...

work page doi:10.18653/v1/n19-1423 2019
[27]

NarayanSharma,RenéSchwendimann,OlgaEndrich,DietmarAusserhofer,andMichaelSimon. ComparingCharlsonandElixhausercomorbidityindiceswithdifferentweightingstopredictin- hospital mortality: an analysis of national inpatient data.BMC Health Services Research, 21 (1):13, December 2021. ISSN 1472-6963. doi: 10.1186/s12913-020-05999-5. URLhttps: //bmchealthservres.b...

work page doi:10.1186/s12913-020-05999-5 2021
[28]

Beck, Thomas E

Hude Quan, Vijaya Sundararajan, Patricia Halfon, Andrew Fong, Bernard Burnand, Jean- Christophe Luthi, L Duncan Saunders, Cynthia A. Beck, Thomas E. Feasby, and William A. Ghali. CodingAlgorithmsforDefiningComorbiditiesinICD-9-CMandICD-10Administrative Data.Medical Care, 43(11), 2005. ISSN 0025-7079. URLhttps://journals.lww.com/l ww-medicalcare/fulltext/2...

2005
[29]

URL https://github.com/ellessenne/comorbidity/

comorbidipy: Python package for calculating comorbidity and clinical risk scores, 2026. URL https://github.com/ellessenne/comorbidity/. 25

2026

[1] [1]

Luigi Ferrucci and George A. Kuchel. Heterogeneity of Aging: Individual Risk Factors, Mech- anisms, Patient Priorities, and Outcomes.Journal of the American Geriatrics Society, 69 (3):610–612, March 2021. ISSN 0002-8614, 1532-5415. doi: 10.1111/jgs.17011. URL https://agsjournals.onlinelibrary.wiley.com/doi/10.1111/jgs.17011

work page doi:10.1111/jgs.17011 2021

[2] [2]

Moodie, Marie-France Forget, Philippe Desmarais, Mark R

Quoc Dinh Nguyen, Erica M. Moodie, Marie-France Forget, Philippe Desmarais, Mark R. Keezer, and Christina Wolfson. Health Heterogeneity in Older Adults: Exploration in the Canadian Longitudinal Study on Aging.Journal of the American Geriatrics Society, 69(3): 678–687, March 2021. ISSN 0002-8614, 1532-5415. doi: 10.1111/jgs.16919. URLhttps: //agsjournals.o...

work page doi:10.1111/jgs.16919 2021

[3] [3]

Amaia Calderón-Larrañaga, Xiaonan Hu, Miriam Haaksma, Debora Rizzuto, Laura Fratiglioni, and Davide L. Vetrano. Health trajectories after age 60: the role of individual behaviors and the social context.Aging, 13(15):19186–19206, August 2021. ISSN 1945-4589. doi: 10.18632/agi ng.203407. URLhttps://www.aging-us.com/lookup/doi/10.18632/aging.203407

work page doi:10.18632/agi 2021

[4] [4]

Siebra, Mascha Kurpicz-Briki, and Katarzyna Wac

Clauirton A. Siebra, Mascha Kurpicz-Briki, and Katarzyna Wac. Transformers in health: a systematic review on architectures for longitudinal data analysis.Artificial Intelligence Review, 57(2):32, February 2024. ISSN 1573-7462. doi: 10.1007/s10462-023-10677-z. URL https://link.springer.com/10.1007/s10462-023-10677-z

work page doi:10.1007/s10462-023-10677-z 2024

[5] [5]

BEHRT: Transformer for Electronic Health Records.Scientific Reports, 10(1):7155, April 2020

Yikuan Li, Shishir Rao, José Roberto Ayala Solares, Abdelaali Hassaine, Rema Ramakrish- nan, Dexter Canoy, Yajie Zhu, Kazem Rahimi, and Gholamreza Salimi-Khorshidi. BEHRT: Transformer for Electronic Health Records.Scientific Reports, 10(1):7155, April 2020. ISSN 2045-2322. doi: 10.1038/s41598-020-62922-y. URLhttps://www.nature.com/article s/s41598-020-62922-y

work page doi:10.1038/s41598-020-62922-y 2020

[6] [6]

ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission

Kexin Huang, Jaan Altosaar, and Rajesh Ranganath. ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission, November 2020. URLhttp://arxiv.org/abs/1904 .05342. arXiv:1904.05342 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2020

[7] [7]

Med-BERT: pretrained contex- tualized embeddings on large-scale structured electronic health records for disease prediction

Laila Rasmy, Yang Xiang, Ziqian Xie, Cui Tao, and Degui Zhi. Med-BERT: pretrained contex- tualized embeddings on large-scale structured electronic health records for disease prediction. npjDigitalMedicine,4(1):86,May2021. ISSN2398-6352. doi: 10.1038/s41746-021-00455-y. URLhttps://www.nature.com/articles/s41746-021-00455-y

work page doi:10.1038/s41746-021-00455-y

[8] [8]

Yikuan Li, Mohammad Mamouei, Gholamreza Salimi-Khorshidi, Shishir Rao, Abdelaali Has- saine, Dexter Canoy, Thomas Lukasiewicz, and Kazem Rahimi. Hi-BEHRT: Hierarchical Transformer-Based Model for Accurate Prediction of Clinical Events Using Multimodal Longi- tudinal Electronic Health Records.IEEE Journal of Biomedical and Health Informatics, 27(2): 1106–1...

work page doi:10.1109/jbhi.2022.3224727 2023

[9] [9]

Zeljko Kraljevic, Dan Bean, Anthony Shek, Rebecca Bendayan, Harry Hemingway, Joshua Au Yeung, Alexander Deng, Alfred Balston, Jack Ross, Esther Idowu, James T Teo, and Richard J B Dobson. Foresight—a generative pretrained transformer for modelling of patient timelines using electronic health records: a retrospective modelling study.The Lancet Digital Heal...

work page doi:10.1016/s2589-7500(24)00025-6 2024

[10] [10]

Zhichao Yang, Avijit Mitra, Weisong Liu, Dan Berlowitz, and Hong Yu. TransformEHR: transformer-based encoder-decoder generative model to enhance prediction of disease outcomes using electronic health records.Nature Communications, 14(1):7857, November 2023. ISSN 22 2041-1723. doi: 10.1038/s41467-023-43715-z. URLhttps://www.nature.com/article s/s41467-023-43715-z

work page doi:10.1038/s41467-023-43715-z 2023

[11] [11]

Towardsmodelingevolvinglongitudinalhealthtrajectorieswithatransformer- based deep learning model.Annals of Epidemiology, 111:30–43, November 2025

Hans Moen, Vishnu Raj, Andrius Vabalas, Markus Perola, Samuel Kaski, Andrea Ganna, and PekkaMarttinen. Towardsmodelingevolvinglongitudinalhealthtrajectorieswithatransformer- based deep learning model.Annals of Epidemiology, 111:30–43, November 2025. ISSN 10472797. doi: 10.1016/j.annepidem.2025.08.025. URLhttps://linkinghub.elsev ier.com/retrieve/pii/S1047...

work page doi:10.1016/j.annepidem.2025.08.025 2025

[12] [12]

TheUseofMachineLearningforAnalyzingReal-WorldDatainDiseasePredictionandManage- ment: Systematic Review.JMIR Medical Informatics, 13:e68898, June 2025

Norah Hamad Alhumaidi, Doni Dermawan, Hanin Farhana Kamaruzaman, and Nasser Alotaiq. TheUseofMachineLearningforAnalyzingReal-WorldDatainDiseasePredictionandManage- ment: Systematic Review.JMIR Medical Informatics, 13:e68898, June 2025. ISSN 2291-9694. doi: 10.2196/68898. URLhttps://medinform.jmir.org/2025/1/e68898

work page doi:10.2196/68898 2025

[13] [13]

Using sequences of life-events to predict human lives.Nature Computational Science, 4(1):43–56, January 2024

Germans Savcisens, Tina Eliassi-Rad, Lars Kai Hansen, Laust Hvas Mortensen, Lau Lilleholt, Anna Rogers, Ingo Zettler, and Sune Lehmann. Using sequences of life-events to predict human lives.Nature Computational Science, 4(1):43–56, January 2024. ISSN 2662-8457. doi: 10.1038/s43588-023-00573-5. URLhttps://doi.org/10.1038/s43588-023-00573-5

work page doi:10.1038/s43588-023-00573-5 2024

[14] [14]

SurvivEHR: a competing risks, time-to-event foundation model for multiple long-term conditions from primary care electronic health records

Charles Gadd, Krishna Gokhale, Aditya Acharya, Jennifer Cooper, Leah Fitzsimmons, Thomas Jackson, Krishnarajah Nirantharakumar, and Christopher Yau. SurvivEHR: a competing risks, time-to-event foundation model for multiple long-term conditions from primary care electronic health records

[15] [15]

Learning the natural history of human disease with generative transformers, June 2024

Artem Shmatko, Alexander Wolfgang Jung, Kumar Gaurav, Søren Brunak, Laust Mortensen, Ewan Birney, Tom Fitzgerald, and Moritz Gerstung. Learning the natural history of human disease with generative transformers, June 2024. URLhttp://medrxiv.org/lookup/doi/1 0.1101/2024.06.07.24308553

2024

[16] [16]

Marks, Aviv Regev, Siamack Ayandeh, MaryT.Brophy,NhanV.Do,PeterKraft,BrianM.Wolpin,MichaelH.Rosenthal,NathanaelR

DavidePlacido,BoYuan,JessicaX.Hjaltelin,ChunleiZheng,AmalieD.Haue,PiotrJ.Chmura, Chen Yuan, Jihye Kim, Renato Umeton, Gregory Antell, Alexander Chowdhury, Alexandra Franz, Lauren Brais, Elizabeth Andrews, Debora S. Marks, Aviv Regev, Siamack Ayandeh, MaryT.Brophy,NhanV.Do,PeterKraft,BrianM.Wolpin,MichaelH.Rosenthal,NathanaelR. Fillmore,SørenBrunak,andChri...

work page doi:10.1038/s41591-023-02332-5 2023

[17] [17]

URLhttps://www.nature.com/articles/s41591-025-04006-w

KaiWang,FeiLiu,WeiWu,ChangxiHu,XianShen,MeihaoWang,GenLi,FanxinZeng,LiLiu, Io Nam Wong, Sian Liu, Zixing Zou, Bingzhou Li, Jinghang Li, Xiaoying Huang, Shengwei Jin, Zhuomin Li, Hui Xu, Gang Chen, Xiaodong Chen, Ying Zhu, Ping Li, Zhe Feng, Winston Wang,LinlingCheng,MingqiYang,QiangHou,WenyangLu,YiwenSun,KunLi,TianZhong, 23 Zhuo Sun, Yun Yin, Alexandre Lo...

work page doi:10.1038/s41591-025-04006-w

[18] [18]

Robert Harris, and Rosanna M

Anne Elixhauser, Claudia Steiner, D. Robert Harris, and Rosanna M. Coffey. Comorbidity MeasuresforUsewithAdministrativeData.MedicalCare,36(1),1998. ISSN0025-7079. URL https://journals.lww.com/lww-medicalcare/fulltext/1998/01000/comorbidity _measures_for_use_with_administrative.4.aspx

1998

[19] [19]

Mansour T. A. Sharabiani, Paul Aylin, and Alex Bottle. Systematic Review of Comorbidity Indices for Administrative Data.Medical Care, 50(12), 2012. ISSN 0025-7079. URLhttps: //journals.lww.com/lww-medicalcare/fulltext/2012/12000/systematic_review _of_comorbidity_indices_for.14.aspx

2012

[20] [20]

Austin, Yu-Ning Wong, Robert G

Steven R. Austin, Yu-Ning Wong, Robert G. Uzzo, J. Robert Beck, and Brian L. Egleston. Why SummaryComorbidityMeasuresSuchAstheCharlsonComorbidityIndexandElixhauserScore Work.MedicalCare, 53(9), 2015. ISSN0025-7079. URLhttps://journals.lww.com/lww -medicalcare/fulltext/2015/09000/why_summary_comorbidity_measures_such_a s_the.14.aspx

2015

[21] [21]

Austin, Alison Jennings, Hude Quan, and Alan J

Carl van Walraven, Peter C. Austin, Alison Jennings, Hude Quan, and Alan J. Forster. A Modification of the Elixhauser Comorbidity Measures Into a Point System for Hospital Death Using Administrative Data.Medical Care, 47(6), 2009. ISSN 0025-7079. URLhttps: //journals.lww.com/lww-medicalcare/fulltext/2009/06000/a_modification_of _the_elixhauser_comorbidity.4.aspx

2009

[22] [22]

Large language model-based biological age prediction in large-scale populations.Nature Medicine, 31(9):2977–2990, September 2025

Yanjun Li, Qi Huang, Jin Jiang, Xusheng Du, Wenxin Xiang, Shiqi Zhang, Zean Pan, Liyuan Zhao, Yuyan Cui, Limei Ke, Bo Yin, Linfeng Liu, Guoqing Feng, Shouyi Yan, Liangcai Gao, Yang Liu, Yujuan Yuan, Yanying Guo, Yuqing Yang, Weizhi Ma, Yining Yang, and Qian Di. Large language model-based biological age prediction in large-scale populations.Nature Medicine...

work page doi:10.1038/s4 2025

[23] [23]

Understandingchangesincomplexcare 24 needs over time: key research insights into multimorbidity trajectories.The Lancet Healthy Longevity, 6(11):100790, November 2025

Amaia Calderón-Larrañaga, Elisa Fabbri, Ana Isabel González, Rafael Perera-Salazar, Nina Grede, BruceGuthrie,JoséMValderas, CaterinaGregorio, ChristianeMuth, DavideLVetrano, GabrieleMeyer,LuigiFerrucci,JeanetWBlom,KerstinBernartz,LaraSchürmann,MariaHanf, Martin Scherer, Michael A Steinman, Mieke Rijken, Sharon Straus, Susan M Smith, Victor M Montori, Svet...

work page doi:10.1016/j.lanhl.2025.100790 2025

[24] [24]

Studying trajectories of multimorbidity: a systematic scoping review of longitudinal approaches and evidence.BMJ Open, 11(11):e048485, November 2021

Genevieve Cezard, Calum Thomas McHale, Frank Sullivan, Juliana Kuster Filipe Bowles, and Katherine Keenan. Studying trajectories of multimorbidity: a systematic scoping review of longitudinal approaches and evidence.BMJ Open, 11(11):e048485, November 2021. ISSN 2044-6055, 2044-6055. doi: 10.1136/bmjopen-2020-048485. URLhttps://bmjopen.bmj. com/lookup/doi/...

work page doi:10.1136/bmjopen-2020-048485 2021

[25] [25]

Generating Older Adult Multimorbidity Trajectories Using Various Comorbidity Indices and Calculation Methods.In- novation in Aging, 7(3):igad023, April 2023

Michael G Newman, Christina A Porucznik, Ankita P Date, Samir Abdelrahman, Karen C Schliep, James A VanDerslice, Ken R Smith, and Heidi A Hanson. Generating Older Adult Multimorbidity Trajectories Using Various Comorbidity Indices and Calculation Methods.In- novation in Aging, 7(3):igad023, April 2023. ISSN 2399-5300. doi: 10.1093/geroni/igad023. URLhttps...

work page doi:10.1093/geroni/igad023 2023

[26] [26]

BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Jill Burstein, Christy Doran, andThamarSolorio,editors,Proceedingsofthe2019ConferenceoftheNorthAmericanChapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long a...

work page doi:10.18653/v1/n19-1423 2019

[27] [27]

NarayanSharma,RenéSchwendimann,OlgaEndrich,DietmarAusserhofer,andMichaelSimon. ComparingCharlsonandElixhausercomorbidityindiceswithdifferentweightingstopredictin- hospital mortality: an analysis of national inpatient data.BMC Health Services Research, 21 (1):13, December 2021. ISSN 1472-6963. doi: 10.1186/s12913-020-05999-5. URLhttps: //bmchealthservres.b...

work page doi:10.1186/s12913-020-05999-5 2021

[28] [28]

Beck, Thomas E

Hude Quan, Vijaya Sundararajan, Patricia Halfon, Andrew Fong, Bernard Burnand, Jean- Christophe Luthi, L Duncan Saunders, Cynthia A. Beck, Thomas E. Feasby, and William A. Ghali. CodingAlgorithmsforDefiningComorbiditiesinICD-9-CMandICD-10Administrative Data.Medical Care, 43(11), 2005. ISSN 0025-7079. URLhttps://journals.lww.com/l ww-medicalcare/fulltext/2...

2005

[29] [29]

URL https://github.com/ellessenne/comorbidity/

comorbidipy: Python package for calculating comorbidity and clinical risk scores, 2026. URL https://github.com/ellessenne/comorbidity/. 25

2026