pith. sign in

arxiv: 2606.10725 · v2 · pith:MN5RHHKMnew · submitted 2026-06-09 · 💻 cs.LG · cs.CL

Pre-AF 13: An Interpretable Atrial Fibrillation Risk Score Mined from Discharge Reports

Pith reviewed 2026-06-27 14:06 UTC · model grok-4.3

classification 💻 cs.LG cs.CL
keywords atrial fibrillationrisk predictionmachine learningelectronic health recordsinterpretable modelsNLPdischarge reportscardiovascular disease
0
0 comments X

The pith

Machine learning models from routine discharge reports predict 24-month atrial fibrillation risk in cardiovascular patients more accurately than established clinical scores.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops machine learning models to predict the risk of developing atrial fibrillation within 24 months or over entire follow-up for patients with cardiovascular disease but no prior AF. It uses a custom NLP pipeline to turn unstructured discharge reports into 73 structured features and then builds full, reduced, and linear models with LightAutoML. The simple model with 13 features reaches ROC AUC 0.725 and outperforms four clinical scores that range from 0.53 to 0.64. SHAP analysis shows age and left atrial volume as the strongest predictors, while a linear version of the score divides observed 24-month incidence from roughly 7 percent to 36 percent. The work shows how routinely collected hospital data can support better short-term risk stratification than existing tools that rely on common factors like age and hypertension.

Core claim

The central claim is that interpretable ML models built from features extracted from discharge reports via NLP outperform established clinical risk scores in predicting AF incidence over a 24-month horizon and entire follow-up among CVD patients without pre-existing AF. The full 73-feature model reaches ROC AUC 0.735 for 24 months while the simple 13-feature model reaches 0.725; both exceed the clinical scores (CHARGE-AF, C2HEST, MHS, HAVOC) whose AUCs fall between 0.53 and 0.64. SHAP identifies age and left atrial volume as dominant predictors, and the linear Pre-AF 9 score stratifies observed 24-month AF incidence from approximately 7 percent to 36 percent.

What carries the argument

The Pre-AF 13 model, a reduced 13-feature subset selected from NLP-processed discharge reports and interpreted with SHAP, that carries the performance comparison to clinical scores.

If this is right

  • Non-linear models from discharge-report features achieve ROC AUC values near 0.73 for 24-month AF prediction while clinical scores remain in the 0.53-0.64 range.
  • Age and left atrial volume emerge as the dominant predictors when SHAP is applied to the models.
  • A linear risk score derived from the same features can divide observed 24-month AF incidence into strata ranging from about 7 percent to 36 percent.
  • Interpretable models built on routinely collected EHR data can identify high-AF-risk CVD patients more effectively than the four compared clinical scores.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the NLP extraction proves reliable across sites, the same pipeline could be reused to derive medium-term risk scores for other cardiac outcomes from existing discharge text.
  • Targeted monitoring or preventive interventions could be directed at the upper risk strata identified by Pre-AF 9 within the 24-month window.
  • The performance gap may shrink if clinical scores are recalibrated on the same single-center population that supplied the training data.

Load-bearing premise

The custom NLP pipeline accurately converts unstructured discharge reports into reliable 73 structured features without substantial extraction errors or biases.

What would settle it

An external validation dataset or prospective cohort in which the Pre-AF 13 model shows ROC AUC no higher than the clinical scores or in which manual review of extracted features reveals error rates high enough to change model rankings.

Figures

Figures reproduced from arXiv: 2606.10725 by Alexander Zolotarev, Artem Shelmanov, Daniil Larionov, Dmitrii Kriukov, Dmitry V. Dylov, Ekaterina Ivanova, Elizaveta Panchenko, Iaroslav Bespalov, Ilya Sochenkov, Kirill Grishchenkov, Miron Kuznetsov, Nikita Khromov, Olga Shakhmatova.

Figure 1
Figure 1. Figure 1: The illustration of target variable construction principle. Black circles show the [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Results of the feature selection procedures used for constructing the “simple” model. [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Receiver Operating Characteristic (ROC) curves comparing different models for [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: SHAP summary plots for feature importance in predicting risk at 24 months [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Shapley additive explanations illustrating feature contributions to the 24-month AF [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗
read the original abstract

Background. Atrial fibrillation (AF) is the most prevalent cardiac arrhythmia and a major determinant of prognosis. Established AF risk scores rely on factors (older age, hypertension) nearly ubiquitous among patients with cardiovascular disease (CVD), offering limited stratification in this high-risk group. Most target long-term (5-10 year) rather than medium-term prediction. We developed interpretable ML models predicting AF risk over a 24-month and entire follow-up horizon in CVD patients using routinely collected hospital data. Methods. Single-center retrospective study of electronic health records from the National Research Cardiology Center (Russia) for patients aged >=18 with CVD but without pre-existing AF, hospitalized more than once between January 2012 and May 2019. A custom NLP pipeline transformed unstructured discharge reports into 73 structured features, combining a rule-based parser with transformer-based NER. Using LightAutoML we built a full model (73 features), a simple model (reduced subset), and a linear model for a bedside risk score. Performance was assessed by ROC AUC, compared with CHARGE-AF, C2HEST, MHS, and HAVOC, and interpreted via SHAP. Results. Of 80,576 records from 45,000 patients, 17,562 met inclusion criteria; 1,438 (8.19%) developed AF. The full model reached ROC AUC 0.735 (24-month) and 0.696 (entire follow-up); the simple model was nearly identical (0.725, 0.696). All non-linear models outperformed the four clinical risk scores (ROC AUC 0.53-0.64). The simple model uses 13 features and is named Pre-AF 13. SHAP identified age and left atrial volume as dominant predictors. A linear risk score (Pre-AF 9) stratified observed 24-month AF incidence from ~7% to 36%. Conclusion. Interpretable ML models built from routinely collected EHR data identify high-AF-risk CVD patients, outperforming established clinical risk scores.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper reports a single-center retrospective EHR study of 17,562 CVD patients without prior AF, in which a custom NLP pipeline (rule-based parser + transformer NER) extracts 73 structured features from discharge reports. LightAutoML is used to train non-linear models (full 73-feature and reduced 13-feature versions) that achieve 24-month ROC AUC 0.725–0.735, outperforming four clinical scores (CHARGE-AF, C2HEST, MHS, HAVOC; AUC 0.53–0.64). SHAP identifies age and left atrial volume as dominant; a linear Pre-AF 9 score stratifies observed 24-month AF incidence from ~7% to 36%.

Significance. If the extracted features prove reliable and the performance generalizes, Pre-AF 13/Pre-AF 9 would supply a practical, interpretable medium-term risk stratification tool for a population in which existing scores have limited dynamic range.

major comments (2)
  1. [Methods] Methods (NLP pipeline paragraph): no held-out annotation study, inter-annotator agreement, precision/recall, or error analysis is reported for the 73 features produced by the rule-based parser plus transformer NER. Because left atrial volume is both a top SHAP driver and a key input to the Pre-AF scores, systematic extraction error would directly undermine the reported AUC superiority and the 7–36% stratification claim.
  2. [Methods] Methods/Results (model evaluation): the manuscript supplies no description of cross-validation procedure, temporal train/test split, class-imbalance handling (8.19% event rate), or external validation. Without these, the claim that non-linear models outperform the clinical scores cannot be assessed for robustness or overfitting.
minor comments (2)
  1. [Results] Abstract and Results: the exact number of patients and records after each inclusion/exclusion step should be shown in a CONSORT-style flow diagram rather than stated only in text.
  2. Table/Figure legends: clarify whether the reported AUCs are from internal cross-validation or a single held-out test set.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments, which highlight important aspects of methodological transparency. We address each major comment below and commit to revisions that improve the manuscript without misrepresenting the study design or results.

read point-by-point responses
  1. Referee: [Methods] Methods (NLP pipeline paragraph): no held-out annotation study, inter-annotator agreement, precision/recall, or error analysis is reported for the 73 features produced by the rule-based parser plus transformer NER. Because left atrial volume is both a top SHAP driver and a key input to the Pre-AF scores, systematic extraction error would directly undermine the reported AUC superiority and the 7–36% stratification claim.

    Authors: We agree that the current manuscript lacks a quantitative validation study for the NLP pipeline, including held-out annotation, inter-annotator agreement, precision/recall, or systematic error analysis. This is a genuine limitation, particularly given the importance of left atrial volume. In the revised version we will expand the Methods to describe the development process for the rule-based parser and transformer NER, report any internal checks performed during feature extraction, and add a limitations paragraph that explicitly discusses the absence of formal validation metrics and their potential implications for the reported performance and risk stratification. We will also note plans for prospective validation of the extraction pipeline in future work. revision: yes

  2. Referee: [Methods] Methods/Results (model evaluation): the manuscript supplies no description of cross-validation procedure, temporal train/test split, class-imbalance handling (8.19% event rate), or external validation. Without these, the claim that non-linear models outperform the clinical scores cannot be assessed for robustness or overfitting.

    Authors: The referee is correct that the manuscript does not describe the cross-validation procedure, any temporal splitting, class-imbalance handling, or external validation. We will revise the Methods section to provide these details: the specific cross-validation strategy used (including whether it was stratified to preserve the 8.19% event rate), how imbalance was addressed during model training and evaluation with LightAutoML, and the rationale for the single-center design precluding external validation at this stage. Corresponding clarifications will be added to the Results to allow readers to assess robustness. These additions address the concern directly while preserving the reported performance comparisons. revision: yes

Circularity Check

0 steps flagged

No circularity: standard ML pipeline with independent evaluation

full rationale

The derivation chain consists of (1) applying a custom NLP pipeline to produce 73 features from discharge reports, (2) training LightAutoML models (full, simple, linear) on those features, and (3) evaluating ROC AUC on held-out patient data or cross-validation, then comparing to external clinical scores (CHARGE-AF etc.). No step reduces a claimed prediction to its own inputs by construction, renames a fit as a prediction, or relies on a self-citation chain for a uniqueness theorem. The reported AUC values (0.725–0.735) are ordinary empirical performance numbers, not algebraically forced by model definition. The NLP pipeline accuracy is an unvalidated assumption but does not create circularity in the derivation itself.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on the accuracy of the NLP extraction step that generates the 73 features and on the assumption that a single-center retrospective cohort will yield models that generalize to new patients and settings.

free parameters (1)
  • LightAutoML hyperparameters and feature selection
    Automated ML tunes many internal parameters and the choice of which 13 features to retain is data-driven.
axioms (2)
  • domain assumption The NLP pipeline correctly identifies and structures the 73 clinical features from free-text discharge reports
    All downstream model performance depends on this extraction step being reliable.
  • domain assumption The single-center retrospective cohort is representative of the target population without major temporal or site-specific shifts
    The study design assumes the 2012-2019 data will support predictions for future patients.

pith-pipeline@v0.9.1-grok · 5987 in / 1746 out tokens · 37096 ms · 2026-06-27T14:06:00.117742+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

46 extracted references · 6 canonical work pages · 1 internal anchor

  1. [1]

    Circulation , volume=

    2024 heart disease and stroke statistics: a report of US and global data from the American Heart Association , author=. Circulation , volume=. 2024 , publisher=

  2. [2]

    bmj , volume=

    Atrial fibrillation and risks of cardiovascular disease, renal disease, and death: systematic review and meta-analysis , author=. bmj , volume=. 2016 , publisher=

  3. [3]

    A systematic review and meta-analysis , author=

    Risk of dementia in patients with atrial fibrillation: Short versus long follow-up. A systematic review and meta-analysis , author=. International journal of geriatric psychiatry , volume=. 2021 , publisher=

  4. [4]

    Neurology , volume=

    Atrial fibrillation detected by mobile cardiac outpatient telemetry in cryptogenic TIA or stroke , author=. Neurology , volume=. 2008 , publisher=

  5. [5]

    Circulation , volume=

    Criteria for evaluation of novel markers of cardiovascular risk: a scientific statement from the American Heart Association , author=. Circulation , volume=. 2009 , publisher=

  6. [6]

    EP Europace , volume=

    Effect of antihypertensive agents on risk of atrial fibrillation: a meta-analysis of large-scale randomized trials , author=. EP Europace , volume=. 2015 , publisher=

  7. [7]

    Journal of the American Heart Association , volume=

    Potential effects of bariatric surgery on the incidence of heart failure and atrial fibrillation in patients with type 2 diabetes mellitus and obesity and on mortality in patients with preexisting heart failure: a nationwide, matched, observational cohort study , author=. Journal of the American Heart Association , volume=

  8. [8]

    Journal of the American Heart Association , volume=

    Simple risk model predicts incidence of atrial fibrillation in a racially and geographically diverse population: the CHARGE-AF consortium , author=. Journal of the American Heart Association , volume=

  9. [9]

    Chest , volume=

    A simple clinical risk score (C2HEST) for predicting incident atrial fibrillation in Asian subjects: derivation in 471,446 Chinese subjects, with internal validation and external application in 451,199 Korean subjects , author=. Chest , volume=. 2019 , publisher=

  10. [10]

    The American journal of cardiology , volume=

    A clinical risk score for atrial fibrillation in a biracial prospective cohort (from the Atherosclerosis Risk in Communities [ARIC] study) , author=. The American journal of cardiology , volume=. 2011 , publisher=

  11. [11]

    Thrombosis and haemostasis , volume=

    Risk score for prediction of 10-year atrial fibrillation: a community-based study , author=. Thrombosis and haemostasis , volume=. 2018 , publisher=

  12. [12]

    Journal of cardiology , volume=

    Simple risk model and score for predicting of incident atrial fibrillation in Japanese , author=. Journal of cardiology , volume=. 2019 , publisher=

  13. [13]

    Circulation Journal , volume=

    Development of a Basic Risk Score for Incident Atrial Fibrillation in a Japanese General Population―The Suita Study― , author=. Circulation Journal , volume=. 2017 , publisher=

  14. [14]

    Circulation , volume=

    Effect of long-term marine -3 fatty acids supplementation on the risk of atrial fibrillation in randomized controlled trials of cardiovascular outcomes: a systematic review and meta-analysis , author=. Circulation , volume=. 2021 , publisher=

  15. [15]

    Heart Failure Reviews , volume=

    Association between sodium-glucose cotransporter-2 inhibitors and incident atrial fibrillation/atrial flutter in heart failure patients with reduced ejection fraction: a meta-analysis of randomized controlled trials , author=. Heart Failure Reviews , volume=. 2023 , publisher=

  16. [16]

    Journal of cardiovascular pharmacology , volume=

    Association of SGLT2 inhibitors with risk of atrial fibrillation and stroke in patients with and without type 2 diabetes: a systemic review and meta-analysis of randomized controlled trials , author=. Journal of cardiovascular pharmacology , volume=. 2022 , publisher=

  17. [17]

    International journal of cardiology , volume=

    Aldosterone pathway blockade to prevent atrial fibrillation: a systematic review and meta-analysis , author=. International journal of cardiology , volume=. 2017 , publisher=

  18. [18]

    Heart rhythm , volume=

    Predicting atrial fibrillation using a combination of genetic risk score and clinical risk factors , author=. Heart rhythm , volume=. 2020 , publisher=

  19. [19]

    European Heart Journal , volume=

    A polygenic risk score predicts atrial fibrillation in cardiovascular disease , author=. European Heart Journal , volume=. 2023 , publisher=

  20. [20]

    PLoS medicine , volume=

    Validation of a genetic risk score for atrial fibrillation: a prospective multicenter cohort study , author=. PLoS medicine , volume=. 2018 , publisher=

  21. [21]

    Current Cardiology Reports , volume=

    The use of artificial intelligence to predict the development of atrial fibrillation , author=. Current Cardiology Reports , volume=. 2023 , publisher=

  22. [22]

    Advances in neural information processing systems , volume=

    Lightgbm: A highly efficient gradient boosting decision tree , author=. Advances in neural information processing systems , volume=

  23. [23]

    arXiv preprint arXiv:2109.01528 , year=

    Lightautoml: Automl solution for a large financial services ecosystem , author=. arXiv preprint arXiv:2109.01528 , year=

  24. [24]

    CatBoost: gradient boosting with categorical features support

    CatBoost: Gradient boosting with categorical features support. arXiv 2018 , author=. arXiv preprint arXiv:1810.11363 , year=

  25. [25]

    Cardiology , volume=

    A clinical score for predicting atrial fibrillation in patients with cryptogenic stroke or transient ischemic attack , author=. Cardiology , volume=. 2017 , publisher=

  26. [26]

    2022 , version =

    Kukushkin, Alexander , title =. 2022 , version =

  27. [27]

    brat: a Web-based Tool for NLP -Assisted Text Annotation

    Stenetorp, Pontus and Pyysalo, Sampo and Topi \'c , Goran and Ohta, Tomoko and Ananiadou, Sophia and Tsujii, Jun ' ichi. brat: a Web-based Tool for NLP -Assisted Text Annotation. Proceedings of the Demonstrations at the 13th Conference of the E uropean Chapter of the Association for Computational Linguistics. 2012

  28. [28]

    Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume , pages=

    Active Learning for Sequence Tagging with Deep Pre-trained Models and Bayesian Uncertainty Estimates , author=. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume , pages=

  29. [29]

    arXiv preprint arXiv:2204.03951 , year=

    RuBioRoBERTa: a pre-trained biomedical language model for Russian language biomedical text mining , author=. arXiv preprint arXiv:2204.03951 , year=

  30. [30]

    2009 , publisher=

    Active learning literature survey , author=. 2009 , publisher=

  31. [31]

    32nd international conference on scientific and statistical database management , pages=

    Unsupervised non-parametric change point detection in electrocardiography , author=. 32nd international conference on scientific and statistical database management , pages=

  32. [32]

    Circulation: Arrhythmia and Electrophysiology , volume=

    Optical mapping-validated machine learning improves atrial fibrillation driver detection by multi-electrode mapping , author=. Circulation: Arrhythmia and Electrophysiology , volume=. 2020 , publisher=

  33. [33]

    Towards Computationally Feasible Deep Active Learning

    Tsvigun, Akim and Shelmanov, Artem and Kuzmin, Gleb and Sanochkin, Leonid and Larionov, Daniil and Gusev, Gleb and Avetisian, Manvel and Zhukov, Leonid. Towards Computationally Feasible Deep Active Learning. Findings of the Association for Computational Linguistics: NAACL 2022. 2022. doi:10.18653/v1/2022.findings-naacl.90

  34. [34]

    Advances in neural information processing systems , volume=

    A unified approach to interpreting model predictions , author=. Advances in neural information processing systems , volume=

  35. [35]

    Journal of the American College of Cardiology , volume=

    Finerenone reduces new-onset atrial fibrillation across the spectrum of cardio-kidney-metabolic syndrome: the FINE-HEART pooled analysis , author=. Journal of the American College of Cardiology , volume=. 2025 , publisher=

  36. [36]

    European Heart Journal , volume =

    Van Gelder, Isabelle C and Rienstra, Michiel and Bunting, Karina V and Casado-Arroyo, Ruben and Caso, Valeria and Crijns, Harry J G M and De Potter, Tom J R and Dwight, Jeremy and Guasti, Luigina and Hanke, Thorsten and Jaarsma, Tiny and Lettino, Maddalena and Løchen, Maja-Lisa and Lumbers, R Thomas and Maesen, Bart and Mølgaard, Inge and Rosano, Giuseppe...

  37. [37]

    Nature communications , volume=

    MSGene: a multistate model using genetic risk and the electronic health record applied to lifetime risk of coronary artery disease , author=. Nature communications , volume=. 2024 , publisher=

  38. [38]

    Circulation: Genomic and Precision Medicine , volume=

    Effect of disclosing a polygenic risk score for coronary heart disease on adverse cardiovascular events , author=. Circulation: Genomic and Precision Medicine , volume=. 2025 , publisher=

  39. [39]

    The Lancet , volume=

    Visualization of asymptomatic atherosclerotic disease for optimum cardiovascular prevention (VIPVIZA): a pragmatic, open-label, randomised controlled trial , author=. The Lancet , volume=. 2019 , publisher=

  40. [40]

    Frontiers in cardiovascular medicine , volume=

    Impaired left atrial performance resulting from age-related arial fibrillation is associated with increased fibrosis burden: insights from a clinical study combining with an in vivo experiment , author=. Frontiers in cardiovascular medicine , volume=. 2021 , publisher=

  41. [41]

    and Galitskiy, Igor and Shelmanov, Artem

    Tsvigun, Akim and Sanochkin, Leonid and Larionov, Daniil and Kuzmin, Gleb and Vazhentsev, Artem and Lazichny, Ivan and Khromov, Nikita and Kireev, Danil and Rubashevskii, Aleksandr and Shahmatova, Olga and Dylov, Dmitry V. and Galitskiy, Igor and Shelmanov, Artem. ALT oolbox: A Set of Tools for Active Learning Annotation of Natural Language Texts. Proceed...

  42. [42]

    Atherosclerosis , volume=

    Clopidogrel associated with reduced risk of atrial fibrillation , author=. Atherosclerosis , volume=. 2025 , publisher=

  43. [43]

    Purinergic Signalling , volume=

    Pleiotropic effects of clopidogrel , author=. Purinergic Signalling , volume=. 2022 , publisher=

  44. [44]

    European heart journal , volume=

    Hypercoagulability causes atrial fibrosis and promotes atrial fibrillation , author=. European heart journal , volume=. 2017 , publisher=

  45. [45]

    Europace , volume=

    The incidence and risk factors for new onset atrial fibrillation in the PROSPER study , author=. Europace , volume=. 2011 , publisher=

  46. [46]

    Journal of the American College of Cardiology , volume=

    New-onset atrial fibrillation after PCI or CABG for left main disease: the EXCEL trial , author=. Journal of the American College of Cardiology , volume=. 2018 , publisher=