pith. machine review for the scientific record. sign in

arxiv: 2604.07085 · v1 · submitted 2026-04-08 · 💻 cs.LG

Recognition: 2 theorem links

· Lean Theorem

Mining Electronic Health Records to Investigate Effectiveness of Ensemble Deep Clustering

Manar D. Samad, Shrabani Ghosh, Yina Hou

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:37 UTC · model grok-4.3

classification 💻 cs.LG
keywords electronic health recordsdeep clusteringensemble clusteringheart failurepatient clusteringtabular dataautoencodersK-means
0
0 comments X

The pith

An ensemble deep clustering method combined with traditional techniques achieves the highest performance in grouping heart failure patients from electronic health records.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper examines how well different clustering methods work on electronic health records to group patients and identify disease subtypes in heart failure cases. Traditional clustering approaches prove more robust on this tabular data than deep learning methods, which were built for images. The authors propose a new ensemble deep clustering technique that combines cluster assignments from several embedding dimensions. When this is merged with traditional clustering in a framework, it ranks best overall among 14 methods tested on real patient data from multiple cohorts. The work also stresses the need for separate analysis by biological sex.

Core claim

The paper establishes that traditional clustering methods perform robustly on tabular EHR data while deep learning approaches underperform due to their design for image clustering. It introduces an ensemble-based deep clustering approach that aggregates cluster assignments from multiple embedding dimensions. When combined with traditional clustering in a novel ensemble framework, this method delivers the best overall performance ranking across 14 diverse clustering methods and multiple patient cohorts. The findings highlight advantages of combining approaches and the importance of biological sex-specific clustering of EHR data.

What carries the argument

Ensemble embedding for deep clustering that aggregates cluster assignments obtained from multiple embedding dimensions rather than a single fixed embedding space, integrated with traditional clustering methods.

Load-bearing premise

Deep learning methods designed for image data inherently underperform on tabular EHR data, and aggregating assignments from multiple embedding dimensions reliably improves clustering quality without overfitting or selection bias.

What would settle it

A direct comparison showing that a single deep embedding space achieves equal or better clustering quality than the ensemble aggregation on the same heart failure EHR cohorts would falsify the advantage of the proposed method.

Figures

Figures reproduced from arXiv: 2604.07085 by Manar D. Samad, Shrabani Ghosh, Yina Hou.

Figure 1
Figure 1. Figure 1: A general framework for deep embedding clustering. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Patient cluster labels color-coded on t-SNE visualization embeddings. t-SNE is applied to G-CEALS with [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of UMAP visualizations for G-CEALS (latent dimension = 10) and K-means on raw data across [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
read the original abstract

In electronic health records (EHRs), clustering patients and distinguishing disease subtypes are key tasks to elucidate pathophysiology and aid clinical decision-making. However, clustering in healthcare informatics is still based on traditional methods, especially K-means, and has achieved limited success when applied to embedding representations learned by autoencoders as hybrid methods. This paper investigates the effectiveness of traditional, hybrid, and deep learning methods in heart failure patient cohorts using real EHR data from the All of Us Research Program. Traditional clustering methods perform robustly because deep learning approaches are specifically designed for image clustering, a task that differs substantially from the tabular EHR data setting. To address the shortcomings of deep clustering, we introduce an ensemble-based deep clustering approach that aggregates cluster assignments obtained from multiple embedding dimensions, rather than relying on a single fixed embedding space. When combined with traditional clustering in a novel ensemble framework, the proposed ensemble embedding for deep clustering delivers the best overall performance ranking across 14 diverse clustering methods and multiple patient cohorts. This paper underscores the importance of biological sex-specific clustering of EHR data and the advantages of combining traditional and deep clustering approaches over a single method.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that traditional clustering methods perform robustly on tabular EHR data for heart failure patient cohorts from the All of Us program, while deep learning methods designed for images underperform. It introduces an ensemble deep clustering approach that aggregates cluster assignments from multiple embedding dimensions rather than a single fixed space. When combined with traditional clustering in a novel ensemble framework, this method is asserted to deliver the best overall performance ranking across 14 diverse clustering methods and multiple patient cohorts, while also highlighting the importance of biological sex-specific clustering.

Significance. If the empirical ranking holds under rigorous validation, the work could advance healthcare informatics by demonstrating practical benefits of hybrid ensemble strategies for patient subtyping in tabular EHR data, where pure deep clustering has seen limited success. It provides a concrete example of adapting embedding-based methods to non-image domains and emphasizes sex-specific analysis, which may inform more accurate pathophysiology studies and clinical decision support.

major comments (2)
  1. Abstract: The assertion that the proposed ensemble embedding for deep clustering 'delivers the best overall performance ranking' is presented without any quantitative metrics (e.g., ARI, NMI, silhouette scores), statistical tests, error bars, cohort sizes, or implementation details, leaving the central empirical claim unsupported by verifiable evidence.
  2. Introduction and Methods: The foundational assumption that deep learning methods 'are specifically designed for image clustering' and thus inherently limited on tabular EHR data requires explicit ablation studies or direct comparisons to confirm that multi-dimension aggregation improves quality without introducing selection bias or overfitting, as this premise drives the need for the ensemble framework.
minor comments (1)
  1. The 14 clustering methods should be explicitly enumerated in the methods section, and any tables reporting performance rankings should include full metric values and cohort descriptions for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We address each major comment point by point below, indicating the revisions we will incorporate.

read point-by-point responses
  1. Referee: Abstract: The assertion that the proposed ensemble embedding for deep clustering 'delivers the best overall performance ranking' is presented without any quantitative metrics (e.g., ARI, NMI, silhouette scores), statistical tests, error bars, cohort sizes, or implementation details, leaving the central empirical claim unsupported by verifiable evidence.

    Authors: We agree that the abstract would be strengthened by including supporting quantitative evidence. In the revised manuscript, we will update the abstract to report key metrics such as the overall performance ranking across the 14 methods, average ARI and NMI values, cohort sizes (number of heart failure patients per All of Us cohort), and references to statistical significance testing. Full details including error bars from repeated runs and implementation specifics remain in the Methods and Results sections. revision: yes

  2. Referee: Introduction and Methods: The foundational assumption that deep learning methods 'are specifically designed for image clustering' and thus inherently limited on tabular EHR data requires explicit ablation studies or direct comparisons to confirm that multi-dimension aggregation improves quality without introducing selection bias or overfitting, as this premise drives the need for the ensemble framework.

    Authors: The manuscript already contains direct empirical comparisons demonstrating that standard deep clustering methods underperform relative to traditional methods on this tabular EHR data. We also report results from the multi-dimension aggregation approach versus single-embedding baselines. To further validate the aggregation step and address concerns about selection bias or overfitting, we will add explicit ablation experiments in the revised version, including performance sensitivity to the number of embedding dimensions and consistency checks across independent cohorts. revision: partial

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper's central claim is an empirical performance ranking of clustering methods (including a proposed ensemble deep clustering approach) on real EHR data from the All of Us program across multiple cohorts and 14 baselines. No derivation chain, theorem, or first-principles result is presented that reduces to its own inputs by construction, self-definition, or fitted-parameter renaming. The abstract and described framework treat the ensemble aggregation as a methodological proposal whose quality is assessed via external data experiments rather than any self-referential equation or self-citation load-bearing premise. This is the expected non-circular outcome for an applied empirical study.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim depends on standard assumptions from clustering literature that performance metrics reflect true subtype structure and that the All of Us dataset is representative; no explicit free parameters or new entities are introduced beyond the ensemble method itself.

axioms (2)
  • domain assumption Deep learning clustering methods optimized for images are unsuitable for tabular EHR data without modification
    Directly stated in the abstract as the reason traditional methods perform robustly.
  • ad hoc to paper Aggregating cluster assignments from multiple embedding dimensions improves overall clustering quality
    Core premise of the proposed ensemble approach without independent justification in the abstract.

pith-pipeline@v0.9.0 · 5497 in / 1215 out tokens · 53892 ms · 2026-05-10T18:37:50.035665+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Reference graph

Works this paper leans on

37 extracted references · 14 canonical work pages · 1 internal anchor

  1. [1]

    M. D. Samad, A. Ulloa, G. J. Wehner, L. Jing, D. Hartzel, C. W. Good, B. A. Williams, C. M. Haggerty, B. K. Fornwalt, Predicting Survival From Large Echocardiography and Electronic Health Record Datasets, JACC: Cardiovascular Imaging 12 (4) (2018) 681–689. doi:10.1016/j.jcmg.2018.04.026

  2. [2]

    Y . Hu, H. Yan, M. Liu, J. Gao, L. Xie, C. Zhang, L. Wei, Y . Ding, H. Jiang, Detecting cardiovascular diseases using unsupervised machine learning clustering based on electronic medical records, BMC Medical Research Methodology 24 (1) (2024) 309

  3. [3]

    S. R. Bhutto, M. Zeng, K. Niu, S. Khoso, M. Umar, G. Lalley, M. Li, Automatic icd-10-cm coding via lambda- scaled attention based deep learning model, Methods 222 (2024) 19–27

  4. [4]

    J. H. B. Masud, C.-C. Kuo, C.-Y . Yeh, H.-C. Yang, M.-C. Lin, Applying deep learning model to predict diagnosis code of medical records, Diagnostics 13 (13) (2023) 2297

  5. [5]

    S. B. Rabbani, I. V . Medri, M. D. Samad, Deep clustering of tabular data by weighted gaussian distribution learning, Neurocomputing 623 (2025) 129359

  6. [6]

    Y . Wang, Y . Zhao, T. M. Therneau, E. J. Atkinson, A. P. Tafti, N. Zhang, S. Amin, A. H. Limper, S. Khosla, H. Liu, Unsupervised machine learning for the discovery of latent disease clusters and patient subgroups using electronic health records, Journal of biomedical informatics 102 (2020) 103364

  7. [7]

    A. Aljohani, Optimizing patient stratification in healthcare: A comparative analysis of clustering algorithms for ehr data, International Journal of Computational Intelligence Systems 17 (1) (2024) 173

  8. [8]

    Karac ¸am, B

    M. Karac ¸am, B. K ¨ult¨ursay, D. Mutlu, S. Tanyeri, A. Kaya, S. C. Efe, C. Do ˘gan, G. S. Halil, ¨O. Y . Akbal, K. KIRAL˙I, et al., From patterns to prognosis: machine learning–derived clusters in advanced heart failure, Frontiers in Cardiovascular Medicine 12 (2025) 1669538

  9. [9]

    Nichols, T

    L. Nichols, T. Taverner, F. Crowe, S. Richardson, C. Yau, S. Kiddle, P. Kirk, J. Barrett, K. Nirantharakumar, S. Griffin, et al., In simulated data and health records, latent class analysis was the optimum multimorbidity clustering algorithm, Journal of clinical epidemiology 152 (2022) 164–175

  10. [10]

    Manzini, B

    E. Manzini, B. Vlacho, J. Franch-Nadal, J. Escudero, A. G´enova, E. Reixach, E. Andr´es, I. Pizarro, J.-L. Portero, D. Mauricio, et al., Longitudinal deep learning clustering of type 2 diabetes mellitus trajectories using routinely collected health records, Journal of biomedical informatics 135 (2022) 104218

  11. [11]

    Bampa, I

    M. Bampa, I. Miliou, B. Jovanovic, P. Papapetrou, M-clustehr: A multimodal clustering approach for electronic health records, Artificial Intelligence in Medicine 154 (2024) 102905. 11 APREPRINT- APRIL9, 2026

  12. [12]

    J. Xie, R. Girshick, A. Farhadi, Unsupervised deep embedding for clustering analysis, in: 33rd International Conference on Machine Learning, ICML 2016, V ol. 1, 2016, pp. 740–749. arXiv:1511.06335

  13. [13]

    W. Shao, X. Luo, Z. Zhang, Z. Han, V . Chandrasekaran, V . Turzhitsky, V . Bali, A. R. Roberts, M. Metzger, J. Baker, et al., Application of unsupervised deep learning algorithms for identification of specific clusters of chronic cough patients from emr data, BMC bioinformatics 23 (Suppl 3) (2022) 140

  14. [14]

    J. Qiu, Y . Hu, L. Li, A. M. Erzurumluoglu, I. Braenne, C. Whitehurst, J. Schmitz, J. Arora, B. A. Bartholdy, S. Gandhi, et al., Deep representation learning for clustering longitudinal survival data from electronic health records, Nature Communications 16 (1) (2025) 2534

  15. [15]

    S. Zhou, H. Xu, Z. Zheng, J. Chen, J. Bu, J. Wu, X. Wang, W. Zhu, M. Ester, et al., A comprehensive survey on deep clustering: Taxonomy, challenges, and future directions, arXiv preprint arXiv:2206.07579 (2022)

  16. [16]

    X. Guo, L. Gao, X. Liu, J. Yin, Improved deep embedded clustering with local structure preservation, IJCAI International Joint Conference on Artificial Intelligence 0 (2017) 1753–1759. doi:10.24963/ijcai.2017/243

  17. [17]

    K. G. Dizaji, A. Herandi, C. Deng, W. Cai, H. Huang, Deep Clustering via Joint Convolutional Autoencoder Em- bedding and Relative Entropy Minimization, in: Proceedings of the IEEE International Conference on Computer Vision, V ol. 2017-Octob, 2017, pp. 5747–5756. arXiv:1704.06327, doi:10.1109/ICCV .2017.612

  18. [18]

    M. M. Fard, T. Thonet, E. Gaussier, Deep k-means: Jointly clustering with k-means and learning representations, Pattern Recognition Letters 138 (2020) 185–192

  19. [19]

    Boubekki, M

    A. Boubekki, M. Kampffmeyer, U. Brefeld, R. Jenssen, Joint optimization of an autoencoder for clustering and embedding, Machine learning 110 (7) (2021) 1901–1937

  20. [20]

    Mrabah, N

    N. Mrabah, N. M. Khan, R. Ksantini, Z. Lachiri, Deep clustering with a dynamic autoencoder: From reconstruc- tion towards centroids construction, Neural Networks 130 (2020) 206–228. doi:10.1016/j.neunet.2020.07.005. URLhttps://doi.org/10.1016%2Fj.neunet.2020.07.005

  21. [21]

    Abrar, A

    S. Abrar, A. Sekmen, M. D. Samad, Effectiveness of deep image embedding clustering methods on tabular data, in: 2023 15th International Conference on Advanced Computational Intelligence (ICACI), IEEE, 2023, pp. 1–7

  22. [22]

    Kowsar, S

    I. Kowsar, S. B. Rabbani, K. F. B. Akhter, M. D. Samad, Deep clustering of electronic health records tabular data for clinical interpretation, in: 2023 IEEE International Conference on Telecommunications and Photonics (ICTP), IEEE, 2023, pp. 01–05

  23. [23]

    N. I. Kuo, M. Polizzotto, S. Finfer, L. Jorm, S. Barbieri, et al., Synthetic acute hypotension and sepsis datasets based on MIMIC-III and published as part of the health gym project, arXiv preprint arXiv:2112.03914 (2021)

  24. [24]

    H. W. Kuhn, The hungarian method for the assignment problem, Naval Research Logistics Quarterly 2 (1955) 83–97. doi:10.1002/nav.3800020109. URLhttps://onlinelibrary.wiley.com/doi/10.1002/nav.3800020109

  25. [25]

    P. A. Est ´evez, M. Tesmer, C. A. Perez, J. M. Zurada, Normalized mutual information feature selection, IEEE Transactions on neural networks 20 (2) (2009) 189–201

  26. [26]

    J. M. Santos, M. Embrechts, On the use of the adjusted rand index as a metric for evaluating supervised classifi- cation, in: International conference on artificial neural networks, Springer, 2009, pp. 175–184

  27. [27]

    of Us Research Program Investigators, J

    A. of Us Research Program Investigators, J. C. Denny, J. L. Rutter, D. B. Goldstein, A. Philippakis, J. W. Smoller, G. Jenkins, E. Dishman, The ”all of us” research program, New England Journal of Medicine 381 (2019) 668–

  28. [28]

    doi:10.1056/NEJMsr1809937

  29. [29]

    P. L. Sankar, L. S. Parker, The precision medicine initiative’s all of us research program: an agenda for research on its ethical, legal, and social issues, Genetics in Medicine 19 (7) (2017) 743–750

  30. [30]

    Griffiths, A

    C. Griffiths, A. Brock, C. Rooney, The impact of introducing icd-10 on trends in mortality from circulatory diseases in england and wales, Health Statistics Quarterly (22) (2004) 14–20

  31. [31]

    C. Luo, Y . Zhu, Z. Zhu, R. Li, G. Chen, Z. Wang, A machine learning-based risk stratification tool for in-hospital mortality of intensive care unit patients with heart failure, Journal of Translational Medicine 20 (1) (2022) 136. doi:10.1186/s12967-022-03340-8. 12 APREPRINT- APRIL9, 2026

  32. [32]

    O. Carr, B. McCollum, J. Collomosse, M. H. Fischer, Deep semi-supervised embedded cluster- ing (dsec) for stratification of heart failure patients, arXiv preprint abs/2012.13233, available at: https://arxiv.org/abs/2012.13233(12 2020)

  33. [33]

    J. Zhu, L. Hong, S. Yuan, X. Xu, J. Wei, H. Yin, Association between glucocorticoid use and all-cause mor- tality in critically ill patients with heart failure: A cohort study based on the mimic-iii database, Frontiers in Pharmacology 14 (2023) 1118551. doi:10.3389/fphar.2023.1118551

  34. [34]

    A. A. Huang, S. Y . Huang, Dendrogram of transparent feature importance machine learning statistics to classify associations for heart failure: A reanalysis of a retrospective cohort study of the medical information mart for intensive care iii (mimic-iii) database, PLOS ONE 18 (7) (2023) e0288819. doi:10.1371/journal.pone.0288819

  35. [35]

    C. W. Tsao, A. W. Aday, Z. I. Almarzooq, C. A. Anderson, P. Arora, C. L. Avery, C. M. Baker-Smith, A. Z. Beaton, A. K. Boehme, A. E. Buxton, et al., Heart disease and stroke statistics—2023 update: a report from the american heart association, Circulation 147 (8) (2023) e93–e621

  36. [36]

    L. v. d. Maaten, G. Hinton, Visualizing data using t-sne, Journal of machine learning research 9 (Nov) (2008) 2579–2605

  37. [37]

    UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

    L. McInnes, J. Healy, J. Melville, Umap: Uniform manifold approximation and projection for dimension reduc- tion, arXiv preprint arXiv:1802.03426 (2018). 13