pith. sign in

arxiv: 2606.11570 · v1 · pith:OPM7XAYBnew · submitted 2026-06-10 · 📊 stat.ML · cs.LG· stat.ME

Enhancing Spectral Embedding through Robust and Flexible Knowledge Transfer in Electronic Health Records

classification 📊 stat.ML cs.LGstat.ME
keywords knowledgematrixdatamethodapproachescohortcomponentselectronic
0
0 comments X
read the original abstract

We propose a spectral-based, unsupervised representation learning framework to derive low-dimensional embeddings for clinical concepts and patients in rare disease cohorts from electronic health records, where data are high-dimensional but sample sizes are limited. To overcome this challenge, we incorporate a knowledge matrix extracted from a broader population that shares a partially overlapping subspace with the rare-disease cohort. Our method departs from existing approaches by relaxing restrictive one-to-one signal-alignment assumptions between the latent data matrix and knowledge matrix, allowing more flexible and realistic forms of structured sharing. We introduce a novel two-step spectral embedding procedure: first, we identify and remove irrelevant components from the knowledge matrix; then, we apply a projection-based method to separately recover shared and heterogeneous components. Simulations and an analysis of a real-world multiple sclerosis cohort show that the proposed method outperforms competing approaches, particularly in challenging scenarios where shared signals are weak and only partially aligned, as is common in rare-disease data.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.