arxiv: 2605.11958 · v1 · submitted 2026-05-12 · 💻 cs.IR

Recognition: no theorem link

From Trajectories to Phenotypes: Disease Progression as Structural Priors for Multi-organ Imaging Representation Learning

Zian Wang , Lizhen Lan , Guangming Wang , Haosen Zhang , Minxuan Xu , Qing Li , Tianxing He , Mo Yang

show 4 more authors

Wenyue Mao Yajing Zhang Yan Li Chengyan Wang

Authors on Pith no claims yet

Pith reviewed 2026-05-13 04:51 UTC · model grok-4.3

classification 💻 cs.IR

keywords imaging-derived phenotypesdisease trajectoriesrepresentation learningknowledge distillationmulti-organ imagingUK Biobankelectronic health recordsdisease prediction

0 comments

The pith

Disease progression patterns from health records act as structural priors for learning representations from multi-organ medical images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Static imaging-derived phenotypes capture only a moment in time, whereas longitudinal health records encode how diseases progress through sequences of diagnoses and comorbidities. The authors test whether these two sources share disease-relevant structure by distilling knowledge from a Transformer that models disease trajectories into an encoder for IDPs. The distillation uses geometry-preserving alignment so that the imaging representations inherit useful progression information. Experiments on UK Biobank data for 159 diseases show consistent gains in predicting which diseases a person will develop and when, measured by higher AUC and lower error in time-to-onset. The biggest improvements appear for diseases that affect fewer people, and the learned imaging embeddings end up with similarity patterns that match the trajectory embeddings.

Core claim

Training a generative Transformer on population-scale longitudinal diagnosis sequences produces embeddings that can be aligned with those from an organ-wise IDP encoder; this alignment transfers structural disease knowledge and yields imaging representations that improve discrimination and time-to-onset prediction for 159 diseases.

What carries the argument

Geometry-preserving alignment between subject-level embeddings from a disease trajectory Transformer and an organ-wise IDP encoder in a distillation framework.

If this is right

Pretraining with trajectory information increases AUC scores for disease discrimination using IDPs.
Time-to-onset prediction error decreases as measured by MAE.
Low-prevalence diseases see the largest performance lifts.
Similarity relationships among IDP embeddings become more consistent with those in the trajectory embedding space.
Cross-attention fusion of the two representations can be used at prediction time.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Applying the same priors might improve imaging models even when labeled data for a specific disease is scarce.
Clinical tools could use imaging alone to estimate progression risk by implicitly drawing on EHR trajectory knowledge.
Extending the alignment to include other data types like lab results could further enrich the priors.
Checking whether the alignment holds in different populations would test the generality of the shared structure hypothesis.

Load-bearing premise

The structure relevant to disease in imaging phenotypes overlaps sufficiently with the structure in diagnosis trajectories for alignment to transfer useful knowledge.

What would settle it

If applying the proposed pretraining fails to increase AUC or decrease MAE on a test set of UK Biobank participants, or if the embedding similarities do not match between the two spaces.

Figures

Figures reproduced from arXiv: 2605.11958 by Chengyan Wang, Guangming Wang, Haosen Zhang, Lizhen Lan, Minxuan Xu, Mo Yang, Qing Li, Tianxing He, Wenyue Mao, Yajing Zhang, Yan Li, Zian Wang.

**Figure 1.** Figure 1: Trajectory-aware distillation framework for aligning imaging-derived phenotypes with disease trajectory representations. (1) A generative disease trajectory Transformer learns population-scale disease progression structure and produces trajectory embeddings. (2) An organ-wise IDP encoder is pretrained via representation alignment, encouraging imaging embeddings to preserve trajectory-informed geometry. (… view at source ↗

**Figure 2.** Figure 2: Effect of trajectory-aware pretraining across disease prevalence levels. Diseases are stratified into low-, mid-, and high-prevalence tertiles by evenly partitioning the 159 diseases based on prevalence. (a) Paired AUC comparison across prevalence tertiles. (b) Relationship between prevalence and performance gain. (c) Distribution of ∆AUC across tertiles. (d-g) ROC curves for representative diseases cover… view at source ↗

**Figure 3.** Figure 3: Representation geometry alignment between IDPs and disease trajectories. (a) Pairwise cosine similarity between trajectory embeddings and IDP embeddings across sampled subject pairs. Each hexagon indicates pair density, and the red curve shows the binned mean trend. Higher trajectory similarity corresponds to higher IDP similarity, indicating preservation of structural relationships. (b) Mean IDP similarit… view at source ↗

read the original abstract

Imaging-derived phenotypes (IDPs) summarize multi-organ physiology but provide only static snapshots of diseases that evolve over time. In contrast, longitudinal electronic health records encode disease trajectories through temporal dependencies among past diagnosis events and comorbidity structure. We hypothesize that IDPs and disease trajectories contain partially shared disease-relevant structure. We propose a trajectory-aware distillation framework that transfers structural knowledge from a generative disease trajectory Transformer into an organ-wise IDP encoder. A population-scale trajectory model trained on longitudinal diagnosis sequences produces subject-level embeddings that supervise IDP representation learning via geometry-preserving alignment. During downstream prediction, trajectory and imaging representations can also be fused via cross-attention. Across 159 diseases in the UK Biobank cohort, trajectory-aware pretraining consistently improves both discrimination (AUC) and time-to-onset prediction (MAE), with the largest gains for low-prevalence diseases. Similarity relationships in IDP embedding space also align with those in trajectory space, providing supportive evidence for partially aligned representation geometry. These results suggest that population-scale generative disease models can serve as structural priors for data-limited imaging modalities, improving robustness under realistic cohort constraints.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's trajectory distillation for IDPs looks promising for rare diseases but risks label leakage from unmasked EHR sequences.

read the letter

Colleague, the one or two things to know: this work uses a generative disease trajectory model from EHR to provide structural priors for training multi-organ IDP encoders via distillation and alignment, claiming better AUC for discrimination and lower MAE for time-to-onset across 159 diseases, with bigger lifts for low-prevalence ones. The second is that without safeguards against label leakage, those gains might not reflect genuine prior knowledge. On the positive side, the framework is a direct application of Transformer-based sequence modeling to supervise imaging reps. They train the trajectory thing on full diagnosis sequences, pull out embeddings, and align them to the IDP space while preserving geometry. That part is cleanly described in the abstract, and showing that similarity relations carry over is a nice check on the hypothesis. It also allows fusion at inference with cross-attention, which is practical. For low-prevalence diseases, where you have few positive imaging cases, borrowing from the larger EHR population makes sense on the surface. The soft spots are around the soundness of the transfer. The stress-test point on leakage lands because the abstract gives no details on sequence truncation, masking target codes, or ensuring the embeddings don't see the outcome labels. If a subject has the disease, their trajectory embedding will likely reflect that, and then supervising the IDP with it for the same prediction task is circular. The paper would need to show ablations with proper hold-outs to convince that it's the shared structure at work rather than direct label info. Also, the abstract mentions performance gains but skips any numbers, baselines, or stats, so the full paper needs to deliver on that or the claims stay unverified. This paper is for the group working on representation learning that combines imaging and EHR data, especially those focused on improving models for rarer conditions under data constraints. A reader who wants concrete methods for using population trajectories as priors would find the alignment technique and the UK Biobank results useful to build on. I would recommend sending it to peer review. The idea is worth testing properly, and with revisions on the leakage controls and more transparent results, it could contribute to the multi-modal medical AI literature.

Referee Report

1 major / 2 minor

Summary. The paper proposes a trajectory-aware distillation framework that trains a generative Transformer on longitudinal diagnosis sequences from EHR data to produce subject-level embeddings. These embeddings supervise an organ-wise IDP encoder via geometry-preserving alignment, with optional cross-attention fusion at inference. Evaluated on 159 diseases in UK Biobank, the approach claims consistent gains in AUC for disease discrimination and MAE for time-to-onset prediction (largest for low-prevalence diseases), plus alignment of similarity structures between IDP and trajectory embedding spaces.

Significance. If the central claim holds without label leakage, the work would be significant for using population-scale EHR trajectory models as structural priors to improve representation learning from data-limited imaging modalities. The large-scale evaluation across 159 diseases and emphasis on low-prevalence cases, combined with the geometry-preserving alignment mechanism, represent a concrete strength in demonstrating potential knowledge transfer from longitudinal records to static imaging phenotypes.

major comments (1)

[Abstract and Methods] Abstract and Methods (trajectory model and alignment): The generative trajectory Transformer is trained on full longitudinal diagnosis sequences, after which subject-level embeddings supervise the IDP encoder. No details are provided on masking target disease codes, truncating sequences at diagnosis time, or applying temporal hold-outs before embedding extraction for subjects who receive a target diagnosis. This leaves open the possibility that performance gains arise from direct label leakage rather than transfer of shared structural priors, which is load-bearing for the hypothesis that IDPs and trajectories contain only partially shared disease-relevant structure.

minor comments (2)

[Abstract] Abstract: The claim of 'consistent improvements' would be strengthened by including at least one key quantitative result (e.g., average AUC delta or range) alongside the qualitative statement.
[Results] Results: Ensure all reported AUC and MAE values include error bars, statistical tests against baselines, and explicit data exclusion criteria as referenced in the soundness assessment.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful review and for recognizing the potential significance of trajectory-aware distillation for improving IDP representations, particularly for low-prevalence diseases. We address the major comment below.

read point-by-point responses

Referee: [Abstract and Methods] Abstract and Methods (trajectory model and alignment): The generative trajectory Transformer is trained on full longitudinal diagnosis sequences, after which subject-level embeddings supervise the IDP encoder. No details are provided on masking target disease codes, truncating sequences at diagnosis time, or applying temporal hold-outs before embedding extraction for subjects who receive a target diagnosis. This leaves open the possibility that performance gains arise from direct label leakage rather than transfer of shared structural priors, which is load-bearing for the hypothesis that IDPs and trajectories contain only partially shared disease-relevant structure.

Authors: We agree that the manuscript does not currently provide explicit details on these safeguards, which is a substantive omission that could raise legitimate questions about label leakage. In the revised manuscript we will expand the Methods section with a dedicated paragraph (and accompanying figure) clarifying the following protocol: (1) all diagnosis sequences used to extract subject-level embeddings are strictly truncated at the date of the UK Biobank imaging visit, so that only pre-imaging history is visible to the trajectory model; (2) any ICD-10 codes corresponding to the 159 target phenotypes are masked during embedding extraction for the supervision loss; and (3) the generative Transformer itself is trained under a temporal hold-out regime in which embeddings for downstream subjects are produced by a model whose training data ends before the subject’s imaging date. We will also add a supplementary sensitivity experiment that repeats the main results using only trajectories that end at least one year before imaging. These clarifications should remove the ambiguity and allow readers to evaluate whether the reported gains reflect genuine structural alignment rather than leakage. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper's core method trains a generative trajectory Transformer independently on longitudinal diagnosis sequences from EHR data to produce subject-level embeddings, then applies a separate geometry-preserving alignment step to supervise an organ-wise IDP encoder. Downstream tasks (discrimination and time-to-onset prediction) use the resulting representations, possibly with fusion. No quoted equations, definitions, or steps reduce the claimed performance gains to the inputs by construction, nor rename fitted parameters as predictions, import uniqueness via self-citation, or smuggle ansatzes. The derivation remains self-contained with independent training phases and external evaluation on UK Biobank metrics.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the unproven domain assumption of shared structure between trajectories and IDPs plus unspecified hyperparameters for the distillation and alignment steps; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)

domain assumption IDPs and disease trajectories contain partially shared disease-relevant structure
This is the explicit central hypothesis stated in the abstract that justifies the distillation framework.

pith-pipeline@v0.9.0 · 5531 in / 1450 out tokens · 85566 ms · 2026-05-13T04:51:14.841006+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages · 2 internal anchors

[1]

et al.: The UK Biobank resource with deep phenotyping and genomic data

Bycroft, C., Freeman, C., Petkova, D. et al.: The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018). https://doi.org/10.1038/s41586-018-0579-z

work page doi:10.1038/s41586-018-0579-z 2018
[2]

Learning the natural history of human disease with generative transformers , volume =

Shmatko, A., Jung, A.W., Gaurav, K., Brunak, S., Mortensen, L.H., Bir- ney, E., Fitzgerald, T., Gerstung, M.: Learning the natural history of human disease with generative transformers. Nature 647, 248-256 (2025). https://doi.org/10.1038/s41586-025-09529-3

work page doi:10.1038/s41586-025-09529-3 2025
[3]

et al.: Multimodal population brain imaging in the UK Biobank prospective epidemiological study

Miller, K.L., Alfaro-Almagro, F., Bangerter, N.K. et al.: Multimodal population brain imaging in the UK Biobank prospective epidemiological study. Nature Neu- roscience 19, 1523–1536 (2016). https://doi.org/10.1038/nn.4393

work page doi:10.1038/nn.4393 2016
[4]

et al.: BEHRT: Transformer for electronic health records

Li, Y., Rao, S., Solares, J.R.A. et al.: BEHRT: Transformer for electronic health records. Scientific Reports 10, 7155 (2020). https://doi.org/10.1038/s41598-020- 62922-y

work page doi:10.1038/s41598-020- 2020
[5]

et al.: Med-BERT: pretrained contextualized embed- dings on large-scale structured electronic health records for disease prediction

Rasmy, L., Xiang, Y., Xie, Z. et al.: Med-BERT: pretrained contextualized embed- dings on large-scale structured electronic health records for disease prediction. npj Digital Medicine 4, 86 (2021). https://doi.org/10.1038/s41746-021-00455-y

work page doi:10.1038/s41746-021-00455-y 2021
[6]

Representation learning: a review and new perspectives

Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence 35(8), 1798–1828 (2013). https://doi.org/10.1109/TPAMI.2013.50

work page doi:10.1109/tpami.2013.50 2013
[7]

et al.: Learning transferable visual models from natural language supervision

Radford, A., Kim, J.W., Hallacy, C. et al.: Learning transferable visual models from natural language supervision. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 8748–8763 (2021) 10 Zian Wang et al

work page 2021
[8]

In: Proceedings of the 4th International Confer- ence on Biomedical and Intelligent Systems (IC-BIS), pp

Liu, C., Ye, F.: A review of multimodal medical data fusion techniques for personalized medicine. In: Proceedings of the 4th International Confer- ence on Biomedical and Intelligent Systems (IC-BIS), pp. 338–347 (2025). https://doi.org/10.1145/3745034.3745088

work page doi:10.1145/3745034.3745088 2025
[9]

Distilling the Knowledge in a Neural Network

Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)

work page internal anchor Pith review Pith/arXiv arXiv 2015
[10]

Representation Learning with Contrastive Predictive Coding

Oord, A.v.d., Li, Y., Vinyals, O.: Representation learning with contrastive predic- tive coding. arXiv preprint arXiv:1807.03748 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[11]

In: Proceedings of the International Con- ference on Machine Learning (ICML), pp

Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for con- trastive learning of visual representations. In: Proceedings of the International Con- ference on Machine Learning (ICML), pp. 1597–1607 (2020)

work page 2020
[12]

In: Pro- ceedingsoftheInternationalConferenceonLearningRepresentations(ICLR)(2020)

Tian, Y., Krishnan, D., Isola, P.: Contrastive representation distillation. In: Pro- ceedingsoftheInternationalConferenceonLearningRepresentations(ICLR)(2020)

work page 2020