pith. machine review for the scientific record. sign in

arxiv: 2604.18757 · v1 · submitted 2026-04-20 · 💻 cs.CV · cs.AI

Recognition: unknown

REVEAL: Multimodal Vision-Language Alignment of Retinal Morphometry and Clinical Risks for Incident AD and Dementia Prediction

Chenyu You, Kuang Gong, Lin Gu, Ruogu Fang, Seowung Leem

Authors on Pith no claims yet

Pith reviewed 2026-05-10 05:28 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords retinal imagingAlzheimer predictiondementiavision language modelsmultimodal learningcontrastive learningearly risk prediction
0
0 comments X

The pith

REVEAL aligns retinal images with risk narratives to predict Alzheimer's and dementia eight years before diagnosis.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces REVEAL to align color fundus photographs with individualized risk profiles translated from questionnaires for forecasting incident AD and dementia. It uses group-aware contrastive learning to cluster patients with matching retinal and clinical features, creating stronger cross-modal links than separate analysis allows. A reader would care if this holds because it turns routine eye exams into tools for spotting disease susceptibility long in advance, potentially shifting care toward prevention. The unified model beats prior retinal-text combinations and general vision-language systems on predictions averaging eight years early.

Core claim

REVEAL translates real-world risk factors from structured questionnaires into clinical narratives, aligns them with retinal images using group-aware contrastive learning to group similar patients as positives, and substantially outperforms state-of-the-art methods for predicting incident Alzheimer's disease and dementia on average eight years before diagnosis.

What carries the argument

Group-aware contrastive learning that clusters patients with similar retinal morphometry and risk factors to strengthen multimodal alignment.

Load-bearing premise

Translating structured risk factors into narratives retains all predictive value and the patient clustering improves genuine associations without selection bias.

What would settle it

A test on independent data where the full REVEAL model loses its edge over simpler image-plus-text baselines when the group-aware clustering step is disabled.

Figures

Figures reproduced from arXiv: 2604.18757 by Chenyu You, Kuang Gong, Lin Gu, Ruogu Fang, Seowung Leem.

Figure 1
Figure 1. Figure 1: Schematic of clinical scenario and proposed method. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Schematic overview of how a synthetic clinical report is generated. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Schematic overview of how GACL is performed. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The years until onset of Alzheimer’s Disease and dementia. IQR denotes in [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Effect (% difference) of varying thresholds on the incident AD and dementia [PITH_FULL_IMAGE:figures/full_fig_p021_5.png] view at source ↗
read the original abstract

The retina provides a unique, noninvasive window into Alzheimer's disease (AD) and dementia, capturing early structural changes through morphometric features, while systemic and lifestyle risk factors reflect well-established contributors to disease susceptibility long before clinical symptom onset. However, current retinal analysis frameworks typically model imaging and risk factors separately, limiting their ability to capture joint multimodal patterns critical for early risk prediction. Moreover, existing methods rarely incorporate mechanisms to organize or align patients with similar retinal and clinical characteristics, constraining the learning of coherent cross-modal associations. To address these limitations, we introduce REVEAL (REtinal-risk Vision-Language Early Alzheimer's Learning), a framework that aligns color fundus photographs with individualized disease-specific risk profiles for predicting incident AD and dementia, on average 8 years before diagnosis (range: 1-11 years). Because real-world risk factors are structured questionnaire data, we translate them into clinically interpretable narratives compatible with pretrained vision-language models (VLMs). We further propose a group-aware contrastive learning (GACL) strategy that clusters patients with similar retinal morphometry and risk factors as positive pairs, strengthening multimodal alignment. This unified representation learning framework substantially outperforms state-of-the-art retinal imaging models paired with clinical text encoders, as well as general-purpose VLMs, demonstrating the value of jointly modeling retinal biomarkers and clinical risk factors. By providing a generalizable and noninvasive approach for early AD and dementia risk stratification, REVEAL has the potential to enable earlier intervention and improve preventive care at the population level.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces REVEAL, a multimodal vision-language framework that translates structured clinical risk-factor questionnaires into narratives, aligns them with color fundus photographs via pretrained VLMs and a proposed group-aware contrastive learning (GACL) strategy, and uses the resulting representations to predict incident AD and dementia on average 8 years before diagnosis (range 1-11 years). It claims this unified approach substantially outperforms state-of-the-art retinal imaging models paired with clinical text encoders as well as general-purpose VLMs.

Significance. If the performance claims hold under rigorous validation, the work could enable earlier, noninvasive population-level risk stratification for AD and dementia by demonstrating the benefit of joint retinal morphometry and clinical risk modeling inside a VLM. The use of narrative translation to leverage pretrained VLMs and the GACL clustering mechanism represent concrete technical contributions that may generalize beyond this application.

major comments (2)
  1. [Methods (risk-factor narrative translation)] Methods (risk-factor narrative translation): The central claim of improved multimodal integration rests on the assumption that converting structured questionnaires into clinically interpretable narratives preserves all predictive information; however, no ablation is reported that compares narrative text against direct structured numeric input (e.g., exact BMI, blood pressure, or genetic scores), leaving open the possibility that performance gains arise from information loss or generation artifacts rather than true cross-modal alignment.
  2. [Methods (GACL)] Methods (GACL): The group-aware contrastive learning is presented as strengthening cross-modal associations by treating patients with similar retinal+text pairs as positives, yet no analysis (e.g., cluster composition by demographics, site, or known confounders, or controlled experiments) is supplied to demonstrate that clusters form on AD-relevant biology rather than spurious factors; this directly affects whether the reported outperformance over retinal-only and general VLM baselines reflects genuine joint modeling.
minor comments (2)
  1. [Abstract] Abstract: The summary asserts outperformance and an 8-year lead time but supplies no quantitative metrics, baselines, validation details, or error bars; while the full manuscript presumably contains these, the abstract should at minimum reference key results (e.g., AUC or hazard ratios) to allow readers to assess the claim without reading the entire paper.
  2. [Results / Experiments] The manuscript would benefit from explicit discussion of dataset characteristics (size, demographics, imaging sites, follow-up duration) and any steps taken to ensure the held-out test set is independent of the GACL clustering process.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which highlight important aspects of our multimodal framework. We address each major comment below and commit to revisions that will strengthen the manuscript's claims regarding the narrative translation and GACL components.

read point-by-point responses
  1. Referee: Methods (risk-factor narrative translation): The central claim of improved multimodal integration rests on the assumption that converting structured questionnaires into clinically interpretable narratives preserves all predictive information; however, no ablation is reported that compares narrative text against direct structured numeric input (e.g., exact BMI, blood pressure, or genetic scores), leaving open the possibility that performance gains arise from information loss or generation artifacts rather than true cross-modal alignment.

    Authors: We agree that a direct ablation comparing narrative translation to structured numeric inputs is needed to isolate the contribution of the narrative format. The translation step was introduced specifically to enable compatibility with pretrained VLMs, which expect natural language rather than tabular data. In the revised manuscript, we will add an ablation where structured risk factors are encoded directly (via a dedicated tabular encoder or MLP) and compared against the narrative-based pipeline on the same downstream prediction tasks. This will quantify any information loss or artifacts and clarify whether the observed gains arise from true cross-modal alignment. revision: yes

  2. Referee: Methods (GACL): The group-aware contrastive learning is presented as strengthening cross-modal associations by treating patients with similar retinal+text pairs as positives, yet no analysis (e.g., cluster composition by demographics, site, or known confounders, or controlled experiments) is supplied to demonstrate that clusters form on AD-relevant biology rather than spurious factors; this directly affects whether the reported outperformance over retinal-only and general VLM baselines reflects genuine joint modeling.

    Authors: We acknowledge that without explicit analysis of cluster composition, it remains possible that GACL groupings reflect confounders rather than AD-relevant biology. The current manuscript describes the GACL objective and its integration with the contrastive loss but does not include post-hoc validation of the resulting clusters. In the revision, we will add: (1) breakdowns of cluster membership by demographics, site, and other covariates; (2) controlled experiments comparing GACL to random or demographic-matched grouping; and (3) correlation analyses between cluster assignments and known AD risk markers. These additions will demonstrate that the performance improvements stem from biologically meaningful joint modeling. revision: yes

Circularity Check

0 steps flagged

No significant circularity; REVEAL is an empirical ML framework evaluated on held-out data

full rationale

The paper describes a standard multimodal learning pipeline: translating structured risk-factor data into text narratives, applying group-aware contrastive learning (GACL) to align retinal images with the resulting text, and training a VLM-based model. Performance is asserted via comparison against baselines on held-out test sets for incident AD/dementia prediction. No equations, uniqueness theorems, or derivations are presented that reduce by construction to fitted parameters or self-citations. The central claims rest on empirical results rather than any self-referential mathematical identity. This is the expected outcome for a trained predictive model whose validity is assessed externally.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the assumption that pretrained VLMs can handle translated clinical text without major information loss and that contrastive clustering on similarity groups improves alignment; no new physical entities or free parameters are explicitly introduced in the abstract.

axioms (1)
  • domain assumption Structured risk factors can be translated into narratives that retain all clinically relevant predictive signal for VLMs
    Required for the vision-language alignment step described in the abstract.

pith-pipeline@v0.9.0 · 5587 in / 1119 out tokens · 38076 ms · 2026-05-10T05:28:57.940866+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

42 extracted references · 26 canonical work pages · 3 internal anchors

  1. [1]

    BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs

    Zhang, Sheng and Xu, Yanbo and Usuyama, Naoto and Xu, Hanwen and Bagga, Jaspreet and Tinn, Robert and Preston, Sam and Rao, Rajesh and Wei, Mu and Valluri, Naveen and Wong, Cliff and Tupini, Andrea and Wang, Yu and Mazzola, Matt and Shukla, Swadheen and Liden, Lars and Gao, Jianfeng and Crabtree, Angela and Piening, Brian and Bifulco, Carlo and Lungren, M...

  2. [2]

    PMC-CLIP: Con- trastive language-image pre-training using biomedical docu- ments.arXiv preprint arXiv:2303.07240, 2023

    Lin, Weixiong and Zhao, Ziheng and Zhang, Xiaoman and Wu, Chaoyi and Zhang, Ya and Wang, Yanfeng and Xie, Weidi , month = mar, year =. doi:10.48550/arXiv.2303.07240 , urldate =

  3. [3]

    Editing conditional radiance fields

    Huang, Shih-Cheng and Shen, Liyue and Lungren, Matthew P. and Yeung, Serena , month = oct, year =. 2021. doi:10.1109/ICCV48922.2021.00391 , urldate =

  4. [4]

    Müller, Philip and Kaissis, Georgios and Zou, Congyu and Rueckert, Daniel , year =. Joint. doi:10.1007/978-3-031-19809-0_39 , note =

  5. [5]

    MedCLIP: Contrastive Learning from Unpaired Medical Images and Text.Proc Conf Empir Methods Nat Lang Process

    Wang, Zifeng and Wu, Zhenbang and Agarwal, Dinesh and Sun, Jimeng , editor =. Proceedings of the 2022. 2022 , pages =. doi:10.18653/v1/2022.emnlp-main.256 , urldate =

  6. [6]

    Frontiers in Neuroscience , author =

    Retinal. Frontiers in Neuroscience , author =. 2021 , pmid =. doi:10.3389/fnins.2021.731614 , urldate =

  7. [7]

    Ageing Research Reviews , author =

    The retina:. Ageing Research Reviews , author =. 2022 , pmid =. doi:10.1016/j.arr.2022.101590 , language =

  8. [8]

    JCI Insight , author =

    Retinal amyloid pathology and proof-of-concept imaging trial in. JCI Insight , author =. 2017 , pmid =. doi:10.1172/jci.insight.93621 , language =

  9. [9]

    Journal of Neurology, Neurosurgery & Psychiatry , author =

    Retinal imaging in. Journal of Neurology, Neurosurgery & Psychiatry , author =. 2021 , pmid =. doi:10.1136/jnnp-2020-325347 , language =

  10. [10]

    Nature , author =

    A foundation model for generalizable disease detection from retinal images , volume =. Nature , author =. 2023 , note =. doi:10.1038/s41586-023-06555-x , language =

  11. [11]

    Ret-clip: A retinal im- age foundation model pre-trained with clinical diagnostic re- ports.arXiv preprint arXiv:2405.14137, 2024

    Du, Jiawei and Guo, Jia and Zhang, Weihang and Yang, Shengzhu and Liu, Hanruo and Li, Huiqi and Wang, Ningli , month = aug, year =. doi:10.48550/arXiv.2405.14137 , urldate =

  12. [12]

    doi:10.48550/arXiv.2405.11793 , urldate =

    Wu, Ruiqi and Zhang, Chenran and Zhang, Jianle and Zhou, Yi and Zhou, Tao and Fu, Huazhu , month = may, year =. doi:10.48550/arXiv.2405.11793 , urldate =

  13. [13]

    , Chen, A

    A large language model for electronic health records , volume =. npj Digital Medicine , author =. 2022 , note =. doi:10.1038/s41746-022-00742-2 , language =

  14. [14]

    2022 , pages =

    Translational Vision Science & Technology , author =. 2022 , pages =. doi:10.1167/tvst.11.7.12 , number =

  15. [15]
  16. [16]

    Investigative Ophthalmology & Visual Science , author =

    Relationship of. Investigative Ophthalmology & Visual Science , author =. 2010 , pages =. doi:10.1167/iovs.09-5008 , abstract =

  17. [17]

    Alzheimer's & dementia : the journal of the Alzheimer's Association , author =

    Smoking and increased. Alzheimer's & dementia : the journal of the Alzheimer's Association , author =. 2014 , pmid =. doi:10.1016/j.jalz.2014.04.009 , abstract =

  18. [18]

    2017 , doi =

    Preventing. 2017 , doi =

  19. [19]

    Grattafiori, Aaron and Dubey, Abhimanyu and Jauhri, Abhinav and others , month = nov, year =. The. doi:10.48550/arXiv.2407.21783 , abstract =

  20. [20]

    Learning Transferable Visual Models From Natural Language Supervision

    Radford, Alec and Kim, Jong Wook and Hallacy, Chris and Ramesh, Aditya and Goh, Gabriel and Agarwal, Sandhini and Sastry, Girish and Askell, Amanda and Mishkin, Pamela and Clark, Jack and Krueger, Gretchen and Sutskever, Ilya , month = feb, year =. Learning. doi:10.48550/arXiv.2103.00020 , abstract =

  21. [21]

    Alzheimer's & Dementia: The Journal of the Alzheimer's Association , author =

    Microvascular network alterations in the retina of patients with. Alzheimer's & Dementia: The Journal of the Alzheimer's Association , author =. 2014 , pmid =. doi:10.1016/j.jalz.2013.06.009 , abstract =

  22. [22]

    Bulat, Adrian and Ouali, Yassine and Tzimiropoulos, Georgios , booktitle=

  23. [23]

    Modifiable risk factors in Alzheimer disease and related dementias: A review

    Litke, Rachel and Garcharna, Lorena Cancino and Jiwani, Salima and Neugroschl, Judith. Modifiable risk factors in Alzheimer disease and related dementias: A review. Clin. Ther

  24. [24]

    The Lancet , author =

    Dementia prevention, intervention, and care: 2024 report of the. The Lancet , author =. 2024 , note =. doi:10.1016/S0140-6736(24)01296-0 , number =

  25. [25]

    JAR life , author =

    Association between. JAR life , author =. 2024 , pmid =. doi:10.14283/jarlife.2024.1 , abstract =

  26. [26]

    Alzheimer's Research & Therapy , author =

    Association of modifiable risk factors with progression to dementia in relation to amyloid and tau pathology , volume =. Alzheimer's Research & Therapy , author =. 2024 , pmid =. doi:10.1186/s13195-024-01602-9 , abstract =

  27. [27]

    The Journal of Neuroscience Nursing: Journal of the American Association of Neuroscience Nurses , author =

    Review of. The Journal of Neuroscience Nursing: Journal of the American Association of Neuroscience Nurses , author =. 2023 , pmid =. doi:10.1097/JNN.0000000000000705 , abstract =

  28. [28]

    Neurology , author =

    Poor sleep is associated with. Neurology , author =. 2017 , pmid =. doi:10.1212/WNL.0000000000004171 , abstract =

  29. [29]

    The CARE guidelines: Consensus-based clinical case reporting guideline development

    Gagnier, Joel J and Kienle, Gunver and Altman, Douglas G and Moher, David and Sox, Harold and Riley, David and CARE Group*. The CARE guidelines: Consensus-based clinical case reporting guideline development. Glob. Adv. Health Med

  30. [30]

    Akiba, Takuya and Sano, Shotaro and Yanase, Toshihiko and Ohta, Takeru and Koyama, Masanori. Optuna. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

  31. [31]

    Global Burden of Alzheimer's disease and other dementias in adults aged 65 years and older, 1991-2021: population-based study

    Xiaopeng, Zhu and Jing, Yu and Xia, Lai and Xingsheng, Wang and Juan, Deng and Yan, Long and Baoshan, Li. Global Burden of Alzheimer's disease and other dementias in adults aged 65 years and older, 1991-2021: population-based study. Front. Public Health

  32. [32]

    2024 , eprint=

    The Llama 3 Herd of Models , author=. 2024 , eprint=

  33. [33]

    Association and multimodal model of retinal and blood-based biomarkers for detection of preclinical Alzheimer's disease

    Ravichandran, Swetha and Snyder, Peter J and Alber, Jessica and Murchison, Charles F and Chaby, Lauren E and Jeromin, Andreas and Arthur, Edmund. Association and multimodal model of retinal and blood-based biomarkers for detection of preclinical Alzheimer's disease. Alzheimers. Res. Ther

  34. [34]

    Association of retinal changes with Alzheimer disease neuroimaging biomarkers in cognitively normal individuals

    Byun, Min Soo and Park, Sung Wook and Lee, Jun Ho and Yi, Dahyun and Jeon, So Yeon and Choi, Hyo Jung and Joung, Haejung and Ghim, Un Hyung and Park, Un Chul and Kim, Yu Kyeong and Shin, Seong A and Yu, Hyeong Gon and Lee, Dong Young and KBASE Research Group. Association of retinal changes with Alzheimer disease neuroimaging biomarkers in cognitively norm...

  35. [35]

    Nonvascular retinal imaging markers of preclinical Alzheimer's disease

    Snyder, Peter J and Johnson, Lenworth N and Lim, Yen Ying and Santos, Cl \'a udia Y and Alber, Jessica and Maruff, Paul and Fern \'a ndez, Brian. Nonvascular retinal imaging markers of preclinical Alzheimer's disease. Alzheimers Dement. (Amst.)

  36. [36]

    Findings of the

    Eslami, Sedigheh and Meinel, Christoph and de Melo, Gerard , editor =. Findings of the. 2023 , pages =. doi:10.18653/v1/2023.findings-eacl.88 , abstract =

  37. [37]

    2022 , eprint=

    MedCLIP: Contrastive Learning from Unpaired Medical Images and Text , author=. 2022 , eprint=

  38. [38]

    Retinal imaging findings in carriers with PSEN1-associated early-onset familial Alzheimer disease before onset of cognitive symptoms

    Armstrong, Grayson W and Kim, Leo A and Vingopoulos, Filippos and others. Retinal imaging findings in carriers with PSEN1-associated early-onset familial Alzheimer disease before onset of cognitive symptoms. JAMA Ophthalmol

  39. [39]

    Retinal vascular fractals and cognitive impairment

    Ong, Yi-Ting and Hilal, Saima and Cheung, Carol Yim-Lui and others. Retinal vascular fractals and cognitive impairment. Dement. Geriatr. Cogn. Dis. Extra

  40. [40]

    Alzheimer's disease and glaucoma: imaging the biomarkers of neurodegenerative disease

    Valenti, Denise A. Alzheimer's disease and glaucoma: imaging the biomarkers of neurodegenerative disease. Int. J. Alzheimers. Dis

  41. [41]

    Vascular retinal biomarkers improves the detection of the likely cerebral amyloid status from hyperspectral retinal images

    Sharafi, Sayed Mehran and Sylvestre, Jean-Philippe and Chevrefils, Claudia and Soucy, Jean-Paul and Beaulieu, Sylvain and Pascoal, Tharick A and Arbour, Jean Daniel and Rh \'e aume, Marc-Andr \'e and Robillard, Alain and Chayer, C \'e line and Rosa-Neto, Pedro and Mathotaarachchi, Sulantha S and Nasreddine, Ziad S and Gauthier, Serge and Lesage, Fr \'e d ...

  42. [42]

    Retinal vascular biomarkers for early detection and monitoring of Alzheimer's disease

    Frost, S and Kanagasingam, Y and Sohrabi, H and Vignarajan, J and Bourgeat, P and Salvado, O and Villemagne, V and Rowe, C C and Macaulay, S Lance and Szoeke, C and Ellis, K A and Ames, D and Masters, C L and Rainey-Smith, S and Martins, R N and AIBL Research Group. Retinal vascular biomarkers for early detection and monitoring of Alzheimer's disease. Tra...