Recognition: unknown
REVEAL: Multimodal Vision-Language Alignment of Retinal Morphometry and Clinical Risks for Incident AD and Dementia Prediction
Pith reviewed 2026-05-10 05:28 UTC · model grok-4.3
The pith
REVEAL aligns retinal images with risk narratives to predict Alzheimer's and dementia eight years before diagnosis.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
REVEAL translates real-world risk factors from structured questionnaires into clinical narratives, aligns them with retinal images using group-aware contrastive learning to group similar patients as positives, and substantially outperforms state-of-the-art methods for predicting incident Alzheimer's disease and dementia on average eight years before diagnosis.
What carries the argument
Group-aware contrastive learning that clusters patients with similar retinal morphometry and risk factors to strengthen multimodal alignment.
Load-bearing premise
Translating structured risk factors into narratives retains all predictive value and the patient clustering improves genuine associations without selection bias.
What would settle it
A test on independent data where the full REVEAL model loses its edge over simpler image-plus-text baselines when the group-aware clustering step is disabled.
Figures
read the original abstract
The retina provides a unique, noninvasive window into Alzheimer's disease (AD) and dementia, capturing early structural changes through morphometric features, while systemic and lifestyle risk factors reflect well-established contributors to disease susceptibility long before clinical symptom onset. However, current retinal analysis frameworks typically model imaging and risk factors separately, limiting their ability to capture joint multimodal patterns critical for early risk prediction. Moreover, existing methods rarely incorporate mechanisms to organize or align patients with similar retinal and clinical characteristics, constraining the learning of coherent cross-modal associations. To address these limitations, we introduce REVEAL (REtinal-risk Vision-Language Early Alzheimer's Learning), a framework that aligns color fundus photographs with individualized disease-specific risk profiles for predicting incident AD and dementia, on average 8 years before diagnosis (range: 1-11 years). Because real-world risk factors are structured questionnaire data, we translate them into clinically interpretable narratives compatible with pretrained vision-language models (VLMs). We further propose a group-aware contrastive learning (GACL) strategy that clusters patients with similar retinal morphometry and risk factors as positive pairs, strengthening multimodal alignment. This unified representation learning framework substantially outperforms state-of-the-art retinal imaging models paired with clinical text encoders, as well as general-purpose VLMs, demonstrating the value of jointly modeling retinal biomarkers and clinical risk factors. By providing a generalizable and noninvasive approach for early AD and dementia risk stratification, REVEAL has the potential to enable earlier intervention and improve preventive care at the population level.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces REVEAL, a multimodal vision-language framework that translates structured clinical risk-factor questionnaires into narratives, aligns them with color fundus photographs via pretrained VLMs and a proposed group-aware contrastive learning (GACL) strategy, and uses the resulting representations to predict incident AD and dementia on average 8 years before diagnosis (range 1-11 years). It claims this unified approach substantially outperforms state-of-the-art retinal imaging models paired with clinical text encoders as well as general-purpose VLMs.
Significance. If the performance claims hold under rigorous validation, the work could enable earlier, noninvasive population-level risk stratification for AD and dementia by demonstrating the benefit of joint retinal morphometry and clinical risk modeling inside a VLM. The use of narrative translation to leverage pretrained VLMs and the GACL clustering mechanism represent concrete technical contributions that may generalize beyond this application.
major comments (2)
- [Methods (risk-factor narrative translation)] Methods (risk-factor narrative translation): The central claim of improved multimodal integration rests on the assumption that converting structured questionnaires into clinically interpretable narratives preserves all predictive information; however, no ablation is reported that compares narrative text against direct structured numeric input (e.g., exact BMI, blood pressure, or genetic scores), leaving open the possibility that performance gains arise from information loss or generation artifacts rather than true cross-modal alignment.
- [Methods (GACL)] Methods (GACL): The group-aware contrastive learning is presented as strengthening cross-modal associations by treating patients with similar retinal+text pairs as positives, yet no analysis (e.g., cluster composition by demographics, site, or known confounders, or controlled experiments) is supplied to demonstrate that clusters form on AD-relevant biology rather than spurious factors; this directly affects whether the reported outperformance over retinal-only and general VLM baselines reflects genuine joint modeling.
minor comments (2)
- [Abstract] Abstract: The summary asserts outperformance and an 8-year lead time but supplies no quantitative metrics, baselines, validation details, or error bars; while the full manuscript presumably contains these, the abstract should at minimum reference key results (e.g., AUC or hazard ratios) to allow readers to assess the claim without reading the entire paper.
- [Results / Experiments] The manuscript would benefit from explicit discussion of dataset characteristics (size, demographics, imaging sites, follow-up duration) and any steps taken to ensure the held-out test set is independent of the GACL clustering process.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments, which highlight important aspects of our multimodal framework. We address each major comment below and commit to revisions that will strengthen the manuscript's claims regarding the narrative translation and GACL components.
read point-by-point responses
-
Referee: Methods (risk-factor narrative translation): The central claim of improved multimodal integration rests on the assumption that converting structured questionnaires into clinically interpretable narratives preserves all predictive information; however, no ablation is reported that compares narrative text against direct structured numeric input (e.g., exact BMI, blood pressure, or genetic scores), leaving open the possibility that performance gains arise from information loss or generation artifacts rather than true cross-modal alignment.
Authors: We agree that a direct ablation comparing narrative translation to structured numeric inputs is needed to isolate the contribution of the narrative format. The translation step was introduced specifically to enable compatibility with pretrained VLMs, which expect natural language rather than tabular data. In the revised manuscript, we will add an ablation where structured risk factors are encoded directly (via a dedicated tabular encoder or MLP) and compared against the narrative-based pipeline on the same downstream prediction tasks. This will quantify any information loss or artifacts and clarify whether the observed gains arise from true cross-modal alignment. revision: yes
-
Referee: Methods (GACL): The group-aware contrastive learning is presented as strengthening cross-modal associations by treating patients with similar retinal+text pairs as positives, yet no analysis (e.g., cluster composition by demographics, site, or known confounders, or controlled experiments) is supplied to demonstrate that clusters form on AD-relevant biology rather than spurious factors; this directly affects whether the reported outperformance over retinal-only and general VLM baselines reflects genuine joint modeling.
Authors: We acknowledge that without explicit analysis of cluster composition, it remains possible that GACL groupings reflect confounders rather than AD-relevant biology. The current manuscript describes the GACL objective and its integration with the contrastive loss but does not include post-hoc validation of the resulting clusters. In the revision, we will add: (1) breakdowns of cluster membership by demographics, site, and other covariates; (2) controlled experiments comparing GACL to random or demographic-matched grouping; and (3) correlation analyses between cluster assignments and known AD risk markers. These additions will demonstrate that the performance improvements stem from biologically meaningful joint modeling. revision: yes
Circularity Check
No significant circularity; REVEAL is an empirical ML framework evaluated on held-out data
full rationale
The paper describes a standard multimodal learning pipeline: translating structured risk-factor data into text narratives, applying group-aware contrastive learning (GACL) to align retinal images with the resulting text, and training a VLM-based model. Performance is asserted via comparison against baselines on held-out test sets for incident AD/dementia prediction. No equations, uniqueness theorems, or derivations are presented that reduce by construction to fitted parameters or self-citations. The central claims rest on empirical results rather than any self-referential mathematical identity. This is the expected outcome for a trained predictive model whose validity is assessed externally.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Structured risk factors can be translated into narratives that retain all clinically relevant predictive signal for VLMs
Reference graph
Works this paper leans on
-
[1]
Zhang, Sheng and Xu, Yanbo and Usuyama, Naoto and Xu, Hanwen and Bagga, Jaspreet and Tinn, Robert and Preston, Sam and Rao, Rajesh and Wei, Mu and Valluri, Naveen and Wong, Cliff and Tupini, Andrea and Wang, Yu and Mazzola, Matt and Shukla, Swadheen and Liden, Lars and Gao, Jianfeng and Crabtree, Angela and Piening, Brian and Bifulco, Carlo and Lungren, M...
work page internal anchor Pith review doi:10.48550/arxiv.2303.00915
-
[2]
Lin, Weixiong and Zhao, Ziheng and Zhang, Xiaoman and Wu, Chaoyi and Zhang, Ya and Wang, Yanfeng and Xie, Weidi , month = mar, year =. doi:10.48550/arXiv.2303.07240 , urldate =
-
[3]
Editing conditional radiance fields
Huang, Shih-Cheng and Shen, Liyue and Lungren, Matthew P. and Yeung, Serena , month = oct, year =. 2021. doi:10.1109/ICCV48922.2021.00391 , urldate =
-
[4]
Müller, Philip and Kaissis, Georgios and Zou, Congyu and Rueckert, Daniel , year =. Joint. doi:10.1007/978-3-031-19809-0_39 , note =
-
[5]
Wang, Zifeng and Wu, Zhenbang and Agarwal, Dinesh and Sun, Jimeng , editor =. Proceedings of the 2022. 2022 , pages =. doi:10.18653/v1/2022.emnlp-main.256 , urldate =
-
[6]
Frontiers in Neuroscience , author =
Retinal. Frontiers in Neuroscience , author =. 2021 , pmid =. doi:10.3389/fnins.2021.731614 , urldate =
-
[7]
Ageing Research Reviews , author =
The retina:. Ageing Research Reviews , author =. 2022 , pmid =. doi:10.1016/j.arr.2022.101590 , language =
-
[8]
Retinal amyloid pathology and proof-of-concept imaging trial in. JCI Insight , author =. 2017 , pmid =. doi:10.1172/jci.insight.93621 , language =
-
[9]
Journal of Neurology, Neurosurgery & Psychiatry , author =
Retinal imaging in. Journal of Neurology, Neurosurgery & Psychiatry , author =. 2021 , pmid =. doi:10.1136/jnnp-2020-325347 , language =
-
[10]
A foundation model for generalizable disease detection from retinal images , volume =. Nature , author =. 2023 , note =. doi:10.1038/s41586-023-06555-x , language =
-
[11]
Du, Jiawei and Guo, Jia and Zhang, Weihang and Yang, Shengzhu and Liu, Hanruo and Li, Huiqi and Wang, Ningli , month = aug, year =. doi:10.48550/arXiv.2405.14137 , urldate =
-
[12]
doi:10.48550/arXiv.2405.11793 , urldate =
Wu, Ruiqi and Zhang, Chenran and Zhang, Jianle and Zhou, Yi and Zhou, Tao and Fu, Huazhu , month = may, year =. doi:10.48550/arXiv.2405.11793 , urldate =
-
[13]
A large language model for electronic health records , volume =. npj Digital Medicine , author =. 2022 , note =. doi:10.1038/s41746-022-00742-2 , language =
-
[14]
Translational Vision Science & Technology , author =. 2022 , pages =. doi:10.1167/tvst.11.7.12 , number =
-
[15]
PLoS Medicine , author =. 2015 , pmid =. doi:10.1371/journal.pmed.1001779 , abstract =
-
[16]
Investigative Ophthalmology & Visual Science , author =
Relationship of. Investigative Ophthalmology & Visual Science , author =. 2010 , pages =. doi:10.1167/iovs.09-5008 , abstract =
-
[17]
Alzheimer's & dementia : the journal of the Alzheimer's Association , author =
Smoking and increased. Alzheimer's & dementia : the journal of the Alzheimer's Association , author =. 2014 , pmid =. doi:10.1016/j.jalz.2014.04.009 , abstract =
-
[18]
2017 , doi =
Preventing. 2017 , doi =
2017
-
[19]
Grattafiori, Aaron and Dubey, Abhimanyu and Jauhri, Abhinav and others , month = nov, year =. The. doi:10.48550/arXiv.2407.21783 , abstract =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2407.21783
-
[20]
Learning Transferable Visual Models From Natural Language Supervision
Radford, Alec and Kim, Jong Wook and Hallacy, Chris and Ramesh, Aditya and Goh, Gabriel and Agarwal, Sandhini and Sastry, Girish and Askell, Amanda and Mishkin, Pamela and Clark, Jack and Krueger, Gretchen and Sutskever, Ilya , month = feb, year =. Learning. doi:10.48550/arXiv.2103.00020 , abstract =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2103.00020
-
[21]
Alzheimer's & Dementia: The Journal of the Alzheimer's Association , author =
Microvascular network alterations in the retina of patients with. Alzheimer's & Dementia: The Journal of the Alzheimer's Association , author =. 2014 , pmid =. doi:10.1016/j.jalz.2013.06.009 , abstract =
-
[22]
Bulat, Adrian and Ouali, Yassine and Tzimiropoulos, Georgios , booktitle=
-
[23]
Modifiable risk factors in Alzheimer disease and related dementias: A review
Litke, Rachel and Garcharna, Lorena Cancino and Jiwani, Salima and Neugroschl, Judith. Modifiable risk factors in Alzheimer disease and related dementias: A review. Clin. Ther
-
[24]
Dementia prevention, intervention, and care: 2024 report of the. The Lancet , author =. 2024 , note =. doi:10.1016/S0140-6736(24)01296-0 , number =
-
[25]
Association between. JAR life , author =. 2024 , pmid =. doi:10.14283/jarlife.2024.1 , abstract =
-
[26]
Alzheimer's Research & Therapy , author =
Association of modifiable risk factors with progression to dementia in relation to amyloid and tau pathology , volume =. Alzheimer's Research & Therapy , author =. 2024 , pmid =. doi:10.1186/s13195-024-01602-9 , abstract =
-
[27]
Review of. The Journal of Neuroscience Nursing: Journal of the American Association of Neuroscience Nurses , author =. 2023 , pmid =. doi:10.1097/JNN.0000000000000705 , abstract =
-
[28]
Poor sleep is associated with. Neurology , author =. 2017 , pmid =. doi:10.1212/WNL.0000000000004171 , abstract =
-
[29]
The CARE guidelines: Consensus-based clinical case reporting guideline development
Gagnier, Joel J and Kienle, Gunver and Altman, Douglas G and Moher, David and Sox, Harold and Riley, David and CARE Group*. The CARE guidelines: Consensus-based clinical case reporting guideline development. Glob. Adv. Health Med
-
[30]
Akiba, Takuya and Sano, Shotaro and Yanase, Toshihiko and Ohta, Takeru and Koyama, Masanori. Optuna. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
-
[31]
Global Burden of Alzheimer's disease and other dementias in adults aged 65 years and older, 1991-2021: population-based study
Xiaopeng, Zhu and Jing, Yu and Xia, Lai and Xingsheng, Wang and Juan, Deng and Yan, Long and Baoshan, Li. Global Burden of Alzheimer's disease and other dementias in adults aged 65 years and older, 1991-2021: population-based study. Front. Public Health
1991
-
[32]
2024 , eprint=
The Llama 3 Herd of Models , author=. 2024 , eprint=
2024
-
[33]
Association and multimodal model of retinal and blood-based biomarkers for detection of preclinical Alzheimer's disease
Ravichandran, Swetha and Snyder, Peter J and Alber, Jessica and Murchison, Charles F and Chaby, Lauren E and Jeromin, Andreas and Arthur, Edmund. Association and multimodal model of retinal and blood-based biomarkers for detection of preclinical Alzheimer's disease. Alzheimers. Res. Ther
-
[34]
Association of retinal changes with Alzheimer disease neuroimaging biomarkers in cognitively normal individuals
Byun, Min Soo and Park, Sung Wook and Lee, Jun Ho and Yi, Dahyun and Jeon, So Yeon and Choi, Hyo Jung and Joung, Haejung and Ghim, Un Hyung and Park, Un Chul and Kim, Yu Kyeong and Shin, Seong A and Yu, Hyeong Gon and Lee, Dong Young and KBASE Research Group. Association of retinal changes with Alzheimer disease neuroimaging biomarkers in cognitively norm...
-
[35]
Nonvascular retinal imaging markers of preclinical Alzheimer's disease
Snyder, Peter J and Johnson, Lenworth N and Lim, Yen Ying and Santos, Cl \'a udia Y and Alber, Jessica and Maruff, Paul and Fern \'a ndez, Brian. Nonvascular retinal imaging markers of preclinical Alzheimer's disease. Alzheimers Dement. (Amst.)
-
[36]
Eslami, Sedigheh and Meinel, Christoph and de Melo, Gerard , editor =. Findings of the. 2023 , pages =. doi:10.18653/v1/2023.findings-eacl.88 , abstract =
-
[37]
2022 , eprint=
MedCLIP: Contrastive Learning from Unpaired Medical Images and Text , author=. 2022 , eprint=
2022
-
[38]
Retinal imaging findings in carriers with PSEN1-associated early-onset familial Alzheimer disease before onset of cognitive symptoms
Armstrong, Grayson W and Kim, Leo A and Vingopoulos, Filippos and others. Retinal imaging findings in carriers with PSEN1-associated early-onset familial Alzheimer disease before onset of cognitive symptoms. JAMA Ophthalmol
-
[39]
Retinal vascular fractals and cognitive impairment
Ong, Yi-Ting and Hilal, Saima and Cheung, Carol Yim-Lui and others. Retinal vascular fractals and cognitive impairment. Dement. Geriatr. Cogn. Dis. Extra
-
[40]
Alzheimer's disease and glaucoma: imaging the biomarkers of neurodegenerative disease
Valenti, Denise A. Alzheimer's disease and glaucoma: imaging the biomarkers of neurodegenerative disease. Int. J. Alzheimers. Dis
-
[41]
Vascular retinal biomarkers improves the detection of the likely cerebral amyloid status from hyperspectral retinal images
Sharafi, Sayed Mehran and Sylvestre, Jean-Philippe and Chevrefils, Claudia and Soucy, Jean-Paul and Beaulieu, Sylvain and Pascoal, Tharick A and Arbour, Jean Daniel and Rh \'e aume, Marc-Andr \'e and Robillard, Alain and Chayer, C \'e line and Rosa-Neto, Pedro and Mathotaarachchi, Sulantha S and Nasreddine, Ziad S and Gauthier, Serge and Lesage, Fr \'e d ...
-
[42]
Retinal vascular biomarkers for early detection and monitoring of Alzheimer's disease
Frost, S and Kanagasingam, Y and Sohrabi, H and Vignarajan, J and Bourgeat, P and Salvado, O and Villemagne, V and Rowe, C C and Macaulay, S Lance and Szoeke, C and Ellis, K A and Ames, D and Masters, C L and Rainey-Smith, S and Martins, R N and AIBL Research Group. Retinal vascular biomarkers for early detection and monitoring of Alzheimer's disease. Tra...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.