A Dual Edge Spatial Jacobian Image Graph for Interpretable Diabetic Retinopathy Grading

Imran Razzak; Inam Ullah; Shoaib Jameel

arxiv: 2606.24168 · v1 · pith:O25XGJESnew · submitted 2026-06-23 · 📡 eess.IV · cs.CV· stat.ML

A Dual Edge Spatial Jacobian Image Graph for Interpretable Diabetic Retinopathy Grading

Inam Ullah , Imran Razzak , Shoaib Jameel This is my paper

Pith reviewed 2026-06-25 22:22 UTC · model grok-4.3

classification 📡 eess.IV cs.CVstat.ML

keywords diabetic retinopathyfundus photographyimage graphinterpretabilitylesion detectionvascular biomarkerscontrastive embeddinggrading

0 comments

The pith

A dual-edge spatial-Jacobian graph fuses four aligned evidence streams to grade diabetic retinopathy while linking lesions to vascular structure.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper constructs each fundus image as a graph node carrying four matched streams: vessel maps, lesion evidence, a contrastive embedding, and morphometric biomarkers. A spatial edge branch captures how lesions sit relative to vessels while a Jacobian branch tracks how the embedding responds to biomarker changes. Lightweight two-token attention merges the two edge families into one image-level graph for grading. The construction is presented as a way to generate testable lesion-biomarker relations rather than a ready-to-deploy classifier.

Core claim

Each fundus image is represented as a graph node with four aligned evidence streams whose spatial and Jacobian edge relations are fused by two-token attention, yielding an interpretable representation that supports both grading and hypothesis generation about lesion-biomarker geometry.

What carries the argument

The dual-edge spatial-Jacobian image graph, where the spatial branch encodes vessel-lesion geometry and the Jacobian branch models embedding-biomarker sensitivity before two-token attention fusion.

If this is right

On 2,910 matched non-augmented APTOS images the full graph reaches 0.8076 accuracy, 0.8312 quadratic weighted kappa, 0.5915 macro-F1 and 0.9330 adjacent-grade accuracy.
Referable DR detection reaches 0.9055 accuracy and 0.9711 AUROC.
The resulting graph supplies an explainable representation for generating lesion-biomarker hypotheses rather than serving as a deployment classifier.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same dual-branch construction could be tested on other retinal conditions that combine vascular and focal lesion data.
Systematic ablation of individual streams would quantify how much each contributes to the reported metrics.
Re-running the pipeline on images acquired under different cameras or resolutions would test whether the alignment assumption holds outside the APTOS set.

Load-bearing premise

The four evidence streams are spatially aligned and carry complementary information that the spatial and Jacobian branches can fuse without introducing spurious correlations.

What would settle it

Performance would be expected to drop sharply if the four streams were deliberately spatially misaligned before fusion or if any single stream were removed while keeping the rest fixed.

Figures

Figures reproduced from arXiv: 2606.24168 by Imran Razzak, Inam Ullah, Shoaib Jameel.

read the original abstract

Automated diabetic retinopathy (DR) grading from colour fundus photographs can achieve strong predictive performance, but clinical interpretation requires more than an image-level label. It requires understanding how lesion evidence is distributed around retinal vessels and how this evidence relates to quantitative vascular biomarkers. We present a dual-edge spatial-Jacobian image graph for interpretable DR grading. Each fundus image is represented as a graph node with four aligned evidence streams: AutoMorph vessel information ($X_1$), DR-XAI-style lesion evidence maps ($X_2$), a 128-dimensional lesion-based contrastive image embedding ($X_3$), and AutoMorph morphometric biomarkers ($X_4$). The spatial edge branch ($X_{12}$) encodes vessel-lesion geometry, while the Jacobian branch ($X_{34}$) models embedding-biomarker sensitivity. Lightweight two-token attention fuses both edge families into a final image graph. On 2,910 matched non-augmented APTOS images, the full graph achieves 0.8076 accuracy, 0.8312 quadratic weighted kappa, 0.5915 macro-F1, and 0.9330 adjacent-grade accuracy; referable DR reaches 0.9055 accuracy and 0.9711 AUROC. The framework is positioned as an explainable representation-learning tool for lesion-biomarker hypothesis generation, rather than as a deployment-ready clinical classifier. The code is available at https://github.com/Inamullah-Colab/dual-edge-dr-graph-xai.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The dual-edge spatial-Jacobian graph fuses four evidence streams in a specific way that looks new, but the abstract gives no baselines or ablations so the interpretability gain is still unproven.

read the letter

The main new element is the dual-edge construction: a spatial branch linking vessel and lesion geometry plus a Jacobian branch linking the contrastive embedding to morphometric biomarkers, fused by two-token attention on top of the four aligned streams from AutoMorph and DR-XAI tools.

It does a reasonable job of describing a concrete architecture, reporting numbers on a clean 2,910-image non-augmented APTOS split (0.8076 accuracy, 0.8312 QWK, 0.9711 AUROC for referable DR), and releasing code. Framing the work as a hypothesis-generation tool rather than a deployable classifier is also fair.

The soft spots are the missing comparisons. No baselines appear, no ablations test whether removing one edge branch hurts, and there are no error bars or checks that the Jacobian edges actually surface biologically useful relations instead of spurious ones. The assumption that the four streams stay spatially aligned and complementary is plausible but untested in what we have. If the full paper supplies those pieces the soundness improves; otherwise the central claim rests on the architecture description alone.

This is for researchers already working on graph models or explainable retinal imaging who want a structured way to tie lesions to vascular features. A serious referee could evaluate the methods and any added validation. I would send it to peer review rather than desk reject because the construction is specific enough to be worth proper scrutiny.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a dual-edge spatial-Jacobian image graph to represent fundus photographs for interpretable diabetic retinopathy grading. Each image node integrates four aligned evidence streams (AutoMorph vessel information X1, DR-XAI-style lesion maps X2, 128-dimensional contrastive lesion embedding X3, and AutoMorph morphometric biomarkers X4). A spatial edge branch encodes vessel-lesion geometry while a Jacobian branch models embedding-biomarker sensitivity; these are fused via lightweight two-token attention. On a fixed split of 2,910 non-augmented APTOS images the full graph reports 0.8076 accuracy, 0.8312 quadratic weighted kappa, 0.5915 macro-F1 and 0.9330 adjacent-grade accuracy (referable DR: 0.9055 accuracy, 0.9711 AUROC). The work is framed as an explainable representation-learning tool for lesion-biomarker hypothesis generation rather than a deployment classifier, with code released at the cited GitHub repository.

Significance. If the dual-edge construction demonstrably surfaces biologically meaningful lesion-vessel relations without introducing spurious correlations, the framework could supply a useful bridge between lesion detection and quantitative vascular biomarkers for hypothesis generation in DR research. Public code availability is a clear strength that supports reproducibility and further experimentation.

major comments (2)

[Abstract / Evaluation] Abstract and evaluation section: performance numbers (accuracy 0.8076, QWK 0.8312, etc.) are stated without any baselines, error bars, ablation studies, or training-protocol details. Because the central claim is that the dual-edge graph yields useful interpretability, the absence of these controls leaves open whether the reported figures arise from the proposed architecture or from the underlying streams alone.
[Methods] Methods / model description: the design rests on the assumption that the four evidence streams are spatially aligned and carry complementary information that the spatial and Jacobian branches can fuse meaningfully. No quantitative check of alignment quality, cross-stream correlation analysis, or ablation removing one stream is supplied, which directly affects the load-bearing interpretability claim.

minor comments (2)

[Abstract] The phrase 'DR-XAI-style lesion evidence maps' is used without a precise citation or implementation detail for the lesion maps employed.
The manuscript would benefit from a short table or figure caption that explicitly lists the four input streams and the two edge families for quick reference.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the constructive review of our manuscript. We address each major comment below and commit to revisions that directly respond to the concerns raised.

read point-by-point responses

Referee: [Abstract / Evaluation] Abstract and evaluation section: performance numbers (accuracy 0.8076, QWK 0.8312, etc.) are stated without any baselines, error bars, ablation studies, or training-protocol details. Because the central claim is that the dual-edge graph yields useful interpretability, the absence of these controls leaves open whether the reported figures arise from the proposed architecture or from the underlying streams alone.

Authors: We agree that the absence of baselines and ablations leaves the contribution of the dual-edge construction unclear. In the revised version we will add a results table comparing the full model against (i) each individual stream used as a standalone classifier and (ii) ablated graph variants that disable the spatial or Jacobian branch. Expanded Methods text will detail the training protocol (optimizer, learning rate schedule, batch size, epochs, and data split). Where compute permits, we will rerun training with three random seeds and report mean ± std for all metrics. revision: yes
Referee: [Methods] Methods / model description: the design rests on the assumption that the four evidence streams are spatially aligned and carry complementary information that the spatial and Jacobian branches can fuse meaningfully. No quantitative check of alignment quality, cross-stream correlation analysis, or ablation removing one stream is supplied, which directly affects the load-bearing interpretability claim.

Authors: The four streams are produced from identical APTOS images via AutoMorph (X1, X4) and a lesion model (X2, X3), so pixel-level spatial alignment follows from the shared coordinate system. We nevertheless accept that explicit verification is needed. The revision will include a supplementary section reporting (a) pairwise Pearson correlations between the four feature maps and (b) performance ablations that successively remove each stream, thereby quantifying complementarity and supporting the interpretability rationale. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper constructs a dual-edge graph model from four aligned evidence streams (vessel info, lesion maps, embeddings, biomarkers) fused by lightweight attention, then reports standard classification metrics on a fixed public APTOS split. No equations are shown that define a target quantity in terms of itself or rename a fitted parameter as a prediction. No load-bearing self-citations, uniqueness theorems, or ansatzes imported from prior author work appear in the abstract or description. The central claim is an empirical performance number obtained by end-to-end training on external data, which remains independent of the model definition itself.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 1 invented entities

The approach rests on standard machine-learning assumptions about data alignment and complementarity plus several fitted components typical of graph attention models.

free parameters (2)

128-dimensional lesion embedding
Dimension chosen for the contrastive image embedding; value is a modeling choice that affects downstream fusion.
two-token attention weights
Learnable parameters of the lightweight attention that fuses the edge families; fitted during training.

axioms (2)

domain assumption The four evidence streams X1-X4 are spatially aligned and carry complementary information
Invoked when constructing the image graph node from AutoMorph, lesion maps, embedding, and biomarkers.
domain assumption The spatial edge branch meaningfully encodes vessel-lesion geometry
Core premise for the X12 branch.

invented entities (1)

Dual-edge spatial-Jacobian image graph no independent evidence
purpose: To fuse geometric and sensitivity information for interpretable grading
New graph structure introduced by the paper; no independent evidence outside the model itself.

pith-pipeline@v0.9.1-grok · 5811 in / 1563 out tokens · 28874 ms · 2026-06-25T22:22:58.583119+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

10 extracted references · 7 canonical work pages

[1]

Asia Pacific Tele-Ophthalmology Society. 2019. APTOS 2019 Blindness Detection. Kaggle competition dataset. https://www.kaggle.com/c/aptos2019-blindness- detection Online dataset

2019
[2]

Yoav Benjamini and Yosef Hochberg. 1995. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing.Journal of the Royal Statistical Society: Series B57, 1 (1995), 289–300. doi:10.1111/j.2517-6161.1995.tb02031.x

work page doi:10.1111/j.2517-6161.1995.tb02031.x 1995
[3]

Chew, Stephen A

Emily Y. Chew, Stephen A. Burns, Alison G. Abraham, Michael F. Bakhoum, Joshua A. Beckman, Toco Y. P. Chui, Robert P. Finger, Alejandro F. Frangi, Re- becca F. Gottesman, Maria B. Grant, Henner Hanssen, Cecilia S. Lee, Michelle L. Meyer, Damiano Rizzoni, Alicja R. Rudnicka, Joel S. Schuman, Sara B. Seidelmann, W. H. Wilson Tang, B. B. Adhikari, N. Danthi,...

work page doi:10.1038/s41569-024-01060-8 2025
[4]

Rishab Gargeya and Theodore Leng. 2017. Automated Identification of Diabetic Retinopathy Using Deep Learning.Ophthalmology124, 7 (2017), 962–969. doi:10. 1016/j.ophtha.2017.02.008

2017
[5]

Stumpe, Derek Wu, Arunacha- lam Narayanaswamy, Subhashini Venugopalan, Kasumi Widner, Tom Madams, Jorge Cuadros, Ramasamy Kim, Rajiv Raman, Philip C

Varun Gulshan, Lily Peng, Marc Coram, Martin C. Stumpe, Derek Wu, Arunacha- lam Narayanaswamy, Subhashini Venugopalan, Kasumi Widner, Tom Madams, Jorge Cuadros, Ramasamy Kim, Rajiv Raman, Philip C. Nelson, Jessica L. Mega, and Dale R. Webster. 2016. Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fu...

work page doi:10.1001/jama.2016.17216 2016
[6]

Yijin Huang, Li Lin, Pujin Cheng, Junyan Lyu, and Xiaoying Tang. 2021. Lesion- Based Contrastive Learning for Diabetic Retinopathy Grading from Fundus Images. InMedical Image Computing and Computer Assisted Intervention – MICCAI 2021 (Lecture Notes in Computer Science, Vol. 12902). Springer, 113–123. doi:10.1007/978- 3-030-87196-3_11

work page doi:10.1007/978- 2021
[7]

In: 2017 IEEE International Conference on Computer Vision (ICCV)

Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedan- tam, Devi Parikh, and Dhruv Batra. 2017. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. InProceedings of the IEEE Inter- national Conference on Computer Vision. 618–626. doi:10.1109/ICCV.2017.74

work page doi:10.1109/iccv.2017.74 2017
[8]

Zhengwei Zhang, Callie Deng, and Yannis M. Paulus. 2024. Advances in Structural and Functional Retinal Imaging and Biomarkers for Early Detection of Diabetic Retinopathy.Biomedicines12, 7 (2024), 1405. doi:10.3390/biomedicines12071405

work page doi:10.3390/biomedicines12071405 2024
[9]

Wagner, Mark A

Yukun Zhou, Siegfried K. Wagner, Mark A. Chia, An Zhao, Peter Woodward-Court, Moucheng Xu, Robbert R. Struyven, Daniel C. Alexander, and Pearse A. Keane
[10]

doi:10.1167/tvst.11.7.12

AutoMorph: Automated Retinal Vascular Morphology Quantification Via a Deep Learning Pipeline.Translational Vision Science & Technology11, 7 (2022), 12. doi:10.1167/tvst.11.7.12

work page doi:10.1167/tvst.11.7.12 2022

[1] [1]

Asia Pacific Tele-Ophthalmology Society. 2019. APTOS 2019 Blindness Detection. Kaggle competition dataset. https://www.kaggle.com/c/aptos2019-blindness- detection Online dataset

2019

[2] [2]

Yoav Benjamini and Yosef Hochberg. 1995. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing.Journal of the Royal Statistical Society: Series B57, 1 (1995), 289–300. doi:10.1111/j.2517-6161.1995.tb02031.x

work page doi:10.1111/j.2517-6161.1995.tb02031.x 1995

[3] [3]

Chew, Stephen A

Emily Y. Chew, Stephen A. Burns, Alison G. Abraham, Michael F. Bakhoum, Joshua A. Beckman, Toco Y. P. Chui, Robert P. Finger, Alejandro F. Frangi, Re- becca F. Gottesman, Maria B. Grant, Henner Hanssen, Cecilia S. Lee, Michelle L. Meyer, Damiano Rizzoni, Alicja R. Rudnicka, Joel S. Schuman, Sara B. Seidelmann, W. H. Wilson Tang, B. B. Adhikari, N. Danthi,...

work page doi:10.1038/s41569-024-01060-8 2025

[4] [4]

Rishab Gargeya and Theodore Leng. 2017. Automated Identification of Diabetic Retinopathy Using Deep Learning.Ophthalmology124, 7 (2017), 962–969. doi:10. 1016/j.ophtha.2017.02.008

2017

[5] [5]

Stumpe, Derek Wu, Arunacha- lam Narayanaswamy, Subhashini Venugopalan, Kasumi Widner, Tom Madams, Jorge Cuadros, Ramasamy Kim, Rajiv Raman, Philip C

Varun Gulshan, Lily Peng, Marc Coram, Martin C. Stumpe, Derek Wu, Arunacha- lam Narayanaswamy, Subhashini Venugopalan, Kasumi Widner, Tom Madams, Jorge Cuadros, Ramasamy Kim, Rajiv Raman, Philip C. Nelson, Jessica L. Mega, and Dale R. Webster. 2016. Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fu...

work page doi:10.1001/jama.2016.17216 2016

[6] [6]

Yijin Huang, Li Lin, Pujin Cheng, Junyan Lyu, and Xiaoying Tang. 2021. Lesion- Based Contrastive Learning for Diabetic Retinopathy Grading from Fundus Images. InMedical Image Computing and Computer Assisted Intervention – MICCAI 2021 (Lecture Notes in Computer Science, Vol. 12902). Springer, 113–123. doi:10.1007/978- 3-030-87196-3_11

work page doi:10.1007/978- 2021

[7] [7]

In: 2017 IEEE International Conference on Computer Vision (ICCV)

Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedan- tam, Devi Parikh, and Dhruv Batra. 2017. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. InProceedings of the IEEE Inter- national Conference on Computer Vision. 618–626. doi:10.1109/ICCV.2017.74

work page doi:10.1109/iccv.2017.74 2017

[8] [8]

Zhengwei Zhang, Callie Deng, and Yannis M. Paulus. 2024. Advances in Structural and Functional Retinal Imaging and Biomarkers for Early Detection of Diabetic Retinopathy.Biomedicines12, 7 (2024), 1405. doi:10.3390/biomedicines12071405

work page doi:10.3390/biomedicines12071405 2024

[9] [9]

Wagner, Mark A

Yukun Zhou, Siegfried K. Wagner, Mark A. Chia, An Zhao, Peter Woodward-Court, Moucheng Xu, Robbert R. Struyven, Daniel C. Alexander, and Pearse A. Keane

[10] [10]

doi:10.1167/tvst.11.7.12

AutoMorph: Automated Retinal Vascular Morphology Quantification Via a Deep Learning Pipeline.Translational Vision Science & Technology11, 7 (2022), 12. doi:10.1167/tvst.11.7.12

work page doi:10.1167/tvst.11.7.12 2022