pith. machine review for the scientific record. sign in

arxiv: 2605.02589 · v1 · submitted 2026-05-04 · 💻 cs.CV · cs.LG

Recognition: 2 theorem links

· Lean Theorem

Representation learning from OCT images

Hedi Tabia , D\'esir\'e Sidib\'e , Nawres Khlifa , Ahmed Tabia , Ines Rahmany , Noura Aboudi , Zainab Haddad , Hajer Khachnaoui , Hsouna Zgolli

Authors on Pith no claims yet

Pith reviewed 2026-05-08 18:27 UTC · model grok-4.3

classification 💻 cs.CV cs.LG
keywords optical coherence tomographyrepresentation learningretinal imagingdeep learningself-supervised learningfoundation modelsmedical image analysisvision-language models
0
0 comments X

The pith

Representation learning for retinal OCT images advances from supervised CNNs and transformers through self-supervised and generative methods to foundation models and vision-language systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This survey organizes the literature on representation learning for Optical Coherence Tomography images of the retina along a taxonomy of learning paradigms. It starts with early supervised deep networks, moves through self-supervised, semi-supervised, and generative techniques, then covers 3D volumetric modeling, multimodal approaches, and recent large pretrained foundation models. The aim is to reduce reliance on expert annotations while achieving more consistent analysis across devices and patient populations. A sympathetic reader cares because OCT produces high volumes of detailed scans whose manual review is impractical, so better automated representations could support faster and more reliable diagnosis of eye conditions.

Core claim

The survey reviews representation learning for retinal OCT images by covering supervised CNN-based and transformer architectures, self-supervised and semi-supervised methods, generative approaches, 3D volumetric modeling, multimodal representation learning, and large-scale pretrained foundation models. For each paradigm it examines core methodological contributions, identifies persistent limitations, and traces connections between successive approaches. It supplies a structured list of public OCT datasets, discusses evaluation protocol issues, presents a unified mathematical formulation that places every paradigm inside the same problem setup, and outlines pressing open directions including

What carries the argument

The principled taxonomy of learning paradigms together with the unified mathematical formulation that situates supervised, self-supervised, generative, multimodal, and foundation-model methods inside one common framework for OCT image representation.

If this is right

  • The taxonomy reveals a clear progression from label-heavy supervised methods toward approaches that need far less manual annotation.
  • The unified formulation lets researchers compare different paradigms directly on the same mathematical footing.
  • Listing public datasets and evaluation considerations supports more standardized benchmarking across studies.
  • The highlighted open directions, such as volumetric foundation-model pretraining and fairness mitigation, follow directly from the limitations identified in earlier paradigms.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same taxonomy could be applied to representation learning in other high-volume medical scans such as ultrasound or MRI to spot reusable techniques.
  • Vision-language models listed in the open directions may eventually supply natural-language explanations that help clinicians trust the outputs.
  • Federated and privacy-preserving training highlighted in the survey could become necessary for real-world deployment where patient data cannot leave local hospitals.
  • An implicit next step is to test whether models pretrained on the public OCT datasets listed in the survey generalize to new scanner vendors without additional labels.

Load-bearing premise

The selected papers and the chosen taxonomy of paradigms give an accurate and reasonably complete picture of the field without major omissions or systematic bias in coverage.

What would settle it

A major OCT representation learning paper or method published before the survey cutoff that fits none of the described paradigms and is absent from the review would show the taxonomy and coverage are incomplete.

Figures

Figures reproduced from arXiv: 2605.02589 by Ahmed Tabia, D\'esir\'e Sidib\'e, Hajer Khachnaoui, Hedi Tabia, Hsouna Zgolli, Ines Rahmany, Nawres Khlifa, Noura Aboudi, Zainab Haddad.

Figure 1
Figure 1. Figure 1: From left to right, four OCT B-scans corresponding to: drusen, choroidal neovascularization, diabetic macular view at source ↗
Figure 2
Figure 2. Figure 2: Hierarchical taxonomy of representation learning methods for Optical Coherence Tomography (OCT) image view at source ↗
read the original abstract

Optical Coherence Tomography (OCT) has become one of the most used imaging modality in ophthalmology. It provides high-resolution, non-invasive visualization of retinal microarchitecture. The automated analysis of OCT images through representation learning has emerged as a central research frontier. This has mainly been driven by the clinical need to process large acquisition volumes. The objective is to reduce the reliance on expert annotation, and improve diagnostic consistency across devices and populations. This survey provides a comprehensive and structured review of representation learning methods for retinal OCT image analysis. It covers the period from early deep learning approaches to the most recent developments in foundation models and vision-language systems. We organize the literature along a principled taxonomy of learning paradigms, encompassing supervised learning with CNN-based and transformer-based architectures, self-supervised and semi-supervised methods, generative approaches, as well as 3D volumetric modeling, multimodal representation learning, and large-scale pretrained foundation models. For each paradigm, we analyze the core methodological contributions, identify persistent limitations, and trace the connections between successive approaches. We further provide a structured overview of publicly available OCT datasets, discuss evaluation protocol considerations, and present a unified problem formulation that situates each learning paradigm within a common mathematical framework. Building on this analysis, we identify and discuss the most pressing open research directions emerging in the literature. This includes volumetric foundation model pretraining, uncertainty-aware representation learning, federated and privacy-preserving training, fairness and bias mitigation, concept-based interpretability,...

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript is a survey reviewing representation learning methods for retinal OCT image analysis in ophthalmology. It organizes prior work according to a taxonomy of paradigms (supervised CNN/transformer architectures, self-supervised/semi-supervised learning, generative models, 3D volumetric modeling, multimodal learning, and large-scale foundation models), provides a unified mathematical formulation, surveys public datasets and evaluation protocols, and outlines open challenges such as volumetric pretraining, uncertainty modeling, federated learning, fairness, and interpretability.

Significance. If the coverage proves complete and the taxonomy reproducible, the survey would offer a timely consolidation of a fast-moving area at the intersection of computer vision and ophthalmic imaging, particularly by tracing the shift toward foundation and vision-language models. The unified formulation and dataset overview add practical value for new researchers. However, the absence of a documented literature selection process substantially weakens its authority as a field-wide reference.

major comments (1)
  1. [Abstract and Introduction] Abstract and Introduction: The central claim that the work delivers a 'comprehensive and structured review' organized along a 'principled taxonomy' is not supported by any description of the literature search methodology (databases, keywords, date ranges, inclusion/exclusion criteria, or screening process). Without this, it is impossible to verify that the selected papers form an exhaustive or unbiased map of the field rather than reflecting author curation or recency bias, directly undermining the survey's core contribution.
minor comments (1)
  1. [Abstract] The abstract sentence on open directions is truncated after 'concept-based interpretability,...'; this should be completed for clarity.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our survey of representation learning methods for retinal OCT image analysis. We address the major comment regarding the literature search methodology below.

read point-by-point responses
  1. Referee: [Abstract and Introduction] Abstract and Introduction: The central claim that the work delivers a 'comprehensive and structured review' organized along a 'principled taxonomy' is not supported by any description of the literature search methodology (databases, keywords, date ranges, inclusion/exclusion criteria, or screening process). Without this, it is impossible to verify that the selected papers form an exhaustive or unbiased map of the field rather than reflecting author curation or recency bias, directly undermining the survey's core contribution.

    Authors: We agree that explicitly documenting the literature selection process would strengthen the transparency and authority of the survey. Although the taxonomy is organized by learning paradigms (supervised, self-supervised, generative, multimodal, and foundation models) rather than an exhaustive enumeration of every paper, the selection was informed by a broad review of influential works tracing the field's evolution. In the revised manuscript, we will add a dedicated subsection in the Introduction (or a new 'Survey Methodology' section) that details the process: databases searched (PubMed, IEEE Xplore, arXiv, Google Scholar), primary keywords and combinations (e.g., 'OCT' OR 'optical coherence tomography' AND ('representation learning' OR 'self-supervised' OR 'foundation model' OR 'vision-language')), date range (2012–2024 to cover the deep learning era onward), inclusion criteria (peer-reviewed works focused on representation learning for retinal OCT, with emphasis on methodological novelty), and screening (initial search followed by title/abstract review and full-text assessment for relevance to the taxonomy categories). This addition will clarify scope and potential biases while preserving the paper's structure and contributions. revision: yes

Circularity Check

0 steps flagged

No circularity: survey paper with no derivations or predictions

full rationale

This is a literature review that organizes prior work under a taxonomy and presents a unified problem formulation as an organizational device. No original derivations, equations, fitted parameters, or predictions are claimed. The patterns for circularity (self-definitional claims, fitted inputs renamed as predictions, load-bearing self-citations, uniqueness theorems, ansatz smuggling, or renaming known results) do not apply because there is no derivation chain to inspect. Literature selection and taxonomy choices may reflect author judgment, but this is not circularity under the specified criteria.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

As a literature survey the paper does not introduce new free parameters, axioms, or invented entities. It relies on standard concepts from machine learning and computer vision already established in the cited literature.

pith-pipeline@v0.9.0 · 5598 in / 1054 out tokens · 38143 ms · 2026-05-08T18:27:48.591607+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

185 extracted references · 16 canonical work pages · 1 internal anchor

  1. [1]

    Optical coherence tomography.science, 254(5035):1178–1181, 1991

    David Huang, Eric A Swanson, Charles P Lin, Joel S Schuman, William G Stinson, Warren Chang, Michael R Hee, Thomas Flotte, Kenton Gregory, Carmen A Puliafito, et al. Optical coherence tomography.science, 254(5035):1178–1181, 1991

  2. [2]

    CRC Press, 2024

    Joel S Schuman, James G Fujimoto, Jay Duker, and Hiroshi Ishikawa.Optical coherence tomography of ocular diseases. CRC Press, 2024

  3. [3]

    Optical coherence tomography of age-related macular degeneration and choroidal neovascularization.Ophthalmology, 103(8):1260–1270, 1996

    Michael R Hee, Caroline R Baumal, Carmen A Puliafito, Jay S Duker, Elias Reichel, Jason R Wilkins, Jeffery G Coker, Joel S Schuman, Eric A Swanson, and James G Fujimoto. Optical coherence tomography of age-related macular degeneration and choroidal neovascularization.Ophthalmology, 103(8):1260–1270, 1996

  4. [4]

    Comparison of the clinical diagnosis of diabetic macular edema with diagnosis by optical coherence tomography.Ophthalmology, 111(4):712–715, 2004

    David J Browning, Michael D McOwen, Robert M Bowen Jr, and Tisha L O’Marah. Comparison of the clinical diagnosis of diabetic macular edema with diagnosis by optical coherence tomography.Ophthalmology, 111(4):712–715, 2004

  5. [5]

    Optical coherence tomography to detect and manage retinal disease and glaucoma.American journal of ophthalmology, 137(1):156–169, 2004

    Glenn J Jaffe and Joseph Caprioli. Optical coherence tomography to detect and manage retinal disease and glaucoma.American journal of ophthalmology, 137(1):156–169, 2004

  6. [6]

    Optical coherence tomography findings after an intravitreal injection of bevacizumab (avastin®) for macular edema from central retinal vein occlusion, 2005

    Philip J Rosenfeld, Anne E Fung, and Carmen A Puliafito. Optical coherence tomography findings after an intravitreal injection of bevacizumab (avastin®) for macular edema from central retinal vein occlusion, 2005

  7. [7]

    Accuracy of spectral-domain oct of the macula for detection of complete posterior vitreous detachment.Ophthalmology Retina, 4(2):148–153, 2020

    Eileen S Hwang, Jessica A Kraker, Kim J Griffin, J Sebag, David V Weinberg, and Judy E Kim. Accuracy of spectral-domain oct of the macula for detection of complete posterior vitreous detachment.Ophthalmology Retina, 4(2):148–153, 2020

  8. [8]

    Anterior segment optical coherence tomography with angiography for the cornea and ocular surface.Journal of Clinical Medicine, 15(6):2402, 2026

    Qiu Ying Wong, Sim Ralene, and Marcus Ang. Anterior segment optical coherence tomography with angiography for the cornea and ocular surface.Journal of Clinical Medicine, 15(6):2402, 2026

  9. [9]

    Speckle in optical coherence tomography.Journal of biomedical optics, 4(1):95–105, 1999

    Joseph M Schmitt, SH Xiang, and Kin Man Yung. Speckle in optical coherence tomography.Journal of biomedical optics, 4(1):95–105, 1999

  10. [10]

    Efficient reduction of speckle noise in optical coherence tomography.Optics express, 20(2):1337– 1359, 2012

    Maciej Szkulmowski, Iwona Gorczynska, Daniel Szlag, Marcin Sylwestrzak, Andrzej Kowalczyk, and Maciej Wojtkowski. Efficient reduction of speckle noise in optical coherence tomography.Optics express, 20(2):1337– 1359, 2012

  11. [11]

    Agreement between spectral-domain and time-domain oct for measuring rnfl thickness.British Journal of Ophthalmology, 93(6):775–781, 2009

    Gianmarco Vizzeri, Robert N Weinreb, Alberto O Gonzalez-Garcia, Christopher Bowd, Felipe A Medeiros, Pamela A Sample, and Linda M Zangwill. Agreement between spectral-domain and time-domain oct for measuring rnfl thickness.British Journal of Ophthalmology, 93(6):775–781, 2009

  12. [12]

    Clinical factors associated with long-term oct variability in glaucoma.American journal of ophthalmology, 255:98–106, 2023

    Jo-Hsuan Wu, Sasan Moghimi, Evan Walker, Takashi Nishida, Jeffrey M Liebmann, Massimo Fazio, Christo- pher A Girkin, Linda M Zangwill, and Robert N Weinreb. Clinical factors associated with long-term oct variability in glaucoma.American journal of ophthalmology, 255:98–106, 2023

  13. [13]

    Performance evaluation of retinal oct fluid segmentation, detection, and generalization over variations of data sources.IEEE Access, 12:31719–31735, 2024

    Nchongmaje Ndipenoch, Alina Miron, and Yongmin Li. Performance evaluation of retinal oct fluid segmentation, detection, and generalization over variations of data sources.IEEE Access, 12:31719–31735, 2024

  14. [14]

    Assessment of artifacts and reproducibility across spectral-and time-domain optical coherence tomography devices.Ophthalmology, 116(10):1960–1970, 2009

    Joseph Ho, Alan C Sull, Laurel N Vuong, Yueli Chen, Jonathan Liu, James G Fujimoto, Joel S Schuman, and Jay S Duker. Assessment of artifacts and reproducibility across spectral-and time-domain optical coherence tomography devices.Ophthalmology, 116(10):1960–1970, 2009

  15. [15]

    Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 2002

    Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 2002

  16. [16]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020

  17. [17]

    Transforming auto-encoders

    Geoffrey E Hinton, Alex Krizhevsky, and Sida D Wang. Transforming auto-encoders. InInternational conference on artificial neural networks, pages 44–51. Springer, 2011

  18. [18]

    Generative adversarial networks.Communications of the ACM, 63(11):139–144, 2020

    Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks.Communications of the ACM, 63(11):139–144, 2020. 20

  19. [19]

    Self-supervised learning: Generative or contrastive.IEEE transactions on knowledge and data engineering, 35(1):857–876, 2021

    Xiao Liu, Fanjin Zhang, Zhenyu Hou, Li Mian, Zhaoyu Wang, Jing Zhang, and Jie Tang. Self-supervised learning: Generative or contrastive.IEEE transactions on knowledge and data engineering, 35(1):857–876, 2021

  20. [20]

    Kermany, Michael Goldbaum, Wenjia Cai, Carolina C.S

    Daniel S. Kermany, Michael Goldbaum, Wenjia Cai, Carolina C.S. Valentim, Huiying Liang, Sally L. Baxter, Alex McKeown, Ge Yang, Xiaokang Wu, Fangbing Yan, et al. Identifying medical diagnoses and treatable diseases by image-based deep learning.Cell, 172(5):1122–1131, 2018

  21. [21]

    Statistical model for oct image denoising.Biomedical optics express, 8(9):3903–3917, 2017

    Muxingzi Li, Ramzi Idoughi, Biswarup Choudhury, and Wolfgang Heidrich. Statistical model for oct image denoising.Biomedical optics express, 8(9):3903–3917, 2017

  22. [22]

    A review of state-of-the-art speckle reduction techniques for optical coherence tomography fingertip scans

    Luke Nicholas Darlow, Sharat Saurabh Akhoury, and James Connan. A review of state-of-the-art speckle reduction techniques for optical coherence tomography fingertip scans. InSeventh International Conference on Machine Vision (ICMV 2014), volume 9445, pages 418–426. SPIE, 2015

  23. [23]

    Retinal imaging and image analysis.IEEE reviews in biomedical engineering, 3:169–208, 2010

    Michael D Abràmoff, Mona K Garvin, and Milan Sonka. Retinal imaging and image analysis.IEEE reviews in biomedical engineering, 3:169–208, 2010

  24. [24]

    State-of-the-art in retinal optical coherence tomography image analysis.Quantitative imaging in medicine and surgery, 5(4):603, 2015

    Ahmadreza Baghaie, Zeyun Yu, and Roshan M D’Souza. State-of-the-art in retinal optical coherence tomography image analysis.Quantitative imaging in medicine and surgery, 5(4):603, 2015

  25. [25]

    Machine learning techniques for diabetic macular edema (dme) classification on sd-oct images.Biomedical engineering online, 16(1):68, 2017

    Khaled Alsaih, Guillaume Lemaitre, Mojdeh Rastgoo, Joan Massich, Désiré Sidibé, and Fabrice Meriaudeau. Machine learning techniques for diabetic macular edema (dme) classification on sd-oct images.Biomedical engineering online, 16(1):68, 2017

  26. [26]

    Octdl: Optical coherence tomography dataset for image-based deep learning methods.Scientific data, 11(1):365, 2024

    Mikhail Kulyabin, Aleksei Zhdanov, Anastasia Nikiforova, Andrey Stepichev, Anna Kuznetsova, Mikhail Ronkin, Vasilii Borisov, Alexander Bogachev, Sergey Korotkich, Paul A Constable, et al. Octdl: Optical coherence tomography dataset for image-based deep learning methods.Scientific data, 11(1):365, 2024

  27. [27]

    Classifica- tion of retinal oct images using deep learning

    Malliga Subramanian, Kogilavani Shanmugavadivel, Obuli Sai Naren, K Premkumar, and K Rankish. Classifica- tion of retinal oct images using deep learning. In2022 international conference on computer communication and informatics (ICCCI), pages 1–7. IEEE, 2022

  28. [28]

    Chiu, Michael J

    Stephanie J. Chiu, Michael J. Allingham, Priyatham S. Mettu, Scott W. Cousins, Joseph A. Izatt, and Sina Farsiu. Kernel regression based segmentation of optical coherence tomography images with diabetic macular edema. Biomedical Optics Express, 6(4):1172–1194, 2015

  29. [29]

    Waldstein, José Ignacio Orlando, Matteo Baroni, Ciarán N

    Hrvoje Bogunovi´c, Freya Maintau-Sanchez, Sebastian M. Waldstein, José Ignacio Orlando, Matteo Baroni, Ciarán N. Bhreartaigh, et al. RETOUCH: The retinal OCT fluid detection and segmentation benchmark and challenge.IEEE Transactions on Medical Imaging, 38(8):1858–1874, 2019

  30. [30]

    Annotated retinal optical coherence tomography images (AROI) database for joint retinal layer and fluid segmentation.Automatika, 62(3):375–385, 2021

    Matej Melinšˇcak, Marin Radmilovi´c, Zoran Vatavuk, and Sven Lonˇcari´c. Annotated retinal optical coherence tomography images (AROI) database for joint retinal layer and fluid segmentation.Automatika, 62(3):375–385, 2021

  31. [31]

    Oimhs: An optical coherence tomography image dataset based on macular hole manual segmentation.Scientific Data, 10(1):769, 2023

    Xin Ye, Shucheng He, Xiaxing Zhong, Jiafeng Yu, Shangchao Yang, Yingjiao Shen, Yiqi Chen, Yaqi Wang, Xingru Huang, and Lijun Shen. Oimhs: An optical coherence tomography image dataset based on macular hole manual segmentation.Scientific Data, 10(1):769, 2023

  32. [32]

    OCT5k: A dataset of multi-disease and multi-graded annotations for retinal layers.Scientific Data, 12(1):267, 2025

    Murat Arikan, James Willoughby, Serhat Ongun, et al. OCT5k: A dataset of multi-disease and multi-graded annotations for retinal layers.Scientific Data, 12(1):267, 2025

  33. [33]

    Gamma challenge: glaucoma grading from multi-modality images.Medical Image Analysis, 90:102938, 2023

    Junde Wu, Huihui Fang, Fei Li, Huazhu Fu, Fengbin Lin, Jiongcheng Li, Yue Huang, Qinji Yu, Sifan Song, Xinxing Xu, et al. Gamma challenge: glaucoma grading from multi-modality images.Medical Image Analysis, 90:102938, 2023

  34. [34]

    MultiEYE: Dataset and benchmark for OCT-enhanced retinal disease recognition from fundus images.IEEE Transactions on Medical Imaging, 44(4):1711–1722, 2025

    Lehan Wang, Chongchong Qi, Chubin Ou, Lin An, Mei Jin, Xiangbin Kong, and Xiaomeng Li. MultiEYE: Dataset and benchmark for OCT-enhanced retinal disease recognition from fundus images.IEEE Transactions on Medical Imaging, 44(4):1711–1722, 2025

  35. [35]

    Octa-500: a retinal dataset for optical coherence tomography angiography study

    Mingchao Li, Kun Huang, Qiuzhuo Xu, Jiadong Yang, Yuhan Zhang, Zexuan Ji, Keren Xie, Songtao Yuan, Qinghuai Liu, and Qiang Chen. Octa-500: a retinal dataset for optical coherence tomography angiography study. Medical image analysis, 93:103092, 2024

  36. [36]

    Srinivasan, Leo A

    Pratul P. Srinivasan, Leo A. Kim, Priyatham S. Mettu, Scott W. Cousins, Grant M. Comer, Joseph A. Izatt, and Sina Farsiu. Fully automated detection of diabetic macular edema and dry age-related macular degeneration from optical coherence tomography images.Biomedical Optics Express, 5(10):3568–3577, 2014

  37. [37]

    OCTID: Optical coherence tomography image database.Computers & Electrical Engineering, 81:106532, 2020

    Peyman Gholami, Priyanka Roy, Mohana Kuppuswamy Parthasarathy, and Vasudevan Lakshminarayanan. OCTID: Optical coherence tomography image database.Computers & Electrical Engineering, 81:106532, 2020. 21

  38. [38]

    Jedynak, Sharon D

    Yufan He, Aaron Carass, Yihao Liu, Bruno M. Jedynak, Sharon D. Solomon, Shiv Saidha, Peter A. Calabresi, and Jerry L. Prince. Retinal layer parcellation of optical coherence tomography images: Data resource for multiple sclerosis and healthy controls.Data in Brief, 22:601–604, 2019

  39. [39]

    Fully-automated segmentation of fluid regions in exudative age-related macular degeneration subjects: Kernel graph cut in neutrosophic domain.PloS one, 12(10):e0186949, 2017

    Abdolreza Rashno, Behzad Nazari, Dara D Koozekanani, Paul M Drayna, Saeed Sadri, Hossein Rabbani, and Keshab K Parhi. Fully-automated segmentation of fluid regions in exudative age-related macular degeneration subjects: Kernel graph cut in neutrosophic domain.PloS one, 12(10):e0186949, 2017

  40. [40]

    A composite retinal fundus and oct dataset to grade macular and glaucomatous disorders

    Taimur Hassan, Hina Raja, Bilal Hassan, Muhammad Usman Akram, Jorge Dias, and Naoufel Werghi. A composite retinal fundus and oct dataset to grade macular and glaucomatous disorders. In2022 2nd International Conference on Digital Futures and Transformative Technologies (ICoDT2), pages 1–6. IEEE, 2022

  41. [41]

    Learning two-stream cnn for multi-modal age-related macular degeneration categorization.IEEE Journal of Biomedical and Health Informatics, 26(8):4111–4122, 2022

    Weisen Wang, Xirong Li, Zhiyan Xu, Weihong Yu, Jianchun Zhao, Dayong Ding, and Youxin Chen. Learning two-stream cnn for multi-modal age-related macular degeneration categorization.IEEE Journal of Biomedical and Health Informatics, 26(8):4111–4122, 2022

  42. [42]

    A retinal oct- angiography and cardiovascular status (rasta) dataset of swept-source microvascular imaging for cardiovascular risk assessment.Data, 8(10):147, 2023

    Clément Germanèse, Fabrice Meriaudeau, Pétra Eid, Ramin Tadayoni, Dominique Ginhac, Atif Anwer, Steinberg Laure-Anne, Charles Guenancia, Catherine Creuzot-Garcher, Pierre-Henry Gabrielle, et al. A retinal oct- angiography and cardiovascular status (rasta) dataset of swept-source microvascular imaging for cardiovascular risk assessment.Data, 8(10):147, 2023

  43. [43]

    Rose: a retinal oct-angiography vessel segmentation dataset and new model.IEEE transactions on medical imaging, 40(3):928–939, 2020

    Yuhui Ma, Huaying Hao, Jianyang Xie, Huazhu Fu, Jiong Zhang, Jianlong Yang, Zhen Wang, Jiang Liu, Yalin Zheng, and Yitian Zhao. Rose: a retinal oct-angiography vessel segmentation dataset and new model.IEEE transactions on medical imaging, 40(3):928–939, 2020

  44. [44]

    Syn-oct: A synthetic dataset of ocular optical coherence tomography images from healthy and glaucoma eyes.Scientific Data, 2026

    Damon Wong, Ashish Jith Sreejith Kumar, Rachel S Chong, Monisha E Nongpiur, Rahat Husain, Tina Wong, Shamira Perera, Tin Aung, Bingyao Tan, Ching-Yu Cheng, et al. Syn-oct: A synthetic dataset of ocular optical coherence tomography images from healthy and glaucoma eyes.Scientific Data, 2026

  45. [45]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016

  46. [46]

    Densely connected convolutional networks

    Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. Densely connected convolutional networks. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 4700–4708, 2017

  47. [47]

    U-net: Convolutional networks for biomedical image segmentation

    Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. InInternational Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer, 2015

  48. [48]

    A simple framework for contrastive learning of visual representations

    Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. InInternational conference on machine learning, pages 1597–1607. PmLR, 2020

  49. [49]

    Momentum contrast for unsupervised visual representation learning

    Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual representation learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738, 2020

  50. [50]

    Bootstrap your own latent-a new approach to self-supervised learning.Advances in neural information processing systems, 33:21271–21284, 2020

    Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Guo, Mohammad Gheshlaghi Azar, et al. Bootstrap your own latent-a new approach to self-supervised learning.Advances in neural information processing systems, 33:21271–21284, 2020

  51. [51]

    Octnet: A lightweight cnn for retinal disease classification from optical coherence tomography images.Computer methods and programs in biomedicine, 200:105877, 2021

    AP Sunija, Saikat Kar, S Gayathri, Varun P Gopi, and Ponnusamy Palanisamy. Octnet: A lightweight cnn for retinal disease classification from optical coherence tomography images.Computer methods and programs in biomedicine, 200:105877, 2021

  52. [52]

    Ali Mohammad Alqudah. Aoct-net: a convolutional network automated classification of multiclass retinal diseases using spectral-domain optical coherence tomography images.Medical & biological engineering & computing, 58(1):41–53, 2020

  53. [53]

    Classification of sd-oct images using a deep learning approach

    Muhammad Awais, Henning Müller, Tong B Tang, and Fabrice Meriaudeau. Classification of sd-oct images using a deep learning approach. In2017 IEEE International Conference on Signal and Image Processing Applications (ICSIPA), pages 489–492. IEEE, 2017

  54. [54]

    Prediction of postoperative visual acuity in rhegmatogenous retinal detachment using oct images.IEEE Access, 11:135435–135448, 2023

    Sinda Hosni, Hajer Khachnaoui, Hsouna Mehdi Zgolli, Sonya Mabrouk, Désiré Sidibé, Hedi Tabia, and Nawres Khlifa. Prediction of postoperative visual acuity in rhegmatogenous retinal detachment using oct images.IEEE Access, 11:135435–135448, 2023. 22

  55. [55]

    Surrogate-assisted retinal oct image classification based on convolutional neural networks.IEEE journal of biomedical and health informatics, 23(1):253–263, 2018

    Yibiao Rong, Dehui Xiang, Weifang Zhu, Kai Yu, Fei Shi, Zhun Fan, and Xinjian Chen. Surrogate-assisted retinal oct image classification based on convolutional neural networks.IEEE journal of biomedical and health informatics, 23(1):253–263, 2018

  56. [56]

    Deep learning-based classification of eye diseases using convolutional neural network for oct images.Frontiers in Computer Science, 5:1252295, 2024

    Mohamed Elkholy and Marwa A Marzouk. Deep learning-based classification of eye diseases using convolutional neural network for oct images.Frontiers in Computer Science, 5:1252295, 2024

  57. [57]

    Segmentation- enhanced deep learning for amd detection from oct images

    Zainab Haddad, Sirine Elhoula, Désiré Sidibé, Hedi Tabia, Imen Zeghal, and Nawres Khlifa. Segmentation- enhanced deep learning for amd detection from oct images. In2025 IEEE International Conference on Advances in Data-Driven Analytics And Intelligent Systems (ADACIS), pages 1–6. IEEE, 2025

  58. [58]

    Advanced deep learning techniques for evaluating oct image quality and detecting retinal pathologies

    Arij Mlaouhi, Zainab Haddad, Hsouna Zgolli, Hedi Tabia, Désiré Sidibé, and Nawres Khlifa. Advanced deep learning techniques for evaluating oct image quality and detecting retinal pathologies. In2025 IEEE/ACS 22nd International Conference on Computer Systems and Applications (AICCSA), pages 1–2. IEEE, 2025

  59. [59]

    Extraction of retinal layers through convolution neural network (cnn) in an oct image for glaucoma diagnosis.Journal of Digital Imaging, 33(6):1428–1442, 2020

    Hina Raja, M Usman Akram, Arslan Shaukat, Shoab Ahmed Khan, Norah Alghamdi, Sajid Gul Khawaja, and Noman Nazir. Extraction of retinal layers through convolution neural network (cnn) in an oct image for glaucoma diagnosis.Journal of Digital Imaging, 33(6):1428–1442, 2020

  60. [60]

    Grad-cam: Visual explanations from deep networks via gradient-based localization

    Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. InProceedings of the IEEE international conference on computer vision, pages 618–626, 2017

  61. [61]

    On the connection between adversarial robustness and saliency map interpretability.arXiv preprint arXiv:1905.04172, 2019

    Christian Etmann, Sebastian Lunz, Peter Maass, and Carola-Bibiane Schönlieb. On the connection between adversarial robustness and saliency map interpretability.arXiv preprint arXiv:1905.04172, 2019

  62. [62]

    Explainable ai for retinal pathology detection in oct images

    Zainab Haddad, Hsouna Zgolli, Désiré Sidibé, Hedi Tabia, and Nawres Khlifa. Explainable ai for retinal pathology detection in oct images. In2024 10th International Conference on Control, Decision and Information Technologies (CoDIT), pages 2401–2406. IEEE, 2024

  63. [63]

    Attention is all you need.Advances in neural information processing systems, 30, 2017

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017

  64. [64]

    Svit: Scaling up visual instruction tuning

    Bo Zhao, Boya Wu, Muyang He, and Tiejun Huang. Svit: Scaling up visual instruction tuning.arXiv preprint arXiv:2307.04087, 2023

  65. [65]

    Automated retinal disease classification using hybrid transformer model (svit) using optical coherence tomography images.Neural Computing and Applications, 36(16):9171–9188, 2024

    GR Hemalakshmi, M Murugappan, Mohamed Yacin Sikkandar, S Sabarunisha Begum, and NB Prakash. Automated retinal disease classification using hybrid transformer model (svit) using optical coherence tomography images.Neural Computing and Applications, 36(16):9171–9188, 2024

  66. [66]

    Haoran Wang, Xinyu Guo, Kaiwen Song, Mingyang Sun, Yanbin Shao, Songfeng Xue, Hongwei Zhang, and Tianyu Zhang. Octformer: an efficient hierarchical transformer network specialized for retinal optical coherence tomography image recognition.IEEE Transactions on Instrumentation and Measurement, 72:1–17, 2023

  67. [67]

    Crat: advanced transformer-based deep learning algorithms in oct image classification.Biomedical Signal Processing and Control, 104:107544, 2025

    Mingming Yang, Junhui Du, and Ruichan Lv. Crat: advanced transformer-based deep learning algorithms in oct image classification.Biomedical Signal Processing and Control, 104:107544, 2025

  68. [68]

    Oct-trans: A novel transformer backbone with multimodal feature extraction in oct-based retinal disease classification

    Mohamed Elsharkawy, Ibrahim Abdelhalim, Mohammed Ghazal, Ali Mahmoud, Harpal S Sandhu, Aristomenis Thanos, and Ayman El-Baz. Oct-trans: A novel transformer backbone with multimodal feature extraction in oct-based retinal disease classification. In2025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI), pages 1–4. IEEE, 2025

  69. [69]

    Classification of diabetic maculopathy based on optical coherence tomography images using a vision transformer model.BMJ Open Ophthalmology, 8(1), 2023

    Liwei Cai, Chi Wen, Jingwen Jiang, Congbi Liang, Hongmei Zheng, Yu Su, and Changzheng Chen. Classification of diabetic maculopathy based on optical coherence tomography images using a vision transformer model.BMJ Open Ophthalmology, 8(1), 2023

  70. [70]

    Detection of nonexudative macular neovascularization on structural oct images using vision transformers.Ophthalmology Science, 2(4):100197, 2022

    Yuka Kihara, Mengxi Shen, Yingying Shi, Xiaoshuang Jiang, Liang Wang, Rita Laiginhas, Cancan Lyu, Jin Yang, Jeremy Liu, Rosalyn Morin, et al. Detection of nonexudative macular neovascularization on structural oct images using vision transformers.Ophthalmology Science, 2(4):100197, 2022

  71. [71]

    A vision transformer architecture for the automated segmenta- tion of retinal lesions in spectral domain optical coherence tomography images.Scientific Reports, 13(1):517, 2023

    Daniel Philippi, Kai Rothaus, and Mauro Castelli. A vision transformer architecture for the automated segmenta- tion of retinal lesions in spectral domain optical coherence tomography images.Scientific Reports, 13(1):517, 2023

  72. [72]

    An interpretable transformer network for the retinal disease classification using optical coherence tomography.Scientific Reports, 13(1):3637, 2023

    Jingzhen He, Junxia Wang, Zeyu Han, Jun Ma, Chongjing Wang, and Meng Qi. An interpretable transformer network for the retinal disease classification using optical coherence tomography.Scientific Reports, 13(1):3637, 2023

  73. [73]

    Mbt: Model-based transformer for retinal optical coherence tomography image and video multi-classification.International journal of medical informatics, 178:105178, 2023

    Badr Ait Hammou, Fares Antaki, Marie-Carole Boucher, and Renaud Duval. Mbt: Model-based transformer for retinal optical coherence tomography image and video multi-classification.International journal of medical informatics, 178:105178, 2023. 23

  74. [74]

    Transegnet: hybrid cnn-vision transformers encoder for retina segmentation of optical coherence tomography.Life, 13(4):976, 2023

    Yiheng Zhang, Zhongliang Li, Nan Nan, and Xiangzhao Wang. Transegnet: hybrid cnn-vision transformers encoder for retina segmentation of optical coherence tomography.Life, 13(4):976, 2023

  75. [75]

    Hyformer: a hybrid transformer-cnn architecture for retinal oct image segmentation.Biomedical Optics Express, 15(11):6156–6170, 2024

    Qingxin Jiang, Ying Fan, Menghan Li, Sheng Fang, Weifang Zhu, Dehui Xiang, Tao Peng, Xinjian Chen, Xun Xu, and Fei Shi. Hyformer: a hybrid transformer-cnn architecture for retinal oct image segmentation.Biomedical Optics Express, 15(11):6156–6170, 2024

  76. [76]

    Hctnet: a hybrid convnet- transformer network for retinal optical coherence tomography image classification.Biosensors, 12(7):542, 2022

    Zongqing Ma, Qiaoxue Xie, Pinxue Xie, Fan Fan, Xinxiao Gao, and Jiang Zhu. Hctnet: a hybrid convnet- transformer network for retinal optical coherence tomography image classification.Biosensors, 12(7):542, 2022

  77. [77]

    Hrs-net: A hybrid multi-scale network model based on convolution and transformers for multi-class retinal disease classification.IEEE Access, 12:144219–144229, 2024

    Hai Yang, Li Chen, Junyang Cao, and Juan Wang. Hrs-net: A hybrid multi-scale network model based on convolution and transformers for multi-class retinal disease classification.IEEE Access, 12:144219–144229, 2024

  78. [78]

    Ayoub Laouarem, Chafia Kara-Mohamed, El-Bay Bourennane, and Aboubekeur Hamdi-Cherif. Htc-retina: a hybrid retinal diseases classification model using transformer-convolutional neural network from optical coherence tomography images.Computers in biology and medicine, 178:108726, 2024

  79. [79]

    Effivit: Hybrid cnn-transformer for retinal imaging.Computers in Biology and Medicine, 191:110164, 2025

    DV Ashoka et al. Effivit: Hybrid cnn-transformer for retinal imaging.Computers in Biology and Medicine, 191:110164, 2025

  80. [80]

    Supervised contrastive learning.Advances in neural information processing systems, 33:18661–18673, 2020

    Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. Supervised contrastive learning.Advances in neural information processing systems, 33:18661–18673, 2020

Showing first 80 references.