arxiv: 2605.02589 · v1 · submitted 2026-05-04 · 💻 cs.CV · cs.LG

Recognition: 2 theorem links

· Lean Theorem

Representation learning from OCT images

Hedi Tabia , D\'esir\'e Sidib\'e , Nawres Khlifa , Ahmed Tabia , Ines Rahmany , Noura Aboudi , Zainab Haddad , Hajer Khachnaoui , Hsouna Zgolli

Authors on Pith no claims yet

Pith reviewed 2026-05-08 18:27 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords optical coherence tomographyrepresentation learningretinal imagingdeep learningself-supervised learningfoundation modelsmedical image analysisvision-language models

0 comments

The pith

Representation learning for retinal OCT images advances from supervised CNNs and transformers through self-supervised and generative methods to foundation models and vision-language systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This survey organizes the literature on representation learning for Optical Coherence Tomography images of the retina along a taxonomy of learning paradigms. It starts with early supervised deep networks, moves through self-supervised, semi-supervised, and generative techniques, then covers 3D volumetric modeling, multimodal approaches, and recent large pretrained foundation models. The aim is to reduce reliance on expert annotations while achieving more consistent analysis across devices and patient populations. A sympathetic reader cares because OCT produces high volumes of detailed scans whose manual review is impractical, so better automated representations could support faster and more reliable diagnosis of eye conditions.

Core claim

The survey reviews representation learning for retinal OCT images by covering supervised CNN-based and transformer architectures, self-supervised and semi-supervised methods, generative approaches, 3D volumetric modeling, multimodal representation learning, and large-scale pretrained foundation models. For each paradigm it examines core methodological contributions, identifies persistent limitations, and traces connections between successive approaches. It supplies a structured list of public OCT datasets, discusses evaluation protocol issues, presents a unified mathematical formulation that places every paradigm inside the same problem setup, and outlines pressing open directions including

What carries the argument

The principled taxonomy of learning paradigms together with the unified mathematical formulation that situates supervised, self-supervised, generative, multimodal, and foundation-model methods inside one common framework for OCT image representation.

If this is right

The taxonomy reveals a clear progression from label-heavy supervised methods toward approaches that need far less manual annotation.
The unified formulation lets researchers compare different paradigms directly on the same mathematical footing.
Listing public datasets and evaluation considerations supports more standardized benchmarking across studies.
The highlighted open directions, such as volumetric foundation-model pretraining and fairness mitigation, follow directly from the limitations identified in earlier paradigms.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same taxonomy could be applied to representation learning in other high-volume medical scans such as ultrasound or MRI to spot reusable techniques.
Vision-language models listed in the open directions may eventually supply natural-language explanations that help clinicians trust the outputs.
Federated and privacy-preserving training highlighted in the survey could become necessary for real-world deployment where patient data cannot leave local hospitals.
An implicit next step is to test whether models pretrained on the public OCT datasets listed in the survey generalize to new scanner vendors without additional labels.

Load-bearing premise

The selected papers and the chosen taxonomy of paradigms give an accurate and reasonably complete picture of the field without major omissions or systematic bias in coverage.

What would settle it

A major OCT representation learning paper or method published before the survey cutoff that fits none of the described paradigms and is absent from the review would show the taxonomy and coverage are incomplete.

Figures

Figures reproduced from arXiv: 2605.02589 by Ahmed Tabia, D\'esir\'e Sidib\'e, Hajer Khachnaoui, Hedi Tabia, Hsouna Zgolli, Ines Rahmany, Nawres Khlifa, Noura Aboudi, Zainab Haddad.

**Figure 1.** Figure 1: From left to right, four OCT B-scans corresponding to: drusen, choroidal neovascularization, diabetic macular view at source ↗

**Figure 2.** Figure 2: Hierarchical taxonomy of representation learning methods for Optical Coherence Tomography (OCT) image view at source ↗

read the original abstract

Optical Coherence Tomography (OCT) has become one of the most used imaging modality in ophthalmology. It provides high-resolution, non-invasive visualization of retinal microarchitecture. The automated analysis of OCT images through representation learning has emerged as a central research frontier. This has mainly been driven by the clinical need to process large acquisition volumes. The objective is to reduce the reliance on expert annotation, and improve diagnostic consistency across devices and populations. This survey provides a comprehensive and structured review of representation learning methods for retinal OCT image analysis. It covers the period from early deep learning approaches to the most recent developments in foundation models and vision-language systems. We organize the literature along a principled taxonomy of learning paradigms, encompassing supervised learning with CNN-based and transformer-based architectures, self-supervised and semi-supervised methods, generative approaches, as well as 3D volumetric modeling, multimodal representation learning, and large-scale pretrained foundation models. For each paradigm, we analyze the core methodological contributions, identify persistent limitations, and trace the connections between successive approaches. We further provide a structured overview of publicly available OCT datasets, discuss evaluation protocol considerations, and present a unified problem formulation that situates each learning paradigm within a common mathematical framework. Building on this analysis, we identify and discuss the most pressing open research directions emerging in the literature. This includes volumetric foundation model pretraining, uncertainty-aware representation learning, federated and privacy-preserving training, fairness and bias mitigation, concept-based interpretability,...

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a survey that taxonomizes OCT representation learning methods and lists open problems, but its coverage claims rest on an undocumented selection process.

read the letter

The main thing to know is that this paper organizes existing work on representation learning for retinal OCT images into a taxonomy covering supervised CNNs and transformers, self-supervised and semi-supervised approaches, generative models, 3D volumetric work, multimodal learning, and recent foundation models. It adds a unified mathematical framing, a dataset overview, and a list of open directions such as volumetric pretraining and fairness issues.

Referee Report

1 major / 1 minor

Summary. The manuscript is a survey reviewing representation learning methods for retinal OCT image analysis in ophthalmology. It organizes prior work according to a taxonomy of paradigms (supervised CNN/transformer architectures, self-supervised/semi-supervised learning, generative models, 3D volumetric modeling, multimodal learning, and large-scale foundation models), provides a unified mathematical formulation, surveys public datasets and evaluation protocols, and outlines open challenges such as volumetric pretraining, uncertainty modeling, federated learning, fairness, and interpretability.

Significance. If the coverage proves complete and the taxonomy reproducible, the survey would offer a timely consolidation of a fast-moving area at the intersection of computer vision and ophthalmic imaging, particularly by tracing the shift toward foundation and vision-language models. The unified formulation and dataset overview add practical value for new researchers. However, the absence of a documented literature selection process substantially weakens its authority as a field-wide reference.

major comments (1)

[Abstract and Introduction] Abstract and Introduction: The central claim that the work delivers a 'comprehensive and structured review' organized along a 'principled taxonomy' is not supported by any description of the literature search methodology (databases, keywords, date ranges, inclusion/exclusion criteria, or screening process). Without this, it is impossible to verify that the selected papers form an exhaustive or unbiased map of the field rather than reflecting author curation or recency bias, directly undermining the survey's core contribution.

minor comments (1)

[Abstract] The abstract sentence on open directions is truncated after 'concept-based interpretability,...'; this should be completed for clarity.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our survey of representation learning methods for retinal OCT image analysis. We address the major comment regarding the literature search methodology below.

read point-by-point responses

Referee: [Abstract and Introduction] Abstract and Introduction: The central claim that the work delivers a 'comprehensive and structured review' organized along a 'principled taxonomy' is not supported by any description of the literature search methodology (databases, keywords, date ranges, inclusion/exclusion criteria, or screening process). Without this, it is impossible to verify that the selected papers form an exhaustive or unbiased map of the field rather than reflecting author curation or recency bias, directly undermining the survey's core contribution.

Authors: We agree that explicitly documenting the literature selection process would strengthen the transparency and authority of the survey. Although the taxonomy is organized by learning paradigms (supervised, self-supervised, generative, multimodal, and foundation models) rather than an exhaustive enumeration of every paper, the selection was informed by a broad review of influential works tracing the field's evolution. In the revised manuscript, we will add a dedicated subsection in the Introduction (or a new 'Survey Methodology' section) that details the process: databases searched (PubMed, IEEE Xplore, arXiv, Google Scholar), primary keywords and combinations (e.g., 'OCT' OR 'optical coherence tomography' AND ('representation learning' OR 'self-supervised' OR 'foundation model' OR 'vision-language')), date range (2012–2024 to cover the deep learning era onward), inclusion criteria (peer-reviewed works focused on representation learning for retinal OCT, with emphasis on methodological novelty), and screening (initial search followed by title/abstract review and full-text assessment for relevance to the taxonomy categories). This addition will clarify scope and potential biases while preserving the paper's structure and contributions. revision: yes

Circularity Check

0 steps flagged

No circularity: survey paper with no derivations or predictions

full rationale

This is a literature review that organizes prior work under a taxonomy and presents a unified problem formulation as an organizational device. No original derivations, equations, fitted parameters, or predictions are claimed. The patterns for circularity (self-definitional claims, fitted inputs renamed as predictions, load-bearing self-citations, uniqueness theorems, ansatz smuggling, or renaming known results) do not apply because there is no derivation chain to inspect. Literature selection and taxonomy choices may reflect author judgment, but this is not circularity under the specified criteria.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

As a literature survey the paper does not introduce new free parameters, axioms, or invented entities. It relies on standard concepts from machine learning and computer vision already established in the cited literature.

pith-pipeline@v0.9.0 · 5598 in / 1054 out tokens · 38143 ms · 2026-05-08T18:27:48.591607+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Cost.FunctionalEquation (J uniqueness); Foundation.LogicAsFunctionalEquation washburn_uniqueness_aczel — no relation; paper uses generic ML losses, not RS J-cost unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

min_{θ,ϕ} E_{(x,y)~D} [ L( g_ϕ(f_θ(x)), y ) ] ... ELBO ... GAN minimax ... diffusion denoising loss ‖ε − ε_θ(x_t,t)‖²

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

185 extracted references · 16 canonical work pages · 1 internal anchor

[1]

Optical coherence tomography.science, 254(5035):1178–1181, 1991

David Huang, Eric A Swanson, Charles P Lin, Joel S Schuman, William G Stinson, Warren Chang, Michael R Hee, Thomas Flotte, Kenton Gregory, Carmen A Puliafito, et al. Optical coherence tomography.science, 254(5035):1178–1181, 1991

1991
[2]

CRC Press, 2024

Joel S Schuman, James G Fujimoto, Jay Duker, and Hiroshi Ishikawa.Optical coherence tomography of ocular diseases. CRC Press, 2024

2024
[3]

Optical coherence tomography of age-related macular degeneration and choroidal neovascularization.Ophthalmology, 103(8):1260–1270, 1996

Michael R Hee, Caroline R Baumal, Carmen A Puliafito, Jay S Duker, Elias Reichel, Jason R Wilkins, Jeffery G Coker, Joel S Schuman, Eric A Swanson, and James G Fujimoto. Optical coherence tomography of age-related macular degeneration and choroidal neovascularization.Ophthalmology, 103(8):1260–1270, 1996

1996
[4]

Comparison of the clinical diagnosis of diabetic macular edema with diagnosis by optical coherence tomography.Ophthalmology, 111(4):712–715, 2004

David J Browning, Michael D McOwen, Robert M Bowen Jr, and Tisha L O’Marah. Comparison of the clinical diagnosis of diabetic macular edema with diagnosis by optical coherence tomography.Ophthalmology, 111(4):712–715, 2004

2004
[5]

Optical coherence tomography to detect and manage retinal disease and glaucoma.American journal of ophthalmology, 137(1):156–169, 2004

Glenn J Jaffe and Joseph Caprioli. Optical coherence tomography to detect and manage retinal disease and glaucoma.American journal of ophthalmology, 137(1):156–169, 2004

2004
[6]

Optical coherence tomography findings after an intravitreal injection of bevacizumab (avastin®) for macular edema from central retinal vein occlusion, 2005

Philip J Rosenfeld, Anne E Fung, and Carmen A Puliafito. Optical coherence tomography findings after an intravitreal injection of bevacizumab (avastin®) for macular edema from central retinal vein occlusion, 2005

2005
[7]

Accuracy of spectral-domain oct of the macula for detection of complete posterior vitreous detachment.Ophthalmology Retina, 4(2):148–153, 2020

Eileen S Hwang, Jessica A Kraker, Kim J Griffin, J Sebag, David V Weinberg, and Judy E Kim. Accuracy of spectral-domain oct of the macula for detection of complete posterior vitreous detachment.Ophthalmology Retina, 4(2):148–153, 2020

2020
[8]

Anterior segment optical coherence tomography with angiography for the cornea and ocular surface.Journal of Clinical Medicine, 15(6):2402, 2026

Qiu Ying Wong, Sim Ralene, and Marcus Ang. Anterior segment optical coherence tomography with angiography for the cornea and ocular surface.Journal of Clinical Medicine, 15(6):2402, 2026

2026
[9]

Speckle in optical coherence tomography.Journal of biomedical optics, 4(1):95–105, 1999

Joseph M Schmitt, SH Xiang, and Kin Man Yung. Speckle in optical coherence tomography.Journal of biomedical optics, 4(1):95–105, 1999

1999
[10]

Efficient reduction of speckle noise in optical coherence tomography.Optics express, 20(2):1337– 1359, 2012

Maciej Szkulmowski, Iwona Gorczynska, Daniel Szlag, Marcin Sylwestrzak, Andrzej Kowalczyk, and Maciej Wojtkowski. Efficient reduction of speckle noise in optical coherence tomography.Optics express, 20(2):1337– 1359, 2012

2012
[11]

Agreement between spectral-domain and time-domain oct for measuring rnfl thickness.British Journal of Ophthalmology, 93(6):775–781, 2009

Gianmarco Vizzeri, Robert N Weinreb, Alberto O Gonzalez-Garcia, Christopher Bowd, Felipe A Medeiros, Pamela A Sample, and Linda M Zangwill. Agreement between spectral-domain and time-domain oct for measuring rnfl thickness.British Journal of Ophthalmology, 93(6):775–781, 2009

2009
[12]

Clinical factors associated with long-term oct variability in glaucoma.American journal of ophthalmology, 255:98–106, 2023

Jo-Hsuan Wu, Sasan Moghimi, Evan Walker, Takashi Nishida, Jeffrey M Liebmann, Massimo Fazio, Christo- pher A Girkin, Linda M Zangwill, and Robert N Weinreb. Clinical factors associated with long-term oct variability in glaucoma.American journal of ophthalmology, 255:98–106, 2023

2023
[13]

Performance evaluation of retinal oct fluid segmentation, detection, and generalization over variations of data sources.IEEE Access, 12:31719–31735, 2024

Nchongmaje Ndipenoch, Alina Miron, and Yongmin Li. Performance evaluation of retinal oct fluid segmentation, detection, and generalization over variations of data sources.IEEE Access, 12:31719–31735, 2024

2024
[14]

Assessment of artifacts and reproducibility across spectral-and time-domain optical coherence tomography devices.Ophthalmology, 116(10):1960–1970, 2009

Joseph Ho, Alan C Sull, Laurel N Vuong, Yueli Chen, Jonathan Liu, James G Fujimoto, Joel S Schuman, and Jay S Duker. Assessment of artifacts and reproducibility across spectral-and time-domain optical coherence tomography devices.Ophthalmology, 116(10):1960–1970, 2009

1960
[15]

Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 2002

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 2002

2002
[16]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020

work page Pith review arXiv 2010
[17]

Transforming auto-encoders

Geoffrey E Hinton, Alex Krizhevsky, and Sida D Wang. Transforming auto-encoders. InInternational conference on artificial neural networks, pages 44–51. Springer, 2011

2011
[18]

Generative adversarial networks.Communications of the ACM, 63(11):139–144, 2020

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks.Communications of the ACM, 63(11):139–144, 2020. 20

2020
[19]

Self-supervised learning: Generative or contrastive.IEEE transactions on knowledge and data engineering, 35(1):857–876, 2021

Xiao Liu, Fanjin Zhang, Zhenyu Hou, Li Mian, Zhaoyu Wang, Jing Zhang, and Jie Tang. Self-supervised learning: Generative or contrastive.IEEE transactions on knowledge and data engineering, 35(1):857–876, 2021

2021
[20]

Kermany, Michael Goldbaum, Wenjia Cai, Carolina C.S

Daniel S. Kermany, Michael Goldbaum, Wenjia Cai, Carolina C.S. Valentim, Huiying Liang, Sally L. Baxter, Alex McKeown, Ge Yang, Xiaokang Wu, Fangbing Yan, et al. Identifying medical diagnoses and treatable diseases by image-based deep learning.Cell, 172(5):1122–1131, 2018

2018
[21]

Statistical model for oct image denoising.Biomedical optics express, 8(9):3903–3917, 2017

Muxingzi Li, Ramzi Idoughi, Biswarup Choudhury, and Wolfgang Heidrich. Statistical model for oct image denoising.Biomedical optics express, 8(9):3903–3917, 2017

2017
[22]

A review of state-of-the-art speckle reduction techniques for optical coherence tomography fingertip scans

Luke Nicholas Darlow, Sharat Saurabh Akhoury, and James Connan. A review of state-of-the-art speckle reduction techniques for optical coherence tomography fingertip scans. InSeventh International Conference on Machine Vision (ICMV 2014), volume 9445, pages 418–426. SPIE, 2015

2014
[23]

Retinal imaging and image analysis.IEEE reviews in biomedical engineering, 3:169–208, 2010

Michael D Abràmoff, Mona K Garvin, and Milan Sonka. Retinal imaging and image analysis.IEEE reviews in biomedical engineering, 3:169–208, 2010

2010
[24]

State-of-the-art in retinal optical coherence tomography image analysis.Quantitative imaging in medicine and surgery, 5(4):603, 2015

Ahmadreza Baghaie, Zeyun Yu, and Roshan M D’Souza. State-of-the-art in retinal optical coherence tomography image analysis.Quantitative imaging in medicine and surgery, 5(4):603, 2015

2015
[25]

Machine learning techniques for diabetic macular edema (dme) classification on sd-oct images.Biomedical engineering online, 16(1):68, 2017

Khaled Alsaih, Guillaume Lemaitre, Mojdeh Rastgoo, Joan Massich, Désiré Sidibé, and Fabrice Meriaudeau. Machine learning techniques for diabetic macular edema (dme) classification on sd-oct images.Biomedical engineering online, 16(1):68, 2017

2017
[26]

Octdl: Optical coherence tomography dataset for image-based deep learning methods.Scientific data, 11(1):365, 2024

Mikhail Kulyabin, Aleksei Zhdanov, Anastasia Nikiforova, Andrey Stepichev, Anna Kuznetsova, Mikhail Ronkin, Vasilii Borisov, Alexander Bogachev, Sergey Korotkich, Paul A Constable, et al. Octdl: Optical coherence tomography dataset for image-based deep learning methods.Scientific data, 11(1):365, 2024

2024
[27]

Classifica- tion of retinal oct images using deep learning

Malliga Subramanian, Kogilavani Shanmugavadivel, Obuli Sai Naren, K Premkumar, and K Rankish. Classifica- tion of retinal oct images using deep learning. In2022 international conference on computer communication and informatics (ICCCI), pages 1–7. IEEE, 2022

2022
[28]

Chiu, Michael J

Stephanie J. Chiu, Michael J. Allingham, Priyatham S. Mettu, Scott W. Cousins, Joseph A. Izatt, and Sina Farsiu. Kernel regression based segmentation of optical coherence tomography images with diabetic macular edema. Biomedical Optics Express, 6(4):1172–1194, 2015

2015
[29]

Waldstein, José Ignacio Orlando, Matteo Baroni, Ciarán N

Hrvoje Bogunovi´c, Freya Maintau-Sanchez, Sebastian M. Waldstein, José Ignacio Orlando, Matteo Baroni, Ciarán N. Bhreartaigh, et al. RETOUCH: The retinal OCT fluid detection and segmentation benchmark and challenge.IEEE Transactions on Medical Imaging, 38(8):1858–1874, 2019

2019
[30]

Annotated retinal optical coherence tomography images (AROI) database for joint retinal layer and fluid segmentation.Automatika, 62(3):375–385, 2021

Matej Melinšˇcak, Marin Radmilovi´c, Zoran Vatavuk, and Sven Lonˇcari´c. Annotated retinal optical coherence tomography images (AROI) database for joint retinal layer and fluid segmentation.Automatika, 62(3):375–385, 2021

2021
[31]

Oimhs: An optical coherence tomography image dataset based on macular hole manual segmentation.Scientific Data, 10(1):769, 2023

Xin Ye, Shucheng He, Xiaxing Zhong, Jiafeng Yu, Shangchao Yang, Yingjiao Shen, Yiqi Chen, Yaqi Wang, Xingru Huang, and Lijun Shen. Oimhs: An optical coherence tomography image dataset based on macular hole manual segmentation.Scientific Data, 10(1):769, 2023

2023
[32]

OCT5k: A dataset of multi-disease and multi-graded annotations for retinal layers.Scientific Data, 12(1):267, 2025

Murat Arikan, James Willoughby, Serhat Ongun, et al. OCT5k: A dataset of multi-disease and multi-graded annotations for retinal layers.Scientific Data, 12(1):267, 2025

2025
[33]

Gamma challenge: glaucoma grading from multi-modality images.Medical Image Analysis, 90:102938, 2023

Junde Wu, Huihui Fang, Fei Li, Huazhu Fu, Fengbin Lin, Jiongcheng Li, Yue Huang, Qinji Yu, Sifan Song, Xinxing Xu, et al. Gamma challenge: glaucoma grading from multi-modality images.Medical Image Analysis, 90:102938, 2023

2023
[34]

MultiEYE: Dataset and benchmark for OCT-enhanced retinal disease recognition from fundus images.IEEE Transactions on Medical Imaging, 44(4):1711–1722, 2025

Lehan Wang, Chongchong Qi, Chubin Ou, Lin An, Mei Jin, Xiangbin Kong, and Xiaomeng Li. MultiEYE: Dataset and benchmark for OCT-enhanced retinal disease recognition from fundus images.IEEE Transactions on Medical Imaging, 44(4):1711–1722, 2025

2025
[35]

Octa-500: a retinal dataset for optical coherence tomography angiography study

Mingchao Li, Kun Huang, Qiuzhuo Xu, Jiadong Yang, Yuhan Zhang, Zexuan Ji, Keren Xie, Songtao Yuan, Qinghuai Liu, and Qiang Chen. Octa-500: a retinal dataset for optical coherence tomography angiography study. Medical image analysis, 93:103092, 2024

2024
[36]

Srinivasan, Leo A

Pratul P. Srinivasan, Leo A. Kim, Priyatham S. Mettu, Scott W. Cousins, Grant M. Comer, Joseph A. Izatt, and Sina Farsiu. Fully automated detection of diabetic macular edema and dry age-related macular degeneration from optical coherence tomography images.Biomedical Optics Express, 5(10):3568–3577, 2014

2014
[37]

OCTID: Optical coherence tomography image database.Computers & Electrical Engineering, 81:106532, 2020

Peyman Gholami, Priyanka Roy, Mohana Kuppuswamy Parthasarathy, and Vasudevan Lakshminarayanan. OCTID: Optical coherence tomography image database.Computers & Electrical Engineering, 81:106532, 2020. 21

2020
[38]

Jedynak, Sharon D

Yufan He, Aaron Carass, Yihao Liu, Bruno M. Jedynak, Sharon D. Solomon, Shiv Saidha, Peter A. Calabresi, and Jerry L. Prince. Retinal layer parcellation of optical coherence tomography images: Data resource for multiple sclerosis and healthy controls.Data in Brief, 22:601–604, 2019

2019
[39]

Fully-automated segmentation of fluid regions in exudative age-related macular degeneration subjects: Kernel graph cut in neutrosophic domain.PloS one, 12(10):e0186949, 2017

Abdolreza Rashno, Behzad Nazari, Dara D Koozekanani, Paul M Drayna, Saeed Sadri, Hossein Rabbani, and Keshab K Parhi. Fully-automated segmentation of fluid regions in exudative age-related macular degeneration subjects: Kernel graph cut in neutrosophic domain.PloS one, 12(10):e0186949, 2017

2017
[40]

A composite retinal fundus and oct dataset to grade macular and glaucomatous disorders

Taimur Hassan, Hina Raja, Bilal Hassan, Muhammad Usman Akram, Jorge Dias, and Naoufel Werghi. A composite retinal fundus and oct dataset to grade macular and glaucomatous disorders. In2022 2nd International Conference on Digital Futures and Transformative Technologies (ICoDT2), pages 1–6. IEEE, 2022

2022
[41]

Learning two-stream cnn for multi-modal age-related macular degeneration categorization.IEEE Journal of Biomedical and Health Informatics, 26(8):4111–4122, 2022

Weisen Wang, Xirong Li, Zhiyan Xu, Weihong Yu, Jianchun Zhao, Dayong Ding, and Youxin Chen. Learning two-stream cnn for multi-modal age-related macular degeneration categorization.IEEE Journal of Biomedical and Health Informatics, 26(8):4111–4122, 2022

2022
[42]

A retinal oct- angiography and cardiovascular status (rasta) dataset of swept-source microvascular imaging for cardiovascular risk assessment.Data, 8(10):147, 2023

Clément Germanèse, Fabrice Meriaudeau, Pétra Eid, Ramin Tadayoni, Dominique Ginhac, Atif Anwer, Steinberg Laure-Anne, Charles Guenancia, Catherine Creuzot-Garcher, Pierre-Henry Gabrielle, et al. A retinal oct- angiography and cardiovascular status (rasta) dataset of swept-source microvascular imaging for cardiovascular risk assessment.Data, 8(10):147, 2023

2023
[43]

Rose: a retinal oct-angiography vessel segmentation dataset and new model.IEEE transactions on medical imaging, 40(3):928–939, 2020

Yuhui Ma, Huaying Hao, Jianyang Xie, Huazhu Fu, Jiong Zhang, Jianlong Yang, Zhen Wang, Jiang Liu, Yalin Zheng, and Yitian Zhao. Rose: a retinal oct-angiography vessel segmentation dataset and new model.IEEE transactions on medical imaging, 40(3):928–939, 2020

2020
[44]

Syn-oct: A synthetic dataset of ocular optical coherence tomography images from healthy and glaucoma eyes.Scientific Data, 2026

Damon Wong, Ashish Jith Sreejith Kumar, Rachel S Chong, Monisha E Nongpiur, Rahat Husain, Tina Wong, Shamira Perera, Tin Aung, Bingyao Tan, Ching-Yu Cheng, et al. Syn-oct: A synthetic dataset of ocular optical coherence tomography images from healthy and glaucoma eyes.Scientific Data, 2026

2026
[45]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016

2016
[46]

Densely connected convolutional networks

Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. Densely connected convolutional networks. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 4700–4708, 2017

2017
[47]

U-net: Convolutional networks for biomedical image segmentation

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. InInternational Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer, 2015

2015
[48]

A simple framework for contrastive learning of visual representations

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. InInternational conference on machine learning, pages 1597–1607. PmLR, 2020

2020
[49]

Momentum contrast for unsupervised visual representation learning

Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual representation learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738, 2020

2020
[50]

Bootstrap your own latent-a new approach to self-supervised learning.Advances in neural information processing systems, 33:21271–21284, 2020

Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Guo, Mohammad Gheshlaghi Azar, et al. Bootstrap your own latent-a new approach to self-supervised learning.Advances in neural information processing systems, 33:21271–21284, 2020

2020
[51]

Octnet: A lightweight cnn for retinal disease classification from optical coherence tomography images.Computer methods and programs in biomedicine, 200:105877, 2021

AP Sunija, Saikat Kar, S Gayathri, Varun P Gopi, and Ponnusamy Palanisamy. Octnet: A lightweight cnn for retinal disease classification from optical coherence tomography images.Computer methods and programs in biomedicine, 200:105877, 2021

2021
[52]

Ali Mohammad Alqudah. Aoct-net: a convolutional network automated classification of multiclass retinal diseases using spectral-domain optical coherence tomography images.Medical & biological engineering & computing, 58(1):41–53, 2020

2020
[53]

Classification of sd-oct images using a deep learning approach

Muhammad Awais, Henning Müller, Tong B Tang, and Fabrice Meriaudeau. Classification of sd-oct images using a deep learning approach. In2017 IEEE International Conference on Signal and Image Processing Applications (ICSIPA), pages 489–492. IEEE, 2017

2017
[54]

Prediction of postoperative visual acuity in rhegmatogenous retinal detachment using oct images.IEEE Access, 11:135435–135448, 2023

Sinda Hosni, Hajer Khachnaoui, Hsouna Mehdi Zgolli, Sonya Mabrouk, Désiré Sidibé, Hedi Tabia, and Nawres Khlifa. Prediction of postoperative visual acuity in rhegmatogenous retinal detachment using oct images.IEEE Access, 11:135435–135448, 2023. 22

2023
[55]

Surrogate-assisted retinal oct image classification based on convolutional neural networks.IEEE journal of biomedical and health informatics, 23(1):253–263, 2018

Yibiao Rong, Dehui Xiang, Weifang Zhu, Kai Yu, Fei Shi, Zhun Fan, and Xinjian Chen. Surrogate-assisted retinal oct image classification based on convolutional neural networks.IEEE journal of biomedical and health informatics, 23(1):253–263, 2018

2018
[56]

Deep learning-based classification of eye diseases using convolutional neural network for oct images.Frontiers in Computer Science, 5:1252295, 2024

Mohamed Elkholy and Marwa A Marzouk. Deep learning-based classification of eye diseases using convolutional neural network for oct images.Frontiers in Computer Science, 5:1252295, 2024

2024
[57]

Segmentation- enhanced deep learning for amd detection from oct images

Zainab Haddad, Sirine Elhoula, Désiré Sidibé, Hedi Tabia, Imen Zeghal, and Nawres Khlifa. Segmentation- enhanced deep learning for amd detection from oct images. In2025 IEEE International Conference on Advances in Data-Driven Analytics And Intelligent Systems (ADACIS), pages 1–6. IEEE, 2025

2025
[58]

Advanced deep learning techniques for evaluating oct image quality and detecting retinal pathologies

Arij Mlaouhi, Zainab Haddad, Hsouna Zgolli, Hedi Tabia, Désiré Sidibé, and Nawres Khlifa. Advanced deep learning techniques for evaluating oct image quality and detecting retinal pathologies. In2025 IEEE/ACS 22nd International Conference on Computer Systems and Applications (AICCSA), pages 1–2. IEEE, 2025

2025
[59]

Extraction of retinal layers through convolution neural network (cnn) in an oct image for glaucoma diagnosis.Journal of Digital Imaging, 33(6):1428–1442, 2020

Hina Raja, M Usman Akram, Arslan Shaukat, Shoab Ahmed Khan, Norah Alghamdi, Sajid Gul Khawaja, and Noman Nazir. Extraction of retinal layers through convolution neural network (cnn) in an oct image for glaucoma diagnosis.Journal of Digital Imaging, 33(6):1428–1442, 2020

2020
[60]

Grad-cam: Visual explanations from deep networks via gradient-based localization

Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. InProceedings of the IEEE international conference on computer vision, pages 618–626, 2017

2017
[61]

On the connection between adversarial robustness and saliency map interpretability.arXiv preprint arXiv:1905.04172, 2019

Christian Etmann, Sebastian Lunz, Peter Maass, and Carola-Bibiane Schönlieb. On the connection between adversarial robustness and saliency map interpretability.arXiv preprint arXiv:1905.04172, 2019

work page arXiv 1905
[62]

Explainable ai for retinal pathology detection in oct images

Zainab Haddad, Hsouna Zgolli, Désiré Sidibé, Hedi Tabia, and Nawres Khlifa. Explainable ai for retinal pathology detection in oct images. In2024 10th International Conference on Control, Decision and Information Technologies (CoDIT), pages 2401–2406. IEEE, 2024

2024
[63]

Attention is all you need.Advances in neural information processing systems, 30, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017

2017
[64]

Svit: Scaling up visual instruction tuning

Bo Zhao, Boya Wu, Muyang He, and Tiejun Huang. Svit: Scaling up visual instruction tuning.arXiv preprint arXiv:2307.04087, 2023

work page arXiv 2023
[65]

Automated retinal disease classification using hybrid transformer model (svit) using optical coherence tomography images.Neural Computing and Applications, 36(16):9171–9188, 2024

GR Hemalakshmi, M Murugappan, Mohamed Yacin Sikkandar, S Sabarunisha Begum, and NB Prakash. Automated retinal disease classification using hybrid transformer model (svit) using optical coherence tomography images.Neural Computing and Applications, 36(16):9171–9188, 2024

2024
[66]

Haoran Wang, Xinyu Guo, Kaiwen Song, Mingyang Sun, Yanbin Shao, Songfeng Xue, Hongwei Zhang, and Tianyu Zhang. Octformer: an efficient hierarchical transformer network specialized for retinal optical coherence tomography image recognition.IEEE Transactions on Instrumentation and Measurement, 72:1–17, 2023

2023
[67]

Crat: advanced transformer-based deep learning algorithms in oct image classification.Biomedical Signal Processing and Control, 104:107544, 2025

Mingming Yang, Junhui Du, and Ruichan Lv. Crat: advanced transformer-based deep learning algorithms in oct image classification.Biomedical Signal Processing and Control, 104:107544, 2025

2025
[68]

Oct-trans: A novel transformer backbone with multimodal feature extraction in oct-based retinal disease classification

Mohamed Elsharkawy, Ibrahim Abdelhalim, Mohammed Ghazal, Ali Mahmoud, Harpal S Sandhu, Aristomenis Thanos, and Ayman El-Baz. Oct-trans: A novel transformer backbone with multimodal feature extraction in oct-based retinal disease classification. In2025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI), pages 1–4. IEEE, 2025

2025
[69]

Classification of diabetic maculopathy based on optical coherence tomography images using a vision transformer model.BMJ Open Ophthalmology, 8(1), 2023

Liwei Cai, Chi Wen, Jingwen Jiang, Congbi Liang, Hongmei Zheng, Yu Su, and Changzheng Chen. Classification of diabetic maculopathy based on optical coherence tomography images using a vision transformer model.BMJ Open Ophthalmology, 8(1), 2023

2023
[70]

Detection of nonexudative macular neovascularization on structural oct images using vision transformers.Ophthalmology Science, 2(4):100197, 2022

Yuka Kihara, Mengxi Shen, Yingying Shi, Xiaoshuang Jiang, Liang Wang, Rita Laiginhas, Cancan Lyu, Jin Yang, Jeremy Liu, Rosalyn Morin, et al. Detection of nonexudative macular neovascularization on structural oct images using vision transformers.Ophthalmology Science, 2(4):100197, 2022

2022
[71]

A vision transformer architecture for the automated segmenta- tion of retinal lesions in spectral domain optical coherence tomography images.Scientific Reports, 13(1):517, 2023

Daniel Philippi, Kai Rothaus, and Mauro Castelli. A vision transformer architecture for the automated segmenta- tion of retinal lesions in spectral domain optical coherence tomography images.Scientific Reports, 13(1):517, 2023

2023
[72]

An interpretable transformer network for the retinal disease classification using optical coherence tomography.Scientific Reports, 13(1):3637, 2023

Jingzhen He, Junxia Wang, Zeyu Han, Jun Ma, Chongjing Wang, and Meng Qi. An interpretable transformer network for the retinal disease classification using optical coherence tomography.Scientific Reports, 13(1):3637, 2023

2023
[73]

Mbt: Model-based transformer for retinal optical coherence tomography image and video multi-classification.International journal of medical informatics, 178:105178, 2023

Badr Ait Hammou, Fares Antaki, Marie-Carole Boucher, and Renaud Duval. Mbt: Model-based transformer for retinal optical coherence tomography image and video multi-classification.International journal of medical informatics, 178:105178, 2023. 23

2023
[74]

Transegnet: hybrid cnn-vision transformers encoder for retina segmentation of optical coherence tomography.Life, 13(4):976, 2023

Yiheng Zhang, Zhongliang Li, Nan Nan, and Xiangzhao Wang. Transegnet: hybrid cnn-vision transformers encoder for retina segmentation of optical coherence tomography.Life, 13(4):976, 2023

2023
[75]

Hyformer: a hybrid transformer-cnn architecture for retinal oct image segmentation.Biomedical Optics Express, 15(11):6156–6170, 2024

Qingxin Jiang, Ying Fan, Menghan Li, Sheng Fang, Weifang Zhu, Dehui Xiang, Tao Peng, Xinjian Chen, Xun Xu, and Fei Shi. Hyformer: a hybrid transformer-cnn architecture for retinal oct image segmentation.Biomedical Optics Express, 15(11):6156–6170, 2024

2024
[76]

Hctnet: a hybrid convnet- transformer network for retinal optical coherence tomography image classification.Biosensors, 12(7):542, 2022

Zongqing Ma, Qiaoxue Xie, Pinxue Xie, Fan Fan, Xinxiao Gao, and Jiang Zhu. Hctnet: a hybrid convnet- transformer network for retinal optical coherence tomography image classification.Biosensors, 12(7):542, 2022

2022
[77]

Hrs-net: A hybrid multi-scale network model based on convolution and transformers for multi-class retinal disease classification.IEEE Access, 12:144219–144229, 2024

Hai Yang, Li Chen, Junyang Cao, and Juan Wang. Hrs-net: A hybrid multi-scale network model based on convolution and transformers for multi-class retinal disease classification.IEEE Access, 12:144219–144229, 2024

2024
[78]

Ayoub Laouarem, Chafia Kara-Mohamed, El-Bay Bourennane, and Aboubekeur Hamdi-Cherif. Htc-retina: a hybrid retinal diseases classification model using transformer-convolutional neural network from optical coherence tomography images.Computers in biology and medicine, 178:108726, 2024

2024
[79]

Effivit: Hybrid cnn-transformer for retinal imaging.Computers in Biology and Medicine, 191:110164, 2025

DV Ashoka et al. Effivit: Hybrid cnn-transformer for retinal imaging.Computers in Biology and Medicine, 191:110164, 2025

2025
[80]

Supervised contrastive learning.Advances in neural information processing systems, 33:18661–18673, 2020

Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. Supervised contrastive learning.Advances in neural information processing systems, 33:18661–18673, 2020

2020

Showing first 80 references.