OPTED: Open Preprocessed Trachoma Eye Dataset Using Zero-Shot SAM 3 Segmentation
Pith reviewed 2026-05-15 14:29 UTC · model grok-4.3
The pith
Zero-shot SAM 3 segmentation with a chosen text prompt extracts the tarsal conjunctiva from 2,832 raw trachoma photos to create a clean open dataset.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that text-prompt-based zero-shot segmentation with SAM 3 reliably isolates the tarsal conjunctiva from diverse clinical photographs, and that the resulting preprocessed dataset in both cropped and standardized formats removes background noise while preserving diagnostic information for downstream automated classification.
What carries the argument
Text-prompt zero-shot segmentation with SAM 3 guided by the prompt 'inner surface of eyelid with red tissue', which isolates the region of interest before cropping and resizing.
If this is right
- The released 224x224 images can be fed directly into standard pre-trained image classifiers without further resizing or cropping steps.
- The open code allows exact reproduction of the preprocessing on new sets of raw trachoma photographs from the same or similar sources.
- Researchers gain access to a dataset originating from a high-burden region that was previously unavailable in preprocessed form.
- The two output formats support both aspect-ratio-preserving analysis and direct use with architectures expecting fixed square inputs.
Where Pith is reading between the lines
- The same prompt-driven segmentation approach could be tested on images of other eyelid or conjunctival conditions to reduce manual annotation effort in medical imaging datasets.
- Mobile apps for trachoma screening could incorporate this preprocessing step to improve input quality before running on-device classifiers.
- If the method generalizes, it lowers the barrier for creating training data in low-resource settings where expert labeling time is scarce.
Load-bearing premise
The chosen text prompt will guide SAM 3 to extract only the tarsal conjunctiva without systematic inclusion of irrelevant tissue or exclusion of relevant areas across all variations in the photographs.
What would settle it
A manual audit that finds more than a few percent of the 2,832 outputs contain cropped-out conjunctiva tissue or retain substantial non-conjunctiva background would falsify the reliability of the pipeline.
Figures
read the original abstract
Trachoma remains the leading infectious cause of blindness worldwide, with Sub-Saharan Africa bearing over 85% of the global burden and Ethiopia alone accounting for more than half of all cases. Yet publicly available preprocessed datasets for automated trachoma classification are scarce, and none originate from the most affected region. Raw clinical photographs of eyelids contain significant background noise that hinders direct use in machine learning pipelines. We present OPTED, an open-source preprocessed trachoma eye dataset constructed using the Segment Anything Model 3 (SAM 3) for automated region-of-interest extraction. We describe a reproducible four-step pipeline: (1) text-prompt-based zero-shot segmentation of the tarsal conjunctiva using SAM 3, (2) background removal and bounding-box cropping with alignment, (3) quality filtering based on confidence scores, and (4) Lanczos resizing to 224x224 pixels. A separate prompt-selection stage identifies the optimal text prompt, and manual quality assurance verifies outputs. Through comparison of five candidate prompts on all 2,832 known-label images, we identify "inner surface of eyelid with red tissue" as optimal, achieving a mean confidence of 0.872 (std 0.070) and 99.5% detection rate (the remaining 13 images are recovered via fallback prompts). The pipeline produces outputs in two formats: cropped and aligned images preserving the original aspect ratio, and standardized 224x224 images ready for pre-trained architectures. The OPTED dataset, preprocessing code, and all experimental artifacts are released as open source to facilitate reproducible trachoma classification research.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces the OPTED dataset of preprocessed clinical eyelid photographs for trachoma research, constructed via a four-step pipeline that applies zero-shot SAM 3 segmentation using text prompts to isolate the tarsal conjunctiva, followed by background removal, bounding-box cropping with alignment, confidence-based quality filtering, and Lanczos resizing to 224x224. Prompt comparison across all 2,832 labeled images selects 'inner surface of eyelid with red tissue' as optimal (mean confidence 0.872, std 0.070, 99.5% detection rate), with the remaining images recovered by fallback prompts; the dataset, code, and artifacts are released openly.
Significance. If the extracted regions prove anatomically accurate, OPTED fills a clear gap by supplying the first openly available preprocessed trachoma dataset from a high-burden region (Ethiopia), directly supporting reproducible machine-learning pipelines for automated classification. The open release of code and data is a concrete strength that lowers barriers for follow-on work.
major comments (2)
- [Prompt Selection Stage and Quality Filtering] Prompt Selection Stage and Quality Filtering section: the central claim that the pipeline reliably extracts the tarsal conjunctiva (99.5% detection, mean confidence 0.872) rests only on SAM 3 internal confidence scores plus an unspecified manual QA step; no overlap metrics (IoU, Dice, or boundary-error statistics) against expert-annotated ground-truth masks are reported on the 2,832 images, leaving potential systematic errors (e.g., consistent sclera inclusion or marginal-tissue exclusion) unquantified.
- [Pipeline description (step 3)] Pipeline description (step 3): the fallback mechanism for the 13 non-detected images is mentioned but not detailed (which prompts, how many images per fallback, final confidence distribution), so the completeness and uniformity of the released dataset cannot be fully assessed from the reported statistics alone.
minor comments (2)
- [Methods] The manuscript states that 'manual quality assurance verifies outputs' but provides no quantitative summary (e.g., number of images flagged, inter-rater agreement if multiple reviewers) or decision criteria, which would improve reproducibility.
- [Abstract and Data Release] Figure captions and the release statement could explicitly list all released artifacts (e.g., the prompt-comparison table, raw SAM masks, final cropped images) to match the abstract claim of 'all experimental artifacts'.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below and have revised the manuscript to improve transparency where feasible.
read point-by-point responses
-
Referee: [Prompt Selection Stage and Quality Filtering] Prompt Selection Stage and Quality Filtering section: the central claim that the pipeline reliably extracts the tarsal conjunctiva (99.5% detection, mean confidence 0.872) rests only on SAM 3 internal confidence scores plus an unspecified manual QA step; no overlap metrics (IoU, Dice, or boundary-error statistics) against expert-annotated ground-truth masks are reported on the 2,832 images, leaving potential systematic errors (e.g., consistent sclera inclusion or marginal-tissue exclusion) unquantified.
Authors: We acknowledge that the validation relies on SAM 3 confidence scores and manual QA rather than quantitative overlap metrics. No expert-annotated segmentation masks were available for the 2,832 images, as the source data consists of clinical photographs labeled only for trachoma grading (TF/TI/etc.), not pixel-level masks. Creating such ground truth would have required substantial additional expert annotation effort outside the scope of this dataset-release paper. The manual QA was performed by two ophthalmologists who reviewed outputs for anatomical fidelity, specifically checking against inclusion of sclera or exclusion of marginal conjunctival tissue. We have revised the manuscript to provide a detailed description of the QA protocol, sample size reviewed, and decision criteria. While we agree that IoU/Dice would be ideal, they are not feasible without new annotations. revision: partial
-
Referee: [Pipeline description (step 3)] Pipeline description (step 3): the fallback mechanism for the 13 non-detected images is mentioned but not detailed (which prompts, how many images per fallback, final confidence distribution), so the completeness and uniformity of the released dataset cannot be fully assessed from the reported statistics alone.
Authors: We agree that additional detail on the fallback is needed for full reproducibility. In the revised manuscript we have expanded the pipeline description to specify the exact fallback prompt used ('tarsal conjunctiva'), confirm it was applied uniformly to all 13 images, and report the resulting per-image confidence scores and their distribution. This information is now included in the main text and supplementary materials to allow readers to assess dataset uniformity. revision: yes
Circularity Check
No circularity: empirical pipeline using external SAM 3 with direct prompt evaluation
full rationale
The paper presents a four-step preprocessing pipeline that applies the external pre-trained SAM 3 model in zero-shot mode to 2,832 images. Prompt selection is performed by direct comparison of mean confidence scores across five candidate prompts on the full set of images, followed by quality filtering and manual QA. No equations, fitted parameters, or self-citations are used to derive the reported metrics (mean confidence 0.872, 99.5% detection rate); these are presented as observed empirical results. The central claims rest on the external SAM 3 model and held-out prompt testing rather than any self-referential definition or prediction that reduces to the inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption SAM 3 performs reliable zero-shot text-prompt segmentation on clinical eye photographs
Reference graph
Works this paper leans on
-
[1]
World Health Organization, “Trachoma: Key facts,” WHO, Nov. 2025. [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/ trachoma
work page 2025
- [2]
-
[3]
A. R. Lastet al., “Cluster randomised controlled trial of double-dose azithromycin mass drug administration, facial cleanliness and fly control measures for trachoma control in Oromia, Ethiopia: The Stronger SAFE trial protocol,”BMJ Open, vol. 14, no. 12, p. e084478, 2024
work page 2024
-
[4]
The global burden of trachoma: a review,
M. J. Burton and D. C. W. Mabey, “The global burden of trachoma: a review,”PLoS Negl. Trop. Dis., vol. 3, no. 10, p. e460, 2009
work page 2009
-
[5]
S. T. Sherief, C. Macleod, G. Gigar, H. Godefay, A. Abraha, M. Dejene, and A. W. Solomon, “The prevalence of trachoma in Tigray Region, Northern Ethiopia: Results of 11 population-based prevalence surveys completed as part of the Global Trachoma Mapping Project,”Oph- thalmic Epidemiol., vol. 23, no. sup1, pp. 94–99, 2016
work page 2016
-
[6]
World Health Organization, “Ending the neglect to attain the Sustainable Development Goals: A road map for neglected tropical diseases 2021– 2030,” WHO, 2021
work page 2021
-
[7]
M. C. Kimet al., “Sensitivity and specificity of computer vision classi- fication of eyelid photographs for programmatic trachoma assessment,” PLoS ONE, vol. 14, no. 2, pp. 1–12, 2019
work page 2019
-
[8]
Detection of trachoma using machine learning approaches,
D. Socia, C. J. Brady, S. K. West, and R. C. Cockrell, “Detection of trachoma using machine learning approaches,”PLoS Negl. Trop. Dis., vol. 16, pp. 1–15, 2022
work page 2022
-
[9]
Active trachoma: enhancing image classification using pretrained SOTA models and explainable AI,
Y . Pan, W. Lan, and B. Xu, “Active trachoma: enhancing image classification using pretrained SOTA models and explainable AI,”Front. Bacteriol., vol. 3, p. 1333641, 2024
work page 2024
-
[10]
Feature map quantifi- cation: An efficient approach for active trachoma image classification,
M. S. Zewudie, S. Xiong, X. Yu, and X. Wu, “Feature map quantifi- cation: An efficient approach for active trachoma image classification,” Comput. Biol. Med., 2025
work page 2025
-
[11]
Computer vision identification of trachomatous inflammation–follicular using deep learning,
A. S. Joyeet al., “Computer vision identification of trachomatous inflammation–follicular using deep learning,”Cornea, vol. 44, no. 5, pp. 613–618, 2025
work page 2025
-
[12]
Skin segmentation using color pixel classification: analysis and comparison,
S. L. Phung, A. Bouzerdoum, and D. Chai, “Skin segmentation using color pixel classification: analysis and comparison,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 1, pp. 148–154, 2005
work page 2005
-
[13]
A. Kirillovet al., “Segment anything,”arXiv preprint arXiv:2304.02643, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[14]
SAM 3: Segment Anything with Concepts
N. Carion, L. Gustafson, Y .-T. Hu, S. Debnath, R. Hu, D. Sur´ıs, C. Ryali, K. V . Alwala, H. Khedret al., “SAM 3: Segment Anything with Concepts,”arXiv preprint arXiv:2511.16719, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[15]
B. Yenegeta and Y . Assabie, “TrachomaNet: Detection and grading of trachoma using texture feature based deep convolutional neural network,”Multimed. Tools Appl., vol. 82, pp. 4209–4234, 2023
work page 2023
-
[16]
Filters for common resampling tasks,
K. Turkowski, “Filters for common resampling tasks,” inGraphics Gems, A. S. Glassner, Ed. Academic Press, 1990, pp. 147–165
work page 1990
-
[17]
A simple system for the assessment of trachoma and its complications,
B. Thylefors, C. R. Dawson, B. R. Jones, S. K. West, and H. R. Taylor, “A simple system for the assessment of trachoma and its complications,” Bull. World Health Organ., vol. 65, no. 4, pp. 477–483, 1987
work page 1987
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.