Clinical Validation of the Melanoscope AI Mobile Dermoscopy Clinical Decision Support System
Pith reviewed 2026-06-29 18:24 UTC · model grok-4.3
The pith
Cascade AI dermoscopy system shows no false negatives and 88.3 percent specificity in 176-patient validation
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The integrated cascade classification, attention map visualisation with IoU assessment, and three-zone routing provide reproducible, interpretable clinical decision support, demonstrated by no false negatives observed, 88.3 percent specificity, and 88.6 percent agreement with experts in prospective validation on 176 patients.
What carries the argument
Two-stage cascade deep learning classification of dermoscopic images, with attention rollout or Grad-CAM maps evaluated by mean IoU against expert annotations, plus probability thresholds routing patients into three zones.
If this is right
- The system can support screening use where dermatologist shortages limit coverage.
- Higher IoU values for certain models indicate better alignment with expert visual assessment of lesion features.
- Three-zone routing based on malignancy probability thresholds allows adaptation to varying resource levels.
- All five malignant lesions were correctly flagged without misses, while six dysplastic naevi were routed to follow-up.
- Mean IoU scores ranged from 0.69 for ViT down to 0.51 for EfficientNetV2, showing model-specific differences in map quality.
Where Pith is reading between the lines
- Multi-center testing would be needed to check whether the zero false-negative result generalizes beyond the single-site sample.
- The IoU agreement metric offers a concrete way to compare interpretability across different imaging AI systems.
- Mobile integration of the routing thresholds could increase screening reach in regions with few specialists.
- Further tuning of the cascade stages might raise specificity while preserving the observed sensitivity.
Load-bearing premise
The single-centre prospective validation across four Melanoma Day sessions is representative enough to support claims of reproducible clinical decision support and that the reported IoU agreement sufficiently validates model interpretability for clinical use.
What would settle it
A confirmed false negative malignant lesion in an independent multi-center study with a larger number of malignant cases would show the screening reliability claim does not hold.
Figures
read the original abstract
Introduction. Early detection of malignant skin lesions is critical for prognosis, yet dermatologist shortages in Russian regions limit screening coverage. Mobile dermoscopy clinical decision support systems (CDSS) offer a promising approach, with model interpretability and standardised patient routing remaining key barriers to adoption. Aim. To develop a quantitative interpretability assessment method for cascade deep learning models and a three-zone patient routing algorithm, and to conduct a preliminary single-centre prospective clinical validation of the Melanoscope AI CDSS in Russian outpatient practice. Material and methods. Two-stage cascade classification of dermoscopic images; attention map visualisation (attention rollout for ViT and Swin; Grad-CAM for ConvNeXt and EfficientNetV2); quantitative IoU-based agreement assessment between activation maps and expert annotations; prospective single-centre validation across four "Melanoma Day" sessions (Orel, Russia, June 2025 - April 2026). Results. On 176 patients: agreement with expert assessment 88.6%; no false negatives among 5 malignant lesions (95% CI: 47.8-100.0%); specificity 88.3%. Three melanomas and two basal cell carcinomas were histologically confirmed; six dysplastic naevi placed under follow-up. Mean IoU (n=180): ViT - 0.69; Swin - 0.64; ConvNeXt - 0.53; EfficientNetV2 - 0.51. Routing thresholds: P<0.15 / 0.15-0.50 / >=0.50. Conclusion. No false negatives were observed; specificity was 88.3%, supporting screening use. The integrated cascade classification, attention map visualisation with IoU assessment, and three-zone routing provide reproducible, interpretable clinical decision support adaptable to varying resource levels.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents the development of a two-stage cascade deep learning model for dermoscopic image classification, with attention map visualization (using attention rollout or Grad-CAM) and quantitative IoU assessment against expert annotations for interpretability, plus a three-zone routing algorithm based on malignancy probability thresholds. In a prospective single-centre validation on 176 patients from four Melanoma Day sessions in Orel, Russia, it reports 88.6% agreement with experts, zero false negatives among 5 histologically confirmed malignant lesions (3 melanomas, 2 BCCs; 95% CI 47.8-100%), 88.3% specificity, mean IoU scores of 0.51-0.69 across models, and concludes that the system supports screening use with reproducible, interpretable CDSS adaptable to resource levels.
Significance. If the performance holds in larger multi-centre studies, the work could aid early detection of skin malignancies in dermatologist-shortage regions by offering a mobile, interpretable CDSS with explicit routing. The quantitative IoU-based interpretability metric and three-zone routing (P<0.15 / 0.15-0.50 / >=0.50) are constructive contributions for clinical translation. However, the small malignant sample size substantially limits the strength of the sensitivity claim and thus the immediate significance for screening applications.
major comments (2)
- [Results] Results section (and Abstract/Conclusion): The central claim that 'no false negatives were observed; specificity was 88.3%, supporting screening use' is not load-bearing supported by the data. With only n=5 malignant lesions, the 95% CI of 47.8-100% for sensitivity remains compatible with values as low as ~48%, which is insufficient for screening claims that require high sensitivity with narrow precision. Specificity on ~171 benign cases is better powered but does not offset this for the headline conclusion.
- [Methods] Methods section: The single-centre prospective design limited to four Melanoma Day sessions does not provide sufficient evidence for the claim of 'reproducible clinical decision support adaptable to varying resource levels,' as generalizability across centres, populations, and settings remains untested.
minor comments (2)
- [Abstract] Abstract: Detailed exclusion criteria for the 176 patients and the full study protocol are not described, which would improve assessment of the validation cohort and potential biases.
- [Results] Results: Consider reporting 95% CI for specificity alongside the point estimate for completeness and consistency with the sensitivity reporting.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive review. We agree that the small number of malignant cases and single-centre design limit the strength of certain claims in the original manuscript. We have revised the Abstract, Results, and Conclusion sections to present the findings as preliminary, prominently report the confidence interval, and remove unsupported claims about screening utility and generalizability. Below we respond point by point.
read point-by-point responses
-
Referee: [Results] Results section (and Abstract/Conclusion): The central claim that 'no false negatives were observed; specificity was 88.3%, supporting screening use' is not load-bearing supported by the data. With only n=5 malignant lesions, the 95% CI of 47.8-100% for sensitivity remains compatible with values as low as ~48%, which is insufficient for screening claims that require high sensitivity with narrow precision. Specificity on ~171 benign cases is better powered but does not offset this for the headline conclusion.
Authors: We agree that the wide 95% CI for sensitivity precludes strong screening claims. The manuscript has been revised to (1) state the observed sensitivity as 100% (5/5) with the explicit CI 47.8–100%, (2) remove the phrase 'supporting screening use' from the Abstract and Conclusion, and (3) reframe the work as a preliminary single-centre validation whose primary contributions are the IoU-based interpretability metric and the three-zone routing algorithm. These changes ensure the headline claims match the statistical power of the data. revision: yes
-
Referee: [Methods] Methods section: The single-centre prospective design limited to four Melanoma Day sessions does not provide sufficient evidence for the claim of 'reproducible clinical decision support adaptable to varying resource levels,' as generalizability across centres, populations, and settings remains untested.
Authors: We accept that the single-centre design does not demonstrate reproducibility or adaptability across settings. The original phrasing referred to the adjustable routing thresholds, but we agree this is an untested hypothesis. The Conclusion has been revised to state that the routing framework 'provides a template that may be adapted to local resources once validated in multi-centre studies,' removing any implication of current generalizability. revision: yes
Circularity Check
No circularity: empirical validation study with direct performance reporting
full rationale
The paper is a single-centre prospective clinical validation of a dermoscopy CDSS. It reports observed agreement (88.6%), sensitivity (0/5 false negatives), specificity (88.3%), and mean IoU values on a cohort of 176 patients. No equations, parameter fitting to the target metrics, self-definitional loops, or load-bearing self-citations are present. All reported figures are direct empirical counts or averages from the validation sessions; the derivation chain is simply data collection followed by standard metric computation. This matches the default expectation of no circularity for an observational validation study.
Axiom & Free-Parameter Ledger
free parameters (1)
- routing probability thresholds
axioms (1)
- domain assumption Attention map overlap with expert annotations (IoU) is a valid quantitative measure of model interpretability
Forward citations
Cited by 1 Pith paper
-
Cascade Classification of Dermoscopic Images of Skin Neoplasms with Controllable Sensitivity and External Clinical Validation
Cascade classification improves macro F1 over single-stage for some models by allowing sensitivity control but reveals a large generalization gap on external clinical data.
Reference graph
Works this paper leans on
-
[1]
Sung H., Ferlay J., Siegel R.L., Laversanne M., Soerjomataram I., Jemal A., Bray F. Global cancer statistics 2020: GLOBOCAN es- timates of incidence and mortality worldwide for 36 cancers in 185 countries.CA: A Cancer Journal for Clinicians. 2021;71(3):209–249. https://doi.org/10.3322/caac.21660
-
[2]
Na- tional Cancer Institute
SEER Cancer Stat Facts: Melanoma of the Skin. Na- tional Cancer Institute. Bethesda, MD. Available from: https://seer.cancer.gov/statfacts/html/melan.html
-
[3]
Diagnos- tic accuracy of dermoscopy.Lancet Oncology
Kittler H., Pehamberger H., Wolff K., Binder M. Diagnos- tic accuracy of dermoscopy.Lancet Oncology. 2002;3(3):159–165. https://doi.org/10.1016/S1470-2045(02)00679-4
-
[4]
Dermatologist-level classification of skin can- cer with deep neural networks.Nature
Esteva A., Kuprel B., Novoa R.A., Ko J., Swetter S.M., Blau H.M., Thrun S. Dermatologist-level classification of skin can- cer with deep neural networks.Nature. 2017;542(7639):115–118. https://doi.org/10.1038/nature21056
-
[5]
Brinker T.J., Hekler A., Enk A.H., Berking C., Haferkamp S., Hauschild A., Weichenthal M., Klode J., Schadendorf D., Holland- Letz T., von Kalle C., Fröhling S., Schilling B., Utikal J.S. Deep learn- ing outperformed 136 of 157 dermatologists in a head-to-head dermo- scopic melanoma image classification task.European Journal of Cancer. 2019;113:47–54. htt...
-
[6]
Udrea A., Mitra G.D., Costea D., Noels E.C., Wakkee M., Siegel D.M., de Carvalho T.M., Nijsten T. Accuracy of a smartphone applica- tion for triage of skin lesions based on machine learning algorithms. Journal of the European Academy of Dermatology and Venereology. 2020;34(3):648–655. https://doi.org/10.1111/jdv.15935. 22
-
[7]
A deep learn- ing system for differential diagnosis of skin diseases.Nature Medicine
Liu Y., Jain A., Eng C., Way D.H., Lee K., Bui P., Kanada K., de Oliveira Marinho G., Gallegos J., Gabriele S., Gupta V., Singh N., Natarajan V., Hofmann-Wellenhof R., Corrado G.S., Peng L.H., Web- ster D.R., Ai D., Huang S.J., Liu Y., Dunn R.C., Coz D. A deep learn- ing system for differential diagnosis of skin diseases.Nature Medicine. 2020;26(6):900–90...
-
[8]
What clini- cians want: contextualizing explainable machine learning for clinical end use.Proceedings of Machine Learning Research
Tonekaboni S., Joshi S., McCradden M.D., Goldenberg A. What clini- cians want: contextualizing explainable machine learning for clinical end use.Proceedings of Machine Learning Research. 2019;106:359–380
2019
-
[9]
Reyes M., Meier R., Pereira S., Silva C.A., Dahlweid F.-M., von Tengg-Kobligk H., Summers R.M., Wiest R. On the inter- pretability of artificial intelligence in radiology: challenges and opportunities.Radiology: Artificial Intelligence. 2020;2(3):e190043. https://doi.org/10.1148/ryai.2020190043
-
[10]
Abnar S., Zuidema W. Quantifying attention flow in trans- formers.Proceedings of the 58th Annual Meeting of the As- sociation for Computational Linguistics. 2020:4190–4197. https://doi.org/10.18653/v1/2020.acl-main.385
-
[11]
Selvaraju R.R., Cogswell M., Das A., Vedantam R., Parikh D., Batra D. Grad-CAM: visual explanations from deep networks via gradient-based localization.Proceedings of the IEEE International Conference on Com- puter Vision. 2017:618–626. https://doi.org/10.1109/ICCV.2017.74
-
[12]
ArgenzianoG., SoyerH.P., ChimentiS., TalaminiR., CoronaR., SeraF. et al. Dermoscopy of pigmented skin lesions: results of a consensus net meeting via the Internet.Journal of the American Academy of Derma- tology. 2003;48(5):679–693. https://doi.org/10.1067/mjd.2003.281
-
[13]
Human- computer collaboration for skin cancer recognition.Nature Medicine
Tschandl P., Rinner C., Apalla Z., Argenziano G., Codella N., Halpern A., Janda M., Lallas A., Longo C., Malvehy J., Paoli J., Puig S., Rosendahl C., Soyer H.P., Zalaudek I., Kittler H. Human- computer collaboration for skin cancer recognition.Nature Medicine. 2020;26(8):1229–1234. https://doi.org/10.1038/s41591-020-0942-0
-
[14]
Combalia M., Codella N., Rotemberg V., Carrera C., Dusza S., Gutman D., Helba B., Kittler H., Kurtansky N.R., Liopyris K., Marchetti M.A., Podlipnik S., Puig S., Rinner C., Tschandl P., We- ber J., Halpern A., Malvehy J. Validation of artificial intelligence predic- tion models for skin cancer diagnosis using dermoscopy images: the 2019 23 International S...
-
[15]
Kozachok E.S. A dermoscopic image dataset with high-quality anno- tation of clinically significant features for diagnosis of melanocytic skin lesions.Izvestiya Yugo-Zapadnogo gosudarstvennogo uni- versiteta. Seriya: Upravlenie, vychislitel’naya tekhnika, infor- matika. Meditsinskoe priborostroenie. 2025;15(3):93–111. (In Russ.) https://doi.org/10.21869/22...
-
[16]
Methodology for Creating a Clinically Verified Dermoscopic Image Dataset
Kozachok E.S. Methodology for Creating a Clinically Verified Der- moscopic Image Dataset [Preprint]. 2026. arXiv:2605.25168 [cs.CV]. https://doi.org/10.48550/arXiv.2605.25168
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2605.25168 2026
-
[17]
Kozachok E.S., Seregin S.S., Kozachok A.V., Eletskiy K.V., Samovarov O.I. Screening methodology for early differen- tial diagnosis of skin lesions using mobile dermoscopy.Vrach i informatsionnye tekhnologii. 2025;(3):50–64. (In Russ.) https://doi.org/10.25881/18110193_2025_3_50
-
[18]
Kozachok E.S., Seregin S.S., Kozachok A.V., Eletskiy K.V., Samovarov O.I. An intelligent clinical decision support system for di- agnosis of skin lesions based on dermoscopic image analysis.Izvestiya Yugo-Zapadnogo gosudarstvennogo universiteta. Seriya: Upravlenie, vychislitel’naya tekhnika, informatika. Meditsinskoe priborostroenie. 2025;15(3):50–65. (In...
-
[19]
An intelligent clinical decision support sys- tem for diagnosis of skin lesions based on mobile dermoscopy.Rossiyskiy zhurnal telemeditsiny i elektronnogo zdravookhraneniya
Kozachok E.S., Seregin S.S. An intelligent clinical decision support sys- tem for diagnosis of skin lesions based on mobile dermoscopy.Rossiyskiy zhurnal telemeditsiny i elektronnogo zdravookhraneniya. 2025;11(3):38–
2025
-
[20]
(In Russ.) https://doi.org/10.29188/2712-9217-2025-11-3-38-44
-
[21]
Melanoscope AI — a clinical decision support system for skin lesion diagnosis based on mobile dermoscopy
Kozachok E.S. Melanoscope AI — a clinical decision support system for skin lesion diagnosis based on mobile dermoscopy. Certificate of State Registration of Computer Program No. 2026664695. Register of Com- puter Programs. 2026. Rightholder: ISP RAS. (In Russ.)
2026
-
[22]
Melanoscope AI Mo- bile — an application for malignant skin lesion detection based on mo- bile dermoscopy
Kozachok A.V., Khomichuk M.V., Kozachok E.S. Melanoscope AI Mo- bile — an application for malignant skin lesion detection based on mo- bile dermoscopy. Certificate of State Registration of Computer Program No. 2026665028. Register of Computer Programs. 2026, 20 May. (In Russ.) 24
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.