pith. sign in

arxiv: 2605.27561 · v1 · pith:RC2VKD6Dnew · submitted 2026-05-26 · 💻 cs.CV · cs.AI

Clinical Validation of the Melanoscope AI Mobile Dermoscopy Clinical Decision Support System

Pith reviewed 2026-06-29 18:24 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords mobile dermoscopyAI clinical decision supportmelanoma screeningattention mapscascade classificationprospective validationskin lesion detection
0
0 comments X

The pith

Cascade AI dermoscopy system shows no false negatives and 88.3 percent specificity in 176-patient validation

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a mobile dermoscopy clinical decision support system that combines two-stage cascade deep learning classification with attention map visualization and a three-zone routing algorithm. It validates the system prospectively in a single Russian center across four screening sessions on 176 patients, reporting 88.6 percent agreement with expert assessment, zero false negatives among five malignant lesions, and 88.3 percent specificity. The authors argue that the quantitative IoU evaluation of attention maps against expert annotations plus probability-based routing makes the output reproducible and adaptable to different levels of clinical resources. This approach targets early detection of skin malignancies where dermatologist access is limited.

Core claim

The integrated cascade classification, attention map visualisation with IoU assessment, and three-zone routing provide reproducible, interpretable clinical decision support, demonstrated by no false negatives observed, 88.3 percent specificity, and 88.6 percent agreement with experts in prospective validation on 176 patients.

What carries the argument

Two-stage cascade deep learning classification of dermoscopic images, with attention rollout or Grad-CAM maps evaluated by mean IoU against expert annotations, plus probability thresholds routing patients into three zones.

If this is right

  • The system can support screening use where dermatologist shortages limit coverage.
  • Higher IoU values for certain models indicate better alignment with expert visual assessment of lesion features.
  • Three-zone routing based on malignancy probability thresholds allows adaptation to varying resource levels.
  • All five malignant lesions were correctly flagged without misses, while six dysplastic naevi were routed to follow-up.
  • Mean IoU scores ranged from 0.69 for ViT down to 0.51 for EfficientNetV2, showing model-specific differences in map quality.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Multi-center testing would be needed to check whether the zero false-negative result generalizes beyond the single-site sample.
  • The IoU agreement metric offers a concrete way to compare interpretability across different imaging AI systems.
  • Mobile integration of the routing thresholds could increase screening reach in regions with few specialists.
  • Further tuning of the cascade stages might raise specificity while preserving the observed sensitivity.

Load-bearing premise

The single-centre prospective validation across four Melanoma Day sessions is representative enough to support claims of reproducible clinical decision support and that the reported IoU agreement sufficiently validates model interpretability for clinical use.

What would settle it

A confirmed false negative malignant lesion in an independent multi-center study with a larger number of malignant cases would show the screening reliability claim does not hold.

Figures

Figures reproduced from arXiv: 2605.27561 by Elena Sergeevna Kozachok, Sergey Sergeevich Seregin.

Figure 1
Figure 1. Figure 1: Architecture of the Melanoscope AI CDSS: mobile image-acquisition application, server-side cascade inference subsystem and attention-map visualisation module. Arrows show data flows during a standard examination cycle. 3. Attention-map visualisation module, generating an activation heatmap for each classification result and assessing its agreement with clinically relevant image regions. Unlike stationary d… view at source ↗
Figure 2
Figure 2. Figure 2: Mean IoU values for four architectures and four nosological classes. [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Attention map examples for three nosological groups (rows: [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Confusion matrix of the automatic classification ( [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Flowchart of the three-zone patient routing algorithm. [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Distribution of 176 validation-sample patients across the routing zones. The [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗
read the original abstract

Introduction. Early detection of malignant skin lesions is critical for prognosis, yet dermatologist shortages in Russian regions limit screening coverage. Mobile dermoscopy clinical decision support systems (CDSS) offer a promising approach, with model interpretability and standardised patient routing remaining key barriers to adoption. Aim. To develop a quantitative interpretability assessment method for cascade deep learning models and a three-zone patient routing algorithm, and to conduct a preliminary single-centre prospective clinical validation of the Melanoscope AI CDSS in Russian outpatient practice. Material and methods. Two-stage cascade classification of dermoscopic images; attention map visualisation (attention rollout for ViT and Swin; Grad-CAM for ConvNeXt and EfficientNetV2); quantitative IoU-based agreement assessment between activation maps and expert annotations; prospective single-centre validation across four "Melanoma Day" sessions (Orel, Russia, June 2025 - April 2026). Results. On 176 patients: agreement with expert assessment 88.6%; no false negatives among 5 malignant lesions (95% CI: 47.8-100.0%); specificity 88.3%. Three melanomas and two basal cell carcinomas were histologically confirmed; six dysplastic naevi placed under follow-up. Mean IoU (n=180): ViT - 0.69; Swin - 0.64; ConvNeXt - 0.53; EfficientNetV2 - 0.51. Routing thresholds: P<0.15 / 0.15-0.50 / >=0.50. Conclusion. No false negatives were observed; specificity was 88.3%, supporting screening use. The integrated cascade classification, attention map visualisation with IoU assessment, and three-zone routing provide reproducible, interpretable clinical decision support adaptable to varying resource levels.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents the development of a two-stage cascade deep learning model for dermoscopic image classification, with attention map visualization (using attention rollout or Grad-CAM) and quantitative IoU assessment against expert annotations for interpretability, plus a three-zone routing algorithm based on malignancy probability thresholds. In a prospective single-centre validation on 176 patients from four Melanoma Day sessions in Orel, Russia, it reports 88.6% agreement with experts, zero false negatives among 5 histologically confirmed malignant lesions (3 melanomas, 2 BCCs; 95% CI 47.8-100%), 88.3% specificity, mean IoU scores of 0.51-0.69 across models, and concludes that the system supports screening use with reproducible, interpretable CDSS adaptable to resource levels.

Significance. If the performance holds in larger multi-centre studies, the work could aid early detection of skin malignancies in dermatologist-shortage regions by offering a mobile, interpretable CDSS with explicit routing. The quantitative IoU-based interpretability metric and three-zone routing (P<0.15 / 0.15-0.50 / >=0.50) are constructive contributions for clinical translation. However, the small malignant sample size substantially limits the strength of the sensitivity claim and thus the immediate significance for screening applications.

major comments (2)
  1. [Results] Results section (and Abstract/Conclusion): The central claim that 'no false negatives were observed; specificity was 88.3%, supporting screening use' is not load-bearing supported by the data. With only n=5 malignant lesions, the 95% CI of 47.8-100% for sensitivity remains compatible with values as low as ~48%, which is insufficient for screening claims that require high sensitivity with narrow precision. Specificity on ~171 benign cases is better powered but does not offset this for the headline conclusion.
  2. [Methods] Methods section: The single-centre prospective design limited to four Melanoma Day sessions does not provide sufficient evidence for the claim of 'reproducible clinical decision support adaptable to varying resource levels,' as generalizability across centres, populations, and settings remains untested.
minor comments (2)
  1. [Abstract] Abstract: Detailed exclusion criteria for the 176 patients and the full study protocol are not described, which would improve assessment of the validation cohort and potential biases.
  2. [Results] Results: Consider reporting 95% CI for specificity alongside the point estimate for completeness and consistency with the sensitivity reporting.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review. We agree that the small number of malignant cases and single-centre design limit the strength of certain claims in the original manuscript. We have revised the Abstract, Results, and Conclusion sections to present the findings as preliminary, prominently report the confidence interval, and remove unsupported claims about screening utility and generalizability. Below we respond point by point.

read point-by-point responses
  1. Referee: [Results] Results section (and Abstract/Conclusion): The central claim that 'no false negatives were observed; specificity was 88.3%, supporting screening use' is not load-bearing supported by the data. With only n=5 malignant lesions, the 95% CI of 47.8-100% for sensitivity remains compatible with values as low as ~48%, which is insufficient for screening claims that require high sensitivity with narrow precision. Specificity on ~171 benign cases is better powered but does not offset this for the headline conclusion.

    Authors: We agree that the wide 95% CI for sensitivity precludes strong screening claims. The manuscript has been revised to (1) state the observed sensitivity as 100% (5/5) with the explicit CI 47.8–100%, (2) remove the phrase 'supporting screening use' from the Abstract and Conclusion, and (3) reframe the work as a preliminary single-centre validation whose primary contributions are the IoU-based interpretability metric and the three-zone routing algorithm. These changes ensure the headline claims match the statistical power of the data. revision: yes

  2. Referee: [Methods] Methods section: The single-centre prospective design limited to four Melanoma Day sessions does not provide sufficient evidence for the claim of 'reproducible clinical decision support adaptable to varying resource levels,' as generalizability across centres, populations, and settings remains untested.

    Authors: We accept that the single-centre design does not demonstrate reproducibility or adaptability across settings. The original phrasing referred to the adjustable routing thresholds, but we agree this is an untested hypothesis. The Conclusion has been revised to state that the routing framework 'provides a template that may be adapted to local resources once validated in multi-centre studies,' removing any implication of current generalizability. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical validation study with direct performance reporting

full rationale

The paper is a single-centre prospective clinical validation of a dermoscopy CDSS. It reports observed agreement (88.6%), sensitivity (0/5 false negatives), specificity (88.3%), and mean IoU values on a cohort of 176 patients. No equations, parameter fitting to the target metrics, self-definitional loops, or load-bearing self-citations are present. All reported figures are direct empirical counts or averages from the validation sessions; the derivation chain is simply data collection followed by standard metric computation. This matches the default expectation of no circularity for an observational validation study.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The abstract provides limited detail on background assumptions; the three-zone routing thresholds appear chosen rather than derived, and the claim that IoU agreement validates interpretability rests on an unstated domain assumption that expert annotations are the correct reference standard.

free parameters (1)
  • routing probability thresholds
    P<0.15 / 0.15-0.50 / >=0.50 chosen to define the three patient routing zones
axioms (1)
  • domain assumption Attention map overlap with expert annotations (IoU) is a valid quantitative measure of model interpretability
    Invoked when reporting mean IoU values for ViT, Swin, ConvNeXt and EfficientNetV2

pith-pipeline@v0.9.1-grok · 5879 in / 1364 out tokens · 39222 ms · 2026-06-29T18:24:18.139801+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Cascade Classification of Dermoscopic Images of Skin Neoplasms with Controllable Sensitivity and External Clinical Validation

    cs.CV 2026-06 unverdicted novelty 4.0

    Cascade classification improves macro F1 over single-stage for some models by allowing sensitivity control but reveals a large generalization gap on external clinical data.

Reference graph

Works this paper leans on

22 extracted references · 17 canonical work pages · cited by 1 Pith paper · 1 internal anchor

  1. [1]

    Global cancer statistics 2020: GLOBOCAN es- timates of incidence and mortality worldwide for 36 cancers in 185 countries.CA: A Cancer Journal for Clinicians

    Sung H., Ferlay J., Siegel R.L., Laversanne M., Soerjomataram I., Jemal A., Bray F. Global cancer statistics 2020: GLOBOCAN es- timates of incidence and mortality worldwide for 36 cancers in 185 countries.CA: A Cancer Journal for Clinicians. 2021;71(3):209–249. https://doi.org/10.3322/caac.21660

  2. [2]

    Na- tional Cancer Institute

    SEER Cancer Stat Facts: Melanoma of the Skin. Na- tional Cancer Institute. Bethesda, MD. Available from: https://seer.cancer.gov/statfacts/html/melan.html

  3. [3]

    Diagnos- tic accuracy of dermoscopy.Lancet Oncology

    Kittler H., Pehamberger H., Wolff K., Binder M. Diagnos- tic accuracy of dermoscopy.Lancet Oncology. 2002;3(3):159–165. https://doi.org/10.1016/S1470-2045(02)00679-4

  4. [4]

    Dermatologist-level classification of skin can- cer with deep neural networks.Nature

    Esteva A., Kuprel B., Novoa R.A., Ko J., Swetter S.M., Blau H.M., Thrun S. Dermatologist-level classification of skin can- cer with deep neural networks.Nature. 2017;542(7639):115–118. https://doi.org/10.1038/nature21056

  5. [5]

    Deep learn- ing outperformed 136 of 157 dermatologists in a head-to-head dermo- scopic melanoma image classification task.European Journal of Cancer

    Brinker T.J., Hekler A., Enk A.H., Berking C., Haferkamp S., Hauschild A., Weichenthal M., Klode J., Schadendorf D., Holland- Letz T., von Kalle C., Fröhling S., Schilling B., Utikal J.S. Deep learn- ing outperformed 136 of 157 dermatologists in a head-to-head dermo- scopic melanoma image classification task.European Journal of Cancer. 2019;113:47–54. htt...

  6. [6]

    Accuracy of a smartphone applica- tion for triage of skin lesions based on machine learning algorithms

    Udrea A., Mitra G.D., Costea D., Noels E.C., Wakkee M., Siegel D.M., de Carvalho T.M., Nijsten T. Accuracy of a smartphone applica- tion for triage of skin lesions based on machine learning algorithms. Journal of the European Academy of Dermatology and Venereology. 2020;34(3):648–655. https://doi.org/10.1111/jdv.15935. 22

  7. [7]

    A deep learn- ing system for differential diagnosis of skin diseases.Nature Medicine

    Liu Y., Jain A., Eng C., Way D.H., Lee K., Bui P., Kanada K., de Oliveira Marinho G., Gallegos J., Gabriele S., Gupta V., Singh N., Natarajan V., Hofmann-Wellenhof R., Corrado G.S., Peng L.H., Web- ster D.R., Ai D., Huang S.J., Liu Y., Dunn R.C., Coz D. A deep learn- ing system for differential diagnosis of skin diseases.Nature Medicine. 2020;26(6):900–90...

  8. [8]

    What clini- cians want: contextualizing explainable machine learning for clinical end use.Proceedings of Machine Learning Research

    Tonekaboni S., Joshi S., McCradden M.D., Goldenberg A. What clini- cians want: contextualizing explainable machine learning for clinical end use.Proceedings of Machine Learning Research. 2019;106:359–380

  9. [9]

    On the inter- pretability of artificial intelligence in radiology: challenges and opportunities.Radiology: Artificial Intelligence

    Reyes M., Meier R., Pereira S., Silva C.A., Dahlweid F.-M., von Tengg-Kobligk H., Summers R.M., Wiest R. On the inter- pretability of artificial intelligence in radiology: challenges and opportunities.Radiology: Artificial Intelligence. 2020;2(3):e190043. https://doi.org/10.1148/ryai.2020190043

  10. [10]

    Quantifying attention flow in trans- formers.Proceedings of the 58th Annual Meeting of the As- sociation for Computational Linguistics

    Abnar S., Zuidema W. Quantifying attention flow in trans- formers.Proceedings of the 58th Annual Meeting of the As- sociation for Computational Linguistics. 2020:4190–4197. https://doi.org/10.18653/v1/2020.acl-main.385

  11. [11]

    Grad-CAM: visual explanations from deep networks via gradient-based localization.Proceedings of the IEEE International Conference on Com- puter Vision

    Selvaraju R.R., Cogswell M., Das A., Vedantam R., Parikh D., Batra D. Grad-CAM: visual explanations from deep networks via gradient-based localization.Proceedings of the IEEE International Conference on Com- puter Vision. 2017:618–626. https://doi.org/10.1109/ICCV.2017.74

  12. [12]

    ArgenzianoG., SoyerH.P., ChimentiS., TalaminiR., CoronaR., SeraF. et al. Dermoscopy of pigmented skin lesions: results of a consensus net meeting via the Internet.Journal of the American Academy of Derma- tology. 2003;48(5):679–693. https://doi.org/10.1067/mjd.2003.281

  13. [13]

    Human- computer collaboration for skin cancer recognition.Nature Medicine

    Tschandl P., Rinner C., Apalla Z., Argenziano G., Codella N., Halpern A., Janda M., Lallas A., Longo C., Malvehy J., Paoli J., Puig S., Rosendahl C., Soyer H.P., Zalaudek I., Kittler H. Human- computer collaboration for skin cancer recognition.Nature Medicine. 2020;26(8):1229–1234. https://doi.org/10.1038/s41591-020-0942-0

  14. [14]

    Combalia M., Codella N., Rotemberg V., Carrera C., Dusza S., Gutman D., Helba B., Kittler H., Kurtansky N.R., Liopyris K., Marchetti M.A., Podlipnik S., Puig S., Rinner C., Tschandl P., We- ber J., Halpern A., Malvehy J. Validation of artificial intelligence predic- tion models for skin cancer diagnosis using dermoscopy images: the 2019 23 International S...

  15. [15]

    Kozachok E.S. A dermoscopic image dataset with high-quality anno- tation of clinically significant features for diagnosis of melanocytic skin lesions.Izvestiya Yugo-Zapadnogo gosudarstvennogo uni- versiteta. Seriya: Upravlenie, vychislitel’naya tekhnika, infor- matika. Meditsinskoe priborostroenie. 2025;15(3):93–111. (In Russ.) https://doi.org/10.21869/22...

  16. [16]

    Methodology for Creating a Clinically Verified Dermoscopic Image Dataset

    Kozachok E.S. Methodology for Creating a Clinically Verified Der- moscopic Image Dataset [Preprint]. 2026. arXiv:2605.25168 [cs.CV]. https://doi.org/10.48550/arXiv.2605.25168

  17. [17]

    Screening methodology for early differen- tial diagnosis of skin lesions using mobile dermoscopy.Vrach i informatsionnye tekhnologii

    Kozachok E.S., Seregin S.S., Kozachok A.V., Eletskiy K.V., Samovarov O.I. Screening methodology for early differen- tial diagnosis of skin lesions using mobile dermoscopy.Vrach i informatsionnye tekhnologii. 2025;(3):50–64. (In Russ.) https://doi.org/10.25881/18110193_2025_3_50

  18. [18]

    An intelligent clinical decision support system for di- agnosis of skin lesions based on dermoscopic image analysis.Izvestiya Yugo-Zapadnogo gosudarstvennogo universiteta

    Kozachok E.S., Seregin S.S., Kozachok A.V., Eletskiy K.V., Samovarov O.I. An intelligent clinical decision support system for di- agnosis of skin lesions based on dermoscopic image analysis.Izvestiya Yugo-Zapadnogo gosudarstvennogo universiteta. Seriya: Upravlenie, vychislitel’naya tekhnika, informatika. Meditsinskoe priborostroenie. 2025;15(3):50–65. (In...

  19. [19]

    An intelligent clinical decision support sys- tem for diagnosis of skin lesions based on mobile dermoscopy.Rossiyskiy zhurnal telemeditsiny i elektronnogo zdravookhraneniya

    Kozachok E.S., Seregin S.S. An intelligent clinical decision support sys- tem for diagnosis of skin lesions based on mobile dermoscopy.Rossiyskiy zhurnal telemeditsiny i elektronnogo zdravookhraneniya. 2025;11(3):38–

  20. [20]

    (In Russ.) https://doi.org/10.29188/2712-9217-2025-11-3-38-44

  21. [21]

    Melanoscope AI — a clinical decision support system for skin lesion diagnosis based on mobile dermoscopy

    Kozachok E.S. Melanoscope AI — a clinical decision support system for skin lesion diagnosis based on mobile dermoscopy. Certificate of State Registration of Computer Program No. 2026664695. Register of Com- puter Programs. 2026. Rightholder: ISP RAS. (In Russ.)

  22. [22]

    Melanoscope AI Mo- bile — an application for malignant skin lesion detection based on mo- bile dermoscopy

    Kozachok A.V., Khomichuk M.V., Kozachok E.S. Melanoscope AI Mo- bile — an application for malignant skin lesion detection based on mo- bile dermoscopy. Certificate of State Registration of Computer Program No. 2026665028. Register of Computer Programs. 2026, 20 May. (In Russ.) 24