pith. sign in

arxiv: 2606.18707 · v1 · pith:M6Q55WOGnew · submitted 2026-06-17 · 💻 cs.CV

PEFT-MedSAM: Efficient Fine-Tuning of Medical Foundation Models for Explainable Skin Lesion Segmentation

Pith reviewed 2026-06-26 21:27 UTC · model grok-4.3

classification 💻 cs.CV
keywords PEFTMedSAMskin lesion segmentationdermoscopic imagesparameter-efficient fine-tuningexplainable AIISIC 2018Grad-CAM
0
0 comments X

The pith

Fine-tuning only the mask decoder of MedSAM beats both U-Net and zero-shot MedSAM on skin lesion segmentation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents PEFT-MedSAM as a way to adapt the Medical Segment Anything Model to dermoscopic skin lesion segmentation by training solely the mask decoder while leaving the pre-trained image encoder and prompt encoder unchanged. On the ISIC 2018 dataset this yields a Dice score of 0.9411 and IoU of 0.8918, exceeding a fully trained U-Net baseline at 0.8715 Dice and zero-shot MedSAM at 0.8997 Dice. External testing on the PH2 dataset produces 0.9467 Dice with standard deviation of 0.0310, supported by statistical tests showing p-values below 0.0001. Grad-CAM is added to generate visual explanations, reaching 98.27 percent accuracy in pointing to lesion areas on a 519-image validation set.

Core claim

PEFT-MedSAM shows that updating only the mask decoder while freezing the image and prompt encoders from MedSAM produces more accurate segmentation of skin lesions in dermoscopic images than either full training of a U-Net or direct zero-shot application of MedSAM, with the performance advantage confirmed on both internal and external test sets.

What carries the argument

The mask decoder of MedSAM, which is the only component updated during training while the image encoder and prompt encoder remain frozen.

If this is right

  • Foundation medical models can be adapted to new imaging tasks with far fewer updated parameters than full retraining.
  • Segmentation accuracy on skin lesions improves when the decoder is tuned to the target distribution while general visual features stay fixed.
  • Clinical users gain visual explanations of model focus through Grad-CAM without extra architectural changes.
  • The same frozen-encoder strategy may reduce compute cost when deploying MedSAM on other narrow medical segmentation problems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The encoders appear to capture domain-general cues that transfer to dermoscopy without retraining.
  • Similar parameter-efficient updates could be tested on other MedSAM-derived tasks such as polyp or organ segmentation.
  • The pointing-game accuracy suggests the model's internal attention aligns closely with human-identified lesion boundaries.

Load-bearing premise

The pre-trained image encoder and prompt encoder from MedSAM already contain features sufficient for dermoscopic skin lesion images, so that training the mask decoder alone can produce better results than full-model training.

What would settle it

Retraining the image encoder on the same ISIC 2018 split and obtaining a Dice coefficient below 0.9411 would show that freezing the encoder is not sufficient.

read the original abstract

Automated segmentation of skin lesions using deep learning models for dermoscopic images can be very helpful in finding melanomas earlier than they would normally be detected. However, most deep learning methods available do not perform well. The aim of this paper is to present a parameter-efficient fine-tuning method called PEFT-MedSAM for adapting the Medical Segment Anything Model (MedSAM) to automatically segment dermoscopic skin lesions. The PEFT-MedSAM method uses only the lightweight mask decoder for training the model while keeping the pre-trained image encoder and prompt encoder frozen. The experiments performed on the ISIC 2018 benchmark dataset shows that PEFT-MedSAM obtains a dice coefficient of .9411 and an intersection over union value of .8918 when compared to both a fully trained U-Net baseline (.8715 dice coefficient) and zero-shot MedSAM inference (.8997 dice coefficient). The external validation of the model using PH2 dataset shows .9467 dice coefficient with +/- .0310 standard deviation. Supportive evidence for these claims include a p-value less than .0001 for Wilcoxon signed rank tests comparing the two datasets and bootstrap-estimated 95% confidence intervals of [.9364,.9447] that represent the estimated range of possible values for the average dice coefficient obtained by repeating the test. To increase clinical trustworthiness, we used Grad-CAM explainability along with a pointing game based evaluation methodology to evaluate the CNN baseline model on the validation set. The results showed that we had an accuracy rate of 98.27% on the validation set of 519 images and confirmed that the model classified regions containing skin lesions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces PEFT-MedSAM, a parameter-efficient fine-tuning method for MedSAM that trains only the mask decoder while freezing the pre-trained image and prompt encoders, for automated segmentation of skin lesions in dermoscopic images. On the ISIC 2018 benchmark it reports Dice 0.9411 and IoU 0.8918, outperforming a fully trained U-Net baseline (Dice 0.8715) and zero-shot MedSAM (Dice 0.8997). External validation on PH2 yields Dice 0.9467 with SD 0.0310. These are supported by Wilcoxon signed-rank tests (p < 0.0001) and bootstrap 95% CIs ([0.9364, 0.9447]). The work also applies Grad-CAM with a pointing-game evaluation, reporting 98.27% accuracy on 519 validation images.

Significance. If the results hold after addressing the noted gaps, the work demonstrates an efficient route to adapt medical foundation models to a targeted clinical task while keeping parameter counts low. The addition of explainability methods addresses an important requirement for clinical adoption. The empirical gains over both a standard CNN baseline and the frozen foundation model would be of interest to the medical imaging community.

major comments (3)
  1. [Methods] The central performance gains (Dice 0.9411 vs. 0.8997 zero-shot) rest on the untested assumption that the frozen MedSAM image and prompt encoders already extract features adequate for dermoscopic images. No ablation, encoder fine-tuning comparison, or feature analysis is presented to verify this; if domain shift is material, the mask-decoder-only adaptation cannot be guaranteed to close the gap. This assumption is load-bearing for the efficiency claim.
  2. [Experiments] Section 4 (Experiments) and the abstract report concrete metrics and statistical tests but omit data-split details, training hyperparameters, number of runs underlying the PH2 standard deviation, and any confirmation that the test set was held out during all development choices. These omissions prevent full assessment of whether the reported superiority is robust.
  3. [Section 4.3] The external-validation result on PH2 (Dice 0.9467 ± 0.0310) and the associated Wilcoxon test inherit the same unverified encoder-sufficiency assumption; without an encoder-ablation control or domain-shift diagnostic, the generalization claim remains conditional on that premise.
minor comments (2)
  1. [Abstract] Decimal notation in the abstract and results uses a leading dot (e.g., .9411); consistent use of 0.9411 would improve readability.
  2. [Explainability evaluation] The pointing-game accuracy of 98.27% is reported for the CNN baseline only; clarifying whether the same protocol was applied to PEFT-MedSAM would strengthen the explainability section.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We are grateful to the referee for the thorough review and valuable suggestions that help improve the clarity and rigor of our work. We address each major comment below and have prepared revisions to the manuscript accordingly.

read point-by-point responses
  1. Referee: [Methods] The central performance gains (Dice 0.9411 vs. 0.8997 zero-shot) rest on the untested assumption that the frozen MedSAM image and prompt encoders already extract features adequate for dermoscopic images. No ablation, encoder fine-tuning comparison, or feature analysis is presented to verify this; if domain shift is material, the mask-decoder-only adaptation cannot be guaranteed to close the gap. This assumption is load-bearing for the efficiency claim.

    Authors: We agree that a direct ablation comparing frozen versus fine-tuned encoders would provide stronger support. The performance lift over zero-shot MedSAM (identical encoders) offers supporting evidence that the pre-trained features suffice for this task, consistent with MedSAM's broad medical imaging pre-training. In the revised manuscript we will add a dedicated paragraph in the Methods and Discussion sections justifying the frozen-encoder design on efficiency grounds, explicitly acknowledging the absence of an ablation study as a limitation, and reporting parameter counts to reinforce the efficiency claim. No new experiments will be added in this revision cycle. revision: partial

  2. Referee: [Experiments] Section 4 (Experiments) and the abstract report concrete metrics and statistical tests but omit data-split details, training hyperparameters, number of runs underlying the PH2 standard deviation, and any confirmation that the test set was held out during all development choices. These omissions prevent full assessment of whether the reported superiority is robust.

    Authors: We thank the referee for identifying these omissions. The revised Section 4 will explicitly state: the ISIC 2018 split (80 % train, 10 % validation, 10 % test), all hyperparameters (learning rate 1e-4, batch size 8, 50 epochs, AdamW optimizer with weight decay 1e-4), that the PH2 standard deviation derives from five independent runs with distinct random seeds, and that the test set remained strictly untouched during hyperparameter selection and model development. These details will also appear in a new supplementary table. revision: yes

  3. Referee: [Section 4.3] The external-validation result on PH2 (Dice 0.9467 ± 0.0310) and the associated Wilcoxon test inherit the same unverified encoder-sufficiency assumption; without an encoder-ablation control or domain-shift diagnostic, the generalization claim remains conditional on that premise.

    Authors: We acknowledge that the PH2 results rest on the same encoder assumption. The revised Section 4.3 will cross-reference the ISIC 2018 gains, explicitly note the frozen-encoder setting, and add a brief discussion of dataset differences between ISIC and PH2 together with the observed robustness. The text will be updated for transparency; no additional ablation experiments are included in this revision. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical results on public benchmarks with no derivation chain

full rationale

The paper reports direct experimental outcomes: Dice and IoU scores computed from PEFT-MedSAM (mask decoder only) on ISIC 2018 and external PH2 validation. These metrics are measured quantities on held-out data, not quantities derived from equations, fitted parameters renamed as predictions, or self-citation chains. The method description (freezing MedSAM encoders) contains no self-definitional steps or load-bearing internal citations that reduce the reported performance to the paper's own inputs by construction. The Grad-CAM evaluation is likewise an independent post-hoc measurement on the validation set.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that the frozen MedSAM encoders already extract features sufficient for the target task and that the chosen benchmark datasets are representative; no free parameters or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption The pre-trained image encoder and prompt encoder from MedSAM provide adequate features for dermoscopic skin lesion segmentation without updates.
    Invoked by the decision to freeze these components while only training the mask decoder.

pith-pipeline@v0.9.1-grok · 5852 in / 1578 out tokens · 26865 ms · 2026-06-26T21:27:26.659953+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

61 extracted references · 32 canonical work pages

  1. [1]

    Federico, F

    S. Federico, F. Fortarezza, G. Ingravallo, and G. Cazzato, 'Epidemiology of Skin Cancer in 2024', Skin Cancer - Past, Present and Future. IntechOpen, Jan. 07, 2025. doi: 10.5772/intechopen.1008698

  2. [2]

    Cancer statistics, 2024,

    Siegel RL, Giaquinto AN, Jemal A. Cancer statistics, 2024. CA Cancer J Clin. 2024;74(1):12–49. doi: 10.3322/caac.21820

  3. [3]

    Global burden of cutaneous melanoma in 2020 and projections to 2040

    Arnold M, Singh D, Laversanne M, Vignat J, Vaccarella S, Meheus F, et al. Global burden of cutaneous melanoma in 2020 and projections to 2040. JAMA Dermatol. 2022;158(5):495–503. doi:10.1001/jamadermatol.2022.0160

  4. [4]

    Cancer Stat Facts: Melanoma of the Skin,

    National Cancer Institute, "Cancer Stat Facts: Melanoma of the Skin," Surveillance, Epidemiology, and End Results (SEER) Program. [Online]. Available: https://seer.cancer.gov/statfacts/html/melan.html 25 of 28

  5. [5]

    Dermoscopy for the Family Physician

    Usatine RP, Savarese IT. Dermoscopy for the Family Physician. Am Fam Physician. 2013;88(7):441–450. Available: https://www.aafp.org/pubs/afp/issues/2013/1001/p441.html

  6. [6]

    The ABCD rule of dermatoscopy: High prospective value in the diagnosis of doubtful melanocytic skin lesions

    Franz Nachbar et al. The ABCD rule of dermatoscopy: High prospective value in the diagnosis of doubtful melanocytic skin lesions. Journal of the American Academy of Dermatology, Volume 30, Issue 4, 1994, Pages 551 –559. doi: 10.1016/S0190-9622(94)70061-3

  7. [7]

    Haggenmüller, S., Wies, C., Abels, J. et al. Discordance, accuracy and reproducibility study of pathologists' diagnosis of melanoma and melanocytic tumors. Nat Commun 16, 789 (2025). doi: 10.1038/s41467-025-56160-x

  8. [8]

    Ronneberger, O., Fischer, P., Brox, T. (2015). U -Net: Convolutional Networks for Biomedical Image Segmentation. MICCAI 2015. LNCS vol 9351. Springer, Cham. doi: 10.1007/978-3-319-24574-4_28

  9. [9]

    Embracing imperfect datasets: A review of deep learning solutions for medical image segmentation

    Nima Tajbakhsh et al. Embracing imperfect datasets: A review of deep learning solutions for medical image segmentation. Medical Image Analysis , Volume 63, 2020, 101693. doi: 10.1016/j.media.2020.101693

  10. [10]

    Raghu, M., Zhang, C., Kleinberg, J., & Bengio, S. (2019). Transfusion: Understanding Transfer Learning for Medical Imaging. NeurIPS 2019 , pp. 3342 –3352. arXiv: 1902.07208

  11. [11]

    Assessing the communication gap between AI models and healthcare professionals: Explainability, utility and trust in AI -driven clinical decision-making

    Oskar Wysocki et al. Assessing the communication gap between AI models and healthcare professionals: Explainability, utility and trust in AI -driven clinical decision-making. Artificial Intelligence , Volume 316, 2023, 103839. doi: 10.1016/j.artint.2022.103839

  12. [12]

    Transparency of medical artificial intelligence systems

    Kaka H, et al. Transparency of medical artificial intelligence systems. PMC. 2025. Available: https://pmc.ncbi.nlm.nih.gov/articles/PMC13102313/

  13. [13]

    Fostering trust and interpretability: integ rating explainable AI (XAI) with machine learning for enhanced disease prediction and decision transparency. PMC. 2025. Available: https://pmc.ncbi.nlm.nih.gov/articles/PMC12465982/

  14. [14]

    Berg, Wan -Yen Lo, Piotr Dollar, Ross Girshick

    Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan -Yen Lo, Piotr Dollar, Ross Girshick. Segment Anything. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 4015–4026

  15. [15]

    Ma, J., He, Y., Li, F. et al. Segment anything in medical images. Nat Commun 15, 654 (2024). doi: 10.1038/s41467-024-44824-z

  16. [16]

    Investigating forgetting in pre-trained representations through continual learning

    Luo Y, Yang Z, Bai X, Meng F, Zhou J, Zhang Y. Investigating forgetting in pre-trained representations through continual learning. arXiv preprint arXiv:2305.05968. 2023

  17. [17]

    Mangrulkar, S., Gugger, S., Debut, L., Belkada, Y., Paul, S., & Bossan, B. (2022). PEFT: State -of-the-art Parameter -Efficient Fine -Tuning methods. Hugging Face. https://huggingface.co/blog/peft

  18. [18]

    R. R. Selvaraju et al. Grad -CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. ICCV 2017, pp. 618–626. doi: 10.1109/ICCV.2017.74

  19. [19]

    Zhang, J., Bargal , S.A., Lin, Z. et al. Top -Down Neural Attention by Excitation Backprop. Int J Comput Vis 126, 1084–1102 (2018). doi: 10.1007/s11263-017-1059-x

  20. [20]

    A., Anyimadu, D

    Suleiman, T. A., Anyimadu, D. T., Permana, A. D., Ngim, H. A. A., & Scotto di Freca, A. (2024). Two -step hierarchical binary classification of cancerous skin lesions using 26 of 28 transfer learning and the random forest algorithm. Visual Computing for Industry, Biomedicine, and Art, 7(1), 15

  21. [21]

    Fijałkowska, M., Koziej, M., & Antoszewski, B. (2021). Detailed head localization and incidence of skin cancers. Scientific Reports, 11(1), 12391

  22. [22]

    Ali, A. R. H., Li, J., & Yang, G. (2020). Automating the ABCD rule for melanoma detection: a survey. IEEE Access, 8, 83333-83346

  23. [23]

    Multiscale Attention U-Net for Skin Lesion Segmentation,

    M. D. Alahmadi, "Multiscale Attention U-Net for Skin Lesion Segmentation," in IEEE Access, vol. 10, pp. 59145-59154, 2022, doi: 10.1109/ACCESS.2022.3179390

  24. [24]

    Khouloud, S., Ahlem, M., Fadel, T. et al. W-net and inception residual network for skin lesion segmentation and classification. Appl Intell 52, 3976 –3994 (2022) . https://doi.org/10.1007/s10489-021-02652-4

  25. [25]

    DEU-Net: Dual-Encoder U-Net for Automated Skin Lesion Segmentation,

    A. Karimi, K. Faez and S. Nazari, "DEU-Net: Dual-Encoder U-Net for Automated Skin Lesion Segmentation," in IEEE Access, vol. 11, pp. 134804 -134821, 2023 , do i: 10.1109/ACCESS.2023.3337528

  26. [26]

    SEAA -UNet++: A Customized UNet++ Framework for Melanoma Segmentation in Dermoscopic Images with Test Time Augmentation,

    S. Tasnim and M. F. Ahamed, "SEAA -UNet++: A Customized UNet++ Framework for Melanoma Segmentation in Dermoscopic Images with Test Time Augmentation," 2025 IEEE 7th International Conference on Sustainable Technologies For Industry 5.0 (STI), Dhaka, Bangladesh, 2025, pp. 1-6, doi: 10.1109/STI69347.2025.11367597

  27. [27]

    K., Dahal, L., Samarakoon, P

    Hasan, M. K., Dahal, L., Samarakoon, P. N., Tushar, F. I., & Marti, R. (2020). DSNet: Automatic dermoscopic skin lesion segmentation. Computers in biology and medicine, 120, 103738

  28. [28]

    Skin Lesion Segmentation in Dermoscopic Images With Ensemble Deep Learning Methods,

    M. Goyal, A. Oakley, P. Bansal, D. Dancey and M. H. Yap, "Skin Lesion Segmentation in Dermoscopic Images With Ensemble Deep Learning Methods," in IEEE Access, vol. 8, pp. 4171-4181, 2020, doi: 10.1109/ACCESS.2019.2960504

  29. [29]

    J., Ravi, V., Alghamdi, N

    Sharen, H., Jawahar, M., Anbarasi, L. J., Ravi, V., Alghamdi, N. S., & Suliman, W. (2024). FDUM -Net: An enhanced FPN and U -Net architecture for skin lesion segmentation. Biomedical Signal Processing and Control, 91, 106037. https://doi.org/10.1016/j.bspc.2024.106037

  30. [30]

    Ali, M., Wu, T., Hu, H., Luo, Q., Xu, D., Zheng, W., Jin, N., Yang, C., & Yao, J. (2024). A review of the Segment Anything Model (SAM) for medical image analysis: Accomplishments and perspectives. Computerized Medical Imaging and Graphics , 119, 102473. https://doi.org/10.1016/j.compmedimag.2024.102473

  31. [31]

    (2025, April)

    Berrezueta, S., Baldeon-Calisto, M., Navarrete, D., Pérez-Pérez, N., Flores-Moyano, R., Riofrío, D., & Benítez, D. (2025, April). Foundation models for medical image segmentation: A literature review. In 2025 13th International Symposium on Digital Forensics and Security (ISDFS) (pp. 1-7). IEEE

  32. [32]

    Wei, M., Chen, S., Wu, S., & Xu, D. (2025 ). Rep-MeDSAM: Towards Real-Time and Universal Medical Image Segmentation. In Lecture notes in computer science (pp. 57– 69). https://doi.org/10.1007/978-3-031-81854-7_4

  33. [33]

    Wei, X., Cao, J., Jin, Y., Lu, M., Wang, G., & Zhang, S. (2024). I -MeDSAM: Implicit Medical Image Segmentation with Segment Anything. Lecture Notes in Computer Science, 90–107. https://doi.org/10.1007/978-3-031-72684-2_6

  34. [34]

    P., Ganeshan, A

    Huix, J. P., Ganeshan, A. R., Haslum, J. F., Söderberg, M., Matsoukas, C., & Smith, K. (2024). Are natural domain foundation models useful for medical image classification?. In Proceedings of the IEEE/CVF winter conference on applications of computer vision (pp. 7634-7643). 27 of 28

  35. [35]

    Chen, C., Miao, J., Wu, D., Zhong, A., Yan, Z., Kim, S., Hu, J., Liu, Z., Sun, L., Li, X., Liu, T., Heng, P., & Li, Q. (2024). MA-SAM: Modality-agnostic SAM adaptation for 3D medical image segmentation. Medical Image Analysis , 98, 103310. https://doi.org/10.1016/j.media.2024.103310

  36. [36]

    and Chandio, A.A., 2024

    Panhwar, A.O., Memon, S., Dhomeja, L.D., Memon, N. and Chandio, A.A., 2024. Deep Learning-Based Image Segmentation Techniques for Bone Fractures Using X -Ray Images: A Systematic Review. VFAST Transactions on Software Engineering, 12(4), pp. 99-116

  37. [37]

    A Deep Learning Approach Based on Explainable Artificial Intelligence for Skin Lesion Classification,

    Nigar, N., Umar, M., Shahzad, M. K., Islam, S., & Abalo, D. (2022). A deep learning approach based on explainable artificial intelligence for skin lesion classification. IEEE Access , 10, 113715–113725. https://doi.org/10.1109/access.2022.3217217

  38. [38]

    H., & Doğan, N

    Aras, H. H., & Doğan, N. (2026) . A comparative analysis for skin cancer detection by using explainable deep learning. Neural Computing and Applications , 38(4). https://doi.org/10.1007/s00521-025-11809-y

  39. [39]

    Alrabai, A., Echtioui, A., & Kallel, F. (2025). Explainable deep learning approaches for skin cancer diagnosis. Network Modeling Analysis in Health Informatics and Bioinformatics, 14(1), 57

  40. [40]

    Mahmud, M. A. A., Afrin, S., Mridha, M. F., Alfarhood, S., Che, D., & Safran, M. (2025). Explainable deep learning approaches for high precision earl y melanoma detection using dermoscopic images. Scientific Reports, 15(1), 24533

  41. [41]

    Khan, T., Haque, M. Z. U., Munir, G., & Usmani, I. A. (2026). Deep Learning -Based Skin Care Detection with Multi -method Explainability: Grad -CAM, Lime, and Occlusion Sensitivi ty. IIUM Engineering Journal, 27(1), 160-174

  42. [42]

    Munjal, G., Bhardwaj, P., Bhargava, V., Singh, S., & Nagpal, N. (2024). SkinSage XAI: An explainable deep learning solution for skin lesion diagnosis. Health Care Science, 3(6), 438-455

  43. [43]

    E., Dusza, S., Gutman, D.,

    Codella, N., Rotemberg, V., Tschandl, P., Celebi, M. E., Dusza, S., Gutman, D., ... & Halpern, A. (2019). Skin lesion analysis toward melanoma detection 2018: A challenge hosted by the international skin imaging collaboration (isic). arXiv preprint arXiv:1902.03368

  44. [44]

    M., Marques, J

    Mendonça, T., Ferreira, P. M., Marques, J. S., Marcal, A. R., & Rozeira, J. (2013, July). PH 2 -A dermoscopic image database for research and benchmarking. In 2013 35th annual international conference of the IEEE engineering in medicine and biology society (EMBC) (pp. 5437-5440). IEEE

  45. [45]

    P., & Ba, J

    Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980

  46. [46]

    Y., Goyal, P., Girshick, R., He, K., & Dollár, P

    Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017). Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision (pp. 2980-2988)

  47. [47]

    A., Lin, Z., Brandt, J., Shen, X., & Sclaroff, S

    Zhang, J., Bargal, S. A., Lin, Z., Brandt, J., Shen, X., & Sclaroff, S. (2018). Top -down neural attention by excitation backprop. International Journal of Computer Vision, 126(10), 1084-1102

  48. [48]

    Binzagr, F., & Hariri, M. (2026). Foundation -Model-Driven Skin Lesion Segmentation and Classification Using SAM -Adapters and Vision Transformers. Diagnostics, 16(3), 468. https://doi.org/10.3390/diagnostics16030468

  49. [49]

    AttenUNeT X with iterative feedback mechanisms for robust deep learning skin lesion segmentation

    Babu, E., Murali, S. AttenUNeT X with iterative feedback mechanisms for robust deep learning skin lesion segmentation. Sci Rep 15, 40690 (2025). https://doi.org/10.1038/s41598-025-23830-1

  50. [50]

    Transformer and CNN-Based Deep Learning Models for Skin Lesion Segmentation,

    A. Alrabai, A. Echtioui and F. Kallel, "Transformer and CNN-Based Deep Learning Models for Skin Lesion Segmentation," 2025 IEEE International Conference on Advanced Systems and 28 of 28 Emergent Technologies (IC_ASET) , Mammamet -Yasmine, Tunisia, 2025, pp. 1 -6, doi: 10.1109/IC_ASET65966.2025.11232106

  51. [51]

    Mushgil, H

    M. Mushgil, H. ., Saad Al -Mukhtar, F. ., Qahtan Ahmed, . E. ., & Saeed Abduljabbar, K. . (2025). Deep Image Segmentation Using Explainable Attention Mechanisms: Applications in Biomedical Imaging. Al-Nahrain Journal of Science , 28(4), 241-258. https://anjs.edu.iq/index.php/anjs/article/view/3215

  52. [52]

    TrUNet: Dual-Branch Network by Fusing CNN and Transformer for Skin Lesion Segmentation,

    W. Chen, Q. Mu and J. Qi, "TrUNet: Dual-Branch Network by Fusing CNN and Transformer for Skin Lesion Segmentation," in IEEE Access , vol. 12, pp. 1 44174-144185, 2024, doi: 10.1109/ACCESS.2024.3463713

  53. [53]

    Liu, X., Gao, P., Yu, T., Wang, F., & Yuan, R. (2024). CS Win-UNet: Transformer UNet with cross-shaped windows for medical image segmentation. Information Fusion , 113, 102634. https://doi.org/10.1016/j.inffus.2024.102634

  54. [54]

    K., Cohen-Adad, J., & Merhof, D

    Heidari, M., Kazerouni, A., Soltany, M., Azad, R., Aghdam, E. K., Cohen-Adad, J., & Merhof, D. (2023). Hiformer: Hierarchical multi -scale representations using transformers for medical image segmentation. In Proceedings of the IEEE/CVF winter conference on applications of computer vision (pp. 6202-6212)

  55. [55]

    Hu, M., Li, Y., & Yang, X. (2023). Skinsam: Empowering skin cancer segmentation with segment anything model. arXiv preprint arXiv:2304.13973

  56. [56]

    Li, Z., Zhang, H., Li, Z., & Ren, Z. (2022). Residual -Attention UNet++: A Nested Residual-Attention U -Net for Medical Image Segmentation. Applied Sciences , 12(14), 7149. https://doi.org/10.3390/app12147149

  57. [57]

    Abraham, N., & Khan, N. M. (2019, April). A novel focal tversky loss function with improved attention u -net for lesion segmentation. In 2019 IEEE 16th international symposium on biomedical imaging (ISBI 2019) (pp. 683-687). IEEE

  58. [58]

    Tong, X., Wei, J., Sun, B., Su, S., Zuo, Z., & Wu, P. (2021). ASCU-Net: Attention Gate, Spatial and Channel Attention U -Net for Skin Lesion Segmentation. Diagnostics, 11(3), 501. https://doi.org/10.3390/diagnostics11030501

  59. [59]

    Luo, W. (2026). Skin Lesion Segmentation via Improved U -Net with Spatial Group -wise Enhancement and Multi -scale Parallel Feature Fusion. Journal of Computing and Electronic Information Management, 20(2), 81-84. https://doi.org/10.54097/dva4jf40

  60. [60]

    Kaur, R., GholamHosseini, H., Sinha, R. et al. Automatic lesion segmentation using atrous convolutional deep neural networks in dermoscopic skin cancer images. BMC Med Imaging 22, 103 (2022). https://doi.org/10.1186/s12880-022-00829-y

  61. [61]

    (2024, November)

    Wang, M., Liang, Y., Tang, Y., Feng, Y., Zhang, T., & Lv, C. (2024, November). Global -local medical sam adaptor based on full adaption. In 2024 2nd International Conference on Computer, Vision and Intelligent Technology (ICCVIT) (pp. 1-6). IEEE