pith. machine review for the scientific record. sign in

arxiv: 2605.04447 · v1 · submitted 2026-05-06 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

Deep Reprogramming Distillation for Medical Foundation Models

Authors on Pith no claims yet

Pith reviewed 2026-05-08 18:26 UTC · model grok-4.3

classification 💻 cs.CV
keywords medical foundation modelsknowledge distillationreprogrammingparameter-efficient fine-tuningdomain adaptationmedical imagingCKA distillationlightweight models
0
0 comments X

The pith

Deep Reprogramming Distillation adapts medical foundation models to lightweight students by using a reprogramming module to close domain gaps and CKA loss for stable transfer.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Medical foundation models excel on large pre-training data but face gaps in domain, task, and compute when moved to specific medical scenarios such as 2D or 3D imaging. The paper introduces Deep Reprogramming Distillation (DRD) to address this by adding a reprogramming module that aligns the foundation model's inputs with downstream needs while creating a pathway for efficient distillation to smaller student models. It further applies centered kernel alignment (CKA) distillation to reduce sensitivity to training variations. If the method works as described, practitioners could deploy high-performing medical models on modest hardware across many different tasks without retraining from scratch or forcing identical architectures between teacher and student.

Core claim

DRD introduces a reprogramming module that overcomes domain and task discrepancy between pre-training and downstream scenarios while building student-friendly efficient distillation from foundation models to lightweight downstream models; it pairs this with centered kernel alignment (CKA) distillation to promote robust knowledge transfer, and empirical results show it surpasses previous PEFT and KD methods across 18 medical downstream tasks covering 2D/3D classification and segmentation under different foundation models.

What carries the argument

The reprogramming module, which transforms inputs or intermediate representations to reconcile pre-training and downstream domains and thereby enables efficient, structure-agnostic distillation.

If this is right

  • DRD can be applied to multiple foundation models without requiring matching architectures or training strategies between teacher and student.
  • The method supports both classification and segmentation in 2D and 3D medical imaging while remaining computationally lighter than full fine-tuning.
  • CKA distillation reduces performance variability when training conditions such as random seeds or data splits change.
  • Lightweight student models produced by DRD achieve higher task performance than those from standard PEFT or KD alone.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Hospitals with limited GPUs could run customized medical analysis tools derived from public foundation models without full retraining.
  • The reprogramming idea might transfer to other high-stakes domains such as satellite imagery or industrial inspection where large pre-trained models must be specialized quickly.
  • Further work could test whether the same module design reduces the number of labeled examples needed for the downstream medical tasks.

Load-bearing premise

The reprogramming module can reliably close domain and task gaps without discarding critical medical features or introducing distortions that block effective knowledge transfer to the student.

What would settle it

A head-to-head evaluation on the same 18 tasks and foundation models where DRD shows no consistent gains in accuracy or speed over the best prior PEFT-plus-KD baselines.

Figures

Figures reproduced from arXiv: 2605.04447 by Haishuai Wang, Haolin Li, Hui Lin, Jiangchao Yao, Siyuan Du, Yanfeng Wang, Ya Zhang, Yuhang Zhou.

Figure 1
Figure 1. Figure 1: Real-world consideration for the adaptation of medical foundation models. It requires to overcome different inconsistency view at source ↗
Figure 2
Figure 2. Figure 2: The overall framework of Deep Reprogramming Distillation. During training, the foundation model is frozen. The deep view at source ↗
Figure 3
Figure 3. Figure 3: Comparison with PEFT methods in classification experiments. Subfigures (a)-(d) present the results of four datasets view at source ↗
Figure 4
Figure 4. Figure 4: Subfigures (a) and (b) compare the effects of varying depths on KD and DRD in classification and segmentation tasks, view at source ↗
Figure 5
Figure 5. Figure 5: Post-training cosine similarity matrices between teacher and student coarse stages. default design. Table IX shows that overly simple projectors underfit the cross-model transformation, while overly wide convolutional blocks increase cost substantially. Our default design achieves the best accuracy on both datasets, improving over the linear baseline by 2.70% on BUSI and 3.17% on BTC, while maintaining a m… view at source ↗
Figure 6
Figure 6. Figure 6: Comparison of segmentation results. Randomly selected images were input into different models for nuclear and view at source ↗
Figure 7
Figure 7. Figure 7: Comparison of methods in handling inconsistencies. view at source ↗
Figure 8
Figure 8. Figure 8: Comparison of decision boundaries of different meth view at source ↗
Figure 9
Figure 9. Figure 9: Convergence curves of DRD under different reprogram￾ming depths N. Left: training total loss; right: validation Dice. APPENDIX B PARAMETER-LEVEL GRADIENT DIAGNOSIS We also diagnose whether using a larger reprogramming depth introduces conflicting gradients on the late student layers. The analysis is conducted on USC with MedSAM as the teacher and four representative students. For each minibatch, we separat… view at source ↗
read the original abstract

Medical foundation models pre-trained on large-scale datasets have shown powerful versatile performance. However, when adapting medical foundation models for specific medical scenarios, it remains the inevitable challenge due to the gap induced by the discrepancy between pre-training and downstream tasks, the real-world computation, and speed constraints. Relevant techniques that probably handle this challenge more or less suffer from some intrinsic limitations. For example, knowledge distillation (KD) assumes that teacher and student models share the same task, training strategy, and model structure family, while prevalent parameter-efficient fine-tuning (PEFT) fails to achieve personalized and lightweight deployment. Even the combination of PEFT and KD still struggles to resolve model structures and training strategies inconsistencies between teacher and student models, leading to inefficient knowledge transfer. In this study, we propose a novel framework called Deep Reprogramming Distillation (DRD) to combat the general adaptation challenge. Specifically, DRD introduces the novel reprogramming module that on the one side overcomes the domain and task discrepancy between pretraining and downstream scenarios, and on the other side builds the student-friendly efficient distillation from foundation models to lightweight downstream models. Furthermore, to mitigate variability under different training conditions, we design a centered kernel alignment (CKA) distillation method to promote robust knowledge transfer. Empirical results show that DRD surpasses previous PEFT and KD methods across 18 medical downstream tasks under different foundation models, covering various scenarios including 2D/3D classification and 2D/3D segmentation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The paper proposes Deep Reprogramming Distillation (DRD) for adapting medical foundation models to downstream tasks. It introduces a reprogramming module to bridge domain/task discrepancies between pre-training and downstream scenarios while enabling efficient, student-friendly distillation, and employs centered kernel alignment (CKA) distillation to promote robust knowledge transfer under varying conditions. The central claim is that DRD outperforms prior PEFT and KD methods across 18 medical downstream tasks (2D/3D classification and segmentation) under multiple foundation models.

Significance. If the reported gains hold under rigorous verification, DRD would represent a practical advance in efficient adaptation of large medical foundation models, addressing computational constraints and model mismatch issues that limit deployment in clinical settings. The inclusion of both 2D and 3D tasks, multiple foundation models, and ablations provides a reasonably broad empirical basis for the claims.

major comments (2)
  1. [Experimental Results] Experimental section (results tables): The superiority claims over PEFT and KD baselines on 18 tasks lack reported error bars, number of random seeds/runs, and statistical significance tests (e.g., paired t-tests or Wilcoxon tests). This is load-bearing for the central empirical claim, as medical imaging performance is known to vary with data splits and initialization.
  2. [§3.2] §3.2 (Reprogramming module): The module is presented as overcoming domain and task discrepancy while remaining student-friendly, but the exact parameter count, forward-pass formulation, and how it interacts with the foundation model backbone (especially for 3D inputs) are not derived in sufficient detail to verify that it does not implicitly rely on task-specific tuning that would undermine the 'parameter-efficient' positioning.
minor comments (3)
  1. [Abstract] Abstract: The claim of 'surpassing previous PEFT and KD methods' would be more informative if it briefly noted the range of improvement magnitudes or the specific foundation models used.
  2. [§3.3] Notation: The CKA distillation loss could be cross-referenced to the standard formulation in the literature (e.g., Kornblith et al.) to clarify any modifications.
  3. [Figures] Figure captions: Several result figures would benefit from explicit axis labels indicating whether metrics are Dice, AUC, or accuracy, and whether higher/lower is better.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and positive overall assessment of our work on Deep Reprogramming Distillation (DRD). We address each major comment below and will revise the manuscript accordingly to strengthen the presentation and empirical support.

read point-by-point responses
  1. Referee: [Experimental Results] Experimental section (results tables): The superiority claims over PEFT and KD baselines on 18 tasks lack reported error bars, number of random seeds/runs, and statistical significance tests (e.g., paired t-tests or Wilcoxon tests). This is load-bearing for the central empirical claim, as medical imaging performance is known to vary with data splits and initialization.

    Authors: We agree that the absence of error bars, run counts, and statistical tests limits the robustness of the superiority claims, particularly given known variability in medical imaging. In the revised version, we will re-run the key experiments across 3 random seeds, report mean performance with standard deviations in all tables, and include paired Wilcoxon signed-rank tests (with p-values) comparing DRD against the strongest baselines on each task. These additions will be placed in the experimental section and table captions. revision: yes

  2. Referee: [§3.2] §3.2 (Reprogramming module): The module is presented as overcoming domain and task discrepancy while remaining student-friendly, but the exact parameter count, forward-pass formulation, and how it interacts with the foundation model backbone (especially for 3D inputs) are not derived in sufficient detail to verify that it does not implicitly rely on task-specific tuning that would undermine the 'parameter-efficient' positioning.

    Authors: We acknowledge that §3.2 would benefit from greater technical detail to allow verification of the module's efficiency and generality. In the revision, we will expand this section with: (i) the precise parameter count of the reprogramming module (broken down by components), (ii) the complete forward-pass equations showing its integration with the frozen backbone, and (iii) explicit handling for 3D inputs via channel-wise and spatial reprogramming that preserves parameter efficiency without introducing task-specific layers or tuning beyond the module itself. This will confirm that the design remains student-friendly and does not undermine the parameter-efficient claim. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper proposes an empirical framework (DRD) with a reprogramming module and CKA-based distillation, validated across 18 downstream tasks on multiple foundation models. No mathematical derivation chain, equations, or fitted parameters are presented that reduce to self-definition or prior outputs by construction. Claims rest on experimental results rather than self-referential logic or load-bearing self-citations. The architecture and training protocol are described as independent contributions without circular reductions to inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

Based solely on abstract; the central claim rests on unstated assumptions about the effectiveness of the reprogramming module and CKA in bridging gaps, plus standard ML training assumptions. No explicit free parameters or invented entities detailed beyond the named modules.

axioms (2)
  • domain assumption Reprogramming module can simultaneously resolve domain/task discrepancy and enable efficient distillation
    Invoked in abstract description of DRD design without proof or prior validation cited.
  • domain assumption CKA distillation promotes robust knowledge transfer under training variability
    Stated as design choice to mitigate variability but no derivation or external benchmark provided.
invented entities (1)
  • Reprogramming module no independent evidence
    purpose: Overcome domain and task discrepancy while building student-friendly distillation
    New component introduced in the framework; no independent evidence or falsifiable prediction given in abstract.

pith-pipeline@v0.9.0 · 5581 in / 1479 out tokens · 39654 ms · 2026-05-08T18:26:26.562255+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

83 extracted references · 33 canonical work pages · 4 internal anchors

  1. [1]

    PMC-CLIP: Con- trastive language-image pre-training using biomedical docu- ments.arXiv preprint arXiv:2303.07240, 2023

    W. Lin, Z. Zhao, X. Zhang, C. Wu, Y . Zhang, Y . Wang, and W. Xie, “Pmc-clip: Contrastive language-image pre-training using biomedical documents,”arXiv preprint arXiv:2303.07240, 2023

  2. [2]

    Radimagenet: an open radiologic deep learning research dataset for effective transfer learning,

    X. Mei, Z. Liu, P. M. Robson, B. Marinelli, M. Huang, A. Doshi, A. Jacobi, C. Caoet al., “Radimagenet: an open radiologic deep learning research dataset for effective transfer learning,”Radiology: Artificial Intelligence, vol. 4, no. 5, p. e210315, 2022

  3. [3]

    Lvm-med: Learning large-scale self-supervised vision models for medical imaging via second-order graph matching,

    D. M. Nguyen, H. Nguyen, N. T. Diep, T. N. Phamet al., “Lvm-med: Learning large-scale self-supervised vision models for medical imaging via second-order graph matching,”arXiv preprint arXiv:2306.11925, 2023

  4. [4]

    Edge-cloud polarization and collaboration: A comprehensive survey for ai,

    J. Yao, S. Zhang, Y . Yao, F. Wang, J. Ma, J. Zhang, Y . Chu, L. Ji, K. Jia et al., “Edge-cloud polarization and collaboration: A comprehensive survey for ai,”IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 7, pp. 6866–6886, 2022

  5. [5]

    Exploring training on heterogeneous data with mixture of low-rank adapters,

    Y . Zhou, Z. Zhao, S. Du, J. Yao, Y . Zhang, Y . Wanget al., “Exploring training on heterogeneous data with mixture of low-rank adapters,” in Forty-first International Conference on Machine Learning

  6. [6]

    Masked au- toencoders are scalable vision learners,

    K. He, X. Chen, S. Xie, Y . Li, P. Doll ´ar, and R. Girshick, “Masked au- toencoders are scalable vision learners,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 16 000–16 009

  7. [7]

    Low-rank knowl- edge decomposition for medical foundation models,

    Y . Zhou, H. Li, S. Du, J. Yao, Y . Zhang, and Y . Wang, “Low-rank knowl- edge decomposition for medical foundation models,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 11 611–11 620

  8. [8]

    Parameter-efficient fine-tuning for pre-trained vision models: A survey.arXiv preprint arXiv:2402.02242, 2024

    Y . Xin, S. Luo, H. Zhou, J. Du, X. Liu, Y . Fan, Q. Li, and Y . Du, “Parameter-efficient fine-tuning for pre-trained vision models: A survey,” arXiv preprint arXiv:2402.02242, 2024

  9. [9]

    Adaptformer: Adapting vision transformers for scalable visual recogni- tion,

    S. Chen, C. Ge, Z. Tong, J. Wang, Y . Song, J. Wang, and P. Luo, “Adaptformer: Adapting vision transformers for scalable visual recogni- tion,”Advances in Neural Information Processing Systems, vol. 35, pp. 16 664–16 678, 2022

  10. [10]

    Visual prompt tuning,

    M. Jia, L. Tang, B.-C. Chen, C. Cardie, S. Belongieet al., “Visual prompt tuning,” inEuropean Conference on Computer Vision. Springer, 2022, pp. 709–727

  11. [11]

    Prefix-Tuning: Optimizing Continuous Prompts for Generation

    X. L. Li and P. Liang, “Prefix-tuning: Optimizing continuous prompts for generation,”arXiv preprint arXiv:2101.00190, 2021

  12. [12]

    LoRA: Low-Rank Adaptation of Large Language Models

    E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, and W. Chen, “Lora: Low-rank adaptation of large language models,” arXiv preprint arXiv:2106.09685, 2021

  13. [13]

    Parameter-efficient model adaptation for vision transformers,

    X. He, C. Li, P. Zhang, J. Yang, and X. E. Wang, “Parameter-efficient model adaptation for vision transformers,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 1, 2023, pp. 817–825

  14. [14]

    Distilling the Knowledge in a Neural Network

    G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,”arXiv preprint arXiv:1503.02531, 2015

  15. [15]

    Similarity-preserving knowledge distillation,

    F. Tung and G. Mori, “Similarity-preserving knowledge distillation,” inProceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 1365–1374

  16. [16]

    Contrastive representation distilla- tion,

    Y . Tian, D. Krishnan, and P. Isola, “Contrastive representation distilla- tion,”arXiv preprint arXiv:1910.10699, 2019

  17. [17]

    Relational knowledge distilla- tion,

    W. Park, D. Kim, Y . Lu, and M. Cho, “Relational knowledge distilla- tion,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 3967–3976. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 14

  18. [18]

    Masked generative distillation,

    Z. Yang, Z. Li, M. Shao, D. Shi, Z. Yuan, and C. Yuan, “Masked generative distillation,” inEuropean Conference on Computer Vision. Springer, 2022, pp. 53–69

  19. [19]

    Norm: Knowledge distillation via n-to- one representation matching,

    X. Liu, L. Li, C. Li, and A. Yao, “Norm: Knowledge distillation via n-to- one representation matching,”arXiv preprint arXiv:2305.13803, 2023

  20. [20]

    Lorkd: Low-rank knowledge decomposition for medical foundation models,

    H. Li, Y . Zhou, Z. Zhao, S. Du, J. Yao, W. Xie, Y . Zhang, and Y . Wang, “Lorkd: Low-rank knowledge decomposition for medical foundation models,”arXiv preprint arXiv:2409.19540, 2024

  21. [21]

    Reprogramming distillation for medical foundation models,

    Y . Zhou, S. Du, H. Li, J. Yao, Y . Zhang, and Y . Wang, “Reprogramming distillation for medical foundation models,” inInternational Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2024, pp. 533–543

  22. [22]

    Model compression,

    C. Bucilu ˇa, R. Caruana, and A. Niculescu-Mizil, “Model compression,” inProceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, 2006, pp. 535–541

  23. [23]

    Ensemble distribution distillation,

    A. Malinin, B. Mlodozeniec, and M. Gales, “Ensemble distribution distillation,”arXiv preprint arXiv:1905.00076, 2019

  24. [24]

    FitNets: Hints for Thin Deep Nets

    A. Romero, N. Ballas, S. E. Kahou, A. Chassang, C. Gatta, and Y . Ben- gio, “Fitnets: Hints for thin deep nets,”arXiv preprint arXiv:1412.6550, 2014

  25. [25]

    Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer

    S. Zagoruyko and N. Komodakis, “Paying more attention to attention: Improving the performance of convolutional neural networks via atten- tion transfer,”arXiv preprint arXiv:1612.03928, 2016

  26. [26]

    One-for-all: Bridge the gap between heterogeneous architectures in knowledge distillation,

    Z. Hao, J. Guo, K. Han, Y . Tang, H. Hu, Y . Wang, and C. Xu, “One-for-all: Bridge the gap between heterogeneous architectures in knowledge distillation,”Advances in Neural Information Processing Systems, vol. 36, 2024

  27. [27]

    Parameter-efficient transfer learning for nlp,

    N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. De Laroussilhe, A. Gesmundo, M. Attariyan, and S. Gelly, “Parameter-efficient transfer learning for nlp,” inInternational conference on machine learning. PMLR, 2019, pp. 2790–2799

  28. [28]

    Towards a Unified View of Parameter-Efficient Transfer Learning , journal =

    J. He, C. Zhou, X. Ma, T. Berg-Kirkpatrick, and G. Neubig, “Towards a unified view of parameter-efficient transfer learning,”arXiv preprint arXiv:2110.04366, 2021

  29. [29]

    arXiv preprint arXiv:2106.04489 , year=

    R. K. Mahabadi, S. Ruder, M. Dehghani, and J. Henderson, “Parameter- efficient multi-task fine-tuning for transformers via shared hypernet- works,”arXiv preprint arXiv:2106.04489, 2021

  30. [30]

    Compacter: Efficient low-rank hypercomplex adapter layers,

    R. Karimi Mahabadi, J. Henderson, and S. Ruder, “Compacter: Efficient low-rank hypercomplex adapter layers,”Advances in Neural Information Processing Systems, vol. 34, pp. 1022–1035, 2021

  31. [31]

    The Power of Scale for Parameter-Efficient Prompt Tuning

    B. Lester, R. Al-Rfou, and N. Constant, “The power of scale for parameter-efficient prompt tuning,”arXiv preprint arXiv:2104.08691, 2021

  32. [32]

    Residual prompt tuning: Improving prompt tuning with residual reparameterization,

    A. Razdaibiedina, Y . Mao, R. Hou, M. Khabsa, M. Lewis, J. Ba, and A. Almahairi, “Residual prompt tuning: Improving prompt tuning with residual reparameterization,”arXiv preprint arXiv:2305.03937, 2023

  33. [33]

    Lpt: Long-tailed prompt tuning for image classification,

    B. Dong, P. Zhou, S. Yan, and W. Zuo, “Lpt: Long-tailed prompt tuning for image classification,”arXiv preprint arXiv:2210.01033, 2022

  34. [34]

    F., Cheng, K.-T., and Chen, M.-H

    S.-Y . Liu, C.-Y . Wang, H. Yin, P. Molchanov, Y .-C. F. Wang, K.-T. Cheng, and M.-H. Chen, “Dora: Weight-decomposed low-rank adapta- tion,”arXiv preprint arXiv:2402.09353, 2024

  35. [35]

    Controlling text-to-image diffusion by orthogo- nal finetuning,

    Z. Qiu, W. Liu, H. Feng, Y . Xue, Y . Feng, Z. Liu, D. Zhang, A. Weller, and B. Sch ¨olkopf, “Controlling text-to-image diffusion by orthogo- nal finetuning,”Advances in Neural Information Processing Systems, vol. 36, pp. 79 320–79 362, 2023

  36. [36]

    Navigating text-to-image customization: From lycoris fine-tuning to model evaluation,

    S.-Y . Yeh, Y .-G. Hsieh, Z. Gao, B. B. Yang, G. Oh, and Y . Gong, “Navigating text-to-image customization: From lycoris fine-tuning to model evaluation,” inThe Twelfth International Conference on Learning Representations, 2023

  37. [37]

    Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models

    E. B. Zaken, S. Ravfogel, and Y . Goldberg, “Bitfit: Simple parameter- efficient fine-tuning for transformer-based masked language-models,” arXiv preprint arXiv:2106.10199, 2021

  38. [38]

    Medical SAM adapter: Adapting seg- ment anything model for medical image segmentation.arXiv preprint arXiv:2304.12620, 2023

    J. Wu, W. Ji, Y . Liu, H. Fu, M. Xu, Y . Xu, and Y . Jin, “Medical sam adapter: Adapting segment anything model for medical image segmentation,”arXiv preprint arXiv:2304.12620, 2023

  39. [39]

    Kazerouni, I

    B. Azad, R. Azad, S. Eskandari, A. Bozorgpour, A. Kazerouni, I. Rekik, and D. Merhof, “Foundational models in medical imaging: A com- prehensive survey and future vision,”arXiv preprint arXiv:2310.18689, 2023

  40. [40]

    On the challenges and perspectives of foundation models for medical image analysis,

    S. Zhang and D. Metaxas, “On the challenges and perspectives of foundation models for medical image analysis,”Medical image analysis, vol. 91, p. 102996, 2024

  41. [41]

    Fives: A fundus image dataset for artificial intelligence based vessel segmentation,

    K. Jin, X. Huang, J. Zhou, Y . Li, Y . Yan, Y . Sun, Q. Zhang, Y . Wang, and J. Ye, “Fives: A fundus image dataset for artificial intelligence based vessel segmentation,”Scientific Data, vol. 9, no. 1, p. 475, 2022

  42. [42]

    skin-cancer-detection,

    J. Glaister, “skin-cancer-detection,” 2013. [Online]. Avail- able: https://uwaterloo.ca/vision-image-processing-lab/research-demos/ skin-cancer-detection

  43. [43]

    Teeth segmentation on dental x-ray images,

    H. I. T. Loop, “Teeth segmentation on dental x-ray images,” 2023. [Online]. Available: https://www.kaggle.com/dsv/5884500

  44. [44]

    Model reprogramming: Resource-efficient cross-domain machine learning

    P.-Y . Chen, “Model reprogramming: Resource-efficient cross-domain machine learning,”arXiv preprint arXiv:2202.10629, 2022

  45. [45]

    Towards efficient task-driven model reprogramming with foundation models,

    S. Xu, J. Yao, R. Luo, S. Zhang, Z. Lian, M. Tan, B. Han, and Y . Wang, “Towards efficient task-driven model reprogramming with foundation models,”arXiv preprint arXiv:2304.02263, 2023

  46. [46]

    Discrepancy and uncertainty aware denoising knowledge distillation for zero-shot cross-lingual named entity recogni- tion,

    L. Ge, C. Huet al., “Discrepancy and uncertainty aware denoising knowledge distillation for zero-shot cross-lingual named entity recogni- tion,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 16, 2024, pp. 18 056–18 064

  47. [47]

    Similarity of neural network representations revisited,

    S. Kornblith, M. Norouzi, H. Lee, and G. Hinton, “Similarity of neural network representations revisited,” inInternational conference on machine learning. PMLR, 2019, pp. 3519–3529

  48. [48]

    arXiv preprint arXiv:2010.15327 , year=

    T. Nguyen, M. Raghu, and S. Kornblith, “Do wide and deep networks learn the same things? uncovering how neural network representations vary with width and depth,”arXiv preprint arXiv:2010.15327, 2020

  49. [49]

    Measures of the amount of ecologic association between species,

    L. R. Dice, “Measures of the amount of ecologic association between species,”Ecology, vol. 26, no. 3, pp. 297–302, 1945

  50. [50]

    Cross- image relational knowledge distillation for semantic segmentation,

    C. Yang, H. Zhou, Z. An, X. Jiang, Y . Xu, and Q. Zhang, “Cross- image relational knowledge distillation for semantic segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 12 319–12 328

  51. [51]

    Faster segment anything: Towards lightweight sam for mobile applications,

    C. Zhang, D. Han, Y . Qiao, J. U. Kim, S.-H. Bae, S. Lee, and C. S. Hong, “Faster segment anything: Towards lightweight sam for mobile applications,”arXiv preprint arXiv:2306.14289, 2023

  52. [52]

    Melo: Low-rank adaptation is better than fine-tuning for medical image diagnosis,

    Y . Zhu, Z. Shen, Z. Zhao, S. Wang, X. Wang, X. Zhao, D. Shen, and Q. Wang, “Melo: Low-rank adaptation is better than fine-tuning for medical image diagnosis,”arXiv preprint arXiv:2311.08236, 2023

  53. [53]

    Dataset of breast ultrasound images,

    W. Al-Dhabyani, M. Gomaa, H. Khaled, and A. Fahmy, “Dataset of breast ultrasound images,”Data in brief, vol. 28, p. 104863, 2020

  54. [54]

    The ham10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions,

    P. Tschandl, C. Rosendahl, and H. Kittler, “The ham10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions,”Scientific data, vol. 5, no. 1, pp. 1–9, 2018

  55. [55]

    Covid-ct-dataset: a ct image dataset about covid-19,

    Y . Xingyi, H. Xuehai, Z. Jinyu, Z. Yichenet al., “Covid-ct-dataset: a ct image dataset about covid-19,”arXiv preprint arXiv:2003.13865, 2020

  56. [56]

    Brain tumor classification using deep learning,

    A. Saleh, R. Sukaik, and S. S. Abu-Naser, “Brain tumor classification using deep learning,” in2020 International Conference on Assistive and Rehabilitation Technologies (iCareTech), 2020, pp. 131–136

  57. [57]

    arXiv 2003.11597 , year=

    J. P. Cohen, P. Morrison, and L. Dao, “Covid-19 image data collection,” arXiv preprint arXiv:2003.11597, 2020

  58. [58]

    Mosmeddata: Chest ct scans with covid-19 related findings dataset,

    S. P. Morozov, A. E. Andreychenko, N. Pavlov, A. Vladzymyrskyy, N. Ledikhova, V . Gombolevskiy, I. A. Blokhin, P. Gelezhe, A. Gonchar, and V . Y . Chernina, “Mosmeddata: Chest ct scans with covid-19 related findings dataset,”arXiv preprint arXiv:2005.06465, 2020

  59. [59]

    Ct volume samples for lung adenocarcinoma classification,

    Y . Feng, “Ct volume samples for lung adenocarcinoma classification,” 2020

  60. [60]

    Conic challenge: Push- ing the frontiers of nuclear detection, segmentation, classification and counting,

    S. Graham, Q. D. Vu, M. Jahanifar, M. Weigert, U. Schmidt, W. Zhang, J. Zhang, S. Yang, J. Xiang, X. Wanget al., “Conic challenge: Push- ing the frontiers of nuclear detection, segmentation, classification and counting,”Medical image analysis, vol. 92, p. 103047, 2024

  61. [61]

    Lizard: A large-scale dataset for colonic nuclear instance segmentation and classification,

    S. Graham, M. Jahanifar, A. Azam, M. Nimir, Y .-W. Tsang, K. Dodd, E. Hero, H. Sahota, A. Tank, K. Beneset al., “Lizard: A large-scale dataset for colonic nuclear instance segmentation and classification,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 684–693

  62. [62]

    Multi-task learning for thyroid nodule segmentation with thyroid region prior,

    H. Gong, G. Chen, R. Wang, X. Xie, M. Mao, Y . Yu, F. Chen, and G. Li, “Multi-task learning for thyroid nodule segmentation with thyroid region prior,” in2021 IEEE 18th international symposium on biomedical imaging (ISBI). IEEE, 2021, pp. 257–261

  63. [63]

    Image compositing for segmentation of surgical tools without manual annotations,

    L. C. Garcia-Peraza-Herrera, L. Fidon, C. D’Ettorre, D. Stoyanov, T. Vercauteren, and S. Ourselin, “Image compositing for segmentation of surgical tools without manual annotations,”IEEE transactions on medical imaging, vol. 40, no. 5, pp. 1450–1460, 2021

  64. [64]

    Comparison of semi-automatic and deep learning-based automatic methods for liver segmentation in living liver transplant donors,

    A. E. Kavur, N. S. Gezer, M. Barıs ¸, Y . S ¸ahin, S. ¨Ozkan, B. Baydar, U. Y ¨uksel, C ¸ . Kılıkc ¸ıer, S ¸. Olut, G. B. Akaret al., “Comparison of semi-automatic and deep learning-based automatic methods for liver segmentation in living liver transplant donors,”Diagnostic and Inter- ventional Radiology, vol. 26, no. 1, p. 11, 2019

  65. [65]

    CHAOS Challenge - JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 15 combined (CT-MR) healthy abdominal organ segmentation,

    A. E. Kavur, N. S. Gezer, M. Barıs ¸, S. Aslan, P.-H. Conze, V . Groza, D. D. Pham, S. Chatterjee, P. Ernst, S. ¨Ozkan, B. Baydar, D. Lachinov, S. Han, J. Pauli, F. Isensee, M. Perkonigg, R. Sathish, R. Rajan, D. Sheet, G. Dovletov, O. Speck, A. N ¨urnberger, K. H. Maier-Hein, G. Bozda ˘gı Akar, G. ¨Unal, O. Dicle, and M. A. Selver, “CHAOS Challenge - JOU...

  66. [66]

    CHAOS - Combined (CT-MR) Healthy Abdominal Organ Segmentation Challenge Data [Internet]

    A. E. Kavur, M. A. Selver, O. Dicle, M. Barıs ¸, and N. S. Gezer, “CHAOS - Combined (CT-MR) Healthy Abdominal Organ Segmentation Challenge Data,” Apr. 2019. [Online]. Available: https://doi.org/10.5281/zenodo.3362844

  67. [67]

    Miccai multi-atlas labeling beyond the cranial vault–workshop and challenge,

    B. Landman, Z. Xu, J. Igelsias, M. Styner, T. Langerak, and A. Klein, “Miccai multi-atlas labeling beyond the cranial vault–workshop and challenge,” inProc. MICCAI Multi-Atlas Labeling Beyond Cranial Vault—Workshop Challenge, vol. 5, 2015, p. 12

  68. [68]

    The medical segmentation decathlon,

    M. Antonelli, A. Reinke, S. Bakas, K. Farahani, A. Kopp-Schneider, B. A. Landman, G. Litjens, B. Menze, O. Ronneberger, R. M. Summers et al., “The medical segmentation decathlon,”Nature communications, vol. 13, no. 1, p. 4128, 2022

  69. [69]

    Varia- tional information distillation for knowledge transfer,

    S. Ahn, S. X. Hu, A. Damianou, N. D. Lawrence, and Z. Dai, “Varia- tional information distillation for knowledge transfer,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 9163–9171

  70. [70]

    Probabilistic knowledge transfer for lightweight deep representation learning,

    N. Passalis, M. Tzelepi, and A. Tefas, “Probabilistic knowledge transfer for lightweight deep representation learning,”IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 5, pp. 2030–2039, 2020

  71. [71]

    Cross-layer distillation with semantic calibration,

    D. Chen, J.-P. Mei, Y . Zhang, C. Wang, Z. Wang, Y . Feng, and C. Chen, “Cross-layer distillation with semantic calibration,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 8, 2021, pp. 7028–7036

  72. [72]

    Visual query tuning: Towards ef- fective usage of intermediate representations for parameter and memory efficient transfer learning,

    C.-H. Tu, Z. Mai, and W.-L. Chao, “Visual query tuning: Towards ef- fective usage of intermediate representations for parameter and memory efficient transfer learning,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 7725–7735

  73. [73]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778

  74. [74]

    Mobilenetv2: Inverted residuals and linear bottlenecks,

    M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4510–4520

  75. [75]

    Shufflenet v2: Practical guidelines for efficient cnn architecture design,

    N. Ma, X. Zhang, H.-T. Zheng, and J. Sun, “Shufflenet v2: Practical guidelines for efficient cnn architecture design,” inProceedings of the European conference on computer vision (ECCV), 2018, pp. 116–131

  76. [76]

    Merlin: A vision language foundation model for 3d computed tomography,

    L. Blankemeier, J. P. Cohen, A. Kumar, D. Van Veen, S. J. S. Gardezi, M. Paschali, Z. Chen, J.-B. Delbrouck, E. Reis, C. Truytset al., “Merlin: A vision language foundation model for 3d computed tomography,” arXiv preprint arXiv:2406.06512, 2024

  77. [77]

    Learning spatio-temporal features with 3d residual networks for action recognition,

    K. Hara, H. Kataoka, and Y . Satoh, “Learning spatio-temporal features with 3d residual networks for action recognition,” inProceedings of the IEEE international conference on computer vision workshops, 2017, pp. 3154–3160

  78. [78]

    Segment anything in medical images,

    J. Ma, Y . He, F. Li, L. Han, C. You, and B. Wang, “Segment anything in medical images,”Nature Communications, vol. 15, no. 1, p. 654, 2024

  79. [79]

    Swin-umamba: Mamba-based unet with imagenet-based pretraining,

    J. Liu, H. Yang, H.-Y . Zhou, Y . Xi, L. Yu, C. Li, Y . Liang, G. Shi, Y . Yu, S. Zhanget al., “Swin-umamba: Mamba-based unet with imagenet-based pretraining,” inInternational Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2024, pp. 615–625

  80. [80]

    One model to rule them all: To- wards universal segmentation for medical images with text prompt,

    Z. Zhao, Y . Zhang, C. Wu, X. Zhang, Y . Zhang, Y . Wang, and W. Xie, “One model to rule them all: Towards universal segmentation for medical images with text prompts,”arXiv preprint arXiv:2312.17183, 2023

Showing first 80 references.