arxiv: 2605.04447 · v1 · submitted 2026-05-06 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

Deep Reprogramming Distillation for Medical Foundation Models

Siyuan Du , Yuhang Zhou , Haolin Li , Jiangchao Yao , Haishuai Wang , Hui Lin , Ya Zhang , Yanfeng Wang

Authors on Pith no claims yet

Pith reviewed 2026-05-08 18:26 UTC · model grok-4.3

classification 💻 cs.CV

keywords medical foundation modelsknowledge distillationreprogrammingparameter-efficient fine-tuningdomain adaptationmedical imagingCKA distillationlightweight models

0 comments

The pith

Deep Reprogramming Distillation adapts medical foundation models to lightweight students by using a reprogramming module to close domain gaps and CKA loss for stable transfer.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Medical foundation models excel on large pre-training data but face gaps in domain, task, and compute when moved to specific medical scenarios such as 2D or 3D imaging. The paper introduces Deep Reprogramming Distillation (DRD) to address this by adding a reprogramming module that aligns the foundation model's inputs with downstream needs while creating a pathway for efficient distillation to smaller student models. It further applies centered kernel alignment (CKA) distillation to reduce sensitivity to training variations. If the method works as described, practitioners could deploy high-performing medical models on modest hardware across many different tasks without retraining from scratch or forcing identical architectures between teacher and student.

Core claim

DRD introduces a reprogramming module that overcomes domain and task discrepancy between pre-training and downstream scenarios while building student-friendly efficient distillation from foundation models to lightweight downstream models; it pairs this with centered kernel alignment (CKA) distillation to promote robust knowledge transfer, and empirical results show it surpasses previous PEFT and KD methods across 18 medical downstream tasks covering 2D/3D classification and segmentation under different foundation models.

What carries the argument

The reprogramming module, which transforms inputs or intermediate representations to reconcile pre-training and downstream domains and thereby enables efficient, structure-agnostic distillation.

If this is right

DRD can be applied to multiple foundation models without requiring matching architectures or training strategies between teacher and student.
The method supports both classification and segmentation in 2D and 3D medical imaging while remaining computationally lighter than full fine-tuning.
CKA distillation reduces performance variability when training conditions such as random seeds or data splits change.
Lightweight student models produced by DRD achieve higher task performance than those from standard PEFT or KD alone.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Hospitals with limited GPUs could run customized medical analysis tools derived from public foundation models without full retraining.
The reprogramming idea might transfer to other high-stakes domains such as satellite imagery or industrial inspection where large pre-trained models must be specialized quickly.
Further work could test whether the same module design reduces the number of labeled examples needed for the downstream medical tasks.

Load-bearing premise

The reprogramming module can reliably close domain and task gaps without discarding critical medical features or introducing distortions that block effective knowledge transfer to the student.

What would settle it

A head-to-head evaluation on the same 18 tasks and foundation models where DRD shows no consistent gains in accuracy or speed over the best prior PEFT-plus-KD baselines.

Figures

Figures reproduced from arXiv: 2605.04447 by Haishuai Wang, Haolin Li, Hui Lin, Jiangchao Yao, Siyuan Du, Yanfeng Wang, Ya Zhang, Yuhang Zhou.

**Figure 1.** Figure 1: Real-world consideration for the adaptation of medical foundation models. It requires to overcome different inconsistency view at source ↗

**Figure 2.** Figure 2: The overall framework of Deep Reprogramming Distillation. During training, the foundation model is frozen. The deep view at source ↗

**Figure 3.** Figure 3: Comparison with PEFT methods in classification experiments. Subfigures (a)-(d) present the results of four datasets view at source ↗

**Figure 4.** Figure 4: Subfigures (a) and (b) compare the effects of varying depths on KD and DRD in classification and segmentation tasks, view at source ↗

**Figure 5.** Figure 5: Post-training cosine similarity matrices between teacher and student coarse stages. default design. Table IX shows that overly simple projectors underfit the cross-model transformation, while overly wide convolutional blocks increase cost substantially. Our default design achieves the best accuracy on both datasets, improving over the linear baseline by 2.70% on BUSI and 3.17% on BTC, while maintaining a m… view at source ↗

**Figure 6.** Figure 6: Comparison of segmentation results. Randomly selected images were input into different models for nuclear and view at source ↗

**Figure 7.** Figure 7: Comparison of methods in handling inconsistencies. view at source ↗

**Figure 8.** Figure 8: Comparison of decision boundaries of different meth view at source ↗

**Figure 9.** Figure 9: Convergence curves of DRD under different reprogramming depths N. Left: training total loss; right: validation Dice. APPENDIX B PARAMETER-LEVEL GRADIENT DIAGNOSIS We also diagnose whether using a larger reprogramming depth introduces conflicting gradients on the late student layers. The analysis is conducted on USC with MedSAM as the teacher and four representative students. For each minibatch, we separat… view at source ↗

read the original abstract

Medical foundation models pre-trained on large-scale datasets have shown powerful versatile performance. However, when adapting medical foundation models for specific medical scenarios, it remains the inevitable challenge due to the gap induced by the discrepancy between pre-training and downstream tasks, the real-world computation, and speed constraints. Relevant techniques that probably handle this challenge more or less suffer from some intrinsic limitations. For example, knowledge distillation (KD) assumes that teacher and student models share the same task, training strategy, and model structure family, while prevalent parameter-efficient fine-tuning (PEFT) fails to achieve personalized and lightweight deployment. Even the combination of PEFT and KD still struggles to resolve model structures and training strategies inconsistencies between teacher and student models, leading to inefficient knowledge transfer. In this study, we propose a novel framework called Deep Reprogramming Distillation (DRD) to combat the general adaptation challenge. Specifically, DRD introduces the novel reprogramming module that on the one side overcomes the domain and task discrepancy between pretraining and downstream scenarios, and on the other side builds the student-friendly efficient distillation from foundation models to lightweight downstream models. Furthermore, to mitigate variability under different training conditions, we design a centered kernel alignment (CKA) distillation method to promote robust knowledge transfer. Empirical results show that DRD surpasses previous PEFT and KD methods across 18 medical downstream tasks under different foundation models, covering various scenarios including 2D/3D classification and 2D/3D segmentation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DRD adds a reprogramming module and CKA distillation to adapt medical foundation models to lightweight students, with broad empirical coverage but incremental novelty.

read the letter

The core idea here is a practical fix for adapting big medical foundation models to smaller downstream networks under real constraints. DRD inserts a reprogramming module that aligns domains and tasks while making the teacher output usable for distillation, then applies centered kernel alignment loss to stabilize the transfer across training runs. This targets the specific pain points of standard KD (same-task assumption) and PEFT (no lightweight student path) that the abstract flags. The experiments run across 18 tasks, mixing 2D and 3D classification plus segmentation, and test multiple foundation models, which gives decent breadth. Ablations are included and the stress-test found no internal contradictions in the architecture or tables, so the reported gains appear reproducible from the setup described. The method is clearly motivated and the results line up with the claims. The soft spots are modest. The reprogramming module reads like a learned adapter with extra bells; it is not obvious how much it differs from existing PEFT variants once you look at the equations. CKA itself is borrowed from prior work, so the novelty sits mostly in the combination and the medical application rather than a fresh theoretical angle. Effect sizes and variance details would help judge whether the edge over baselines is practically meaningful or sensitive to hyperparameter choices. Overall this is a useful engineering paper for people who deploy medical imaging models on edge devices or with limited compute. A reader focused on model adaptation or compression in healthcare would pick up concrete implementation ideas. It is coherent on its own terms and the experimental scope is wide enough that it deserves a serious referee rather than a desk reject.

Referee Report

2 major / 3 minor

Summary. The paper proposes Deep Reprogramming Distillation (DRD) for adapting medical foundation models to downstream tasks. It introduces a reprogramming module to bridge domain/task discrepancies between pre-training and downstream scenarios while enabling efficient, student-friendly distillation, and employs centered kernel alignment (CKA) distillation to promote robust knowledge transfer under varying conditions. The central claim is that DRD outperforms prior PEFT and KD methods across 18 medical downstream tasks (2D/3D classification and segmentation) under multiple foundation models.

Significance. If the reported gains hold under rigorous verification, DRD would represent a practical advance in efficient adaptation of large medical foundation models, addressing computational constraints and model mismatch issues that limit deployment in clinical settings. The inclusion of both 2D and 3D tasks, multiple foundation models, and ablations provides a reasonably broad empirical basis for the claims.

major comments (2)

[Experimental Results] Experimental section (results tables): The superiority claims over PEFT and KD baselines on 18 tasks lack reported error bars, number of random seeds/runs, and statistical significance tests (e.g., paired t-tests or Wilcoxon tests). This is load-bearing for the central empirical claim, as medical imaging performance is known to vary with data splits and initialization.
[§3.2] §3.2 (Reprogramming module): The module is presented as overcoming domain and task discrepancy while remaining student-friendly, but the exact parameter count, forward-pass formulation, and how it interacts with the foundation model backbone (especially for 3D inputs) are not derived in sufficient detail to verify that it does not implicitly rely on task-specific tuning that would undermine the 'parameter-efficient' positioning.

minor comments (3)

[Abstract] Abstract: The claim of 'surpassing previous PEFT and KD methods' would be more informative if it briefly noted the range of improvement magnitudes or the specific foundation models used.
[§3.3] Notation: The CKA distillation loss could be cross-referenced to the standard formulation in the literature (e.g., Kornblith et al.) to clarify any modifications.
[Figures] Figure captions: Several result figures would benefit from explicit axis labels indicating whether metrics are Dice, AUC, or accuracy, and whether higher/lower is better.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and positive overall assessment of our work on Deep Reprogramming Distillation (DRD). We address each major comment below and will revise the manuscript accordingly to strengthen the presentation and empirical support.

read point-by-point responses

Referee: [Experimental Results] Experimental section (results tables): The superiority claims over PEFT and KD baselines on 18 tasks lack reported error bars, number of random seeds/runs, and statistical significance tests (e.g., paired t-tests or Wilcoxon tests). This is load-bearing for the central empirical claim, as medical imaging performance is known to vary with data splits and initialization.

Authors: We agree that the absence of error bars, run counts, and statistical tests limits the robustness of the superiority claims, particularly given known variability in medical imaging. In the revised version, we will re-run the key experiments across 3 random seeds, report mean performance with standard deviations in all tables, and include paired Wilcoxon signed-rank tests (with p-values) comparing DRD against the strongest baselines on each task. These additions will be placed in the experimental section and table captions. revision: yes
Referee: [§3.2] §3.2 (Reprogramming module): The module is presented as overcoming domain and task discrepancy while remaining student-friendly, but the exact parameter count, forward-pass formulation, and how it interacts with the foundation model backbone (especially for 3D inputs) are not derived in sufficient detail to verify that it does not implicitly rely on task-specific tuning that would undermine the 'parameter-efficient' positioning.

Authors: We acknowledge that §3.2 would benefit from greater technical detail to allow verification of the module's efficiency and generality. In the revision, we will expand this section with: (i) the precise parameter count of the reprogramming module (broken down by components), (ii) the complete forward-pass equations showing its integration with the frozen backbone, and (iii) explicit handling for 3D inputs via channel-wise and spatial reprogramming that preserves parameter efficiency without introducing task-specific layers or tuning beyond the module itself. This will confirm that the design remains student-friendly and does not undermine the parameter-efficient claim. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper proposes an empirical framework (DRD) with a reprogramming module and CKA-based distillation, validated across 18 downstream tasks on multiple foundation models. No mathematical derivation chain, equations, or fitted parameters are presented that reduce to self-definition or prior outputs by construction. Claims rest on experimental results rather than self-referential logic or load-bearing self-citations. The architecture and training protocol are described as independent contributions without circular reductions to inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

Based solely on abstract; the central claim rests on unstated assumptions about the effectiveness of the reprogramming module and CKA in bridging gaps, plus standard ML training assumptions. No explicit free parameters or invented entities detailed beyond the named modules.

axioms (2)

domain assumption Reprogramming module can simultaneously resolve domain/task discrepancy and enable efficient distillation
Invoked in abstract description of DRD design without proof or prior validation cited.
domain assumption CKA distillation promotes robust knowledge transfer under training variability
Stated as design choice to mitigate variability but no derivation or external benchmark provided.

invented entities (1)

Reprogramming module no independent evidence
purpose: Overcome domain and task discrepancy while building student-friendly distillation
New component introduced in the framework; no independent evidence or falsifiable prediction given in abstract.

pith-pipeline@v0.9.0 · 5581 in / 1479 out tokens · 39654 ms · 2026-05-08T18:26:26.562255+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Cost.FunctionalEquation / Foundation.AlphaCoordinateFixation washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

L_train = L_sup + α L_hybrid + β L_KD + L_CKA, where α and β are hyperparameters that balance the contributions of each loss component.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

83 extracted references · 33 canonical work pages · 4 internal anchors

[1]

PMC-CLIP: Con- trastive language-image pre-training using biomedical docu- ments.arXiv preprint arXiv:2303.07240, 2023

W. Lin, Z. Zhao, X. Zhang, C. Wu, Y . Zhang, Y . Wang, and W. Xie, “Pmc-clip: Contrastive language-image pre-training using biomedical documents,”arXiv preprint arXiv:2303.07240, 2023

work page arXiv 2023
[2]

Radimagenet: an open radiologic deep learning research dataset for effective transfer learning,

X. Mei, Z. Liu, P. M. Robson, B. Marinelli, M. Huang, A. Doshi, A. Jacobi, C. Caoet al., “Radimagenet: an open radiologic deep learning research dataset for effective transfer learning,”Radiology: Artificial Intelligence, vol. 4, no. 5, p. e210315, 2022

2022
[3]

Lvm-med: Learning large-scale self-supervised vision models for medical imaging via second-order graph matching,

D. M. Nguyen, H. Nguyen, N. T. Diep, T. N. Phamet al., “Lvm-med: Learning large-scale self-supervised vision models for medical imaging via second-order graph matching,”arXiv preprint arXiv:2306.11925, 2023

work page arXiv 2023
[4]

Edge-cloud polarization and collaboration: A comprehensive survey for ai,

J. Yao, S. Zhang, Y . Yao, F. Wang, J. Ma, J. Zhang, Y . Chu, L. Ji, K. Jia et al., “Edge-cloud polarization and collaboration: A comprehensive survey for ai,”IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 7, pp. 6866–6886, 2022

2022
[5]

Exploring training on heterogeneous data with mixture of low-rank adapters,

Y . Zhou, Z. Zhao, S. Du, J. Yao, Y . Zhang, Y . Wanget al., “Exploring training on heterogeneous data with mixture of low-rank adapters,” in Forty-first International Conference on Machine Learning
[6]

Masked au- toencoders are scalable vision learners,

K. He, X. Chen, S. Xie, Y . Li, P. Doll ´ar, and R. Girshick, “Masked au- toencoders are scalable vision learners,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 16 000–16 009

2022
[7]

Low-rank knowl- edge decomposition for medical foundation models,

Y . Zhou, H. Li, S. Du, J. Yao, Y . Zhang, and Y . Wang, “Low-rank knowl- edge decomposition for medical foundation models,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 11 611–11 620

2024
[8]

Parameter-efficient fine-tuning for pre-trained vision models: A survey.arXiv preprint arXiv:2402.02242, 2024

Y . Xin, S. Luo, H. Zhou, J. Du, X. Liu, Y . Fan, Q. Li, and Y . Du, “Parameter-efficient fine-tuning for pre-trained vision models: A survey,” arXiv preprint arXiv:2402.02242, 2024

work page arXiv 2024
[9]

Adaptformer: Adapting vision transformers for scalable visual recogni- tion,

S. Chen, C. Ge, Z. Tong, J. Wang, Y . Song, J. Wang, and P. Luo, “Adaptformer: Adapting vision transformers for scalable visual recogni- tion,”Advances in Neural Information Processing Systems, vol. 35, pp. 16 664–16 678, 2022

2022
[10]

Visual prompt tuning,

M. Jia, L. Tang, B.-C. Chen, C. Cardie, S. Belongieet al., “Visual prompt tuning,” inEuropean Conference on Computer Vision. Springer, 2022, pp. 709–727

2022
[11]

Prefix-Tuning: Optimizing Continuous Prompts for Generation

X. L. Li and P. Liang, “Prefix-tuning: Optimizing continuous prompts for generation,”arXiv preprint arXiv:2101.00190, 2021

work page internal anchor Pith review arXiv 2021
[12]

LoRA: Low-Rank Adaptation of Large Language Models

E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, and W. Chen, “Lora: Low-rank adaptation of large language models,” arXiv preprint arXiv:2106.09685, 2021

work page internal anchor Pith review arXiv 2021
[13]

Parameter-efficient model adaptation for vision transformers,

X. He, C. Li, P. Zhang, J. Yang, and X. E. Wang, “Parameter-efficient model adaptation for vision transformers,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 1, 2023, pp. 817–825

2023
[14]

Distilling the Knowledge in a Neural Network

G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,”arXiv preprint arXiv:1503.02531, 2015

work page Pith review arXiv 2015
[15]

Similarity-preserving knowledge distillation,

F. Tung and G. Mori, “Similarity-preserving knowledge distillation,” inProceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 1365–1374

2019
[16]

Contrastive representation distilla- tion,

Y . Tian, D. Krishnan, and P. Isola, “Contrastive representation distilla- tion,”arXiv preprint arXiv:1910.10699, 2019

work page arXiv 1910
[17]

Relational knowledge distilla- tion,

W. Park, D. Kim, Y . Lu, and M. Cho, “Relational knowledge distilla- tion,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 3967–3976. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 14

2019
[18]

Masked generative distillation,

Z. Yang, Z. Li, M. Shao, D. Shi, Z. Yuan, and C. Yuan, “Masked generative distillation,” inEuropean Conference on Computer Vision. Springer, 2022, pp. 53–69

2022
[19]

Norm: Knowledge distillation via n-to- one representation matching,

X. Liu, L. Li, C. Li, and A. Yao, “Norm: Knowledge distillation via n-to- one representation matching,”arXiv preprint arXiv:2305.13803, 2023

work page arXiv 2023
[20]

Lorkd: Low-rank knowledge decomposition for medical foundation models,

H. Li, Y . Zhou, Z. Zhao, S. Du, J. Yao, W. Xie, Y . Zhang, and Y . Wang, “Lorkd: Low-rank knowledge decomposition for medical foundation models,”arXiv preprint arXiv:2409.19540, 2024

work page arXiv 2024
[21]

Reprogramming distillation for medical foundation models,

Y . Zhou, S. Du, H. Li, J. Yao, Y . Zhang, and Y . Wang, “Reprogramming distillation for medical foundation models,” inInternational Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2024, pp. 533–543

2024
[22]

Model compression,

C. Bucilu ˇa, R. Caruana, and A. Niculescu-Mizil, “Model compression,” inProceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, 2006, pp. 535–541

2006
[23]

Ensemble distribution distillation,

A. Malinin, B. Mlodozeniec, and M. Gales, “Ensemble distribution distillation,”arXiv preprint arXiv:1905.00076, 2019

work page arXiv 1905
[24]

FitNets: Hints for Thin Deep Nets

A. Romero, N. Ballas, S. E. Kahou, A. Chassang, C. Gatta, and Y . Ben- gio, “Fitnets: Hints for thin deep nets,”arXiv preprint arXiv:1412.6550, 2014

work page internal anchor Pith review arXiv 2014
[25]

Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer

S. Zagoruyko and N. Komodakis, “Paying more attention to attention: Improving the performance of convolutional neural networks via atten- tion transfer,”arXiv preprint arXiv:1612.03928, 2016

work page arXiv 2016
[26]

One-for-all: Bridge the gap between heterogeneous architectures in knowledge distillation,

Z. Hao, J. Guo, K. Han, Y . Tang, H. Hu, Y . Wang, and C. Xu, “One-for-all: Bridge the gap between heterogeneous architectures in knowledge distillation,”Advances in Neural Information Processing Systems, vol. 36, 2024

2024
[27]

Parameter-efficient transfer learning for nlp,

N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. De Laroussilhe, A. Gesmundo, M. Attariyan, and S. Gelly, “Parameter-efficient transfer learning for nlp,” inInternational conference on machine learning. PMLR, 2019, pp. 2790–2799

2019
[28]

Towards a Unified View of Parameter-Efficient Transfer Learning , journal =

J. He, C. Zhou, X. Ma, T. Berg-Kirkpatrick, and G. Neubig, “Towards a unified view of parameter-efficient transfer learning,”arXiv preprint arXiv:2110.04366, 2021

work page arXiv 2021
[29]

arXiv preprint arXiv:2106.04489 , year=

R. K. Mahabadi, S. Ruder, M. Dehghani, and J. Henderson, “Parameter- efficient multi-task fine-tuning for transformers via shared hypernet- works,”arXiv preprint arXiv:2106.04489, 2021

work page arXiv 2021
[30]

Compacter: Efficient low-rank hypercomplex adapter layers,

R. Karimi Mahabadi, J. Henderson, and S. Ruder, “Compacter: Efficient low-rank hypercomplex adapter layers,”Advances in Neural Information Processing Systems, vol. 34, pp. 1022–1035, 2021

2021
[31]

The Power of Scale for Parameter-Efficient Prompt Tuning

B. Lester, R. Al-Rfou, and N. Constant, “The power of scale for parameter-efficient prompt tuning,”arXiv preprint arXiv:2104.08691, 2021

work page internal anchor Pith review arXiv 2021
[32]

Residual prompt tuning: Improving prompt tuning with residual reparameterization,

A. Razdaibiedina, Y . Mao, R. Hou, M. Khabsa, M. Lewis, J. Ba, and A. Almahairi, “Residual prompt tuning: Improving prompt tuning with residual reparameterization,”arXiv preprint arXiv:2305.03937, 2023

work page arXiv 2023
[33]

Lpt: Long-tailed prompt tuning for image classification,

B. Dong, P. Zhou, S. Yan, and W. Zuo, “Lpt: Long-tailed prompt tuning for image classification,”arXiv preprint arXiv:2210.01033, 2022

work page arXiv 2022
[34]

F., Cheng, K.-T., and Chen, M.-H

S.-Y . Liu, C.-Y . Wang, H. Yin, P. Molchanov, Y .-C. F. Wang, K.-T. Cheng, and M.-H. Chen, “Dora: Weight-decomposed low-rank adapta- tion,”arXiv preprint arXiv:2402.09353, 2024

work page arXiv 2024
[35]

Controlling text-to-image diffusion by orthogo- nal finetuning,

Z. Qiu, W. Liu, H. Feng, Y . Xue, Y . Feng, Z. Liu, D. Zhang, A. Weller, and B. Sch ¨olkopf, “Controlling text-to-image diffusion by orthogo- nal finetuning,”Advances in Neural Information Processing Systems, vol. 36, pp. 79 320–79 362, 2023

2023
[36]

Navigating text-to-image customization: From lycoris fine-tuning to model evaluation,

S.-Y . Yeh, Y .-G. Hsieh, Z. Gao, B. B. Yang, G. Oh, and Y . Gong, “Navigating text-to-image customization: From lycoris fine-tuning to model evaluation,” inThe Twelfth International Conference on Learning Representations, 2023

2023
[37]

Bitﬁt: Simple parameter-efﬁcient ﬁne-tuning for transformer-based masked language-models

E. B. Zaken, S. Ravfogel, and Y . Goldberg, “Bitfit: Simple parameter- efficient fine-tuning for transformer-based masked language-models,” arXiv preprint arXiv:2106.10199, 2021

work page arXiv 2021
[38]

Medical SAM adapter: Adapting seg- ment anything model for medical image segmentation.arXiv preprint arXiv:2304.12620, 2023

J. Wu, W. Ji, Y . Liu, H. Fu, M. Xu, Y . Xu, and Y . Jin, “Medical sam adapter: Adapting segment anything model for medical image segmentation,”arXiv preprint arXiv:2304.12620, 2023

work page arXiv 2023
[39]

Kazerouni, I

B. Azad, R. Azad, S. Eskandari, A. Bozorgpour, A. Kazerouni, I. Rekik, and D. Merhof, “Foundational models in medical imaging: A com- prehensive survey and future vision,”arXiv preprint arXiv:2310.18689, 2023

work page arXiv 2023
[40]

On the challenges and perspectives of foundation models for medical image analysis,

S. Zhang and D. Metaxas, “On the challenges and perspectives of foundation models for medical image analysis,”Medical image analysis, vol. 91, p. 102996, 2024

2024
[41]

Fives: A fundus image dataset for artificial intelligence based vessel segmentation,

K. Jin, X. Huang, J. Zhou, Y . Li, Y . Yan, Y . Sun, Q. Zhang, Y . Wang, and J. Ye, “Fives: A fundus image dataset for artificial intelligence based vessel segmentation,”Scientific Data, vol. 9, no. 1, p. 475, 2022

2022
[42]

skin-cancer-detection,

J. Glaister, “skin-cancer-detection,” 2013. [Online]. Avail- able: https://uwaterloo.ca/vision-image-processing-lab/research-demos/ skin-cancer-detection

2013
[43]

Teeth segmentation on dental x-ray images,

H. I. T. Loop, “Teeth segmentation on dental x-ray images,” 2023. [Online]. Available: https://www.kaggle.com/dsv/5884500

work page arXiv 2023
[44]

Model reprogramming: Resource-efficient cross-domain machine learning

P.-Y . Chen, “Model reprogramming: Resource-efficient cross-domain machine learning,”arXiv preprint arXiv:2202.10629, 2022

work page arXiv 2022
[45]

Towards efficient task-driven model reprogramming with foundation models,

S. Xu, J. Yao, R. Luo, S. Zhang, Z. Lian, M. Tan, B. Han, and Y . Wang, “Towards efficient task-driven model reprogramming with foundation models,”arXiv preprint arXiv:2304.02263, 2023

work page arXiv 2023
[46]

Discrepancy and uncertainty aware denoising knowledge distillation for zero-shot cross-lingual named entity recogni- tion,

L. Ge, C. Huet al., “Discrepancy and uncertainty aware denoising knowledge distillation for zero-shot cross-lingual named entity recogni- tion,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 16, 2024, pp. 18 056–18 064

2024
[47]

Similarity of neural network representations revisited,

S. Kornblith, M. Norouzi, H. Lee, and G. Hinton, “Similarity of neural network representations revisited,” inInternational conference on machine learning. PMLR, 2019, pp. 3519–3529

2019
[48]

arXiv preprint arXiv:2010.15327 , year=

T. Nguyen, M. Raghu, and S. Kornblith, “Do wide and deep networks learn the same things? uncovering how neural network representations vary with width and depth,”arXiv preprint arXiv:2010.15327, 2020

work page arXiv 2010
[49]

Measures of the amount of ecologic association between species,

L. R. Dice, “Measures of the amount of ecologic association between species,”Ecology, vol. 26, no. 3, pp. 297–302, 1945

1945
[50]

Cross- image relational knowledge distillation for semantic segmentation,

C. Yang, H. Zhou, Z. An, X. Jiang, Y . Xu, and Q. Zhang, “Cross- image relational knowledge distillation for semantic segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 12 319–12 328

2022
[51]

Faster segment anything: Towards lightweight sam for mobile applications,

C. Zhang, D. Han, Y . Qiao, J. U. Kim, S.-H. Bae, S. Lee, and C. S. Hong, “Faster segment anything: Towards lightweight sam for mobile applications,”arXiv preprint arXiv:2306.14289, 2023

work page arXiv 2023
[52]

Melo: Low-rank adaptation is better than fine-tuning for medical image diagnosis,

Y . Zhu, Z. Shen, Z. Zhao, S. Wang, X. Wang, X. Zhao, D. Shen, and Q. Wang, “Melo: Low-rank adaptation is better than fine-tuning for medical image diagnosis,”arXiv preprint arXiv:2311.08236, 2023

work page arXiv 2023
[53]

Dataset of breast ultrasound images,

W. Al-Dhabyani, M. Gomaa, H. Khaled, and A. Fahmy, “Dataset of breast ultrasound images,”Data in brief, vol. 28, p. 104863, 2020

2020
[54]

The ham10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions,

P. Tschandl, C. Rosendahl, and H. Kittler, “The ham10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions,”Scientific data, vol. 5, no. 1, pp. 1–9, 2018

2018
[55]

Covid-ct-dataset: a ct image dataset about covid-19,

Y . Xingyi, H. Xuehai, Z. Jinyu, Z. Yichenet al., “Covid-ct-dataset: a ct image dataset about covid-19,”arXiv preprint arXiv:2003.13865, 2020

work page arXiv 2003
[56]

Brain tumor classification using deep learning,

A. Saleh, R. Sukaik, and S. S. Abu-Naser, “Brain tumor classification using deep learning,” in2020 International Conference on Assistive and Rehabilitation Technologies (iCareTech), 2020, pp. 131–136

2020
[57]

arXiv 2003.11597 , year=

J. P. Cohen, P. Morrison, and L. Dao, “Covid-19 image data collection,” arXiv preprint arXiv:2003.11597, 2020

work page arXiv 2003
[58]

Mosmeddata: Chest ct scans with covid-19 related findings dataset,

S. P. Morozov, A. E. Andreychenko, N. Pavlov, A. Vladzymyrskyy, N. Ledikhova, V . Gombolevskiy, I. A. Blokhin, P. Gelezhe, A. Gonchar, and V . Y . Chernina, “Mosmeddata: Chest ct scans with covid-19 related findings dataset,”arXiv preprint arXiv:2005.06465, 2020

work page arXiv 2005
[59]

Ct volume samples for lung adenocarcinoma classification,

Y . Feng, “Ct volume samples for lung adenocarcinoma classification,” 2020

2020
[60]

Conic challenge: Push- ing the frontiers of nuclear detection, segmentation, classification and counting,

S. Graham, Q. D. Vu, M. Jahanifar, M. Weigert, U. Schmidt, W. Zhang, J. Zhang, S. Yang, J. Xiang, X. Wanget al., “Conic challenge: Push- ing the frontiers of nuclear detection, segmentation, classification and counting,”Medical image analysis, vol. 92, p. 103047, 2024

2024
[61]

Lizard: A large-scale dataset for colonic nuclear instance segmentation and classification,

S. Graham, M. Jahanifar, A. Azam, M. Nimir, Y .-W. Tsang, K. Dodd, E. Hero, H. Sahota, A. Tank, K. Beneset al., “Lizard: A large-scale dataset for colonic nuclear instance segmentation and classification,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 684–693

2021
[62]

Multi-task learning for thyroid nodule segmentation with thyroid region prior,

H. Gong, G. Chen, R. Wang, X. Xie, M. Mao, Y . Yu, F. Chen, and G. Li, “Multi-task learning for thyroid nodule segmentation with thyroid region prior,” in2021 IEEE 18th international symposium on biomedical imaging (ISBI). IEEE, 2021, pp. 257–261

2021
[63]

Image compositing for segmentation of surgical tools without manual annotations,

L. C. Garcia-Peraza-Herrera, L. Fidon, C. D’Ettorre, D. Stoyanov, T. Vercauteren, and S. Ourselin, “Image compositing for segmentation of surgical tools without manual annotations,”IEEE transactions on medical imaging, vol. 40, no. 5, pp. 1450–1460, 2021

2021
[64]

Comparison of semi-automatic and deep learning-based automatic methods for liver segmentation in living liver transplant donors,

A. E. Kavur, N. S. Gezer, M. Barıs ¸, Y . S ¸ahin, S. ¨Ozkan, B. Baydar, U. Y ¨uksel, C ¸ . Kılıkc ¸ıer, S ¸. Olut, G. B. Akaret al., “Comparison of semi-automatic and deep learning-based automatic methods for liver segmentation in living liver transplant donors,”Diagnostic and Inter- ventional Radiology, vol. 26, no. 1, p. 11, 2019

2019
[65]

CHAOS Challenge - JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 15 combined (CT-MR) healthy abdominal organ segmentation,

A. E. Kavur, N. S. Gezer, M. Barıs ¸, S. Aslan, P.-H. Conze, V . Groza, D. D. Pham, S. Chatterjee, P. Ernst, S. ¨Ozkan, B. Baydar, D. Lachinov, S. Han, J. Pauli, F. Isensee, M. Perkonigg, R. Sathish, R. Rajan, D. Sheet, G. Dovletov, O. Speck, A. N ¨urnberger, K. H. Maier-Hein, G. Bozda ˘gı Akar, G. ¨Unal, O. Dicle, and M. A. Selver, “CHAOS Challenge - JOU...

2021
[66]

CHAOS - Combined (CT-MR) Healthy Abdominal Organ Segmentation Challenge Data [Internet]

A. E. Kavur, M. A. Selver, O. Dicle, M. Barıs ¸, and N. S. Gezer, “CHAOS - Combined (CT-MR) Healthy Abdominal Organ Segmentation Challenge Data,” Apr. 2019. [Online]. Available: https://doi.org/10.5281/zenodo.3362844

work page doi:10.5281/zenodo.3362844 2019
[67]

Miccai multi-atlas labeling beyond the cranial vault–workshop and challenge,

B. Landman, Z. Xu, J. Igelsias, M. Styner, T. Langerak, and A. Klein, “Miccai multi-atlas labeling beyond the cranial vault–workshop and challenge,” inProc. MICCAI Multi-Atlas Labeling Beyond Cranial Vault—Workshop Challenge, vol. 5, 2015, p. 12

2015
[68]

The medical segmentation decathlon,

M. Antonelli, A. Reinke, S. Bakas, K. Farahani, A. Kopp-Schneider, B. A. Landman, G. Litjens, B. Menze, O. Ronneberger, R. M. Summers et al., “The medical segmentation decathlon,”Nature communications, vol. 13, no. 1, p. 4128, 2022

2022
[69]

Varia- tional information distillation for knowledge transfer,

S. Ahn, S. X. Hu, A. Damianou, N. D. Lawrence, and Z. Dai, “Varia- tional information distillation for knowledge transfer,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 9163–9171

2019
[70]

Probabilistic knowledge transfer for lightweight deep representation learning,

N. Passalis, M. Tzelepi, and A. Tefas, “Probabilistic knowledge transfer for lightweight deep representation learning,”IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 5, pp. 2030–2039, 2020

2030
[71]

Cross-layer distillation with semantic calibration,

D. Chen, J.-P. Mei, Y . Zhang, C. Wang, Z. Wang, Y . Feng, and C. Chen, “Cross-layer distillation with semantic calibration,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 8, 2021, pp. 7028–7036

2021
[72]

Visual query tuning: Towards ef- fective usage of intermediate representations for parameter and memory efficient transfer learning,

C.-H. Tu, Z. Mai, and W.-L. Chao, “Visual query tuning: Towards ef- fective usage of intermediate representations for parameter and memory efficient transfer learning,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 7725–7735

2023
[73]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778

2016
[74]

Mobilenetv2: Inverted residuals and linear bottlenecks,

M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4510–4520

2018
[75]

Shufflenet v2: Practical guidelines for efficient cnn architecture design,

N. Ma, X. Zhang, H.-T. Zheng, and J. Sun, “Shufflenet v2: Practical guidelines for efficient cnn architecture design,” inProceedings of the European conference on computer vision (ECCV), 2018, pp. 116–131

2018
[76]

Merlin: A vision language foundation model for 3d computed tomography,

L. Blankemeier, J. P. Cohen, A. Kumar, D. Van Veen, S. J. S. Gardezi, M. Paschali, Z. Chen, J.-B. Delbrouck, E. Reis, C. Truytset al., “Merlin: A vision language foundation model for 3d computed tomography,” arXiv preprint arXiv:2406.06512, 2024

work page arXiv 2024
[77]

Learning spatio-temporal features with 3d residual networks for action recognition,

K. Hara, H. Kataoka, and Y . Satoh, “Learning spatio-temporal features with 3d residual networks for action recognition,” inProceedings of the IEEE international conference on computer vision workshops, 2017, pp. 3154–3160

2017
[78]

Segment anything in medical images,

J. Ma, Y . He, F. Li, L. Han, C. You, and B. Wang, “Segment anything in medical images,”Nature Communications, vol. 15, no. 1, p. 654, 2024

2024
[79]

Swin-umamba: Mamba-based unet with imagenet-based pretraining,

J. Liu, H. Yang, H.-Y . Zhou, Y . Xi, L. Yu, C. Li, Y . Liang, G. Shi, Y . Yu, S. Zhanget al., “Swin-umamba: Mamba-based unet with imagenet-based pretraining,” inInternational Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2024, pp. 615–625

2024
[80]

One model to rule them all: To- wards universal segmentation for medical images with text prompt,

Z. Zhao, Y . Zhang, C. Wu, X. Zhang, Y . Zhang, Y . Wang, and W. Xie, “One model to rule them all: Towards universal segmentation for medical images with text prompts,”arXiv preprint arXiv:2312.17183, 2023

work page arXiv 2023

Showing first 80 references.