pith. machine review for the scientific record. sign in

arxiv: 2604.12668 · v1 · submitted 2026-04-14 · 💻 cs.CV

Recognition: unknown

OFA-Diffusion Compression: Compressing Diffusion Model in One-Shot Manner

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:32 UTC · model grok-4.3

classification 💻 cs.CV
keywords diffusion modelsmodel compressionsingle trainingimage generationsubnetwork extractionchannel allocationtraining efficiency
0
0 comments X

The pith

One training session can yield multiple compressed versions of a diffusion model sized for different devices.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to compress diffusion probabilistic models into various smaller sizes using only a single training process rather than separate trainings for each target size. This addresses the practical need to deploy models on devices with differing resource limits without incurring repeated high costs. By narrowing the space of possible subnetworks to discrete sizes and assigning channels according to their importance, along with a reweighting technique to equalize training, the method aims to achieve good results efficiently.

Core claim

By restricting the candidate subnetworks in a single-training compression setup to a small set of discrete parameter sizes, allocating channels gradually by importance for each size, and applying a reweighting strategy during optimization, a single training run can produce multiple compressed diffusion models that perform satisfactorily on image generation tasks.

What carries the argument

The restricted single-training compression framework that builds subnetworks of preset sizes via importance-based channel allocation and reweighting to balance optimization.

Load-bearing premise

Restricting subnetwork candidates to discrete sizes and allocating channels by importance still permits each resulting model to achieve competitive performance.

What would settle it

Training the framework once, extracting subnetworks for several sizes, and then measuring their image generation quality against models compressed separately for those exact sizes; a large gap in metrics like FID would disprove the claim.

Figures

Figures reproduced from arXiv: 2604.12668 by Haoyang Jiang, Ju Fan, Junxian Cai, Lanqing Hu, Mingyang Yi, Qingbin Liu, Xi Chen, Xiuyu Li, Zekun Wang.

Figure 1
Figure 1. Figure 1: Overview of our OFA-Diffusion Compression framework. On the left, we present the structure of U-ViT architecture. On [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The practical retention rates of parameters [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Comparison between our proposed subnetwork construction approach and other ablations. For simplicity, “Arch” and [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Averaged latency of model on GPU or CPU evalu [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of FID versus MACs with different [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Overview of our OFA-Diffusion Compression framework. On the left, we present the structure of U-Net architecture. [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: The convergence of subnetworks with different [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Visualization of normed importance scores across [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Samples from subnetworks trained by different approaches with the same random seed on CIFAR10 [PITH_FULL_IMAGE:figures/full_fig_p014_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Samples from subnetworks trained by different approaches with the same random seed on FFHQ [PITH_FULL_IMAGE:figures/full_fig_p014_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Samples from subnetworks trained by different approaches with the same random seed on AFHQv2 [PITH_FULL_IMAGE:figures/full_fig_p015_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Samples from subnetworks trained by different approaches with the same random seed on ImageNet [PITH_FULL_IMAGE:figures/full_fig_p015_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Samples from subnetworks trained by different approaches with the same random seed on cifar10 [PITH_FULL_IMAGE:figures/full_fig_p016_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Samples from subnetworks trained by different approaches with the same random seed on CelebA [PITH_FULL_IMAGE:figures/full_fig_p016_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Samples from subnetworks trained by different approaches with the same random seed on MS-COCO [PITH_FULL_IMAGE:figures/full_fig_p017_15.png] view at source ↗
read the original abstract

The Diffusion Probabilistic Model (DPM) achieves remarkable performance in image generation, while its increasing parameter size and computational overhead hinder its deployment in practical applications. To improve this, the existing literature focuses on obtaining a smaller model with a fixed architecture through model compression. However, in practice, DPMs usually need to be deployed on various devices with different resource constraints, which leads to multiple compression processes, incurring significant overhead for repeated training. To obviate this, we propose a once-for-all (OFA) compression framework for DPMs that yields different subnetworks with various computations in a one-shot training manner. The existing OFA framework typically involves massive subnetworks with different parameter sizes, while such a huge candidate space slows the optimization. Thus, we propose to restrict the candidate subnetworks with a certain set of parameter sizes, where each size corresponds to a specific subnetwork. Specifically, to construct each subnetwork with a given size, we gradually allocate the maintained channels by their importance. Furthermore, we propose a reweighting strategy to balance the optimization process of different subnetworks. Experimental results show that our approach can produce compressed DPMs for various sizes with significantly lower training overhead while achieving satisfactory performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes OFA-Diffusion Compression, a once-for-all framework for compressing diffusion probabilistic models (DPMs). It restricts the supernet candidate space to a discrete set of target parameter sizes, constructs each subnetwork by gradually allocating channels according to importance scores, and applies a reweighting strategy on subnetwork losses to balance joint optimization. The central claim is that this one-shot procedure yields multiple compressed DPMs of varying sizes with significantly lower training overhead than repeated per-size compressions while still achieving satisfactory performance on image-generation benchmarks.

Significance. If the empirical results hold, the work offers a practical solution to the repeated-training overhead that currently limits deployment of large DPMs across heterogeneous devices. The restriction to discrete sizes and importance-based allocation represent a pragmatic engineering contribution that makes the OFA paradigm more tractable for diffusion models; credit is due for identifying and addressing the scaling issue of massive supernets in this domain.

major comments (3)
  1. [Section 3.2 and Section 4] The central claim that importance-based channel allocation within the restricted discrete-size supernet produces competitive subnetworks rests on an unverified assumption. The manuscript must demonstrate, via direct comparison in the experimental section, that the extracted subnetworks match or approach the performance of independently trained models of identical parameter counts; without this baseline the 'satisfactory performance' assertion cannot be evaluated.
  2. [Abstract and Section 4] The abstract states that experiments show 'significantly lower training overhead' and 'satisfactory performance,' yet supplies no quantitative numbers, wall-clock times, FID scores, baseline comparisons, or error bars. The load-bearing experimental claims therefore cannot be verified from the given text; the full results section must include these metrics and ablations on the reweighting coefficients.
  3. [Section 3.3] The reweighting strategy is presented as balancing the optimization of different subnetworks, but no analysis is given of how the chosen coefficients interact with the time-conditioned diffusion loss. A sensitivity study or derivation showing that the reweighting does not introduce bias into the importance scores would strengthen the method.
minor comments (2)
  1. [Section 3.2] Notation for the importance scores and channel-allocation procedure should be formalized with explicit equations rather than prose descriptions to improve reproducibility.
  2. [Section 3.1] The manuscript would benefit from a clear statement of the exact discrete parameter sizes chosen and the rationale for their selection.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We appreciate the emphasis on strengthening the empirical validation and analysis. Below we address each major comment point by point, indicating the revisions we will incorporate in the next version of the manuscript.

read point-by-point responses
  1. Referee: [Section 3.2 and Section 4] The central claim that importance-based channel allocation within the restricted discrete-size supernet produces competitive subnetworks rests on an unverified assumption. The manuscript must demonstrate, via direct comparison in the experimental section, that the extracted subnetworks match or approach the performance of independently trained models of identical parameter counts; without this baseline the 'satisfactory performance' assertion cannot be evaluated.

    Authors: We agree that a direct comparison against independently trained models of identical parameter counts would provide a stronger validation of the subnetwork quality. Our current experiments compare against existing DPM compression baselines and demonstrate competitive FID scores at substantially reduced training cost. To address the concern, the revised manuscript will include new experiments that train independent models for each target size and report their performance relative to our extracted subnetworks. We will also note the additional compute required for these baselines as context for the efficiency gains of the one-shot approach. revision: yes

  2. Referee: [Abstract and Section 4] The abstract states that experiments show 'significantly lower training overhead' and 'satisfactory performance,' yet supplies no quantitative numbers, wall-clock times, FID scores, baseline comparisons, or error bars. The load-bearing experimental claims therefore cannot be verified from the given text; the full results section must include these metrics and ablations on the reweighting coefficients.

    Authors: We acknowledge that the abstract and experimental section lack the specific quantitative details needed for verification. In the revised manuscript we will update the abstract to report key metrics including FID scores across model sizes, percentage reductions in training overhead (with wall-clock times), baseline comparisons, and error bars. Section 4 will be expanded with full tables containing these values plus ablations on the reweighting coefficients to make all claims verifiable. revision: yes

  3. Referee: [Section 3.3] The reweighting strategy is presented as balancing the optimization of different subnetworks, but no analysis is given of how the chosen coefficients interact with the time-conditioned diffusion loss. A sensitivity study or derivation showing that the reweighting does not introduce bias into the importance scores would strengthen the method.

    Authors: We will strengthen Section 3.3 by adding a sensitivity study that varies the reweighting coefficients and measures their effect on final performance and importance score rankings. We will also include an empirical comparison of channel importance scores computed with and without reweighting to demonstrate that the chosen coefficients do not materially bias the allocation process. A short derivation relating the reweighting to the time-conditioned loss will be provided to clarify the interaction. revision: yes

Circularity Check

0 steps flagged

Empirical supernet training procedure with no circular derivation

full rationale

The paper presents an empirical one-shot training framework for diffusion model compression. It restricts the supernet to discrete target sizes, allocates channels by importance ranking, and applies reweighting during joint optimization, then evaluates the extracted subnetworks on standard image-generation benchmarks. No equations, predictions, or uniqueness claims are shown to reduce by construction to fitted parameters or prior self-citations. The central results are performance numbers obtained from training and evaluation, not algebraic identities or renamed inputs. This is a standard empirical method paper whose claims rest on external benchmark outcomes rather than internal definitional loops.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The framework rests on standard assumptions from once-for-all training plus two paper-specific choices whose justification is not supplied in the abstract.

free parameters (2)
  • discrete set of target parameter sizes
    Chosen to match expected device constraints; directly determines which subnetworks are optimized.
  • reweighting coefficients for subnetwork losses
    Introduced to balance training across sizes; values are not derived from first principles.
axioms (1)
  • domain assumption Channel importance scores computed on the full model can be used to allocate channels to smaller subnetworks while preserving generation quality.
    Invoked when constructing each subnetwork by gradual allocation; no proof or prior citation is given in the abstract.

pith-pipeline@v0.9.0 · 5540 in / 1328 out tokens · 40945 ms · 2026-05-10T15:32:04.387677+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

71 extracted references · 11 canonical work pages · 3 internal anchors

  1. [1]

    eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers

    Yogesh Balaji, Seungjun Nah, Xun Huang, Arash Vahdat, Jiaming Song, Karsten Kreis, Miika Aittala, Timo Aila, Samuli Laine, Bryan Catanzaro, et al. 2022.eDiff-I: Text-to-image diffusion models with an ensemble of expert denoisers. Preprint arXiv:2211.01324

  2. [2]

    Fan Bao, Shen Nie, Kaiwen Xue, Yue Cao, Chongxuan Li, Hang Su, and Jun Zhu

  3. [3]

    InIEEE/CVF International Conference on Computer Vision

    All are worth words: A ViT Backbone for Diffusion Models. InIEEE/CVF International Conference on Computer Vision. OFA-Diffusion Compression: Compressing Diffusion Model in One-Shot Manner 6 8 10 12 14 16 18 20 MACs(G) 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5FID (10K samples) Uniform FID = 5.22 2.0 1.0 FID = 4.97 2.5 1.0 FID = 4.96 3.0 1.0 FID = 4.81 5.0 1.0 FI...

  4. [4]

    Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, and Song Han. 2020. Once-for-All: Train One Network and Specialize it for Efficient Deployment. In International Conference on Learning Representations

  5. [5]

    Yu-Hui Chen, Raman Sarokin, Juhyun Lee, Jiuqiang Tang, Chuo-Ling Chang, Andrei Kulik, and Matthias Grundmann. 2023. Speed Is All You Need: On- Device Acceleration of Large Diffusion Models via GPU-Aware Optimizations. InConference on Computer Vision and Pattern Recognition

  6. [6]

    Yunjey Choi, Youngjung Uh, Jaejun Yoo, and Jung-Woo Ha. 2020. StarGAN v2: Diverse Image Synthesis for Multiple Domains. InConference on Computer Vision and Pattern Recognition

  7. [7]

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Im- ageNet: A large-scale hierarchical image database. InConference on Computer Vision and Pattern Recognition

  8. [8]

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. InConference of the North American Chapter of the Association for Computational Linguistics

  9. [9]

    Prafulla Dhariwal and Alexander Quinn Nichol. 2021. Diffusion Models Beat GANs on Image Synthesis. InAdvances in Neural Information Processing Systems

  10. [10]

    Angela Fan, Edouard Grave, and Armand Joulin. 2020. Reducing Transformer Depth on Demand with Structured Dropout. InInternational Conference on Learning Representations

  11. [11]

    arXiv preprint arXiv:2305.10924 (2023) 5

    Gongfan Fang, Xinyin Ma, and Xinchao Wang. 2023.Structural Pruning for Diffusion Models. Preprint arXiv:2305.10924

  12. [12]

    Ligong Han, Yinxiao Li, Han Zhang, Peyman Milanfar, Dimitris Metaxas, and Feng Yang. 2023. Svdiff: Compact parameter space for diffusion fine-tuning. In International Conference on Computer Vision

  13. [13]

    Song Han, Huizi Mao, and William J. Dally. 2016. Deep Compression: Compress- ing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding. InInternational Conference on Learning Representations

  14. [14]

    Song Han, Jeff Pool, John Tran, and William J. Dally. 2015.Learning both Weights and Connections for Efficient Neural Networks. Preprint arXiv:1506.02626

  15. [15]

    Stork, and Gregory J

    Babak Hassibi, David G. Stork, and Gregory J. Wolff. 1993. Optimal Brain Surgeon and general network pruning. InProceedings of International Conference on Neural Networks

  16. [16]

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. InConference on Computer Vision and Pattern Recognition

  17. [17]

    2023.EfficientDM: Efficient Quantization-A ware Fine-Tuning of Low-Bit Diffusion Models

    Yefei He, Jing Liu, Weijia Wu, Hong Zhou, and Bohan Zhuang. 2023.EfficientDM: Efficient Quantization-A ware Fine-Tuning of Low-Bit Diffusion Models. Preprint arXiv:2310.03270

  18. [18]

    2023.PTQD: Accurate Post-Training Quantization for Diffusion Models

    Yefei He, Luping Liu, Jing Liu, Weijia Wu, Hong Zhou, and Bohan Zhuang. 2023.PTQD: Accurate Post-Training Quantization for Diffusion Models. Preprint arXiv:2305.10657

  19. [19]

    Yihui He, Xiangyu Zhang, and Jian Sun. 2017. Channel Pruning for Accelerating Very Deep Neural Networks. InInternational Conference on Computer Vision

  20. [20]

    Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. GANs Trained by a Two Time-Scale Update Rule Con- verge to a Local Nash Equilibrium. InAdvances in Neural Information Processing Systems

  21. [21]

    Distilling the Knowledge in a Neural Network

    Geoffrey E. Hinton, Oriol Vinyals, and Jeffrey Dean. 2015.Distilling the Knowledge in a Neural Network. Preprint arXiv:1503.02531

  22. [22]

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising Diffusion Probabilistic Models. InAdvances in Neural Information Processing Systems

  23. [23]

    Lu Hou, Zhiqi Huang, Lifeng Shang, Xin Jiang, Xiao Chen, and Qun Liu. 2020. DynaBERT: Dynamic BERT with Adaptive Width and Depth. InAdvances in Neural Information Processing Systems

  24. [24]

    Lu Hou, Quanming Yao, and James T. Kwok. 2017. Loss-aware Binarization of Deep Networks. InInternational Conference on Learning Representations

  25. [25]

    Liang Hou, Zehuan Yuan, Lei Huang, Huawei Shen, Xueqi Cheng, and Changhu Wang. 2021. Slimmable Generative Adversarial Networks. InAAAI Conference on Artificial Intelligence

  26. [26]

    Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. 2022. Elucidating the Design Space of Diffusion-Based Generative Models. InAdvances in Neural Information Processing Systems

  27. [27]

    Tero Karras, Samuli Laine, and Timo Aila. 2019. A Style-Based Generator Archi- tecture for Generative Adversarial Networks. InConference on Computer Vision and Pattern Recognition

  28. [28]

    2023.On Architectural Compression of Text-to-Image Diffusion Models

    Bo-Kyeong Kim, Hyoung-Kyu Song, Thibault Castells, and Shinkook Choi. 2023.On Architectural Compression of Text-to-Image Diffusion Models. Preprint arXiv:2305.15798

  29. [29]

    Kingma and Jimmy Ba

    Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Opti- mization. InInternational Conference on Learning Representations

  30. [30]

    2009.Learning multiple layers of features from tiny images

    Alex Krizhevsky. 2009.Learning multiple layers of features from tiny images. Technical Report

  31. [31]

    Denker, and Sara A

    Yann LeCun, John S. Denker, and Sara A. Solla. 1989. Optimal Brain Damage. In Advances in Neural Information Processing Systems

  32. [32]

    2023.Q-Diffusion: Quantizing Diffusion Models

    Xiuyu Li, Long Lian, Yijiang Liu, Huanrui Yang, Zhen Dong, Daniel Kang, Shang- hang Zhang, and Kurt Keutzer. 2023.Q-Diffusion: Quantizing Diffusion Models. Preprint arXiv:2302.04304

  33. [33]

    2023.SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds

    Yanyu Li, Huan Wang, Qing Jin, Ju Hu, Pavlo Chemerys, Yun Fu, Yanzhi Wang, Sergey Tulyakov, and Jian Ren. 2023.SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds. Preprint arXiv:2306.00980

  34. [34]

    Ji Lin, Richard Zhang, Frieder Ganz, Song Han, and Jun-Yan Zhu. 2021. Anycost GANs for Interactive Image Synthesis and Editing. InConference on Computer Vision and Pattern Recognition

  35. [35]

    Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. InEuropean conference on computer vision

  36. [36]

    Enshu Liu, Xuefei Ning, Zinan Lin, Huazhong Yang, and Yu Wang. 2023. OMS- DPM: Optimizing the Model Schedule for Diffusion Probabilistic Models. In International Conference on Machine Learning

  37. [37]

    Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep learning face attributes in the wild. InInternational Conference on Computer Vision

  38. [38]

    Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. InInternational Conference on Learning Representations

  39. [39]

    Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. 2022. DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps. InAdvances in Neural Information Processing Systems

  40. [40]

    Kingma, Stefano Ermon, Jonathan Ho, and Tim Salimans

    Chenlin Meng, Robin Rombach, Ruiqi Gao, Diederik P. Kingma, Stefano Ermon, Jonathan Ho, and Tim Salimans. 2023. On Distillation of Guided Diffusion Models. InConference on Computer Vision and Pattern Recognition

  41. [41]

    Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, and Jan Kautz. 2017. Pruning Convolutional Neural Networks for Resource Efficient Inference. In International Conference on Learning Representations

  42. [42]

    2013.Stochastic differential equations: an introduction with applications

    Bernt Oksendal. 2013.Stochastic differential equations: an introduction with applications. Springer Science & Business Media

  43. [43]

    Zizheng Pan, Jianfei Cai, and Bohan Zhuang. 2023. Stitchable Neural Networks. InConference on Computer Vision and Pattern Recognition

  44. [44]

    William Peebles and Saining Xie. 2023. Scalable Diffusion Models with Trans- formers. InIEEE/CVF International Conference on Computer Vision

  45. [45]

    Hierarchical Text-Conditional Image Generation with CLIP Latents

    Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. 2022.Hierarchical Text-Conditional Image Generation with CLIP Latents. Preprint arXiv:2204.06125

  46. [46]

    Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Net- works. InEuropean Conference on Computer Vision

  47. [47]

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-resolution image synthesis with latent diffusion models. In Conference on Computer Vision and Pattern Recognition

  48. [48]

    Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. 2015. FitNets: Hints for Thin Deep Nets. In International Conference on Learning Representations. Haoyang Jiang, Zekun Wang, Mingyang Yi, Xiuyu Li, Lanqing Hu, Junxian Cai, Qingbin Liu, Xi Chen, and Ju Fan

  49. [49]

    Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional Networks for Biomedical Image Segmentation. InMedical Image Computing and Computer-Assisted Intervention

  50. [50]

    Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. 2023. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. InConference on Computer Vision and Pattern Recognition

  51. [51]

    Tim Salimans and Jonathan Ho. 2022. Progressive Distillation for Fast Sampling of Diffusion Models. InInternational Conference on Learning Representations

  52. [52]

    Tim Salimans, Andrej Karpathy, Xi Chen, and Diederik P. Kingma. 2017. Pixel- CNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications. InInternational Conference on Learning Representations

  53. [53]

    Yuzhang Shang, Zhihang Yuan, Bin Xie, Bingzhe Wu, and Yan Yan. 2023. Post- Training Quantization on Diffusion Models. InConference on Computer Vision and Pattern Recognition

  54. [54]

    Weiss, Niru Maheswaranathan, and Surya Ganguli

    Jascha Sohl-Dickstein, Eric A. Weiss, Niru Maheswaranathan, and Surya Ganguli

  55. [55]

    In International Conference on Machine Learning

    Deep Unsupervised Learning using Nonequilibrium Thermodynamics. In International Conference on Machine Learning

  56. [56]

    Jiaming Song, Chenlin Meng, and Stefano Ermon. 2021. Denoising Diffusion Implicit Models. InInternational Conference on Learning Representations

  57. [57]

    Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. 2023. Consistency Models. InInternational Conference on Machine Learning

  58. [58]

    Yang Song and Stefano Ermon. 2019. Generative Modeling by Estimating Gra- dients of the Data Distribution. InAdvances in Neural Information Processing Systems

  59. [59]

    Kingma, Abhishek Kumar, Ste- fano Ermon, and Ben Poole

    Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Ste- fano Ermon, and Ben Poole. 2021. Score-Based Generative Modeling through Stochastic Differential Equations. InInternational Conference on Learning Repre- sentations

  60. [60]

    Chaofan Tao, Lu Hou, Haoli Bai, Jiansheng Wei, Xin Jiang, Qun Liu, Ping Luo, and Ngai Wong. 2023. Structured Pruning for Efficient Generative Pre-trained Language Models. InFindings of the Association for Computational Linguistics

  61. [61]

    Gomez, Lukasz Kaiser, and Illia Polosukhin

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. InAdvances in Neural Information Processing Systems

  62. [62]

    Zekun Wang, Mingyang Yi, Shuchen Xue, Zhenguo Li, Ming Liu, Bing Qin, and Zhi-Ming Ma. 2025. Improved Diffusion-based Generative Model with Better Adversarial Robustness. InInternational Conference on Learning Representations

  63. [63]

    Yuxin Wu and Kaiming He. 2018. Group Normalization. InEuropean Conference on Computer Vision

  64. [64]

    Mengzhou Xia, Zexuan Zhong, and Danqi Chen. 2022. Structured Pruning Learns Compact and Accurate Models. InAnnual Meeting of the Association for Computational Linguistics

  65. [65]

    Shuchen Xue, Mingyang Yi, Weijian Luo, Shifeng Zhang, Jiacheng Sun, Zhenguo Li, and Zhi-Ming Ma. 2023. SA-Solver: Stochastic Adams Solver for Fast Sampling of Diffusion Models. InAdvances in Neural Information Processing Systems

  66. [66]

    Mingyang Yi, Aoxue Li, Yi Xin, and Zhenguo Li. 2024. Towards Understanding the Working Mechanism of Text-to-Image Diffusion Model. InConference on Neural Information Processing Systems

  67. [67]

    2023.On the generalization of diffusion model

    Mingyang Yi, Jiacheng Sun, and Zhenguo Li. 2023.On the generalization of diffusion model. Preprint

  68. [68]

    Jiahui Yu and Thomas S. Huang. 2019. Universally Slimmable Networks and Improved Training Techniques. InInternational Conference on Computer Vision

  69. [69]

    Jiahui Yu, Linjie Yang, Ning Xu, Jianchao Yang, and Thomas S. Huang. 2019. Slimmable Neural Networks. InInternational Conference on Learning Representa- tions

  70. [70]

    Qingru Zhang, Simiao Zuo, Chen Liang, Alexander Bukharin, Pengcheng He, Weizhu Chen, and Tuo Zhao. 2022. PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance. InInternational Conference on Machine Learning

  71. [71]

    In: Advances in Neural Informa- tion Processing Systems (NeurIPS) (2023),https://arxiv.org/abs/2302.04867

    Wenliang Zhao, Lujia Bai, Yongming Rao, Jie Zhou, and Jiwen Lu. 2023.UniPC: A Unified Predictor-Corrector Framework for Fast Sampling of Diffusion Models. Preprint arXiv:2302.04867. OFA-Diffusion Compression: Compressing Diffusion Model in One-Shot Manner Block C o n v 1 M o d u l e C o n v 2 M o d u l e A d a p t i v e W i d t h U-Net Architecture Block ...