Recognition: unknown
OFA-Diffusion Compression: Compressing Diffusion Model in One-Shot Manner
Pith reviewed 2026-05-10 15:32 UTC · model grok-4.3
The pith
One training session can yield multiple compressed versions of a diffusion model sized for different devices.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By restricting the candidate subnetworks in a single-training compression setup to a small set of discrete parameter sizes, allocating channels gradually by importance for each size, and applying a reweighting strategy during optimization, a single training run can produce multiple compressed diffusion models that perform satisfactorily on image generation tasks.
What carries the argument
The restricted single-training compression framework that builds subnetworks of preset sizes via importance-based channel allocation and reweighting to balance optimization.
Load-bearing premise
Restricting subnetwork candidates to discrete sizes and allocating channels by importance still permits each resulting model to achieve competitive performance.
What would settle it
Training the framework once, extracting subnetworks for several sizes, and then measuring their image generation quality against models compressed separately for those exact sizes; a large gap in metrics like FID would disprove the claim.
Figures
read the original abstract
The Diffusion Probabilistic Model (DPM) achieves remarkable performance in image generation, while its increasing parameter size and computational overhead hinder its deployment in practical applications. To improve this, the existing literature focuses on obtaining a smaller model with a fixed architecture through model compression. However, in practice, DPMs usually need to be deployed on various devices with different resource constraints, which leads to multiple compression processes, incurring significant overhead for repeated training. To obviate this, we propose a once-for-all (OFA) compression framework for DPMs that yields different subnetworks with various computations in a one-shot training manner. The existing OFA framework typically involves massive subnetworks with different parameter sizes, while such a huge candidate space slows the optimization. Thus, we propose to restrict the candidate subnetworks with a certain set of parameter sizes, where each size corresponds to a specific subnetwork. Specifically, to construct each subnetwork with a given size, we gradually allocate the maintained channels by their importance. Furthermore, we propose a reweighting strategy to balance the optimization process of different subnetworks. Experimental results show that our approach can produce compressed DPMs for various sizes with significantly lower training overhead while achieving satisfactory performance.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes OFA-Diffusion Compression, a once-for-all framework for compressing diffusion probabilistic models (DPMs). It restricts the supernet candidate space to a discrete set of target parameter sizes, constructs each subnetwork by gradually allocating channels according to importance scores, and applies a reweighting strategy on subnetwork losses to balance joint optimization. The central claim is that this one-shot procedure yields multiple compressed DPMs of varying sizes with significantly lower training overhead than repeated per-size compressions while still achieving satisfactory performance on image-generation benchmarks.
Significance. If the empirical results hold, the work offers a practical solution to the repeated-training overhead that currently limits deployment of large DPMs across heterogeneous devices. The restriction to discrete sizes and importance-based allocation represent a pragmatic engineering contribution that makes the OFA paradigm more tractable for diffusion models; credit is due for identifying and addressing the scaling issue of massive supernets in this domain.
major comments (3)
- [Section 3.2 and Section 4] The central claim that importance-based channel allocation within the restricted discrete-size supernet produces competitive subnetworks rests on an unverified assumption. The manuscript must demonstrate, via direct comparison in the experimental section, that the extracted subnetworks match or approach the performance of independently trained models of identical parameter counts; without this baseline the 'satisfactory performance' assertion cannot be evaluated.
- [Abstract and Section 4] The abstract states that experiments show 'significantly lower training overhead' and 'satisfactory performance,' yet supplies no quantitative numbers, wall-clock times, FID scores, baseline comparisons, or error bars. The load-bearing experimental claims therefore cannot be verified from the given text; the full results section must include these metrics and ablations on the reweighting coefficients.
- [Section 3.3] The reweighting strategy is presented as balancing the optimization of different subnetworks, but no analysis is given of how the chosen coefficients interact with the time-conditioned diffusion loss. A sensitivity study or derivation showing that the reweighting does not introduce bias into the importance scores would strengthen the method.
minor comments (2)
- [Section 3.2] Notation for the importance scores and channel-allocation procedure should be formalized with explicit equations rather than prose descriptions to improve reproducibility.
- [Section 3.1] The manuscript would benefit from a clear statement of the exact discrete parameter sizes chosen and the rationale for their selection.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We appreciate the emphasis on strengthening the empirical validation and analysis. Below we address each major comment point by point, indicating the revisions we will incorporate in the next version of the manuscript.
read point-by-point responses
-
Referee: [Section 3.2 and Section 4] The central claim that importance-based channel allocation within the restricted discrete-size supernet produces competitive subnetworks rests on an unverified assumption. The manuscript must demonstrate, via direct comparison in the experimental section, that the extracted subnetworks match or approach the performance of independently trained models of identical parameter counts; without this baseline the 'satisfactory performance' assertion cannot be evaluated.
Authors: We agree that a direct comparison against independently trained models of identical parameter counts would provide a stronger validation of the subnetwork quality. Our current experiments compare against existing DPM compression baselines and demonstrate competitive FID scores at substantially reduced training cost. To address the concern, the revised manuscript will include new experiments that train independent models for each target size and report their performance relative to our extracted subnetworks. We will also note the additional compute required for these baselines as context for the efficiency gains of the one-shot approach. revision: yes
-
Referee: [Abstract and Section 4] The abstract states that experiments show 'significantly lower training overhead' and 'satisfactory performance,' yet supplies no quantitative numbers, wall-clock times, FID scores, baseline comparisons, or error bars. The load-bearing experimental claims therefore cannot be verified from the given text; the full results section must include these metrics and ablations on the reweighting coefficients.
Authors: We acknowledge that the abstract and experimental section lack the specific quantitative details needed for verification. In the revised manuscript we will update the abstract to report key metrics including FID scores across model sizes, percentage reductions in training overhead (with wall-clock times), baseline comparisons, and error bars. Section 4 will be expanded with full tables containing these values plus ablations on the reweighting coefficients to make all claims verifiable. revision: yes
-
Referee: [Section 3.3] The reweighting strategy is presented as balancing the optimization of different subnetworks, but no analysis is given of how the chosen coefficients interact with the time-conditioned diffusion loss. A sensitivity study or derivation showing that the reweighting does not introduce bias into the importance scores would strengthen the method.
Authors: We will strengthen Section 3.3 by adding a sensitivity study that varies the reweighting coefficients and measures their effect on final performance and importance score rankings. We will also include an empirical comparison of channel importance scores computed with and without reweighting to demonstrate that the chosen coefficients do not materially bias the allocation process. A short derivation relating the reweighting to the time-conditioned loss will be provided to clarify the interaction. revision: yes
Circularity Check
Empirical supernet training procedure with no circular derivation
full rationale
The paper presents an empirical one-shot training framework for diffusion model compression. It restricts the supernet to discrete target sizes, allocates channels by importance ranking, and applies reweighting during joint optimization, then evaluates the extracted subnetworks on standard image-generation benchmarks. No equations, predictions, or uniqueness claims are shown to reduce by construction to fitted parameters or prior self-citations. The central results are performance numbers obtained from training and evaluation, not algebraic identities or renamed inputs. This is a standard empirical method paper whose claims rest on external benchmark outcomes rather than internal definitional loops.
Axiom & Free-Parameter Ledger
free parameters (2)
- discrete set of target parameter sizes
- reweighting coefficients for subnetwork losses
axioms (1)
- domain assumption Channel importance scores computed on the full model can be used to allocate channels to smaller subnetworks while preserving generation quality.
Reference graph
Works this paper leans on
-
[1]
eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers
Yogesh Balaji, Seungjun Nah, Xun Huang, Arash Vahdat, Jiaming Song, Karsten Kreis, Miika Aittala, Timo Aila, Samuli Laine, Bryan Catanzaro, et al. 2022.eDiff-I: Text-to-image diffusion models with an ensemble of expert denoisers. Preprint arXiv:2211.01324
work page internal anchor Pith review arXiv 2022
-
[2]
Fan Bao, Shen Nie, Kaiwen Xue, Yue Cao, Chongxuan Li, Hang Su, and Jun Zhu
-
[3]
InIEEE/CVF International Conference on Computer Vision
All are worth words: A ViT Backbone for Diffusion Models. InIEEE/CVF International Conference on Computer Vision. OFA-Diffusion Compression: Compressing Diffusion Model in One-Shot Manner 6 8 10 12 14 16 18 20 MACs(G) 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5FID (10K samples) Uniform FID = 5.22 2.0 1.0 FID = 4.97 2.5 1.0 FID = 4.96 3.0 1.0 FID = 4.81 5.0 1.0 FI...
-
[4]
Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, and Song Han. 2020. Once-for-All: Train One Network and Specialize it for Efficient Deployment. In International Conference on Learning Representations
2020
-
[5]
Yu-Hui Chen, Raman Sarokin, Juhyun Lee, Jiuqiang Tang, Chuo-Ling Chang, Andrei Kulik, and Matthias Grundmann. 2023. Speed Is All You Need: On- Device Acceleration of Large Diffusion Models via GPU-Aware Optimizations. InConference on Computer Vision and Pattern Recognition
2023
-
[6]
Yunjey Choi, Youngjung Uh, Jaejun Yoo, and Jung-Woo Ha. 2020. StarGAN v2: Diverse Image Synthesis for Multiple Domains. InConference on Computer Vision and Pattern Recognition
2020
-
[7]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Im- ageNet: A large-scale hierarchical image database. InConference on Computer Vision and Pattern Recognition
2009
-
[8]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. InConference of the North American Chapter of the Association for Computational Linguistics
2019
-
[9]
Prafulla Dhariwal and Alexander Quinn Nichol. 2021. Diffusion Models Beat GANs on Image Synthesis. InAdvances in Neural Information Processing Systems
2021
-
[10]
Angela Fan, Edouard Grave, and Armand Joulin. 2020. Reducing Transformer Depth on Demand with Structured Dropout. InInternational Conference on Learning Representations
2020
-
[11]
arXiv preprint arXiv:2305.10924 (2023) 5
Gongfan Fang, Xinyin Ma, and Xinchao Wang. 2023.Structural Pruning for Diffusion Models. Preprint arXiv:2305.10924
-
[12]
Ligong Han, Yinxiao Li, Han Zhang, Peyman Milanfar, Dimitris Metaxas, and Feng Yang. 2023. Svdiff: Compact parameter space for diffusion fine-tuning. In International Conference on Computer Vision
2023
-
[13]
Song Han, Huizi Mao, and William J. Dally. 2016. Deep Compression: Compress- ing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding. InInternational Conference on Learning Representations
2016
-
[14]
Song Han, Jeff Pool, John Tran, and William J. Dally. 2015.Learning both Weights and Connections for Efficient Neural Networks. Preprint arXiv:1506.02626
work page Pith review arXiv 2015
-
[15]
Stork, and Gregory J
Babak Hassibi, David G. Stork, and Gregory J. Wolff. 1993. Optimal Brain Surgeon and general network pruning. InProceedings of International Conference on Neural Networks
1993
-
[16]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. InConference on Computer Vision and Pattern Recognition
2016
-
[17]
2023.EfficientDM: Efficient Quantization-A ware Fine-Tuning of Low-Bit Diffusion Models
Yefei He, Jing Liu, Weijia Wu, Hong Zhou, and Bohan Zhuang. 2023.EfficientDM: Efficient Quantization-A ware Fine-Tuning of Low-Bit Diffusion Models. Preprint arXiv:2310.03270
-
[18]
2023.PTQD: Accurate Post-Training Quantization for Diffusion Models
Yefei He, Luping Liu, Jing Liu, Weijia Wu, Hong Zhou, and Bohan Zhuang. 2023.PTQD: Accurate Post-Training Quantization for Diffusion Models. Preprint arXiv:2305.10657
-
[19]
Yihui He, Xiangyu Zhang, and Jian Sun. 2017. Channel Pruning for Accelerating Very Deep Neural Networks. InInternational Conference on Computer Vision
2017
-
[20]
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. GANs Trained by a Two Time-Scale Update Rule Con- verge to a Local Nash Equilibrium. InAdvances in Neural Information Processing Systems
2017
-
[21]
Distilling the Knowledge in a Neural Network
Geoffrey E. Hinton, Oriol Vinyals, and Jeffrey Dean. 2015.Distilling the Knowledge in a Neural Network. Preprint arXiv:1503.02531
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[22]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising Diffusion Probabilistic Models. InAdvances in Neural Information Processing Systems
2020
-
[23]
Lu Hou, Zhiqi Huang, Lifeng Shang, Xin Jiang, Xiao Chen, and Qun Liu. 2020. DynaBERT: Dynamic BERT with Adaptive Width and Depth. InAdvances in Neural Information Processing Systems
2020
-
[24]
Lu Hou, Quanming Yao, and James T. Kwok. 2017. Loss-aware Binarization of Deep Networks. InInternational Conference on Learning Representations
2017
-
[25]
Liang Hou, Zehuan Yuan, Lei Huang, Huawei Shen, Xueqi Cheng, and Changhu Wang. 2021. Slimmable Generative Adversarial Networks. InAAAI Conference on Artificial Intelligence
2021
-
[26]
Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. 2022. Elucidating the Design Space of Diffusion-Based Generative Models. InAdvances in Neural Information Processing Systems
2022
-
[27]
Tero Karras, Samuli Laine, and Timo Aila. 2019. A Style-Based Generator Archi- tecture for Generative Adversarial Networks. InConference on Computer Vision and Pattern Recognition
2019
-
[28]
2023.On Architectural Compression of Text-to-Image Diffusion Models
Bo-Kyeong Kim, Hyoung-Kyu Song, Thibault Castells, and Shinkook Choi. 2023.On Architectural Compression of Text-to-Image Diffusion Models. Preprint arXiv:2305.15798
-
[29]
Kingma and Jimmy Ba
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Opti- mization. InInternational Conference on Learning Representations
2015
-
[30]
2009.Learning multiple layers of features from tiny images
Alex Krizhevsky. 2009.Learning multiple layers of features from tiny images. Technical Report
2009
-
[31]
Denker, and Sara A
Yann LeCun, John S. Denker, and Sara A. Solla. 1989. Optimal Brain Damage. In Advances in Neural Information Processing Systems
1989
-
[32]
2023.Q-Diffusion: Quantizing Diffusion Models
Xiuyu Li, Long Lian, Yijiang Liu, Huanrui Yang, Zhen Dong, Daniel Kang, Shang- hang Zhang, and Kurt Keutzer. 2023.Q-Diffusion: Quantizing Diffusion Models. Preprint arXiv:2302.04304
-
[33]
2023.SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds
Yanyu Li, Huan Wang, Qing Jin, Ju Hu, Pavlo Chemerys, Yun Fu, Yanzhi Wang, Sergey Tulyakov, and Jian Ren. 2023.SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds. Preprint arXiv:2306.00980
-
[34]
Ji Lin, Richard Zhang, Frieder Ganz, Song Han, and Jun-Yan Zhu. 2021. Anycost GANs for Interactive Image Synthesis and Editing. InConference on Computer Vision and Pattern Recognition
2021
-
[35]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. InEuropean conference on computer vision
2014
-
[36]
Enshu Liu, Xuefei Ning, Zinan Lin, Huazhong Yang, and Yu Wang. 2023. OMS- DPM: Optimizing the Model Schedule for Diffusion Probabilistic Models. In International Conference on Machine Learning
2023
-
[37]
Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep learning face attributes in the wild. InInternational Conference on Computer Vision
2015
-
[38]
Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. InInternational Conference on Learning Representations
2019
-
[39]
Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. 2022. DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps. InAdvances in Neural Information Processing Systems
2022
-
[40]
Kingma, Stefano Ermon, Jonathan Ho, and Tim Salimans
Chenlin Meng, Robin Rombach, Ruiqi Gao, Diederik P. Kingma, Stefano Ermon, Jonathan Ho, and Tim Salimans. 2023. On Distillation of Guided Diffusion Models. InConference on Computer Vision and Pattern Recognition
2023
-
[41]
Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, and Jan Kautz. 2017. Pruning Convolutional Neural Networks for Resource Efficient Inference. In International Conference on Learning Representations
2017
-
[42]
2013.Stochastic differential equations: an introduction with applications
Bernt Oksendal. 2013.Stochastic differential equations: an introduction with applications. Springer Science & Business Media
2013
-
[43]
Zizheng Pan, Jianfei Cai, and Bohan Zhuang. 2023. Stitchable Neural Networks. InConference on Computer Vision and Pattern Recognition
2023
-
[44]
William Peebles and Saining Xie. 2023. Scalable Diffusion Models with Trans- formers. InIEEE/CVF International Conference on Computer Vision
2023
-
[45]
Hierarchical Text-Conditional Image Generation with CLIP Latents
Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. 2022.Hierarchical Text-Conditional Image Generation with CLIP Latents. Preprint arXiv:2204.06125
work page internal anchor Pith review arXiv 2022
-
[46]
Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Net- works. InEuropean Conference on Computer Vision
2016
-
[47]
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-resolution image synthesis with latent diffusion models. In Conference on Computer Vision and Pattern Recognition
2022
-
[48]
Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. 2015. FitNets: Hints for Thin Deep Nets. In International Conference on Learning Representations. Haoyang Jiang, Zekun Wang, Mingyang Yi, Xiuyu Li, Lanqing Hu, Junxian Cai, Qingbin Liu, Xi Chen, and Ju Fan
2015
-
[49]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional Networks for Biomedical Image Segmentation. InMedical Image Computing and Computer-Assisted Intervention
2015
-
[50]
Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. 2023. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. InConference on Computer Vision and Pattern Recognition
2023
-
[51]
Tim Salimans and Jonathan Ho. 2022. Progressive Distillation for Fast Sampling of Diffusion Models. InInternational Conference on Learning Representations
2022
-
[52]
Tim Salimans, Andrej Karpathy, Xi Chen, and Diederik P. Kingma. 2017. Pixel- CNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications. InInternational Conference on Learning Representations
2017
-
[53]
Yuzhang Shang, Zhihang Yuan, Bin Xie, Bingzhe Wu, and Yan Yan. 2023. Post- Training Quantization on Diffusion Models. InConference on Computer Vision and Pattern Recognition
2023
-
[54]
Weiss, Niru Maheswaranathan, and Surya Ganguli
Jascha Sohl-Dickstein, Eric A. Weiss, Niru Maheswaranathan, and Surya Ganguli
-
[55]
In International Conference on Machine Learning
Deep Unsupervised Learning using Nonequilibrium Thermodynamics. In International Conference on Machine Learning
-
[56]
Jiaming Song, Chenlin Meng, and Stefano Ermon. 2021. Denoising Diffusion Implicit Models. InInternational Conference on Learning Representations
2021
-
[57]
Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. 2023. Consistency Models. InInternational Conference on Machine Learning
2023
-
[58]
Yang Song and Stefano Ermon. 2019. Generative Modeling by Estimating Gra- dients of the Data Distribution. InAdvances in Neural Information Processing Systems
2019
-
[59]
Kingma, Abhishek Kumar, Ste- fano Ermon, and Ben Poole
Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Ste- fano Ermon, and Ben Poole. 2021. Score-Based Generative Modeling through Stochastic Differential Equations. InInternational Conference on Learning Repre- sentations
2021
-
[60]
Chaofan Tao, Lu Hou, Haoli Bai, Jiansheng Wei, Xin Jiang, Qun Liu, Ping Luo, and Ngai Wong. 2023. Structured Pruning for Efficient Generative Pre-trained Language Models. InFindings of the Association for Computational Linguistics
2023
-
[61]
Gomez, Lukasz Kaiser, and Illia Polosukhin
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. InAdvances in Neural Information Processing Systems
2017
-
[62]
Zekun Wang, Mingyang Yi, Shuchen Xue, Zhenguo Li, Ming Liu, Bing Qin, and Zhi-Ming Ma. 2025. Improved Diffusion-based Generative Model with Better Adversarial Robustness. InInternational Conference on Learning Representations
2025
-
[63]
Yuxin Wu and Kaiming He. 2018. Group Normalization. InEuropean Conference on Computer Vision
2018
-
[64]
Mengzhou Xia, Zexuan Zhong, and Danqi Chen. 2022. Structured Pruning Learns Compact and Accurate Models. InAnnual Meeting of the Association for Computational Linguistics
2022
-
[65]
Shuchen Xue, Mingyang Yi, Weijian Luo, Shifeng Zhang, Jiacheng Sun, Zhenguo Li, and Zhi-Ming Ma. 2023. SA-Solver: Stochastic Adams Solver for Fast Sampling of Diffusion Models. InAdvances in Neural Information Processing Systems
2023
-
[66]
Mingyang Yi, Aoxue Li, Yi Xin, and Zhenguo Li. 2024. Towards Understanding the Working Mechanism of Text-to-Image Diffusion Model. InConference on Neural Information Processing Systems
2024
-
[67]
2023.On the generalization of diffusion model
Mingyang Yi, Jiacheng Sun, and Zhenguo Li. 2023.On the generalization of diffusion model. Preprint
2023
-
[68]
Jiahui Yu and Thomas S. Huang. 2019. Universally Slimmable Networks and Improved Training Techniques. InInternational Conference on Computer Vision
2019
-
[69]
Jiahui Yu, Linjie Yang, Ning Xu, Jianchao Yang, and Thomas S. Huang. 2019. Slimmable Neural Networks. InInternational Conference on Learning Representa- tions
2019
-
[70]
Qingru Zhang, Simiao Zuo, Chen Liang, Alexander Bukharin, Pengcheng He, Weizhu Chen, and Tuo Zhao. 2022. PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance. InInternational Conference on Machine Learning
2022
-
[71]
Wenliang Zhao, Lujia Bai, Yongming Rao, Jie Zhou, and Jiwen Lu. 2023.UniPC: A Unified Predictor-Corrector Framework for Fast Sampling of Diffusion Models. Preprint arXiv:2302.04867. OFA-Diffusion Compression: Compressing Diffusion Model in One-Shot Manner Block C o n v 1 M o d u l e C o n v 2 M o d u l e A d a p t i v e W i d t h U-Net Architecture Block ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.