arxiv: 2604.12668 · v1 · submitted 2026-04-14 · 💻 cs.CV

Recognition: unknown

OFA-Diffusion Compression: Compressing Diffusion Model in One-Shot Manner

Haoyang Jiang , Zekun Wang , Mingyang Yi , Xiuyu Li , Lanqing Hu , Junxian Cai , Qingbin Liu , Xi Chen

show 1 more author

Ju Fan

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:32 UTC · model grok-4.3

classification 💻 cs.CV

keywords diffusion modelsmodel compressionsingle trainingimage generationsubnetwork extractionchannel allocationtraining efficiency

0 comments

The pith

One training session can yield multiple compressed versions of a diffusion model sized for different devices.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to compress diffusion probabilistic models into various smaller sizes using only a single training process rather than separate trainings for each target size. This addresses the practical need to deploy models on devices with differing resource limits without incurring repeated high costs. By narrowing the space of possible subnetworks to discrete sizes and assigning channels according to their importance, along with a reweighting technique to equalize training, the method aims to achieve good results efficiently.

Core claim

By restricting the candidate subnetworks in a single-training compression setup to a small set of discrete parameter sizes, allocating channels gradually by importance for each size, and applying a reweighting strategy during optimization, a single training run can produce multiple compressed diffusion models that perform satisfactorily on image generation tasks.

What carries the argument

The restricted single-training compression framework that builds subnetworks of preset sizes via importance-based channel allocation and reweighting to balance optimization.

Load-bearing premise

Restricting subnetwork candidates to discrete sizes and allocating channels by importance still permits each resulting model to achieve competitive performance.

What would settle it

Training the framework once, extracting subnetworks for several sizes, and then measuring their image generation quality against models compressed separately for those exact sizes; a large gap in metrics like FID would disprove the claim.

Figures

Figures reproduced from arXiv: 2604.12668 by Haoyang Jiang, Ju Fan, Junxian Cai, Lanqing Hu, Mingyang Yi, Qingbin Liu, Xi Chen, Xiuyu Li, Zekun Wang.

**Figure 1.** Figure 1: Overview of our OFA-Diffusion Compression framework. On the left, we present the structure of U-ViT architecture. On [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: The practical retention rates of parameters [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Comparison between our proposed subnetwork construction approach and other ablations. For simplicity, “Arch” and [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Averaged latency of model on GPU or CPU evalu [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Comparison of FID versus MACs with different [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Overview of our OFA-Diffusion Compression framework. On the left, we present the structure of U-Net architecture. [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗

**Figure 7.** Figure 7: The convergence of subnetworks with different [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

**Figure 8.** Figure 8: Visualization of normed importance scores across [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗

**Figure 9.** Figure 9: Samples from subnetworks trained by different approaches with the same random seed on CIFAR10 [PITH_FULL_IMAGE:figures/full_fig_p014_9.png] view at source ↗

**Figure 10.** Figure 10: Samples from subnetworks trained by different approaches with the same random seed on FFHQ [PITH_FULL_IMAGE:figures/full_fig_p014_10.png] view at source ↗

**Figure 11.** Figure 11: Samples from subnetworks trained by different approaches with the same random seed on AFHQv2 [PITH_FULL_IMAGE:figures/full_fig_p015_11.png] view at source ↗

**Figure 12.** Figure 12: Samples from subnetworks trained by different approaches with the same random seed on ImageNet [PITH_FULL_IMAGE:figures/full_fig_p015_12.png] view at source ↗

**Figure 13.** Figure 13: Samples from subnetworks trained by different approaches with the same random seed on cifar10 [PITH_FULL_IMAGE:figures/full_fig_p016_13.png] view at source ↗

**Figure 14.** Figure 14: Samples from subnetworks trained by different approaches with the same random seed on CelebA [PITH_FULL_IMAGE:figures/full_fig_p016_14.png] view at source ↗

**Figure 15.** Figure 15: Samples from subnetworks trained by different approaches with the same random seed on MS-COCO [PITH_FULL_IMAGE:figures/full_fig_p017_15.png] view at source ↗

read the original abstract

The Diffusion Probabilistic Model (DPM) achieves remarkable performance in image generation, while its increasing parameter size and computational overhead hinder its deployment in practical applications. To improve this, the existing literature focuses on obtaining a smaller model with a fixed architecture through model compression. However, in practice, DPMs usually need to be deployed on various devices with different resource constraints, which leads to multiple compression processes, incurring significant overhead for repeated training. To obviate this, we propose a once-for-all (OFA) compression framework for DPMs that yields different subnetworks with various computations in a one-shot training manner. The existing OFA framework typically involves massive subnetworks with different parameter sizes, while such a huge candidate space slows the optimization. Thus, we propose to restrict the candidate subnetworks with a certain set of parameter sizes, where each size corresponds to a specific subnetwork. Specifically, to construct each subnetwork with a given size, we gradually allocate the maintained channels by their importance. Furthermore, we propose a reweighting strategy to balance the optimization process of different subnetworks. Experimental results show that our approach can produce compressed DPMs for various sizes with significantly lower training overhead while achieving satisfactory performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper restricts OFA to a few discrete sizes, adds importance-based channel allocation and reweighting for diffusion models, but the abstract gives no numbers so the performance claims stay unverified.

read the letter

The main takeaway is that they adapt the once-for-all supernet idea to diffusion probabilistic models by limiting the search space to a small set of target parameter sizes instead of the usual explosion of subnetworks. Each size gets its own subnetwork built by gradually assigning channels according to importance scores, and they add a reweighting term to keep the joint training balanced across those subnetworks. This directly targets the repeated-training cost when you need compressed versions for different devices. The adaptations look like a practical response to the fact that full OFA training is too slow for diffusion objectives. What they do well is frame the deployment bottleneck clearly and propose concrete restrictions that should make the supernet optimization tractable. The reweighting and importance ranking are straightforward ways to avoid one subnetwork starving the others during training. The soft spots are straightforward: the abstract claims lower overhead and satisfactory performance with no quantitative results, baselines, ablations, or error bars at all. Without those, you cannot tell whether the extracted subnetworks actually match what dedicated per-size training would deliver or whether the importance scores (computed under joint optimization) embed compromises that hurt diffusion sampling quality. The non-convex, time-conditioned loss makes that risk real, and the paper would need to show the gap is small. This is aimed at engineers who already know OFA or pruning and need multiple compressed diffusion models without retraining each one. A reader focused on practical compression pipelines would get value if the full experiments hold up. It deserves peer review because the problem is genuine and the restrictions are a reasonable engineering choice, even though the current evidence is too thin to judge the outcome.

Referee Report

3 major / 2 minor

Summary. The paper proposes OFA-Diffusion Compression, a once-for-all framework for compressing diffusion probabilistic models (DPMs). It restricts the supernet candidate space to a discrete set of target parameter sizes, constructs each subnetwork by gradually allocating channels according to importance scores, and applies a reweighting strategy on subnetwork losses to balance joint optimization. The central claim is that this one-shot procedure yields multiple compressed DPMs of varying sizes with significantly lower training overhead than repeated per-size compressions while still achieving satisfactory performance on image-generation benchmarks.

Significance. If the empirical results hold, the work offers a practical solution to the repeated-training overhead that currently limits deployment of large DPMs across heterogeneous devices. The restriction to discrete sizes and importance-based allocation represent a pragmatic engineering contribution that makes the OFA paradigm more tractable for diffusion models; credit is due for identifying and addressing the scaling issue of massive supernets in this domain.

major comments (3)

[Section 3.2 and Section 4] The central claim that importance-based channel allocation within the restricted discrete-size supernet produces competitive subnetworks rests on an unverified assumption. The manuscript must demonstrate, via direct comparison in the experimental section, that the extracted subnetworks match or approach the performance of independently trained models of identical parameter counts; without this baseline the 'satisfactory performance' assertion cannot be evaluated.
[Abstract and Section 4] The abstract states that experiments show 'significantly lower training overhead' and 'satisfactory performance,' yet supplies no quantitative numbers, wall-clock times, FID scores, baseline comparisons, or error bars. The load-bearing experimental claims therefore cannot be verified from the given text; the full results section must include these metrics and ablations on the reweighting coefficients.
[Section 3.3] The reweighting strategy is presented as balancing the optimization of different subnetworks, but no analysis is given of how the chosen coefficients interact with the time-conditioned diffusion loss. A sensitivity study or derivation showing that the reweighting does not introduce bias into the importance scores would strengthen the method.

minor comments (2)

[Section 3.2] Notation for the importance scores and channel-allocation procedure should be formalized with explicit equations rather than prose descriptions to improve reproducibility.
[Section 3.1] The manuscript would benefit from a clear statement of the exact discrete parameter sizes chosen and the rationale for their selection.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We appreciate the emphasis on strengthening the empirical validation and analysis. Below we address each major comment point by point, indicating the revisions we will incorporate in the next version of the manuscript.

read point-by-point responses

Referee: [Section 3.2 and Section 4] The central claim that importance-based channel allocation within the restricted discrete-size supernet produces competitive subnetworks rests on an unverified assumption. The manuscript must demonstrate, via direct comparison in the experimental section, that the extracted subnetworks match or approach the performance of independently trained models of identical parameter counts; without this baseline the 'satisfactory performance' assertion cannot be evaluated.

Authors: We agree that a direct comparison against independently trained models of identical parameter counts would provide a stronger validation of the subnetwork quality. Our current experiments compare against existing DPM compression baselines and demonstrate competitive FID scores at substantially reduced training cost. To address the concern, the revised manuscript will include new experiments that train independent models for each target size and report their performance relative to our extracted subnetworks. We will also note the additional compute required for these baselines as context for the efficiency gains of the one-shot approach. revision: yes
Referee: [Abstract and Section 4] The abstract states that experiments show 'significantly lower training overhead' and 'satisfactory performance,' yet supplies no quantitative numbers, wall-clock times, FID scores, baseline comparisons, or error bars. The load-bearing experimental claims therefore cannot be verified from the given text; the full results section must include these metrics and ablations on the reweighting coefficients.

Authors: We acknowledge that the abstract and experimental section lack the specific quantitative details needed for verification. In the revised manuscript we will update the abstract to report key metrics including FID scores across model sizes, percentage reductions in training overhead (with wall-clock times), baseline comparisons, and error bars. Section 4 will be expanded with full tables containing these values plus ablations on the reweighting coefficients to make all claims verifiable. revision: yes
Referee: [Section 3.3] The reweighting strategy is presented as balancing the optimization of different subnetworks, but no analysis is given of how the chosen coefficients interact with the time-conditioned diffusion loss. A sensitivity study or derivation showing that the reweighting does not introduce bias into the importance scores would strengthen the method.

Authors: We will strengthen Section 3.3 by adding a sensitivity study that varies the reweighting coefficients and measures their effect on final performance and importance score rankings. We will also include an empirical comparison of channel importance scores computed with and without reweighting to demonstrate that the chosen coefficients do not materially bias the allocation process. A short derivation relating the reweighting to the time-conditioned loss will be provided to clarify the interaction. revision: yes

Circularity Check

0 steps flagged

Empirical supernet training procedure with no circular derivation

full rationale

The paper presents an empirical one-shot training framework for diffusion model compression. It restricts the supernet to discrete target sizes, allocates channels by importance ranking, and applies reweighting during joint optimization, then evaluates the extracted subnetworks on standard image-generation benchmarks. No equations, predictions, or uniqueness claims are shown to reduce by construction to fitted parameters or prior self-citations. The central results are performance numbers obtained from training and evaluation, not algebraic identities or renamed inputs. This is a standard empirical method paper whose claims rest on external benchmark outcomes rather than internal definitional loops.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The framework rests on standard assumptions from once-for-all training plus two paper-specific choices whose justification is not supplied in the abstract.

free parameters (2)

discrete set of target parameter sizes
Chosen to match expected device constraints; directly determines which subnetworks are optimized.
reweighting coefficients for subnetwork losses
Introduced to balance training across sizes; values are not derived from first principles.

axioms (1)

domain assumption Channel importance scores computed on the full model can be used to allocate channels to smaller subnetworks while preserving generation quality.
Invoked when constructing each subnetwork by gradual allocation; no proof or prior citation is given in the abstract.

pith-pipeline@v0.9.0 · 5540 in / 1328 out tokens · 40945 ms · 2026-05-10T15:32:04.387677+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

71 extracted references · 11 canonical work pages · 3 internal anchors

[1]

eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers

Yogesh Balaji, Seungjun Nah, Xun Huang, Arash Vahdat, Jiaming Song, Karsten Kreis, Miika Aittala, Timo Aila, Samuli Laine, Bryan Catanzaro, et al. 2022.eDiff-I: Text-to-image diffusion models with an ensemble of expert denoisers. Preprint arXiv:2211.01324

work page internal anchor Pith review arXiv 2022
[2]

Fan Bao, Shen Nie, Kaiwen Xue, Yue Cao, Chongxuan Li, Hang Su, and Jun Zhu
[3]

InIEEE/CVF International Conference on Computer Vision

All are worth words: A ViT Backbone for Diffusion Models. InIEEE/CVF International Conference on Computer Vision. OFA-Diffusion Compression: Compressing Diffusion Model in One-Shot Manner 6 8 10 12 14 16 18 20 MACs(G) 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5FID (10K samples) Uniform FID = 5.22 2.0 1.0 FID = 4.97 2.5 1.0 FID = 4.96 3.0 1.0 FID = 4.81 5.0 1.0 FI...
[4]

Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, and Song Han. 2020. Once-for-All: Train One Network and Specialize it for Efficient Deployment. In International Conference on Learning Representations

2020
[5]

Yu-Hui Chen, Raman Sarokin, Juhyun Lee, Jiuqiang Tang, Chuo-Ling Chang, Andrei Kulik, and Matthias Grundmann. 2023. Speed Is All You Need: On- Device Acceleration of Large Diffusion Models via GPU-Aware Optimizations. InConference on Computer Vision and Pattern Recognition

2023
[6]

Yunjey Choi, Youngjung Uh, Jaejun Yoo, and Jung-Woo Ha. 2020. StarGAN v2: Diverse Image Synthesis for Multiple Domains. InConference on Computer Vision and Pattern Recognition

2020
[7]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Im- ageNet: A large-scale hierarchical image database. InConference on Computer Vision and Pattern Recognition

2009
[8]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. InConference of the North American Chapter of the Association for Computational Linguistics

2019
[9]

Prafulla Dhariwal and Alexander Quinn Nichol. 2021. Diffusion Models Beat GANs on Image Synthesis. InAdvances in Neural Information Processing Systems

2021
[10]

Angela Fan, Edouard Grave, and Armand Joulin. 2020. Reducing Transformer Depth on Demand with Structured Dropout. InInternational Conference on Learning Representations

2020
[11]

arXiv preprint arXiv:2305.10924 (2023) 5

Gongfan Fang, Xinyin Ma, and Xinchao Wang. 2023.Structural Pruning for Diffusion Models. Preprint arXiv:2305.10924

work page arXiv 2023
[12]

Ligong Han, Yinxiao Li, Han Zhang, Peyman Milanfar, Dimitris Metaxas, and Feng Yang. 2023. Svdiff: Compact parameter space for diffusion fine-tuning. In International Conference on Computer Vision

2023
[13]

Song Han, Huizi Mao, and William J. Dally. 2016. Deep Compression: Compress- ing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding. InInternational Conference on Learning Representations

2016
[14]

Song Han, Jeff Pool, John Tran, and William J. Dally. 2015.Learning both Weights and Connections for Efficient Neural Networks. Preprint arXiv:1506.02626

work page Pith review arXiv 2015
[15]

Stork, and Gregory J

Babak Hassibi, David G. Stork, and Gregory J. Wolff. 1993. Optimal Brain Surgeon and general network pruning. InProceedings of International Conference on Neural Networks

1993
[16]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. InConference on Computer Vision and Pattern Recognition

2016
[17]

2023.EfficientDM: Efficient Quantization-A ware Fine-Tuning of Low-Bit Diffusion Models

Yefei He, Jing Liu, Weijia Wu, Hong Zhou, and Bohan Zhuang. 2023.EfficientDM: Efficient Quantization-A ware Fine-Tuning of Low-Bit Diffusion Models. Preprint arXiv:2310.03270

work page arXiv 2023
[18]

2023.PTQD: Accurate Post-Training Quantization for Diffusion Models

Yefei He, Luping Liu, Jing Liu, Weijia Wu, Hong Zhou, and Bohan Zhuang. 2023.PTQD: Accurate Post-Training Quantization for Diffusion Models. Preprint arXiv:2305.10657

work page arXiv 2023
[19]

Yihui He, Xiangyu Zhang, and Jian Sun. 2017. Channel Pruning for Accelerating Very Deep Neural Networks. InInternational Conference on Computer Vision

2017
[20]

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. GANs Trained by a Two Time-Scale Update Rule Con- verge to a Local Nash Equilibrium. InAdvances in Neural Information Processing Systems

2017
[21]

Distilling the Knowledge in a Neural Network

Geoffrey E. Hinton, Oriol Vinyals, and Jeffrey Dean. 2015.Distilling the Knowledge in a Neural Network. Preprint arXiv:1503.02531

work page internal anchor Pith review Pith/arXiv arXiv 2015
[22]

Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising Diffusion Probabilistic Models. InAdvances in Neural Information Processing Systems

2020
[23]

Lu Hou, Zhiqi Huang, Lifeng Shang, Xin Jiang, Xiao Chen, and Qun Liu. 2020. DynaBERT: Dynamic BERT with Adaptive Width and Depth. InAdvances in Neural Information Processing Systems

2020
[24]

Lu Hou, Quanming Yao, and James T. Kwok. 2017. Loss-aware Binarization of Deep Networks. InInternational Conference on Learning Representations

2017
[25]

Liang Hou, Zehuan Yuan, Lei Huang, Huawei Shen, Xueqi Cheng, and Changhu Wang. 2021. Slimmable Generative Adversarial Networks. InAAAI Conference on Artificial Intelligence

2021
[26]

Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. 2022. Elucidating the Design Space of Diffusion-Based Generative Models. InAdvances in Neural Information Processing Systems

2022
[27]

Tero Karras, Samuli Laine, and Timo Aila. 2019. A Style-Based Generator Archi- tecture for Generative Adversarial Networks. InConference on Computer Vision and Pattern Recognition

2019
[28]

2023.On Architectural Compression of Text-to-Image Diffusion Models

Bo-Kyeong Kim, Hyoung-Kyu Song, Thibault Castells, and Shinkook Choi. 2023.On Architectural Compression of Text-to-Image Diffusion Models. Preprint arXiv:2305.15798

work page arXiv 2023
[29]

Kingma and Jimmy Ba

Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Opti- mization. InInternational Conference on Learning Representations

2015
[30]

2009.Learning multiple layers of features from tiny images

Alex Krizhevsky. 2009.Learning multiple layers of features from tiny images. Technical Report

2009
[31]

Denker, and Sara A

Yann LeCun, John S. Denker, and Sara A. Solla. 1989. Optimal Brain Damage. In Advances in Neural Information Processing Systems

1989
[32]

2023.Q-Diffusion: Quantizing Diffusion Models

Xiuyu Li, Long Lian, Yijiang Liu, Huanrui Yang, Zhen Dong, Daniel Kang, Shang- hang Zhang, and Kurt Keutzer. 2023.Q-Diffusion: Quantizing Diffusion Models. Preprint arXiv:2302.04304

work page arXiv 2023
[33]

2023.SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds

Yanyu Li, Huan Wang, Qing Jin, Ju Hu, Pavlo Chemerys, Yun Fu, Yanzhi Wang, Sergey Tulyakov, and Jian Ren. 2023.SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds. Preprint arXiv:2306.00980

work page arXiv 2023
[34]

Ji Lin, Richard Zhang, Frieder Ganz, Song Han, and Jun-Yan Zhu. 2021. Anycost GANs for Interactive Image Synthesis and Editing. InConference on Computer Vision and Pattern Recognition

2021
[35]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. InEuropean conference on computer vision

2014
[36]

Enshu Liu, Xuefei Ning, Zinan Lin, Huazhong Yang, and Yu Wang. 2023. OMS- DPM: Optimizing the Model Schedule for Diffusion Probabilistic Models. In International Conference on Machine Learning

2023
[37]

Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep learning face attributes in the wild. InInternational Conference on Computer Vision

2015
[38]

Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. InInternational Conference on Learning Representations

2019
[39]

Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. 2022. DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps. InAdvances in Neural Information Processing Systems

2022
[40]

Kingma, Stefano Ermon, Jonathan Ho, and Tim Salimans

Chenlin Meng, Robin Rombach, Ruiqi Gao, Diederik P. Kingma, Stefano Ermon, Jonathan Ho, and Tim Salimans. 2023. On Distillation of Guided Diffusion Models. InConference on Computer Vision and Pattern Recognition

2023
[41]

Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, and Jan Kautz. 2017. Pruning Convolutional Neural Networks for Resource Efficient Inference. In International Conference on Learning Representations

2017
[42]

2013.Stochastic differential equations: an introduction with applications

Bernt Oksendal. 2013.Stochastic differential equations: an introduction with applications. Springer Science & Business Media

2013
[43]

Zizheng Pan, Jianfei Cai, and Bohan Zhuang. 2023. Stitchable Neural Networks. InConference on Computer Vision and Pattern Recognition

2023
[44]

William Peebles and Saining Xie. 2023. Scalable Diffusion Models with Trans- formers. InIEEE/CVF International Conference on Computer Vision

2023
[45]

Hierarchical Text-Conditional Image Generation with CLIP Latents

Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. 2022.Hierarchical Text-Conditional Image Generation with CLIP Latents. Preprint arXiv:2204.06125

work page internal anchor Pith review arXiv 2022
[46]

Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Net- works. InEuropean Conference on Computer Vision

2016
[47]

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-resolution image synthesis with latent diffusion models. In Conference on Computer Vision and Pattern Recognition

2022
[48]

Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. 2015. FitNets: Hints for Thin Deep Nets. In International Conference on Learning Representations. Haoyang Jiang, Zekun Wang, Mingyang Yi, Xiuyu Li, Lanqing Hu, Junxian Cai, Qingbin Liu, Xi Chen, and Ju Fan

2015
[49]

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional Networks for Biomedical Image Segmentation. InMedical Image Computing and Computer-Assisted Intervention

2015
[50]

Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. 2023. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. InConference on Computer Vision and Pattern Recognition

2023
[51]

Tim Salimans and Jonathan Ho. 2022. Progressive Distillation for Fast Sampling of Diffusion Models. InInternational Conference on Learning Representations

2022
[52]

Tim Salimans, Andrej Karpathy, Xi Chen, and Diederik P. Kingma. 2017. Pixel- CNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications. InInternational Conference on Learning Representations

2017
[53]

Yuzhang Shang, Zhihang Yuan, Bin Xie, Bingzhe Wu, and Yan Yan. 2023. Post- Training Quantization on Diffusion Models. InConference on Computer Vision and Pattern Recognition

2023
[54]

Weiss, Niru Maheswaranathan, and Surya Ganguli

Jascha Sohl-Dickstein, Eric A. Weiss, Niru Maheswaranathan, and Surya Ganguli
[55]

In International Conference on Machine Learning

Deep Unsupervised Learning using Nonequilibrium Thermodynamics. In International Conference on Machine Learning
[56]

Jiaming Song, Chenlin Meng, and Stefano Ermon. 2021. Denoising Diffusion Implicit Models. InInternational Conference on Learning Representations

2021
[57]

Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. 2023. Consistency Models. InInternational Conference on Machine Learning

2023
[58]

Yang Song and Stefano Ermon. 2019. Generative Modeling by Estimating Gra- dients of the Data Distribution. InAdvances in Neural Information Processing Systems

2019
[59]

Kingma, Abhishek Kumar, Ste- fano Ermon, and Ben Poole

Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Ste- fano Ermon, and Ben Poole. 2021. Score-Based Generative Modeling through Stochastic Differential Equations. InInternational Conference on Learning Repre- sentations

2021
[60]

Chaofan Tao, Lu Hou, Haoli Bai, Jiansheng Wei, Xin Jiang, Qun Liu, Ping Luo, and Ngai Wong. 2023. Structured Pruning for Efficient Generative Pre-trained Language Models. InFindings of the Association for Computational Linguistics

2023
[61]

Gomez, Lukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. InAdvances in Neural Information Processing Systems

2017
[62]

Zekun Wang, Mingyang Yi, Shuchen Xue, Zhenguo Li, Ming Liu, Bing Qin, and Zhi-Ming Ma. 2025. Improved Diffusion-based Generative Model with Better Adversarial Robustness. InInternational Conference on Learning Representations

2025
[63]

Yuxin Wu and Kaiming He. 2018. Group Normalization. InEuropean Conference on Computer Vision

2018
[64]

Mengzhou Xia, Zexuan Zhong, and Danqi Chen. 2022. Structured Pruning Learns Compact and Accurate Models. InAnnual Meeting of the Association for Computational Linguistics

2022
[65]

Shuchen Xue, Mingyang Yi, Weijian Luo, Shifeng Zhang, Jiacheng Sun, Zhenguo Li, and Zhi-Ming Ma. 2023. SA-Solver: Stochastic Adams Solver for Fast Sampling of Diffusion Models. InAdvances in Neural Information Processing Systems

2023
[66]

Mingyang Yi, Aoxue Li, Yi Xin, and Zhenguo Li. 2024. Towards Understanding the Working Mechanism of Text-to-Image Diffusion Model. InConference on Neural Information Processing Systems

2024
[67]

2023.On the generalization of diffusion model

Mingyang Yi, Jiacheng Sun, and Zhenguo Li. 2023.On the generalization of diffusion model. Preprint

2023
[68]

Jiahui Yu and Thomas S. Huang. 2019. Universally Slimmable Networks and Improved Training Techniques. InInternational Conference on Computer Vision

2019
[69]

Jiahui Yu, Linjie Yang, Ning Xu, Jianchao Yang, and Thomas S. Huang. 2019. Slimmable Neural Networks. InInternational Conference on Learning Representa- tions

2019
[70]

Qingru Zhang, Simiao Zuo, Chen Liang, Alexander Bukharin, Pengcheng He, Weizhu Chen, and Tuo Zhao. 2022. PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance. InInternational Conference on Machine Learning

2022
[71]

In: Advances in Neural Informa- tion Processing Systems (NeurIPS) (2023),https://arxiv.org/abs/2302.04867

Wenliang Zhao, Lujia Bai, Yongming Rao, Jie Zhou, and Jiwen Lu. 2023.UniPC: A Unified Predictor-Corrector Framework for Fast Sampling of Diffusion Models. Preprint arXiv:2302.04867. OFA-Diffusion Compression: Compressing Diffusion Model in One-Shot Manner Block C o n v 1 M o d u l e C o n v 2 M o d u l e A d a p t i v e W i d t h U-Net Architecture Block ...

work page arXiv 2023