Recognition: 2 theorem links
· Lean TheoremElucidating Representation Degradation Problem in Diffusion Model Training
Pith reviewed 2026-05-12 04:04 UTC · model grok-4.3
The pith
Representation degradation in diffusion models arises from mismatched recoverability at high noise levels and is corrected by dynamically reallocating optimization effort in a plug-and-play framework.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that training instability in diffusion models stems from mismatched target recoverability, which manifests as Neural Tangent Kernel spectral weakening and effective low-rank behavior; Elucidated Representation Diffusion corrects this by dynamically reallocating optimization effort according to each sample's effective recoverability, thereby stabilizing representation learning, accelerating convergence, and improving performance across backbones without external supervision.
What carries the argument
Elucidated Representation Diffusion (ERD), a plug-and-play optimizer that reallocates training effort according to effective recoverability at each noise level.
If this is right
- Training reaches stable representations faster because effort concentrates on recoverable signals.
- Generation quality improves across diffusion backbones without added supervision or architectural changes.
- The same reallocation rule can be inserted into existing training pipelines as a drop-in module.
- Convergence acceleration holds when the framework is applied to varied noise schedules and model sizes.
Where Pith is reading between the lines
- Similar recoverability mismatches may appear in other score-based or flow-matching generative models that use noise schedules.
- The approach could reduce the total compute needed for large-scale diffusion pre-training by shortening the unstable early phase.
- Monitoring NTK spectrum or rank during training might serve as a diagnostic for when reallocation is required.
Load-bearing premise
The observed instability is caused by mismatched recoverability between the model and its training targets at different noise levels.
What would settle it
Train identical diffusion backbones with and without the ERD reallocation rule, then compare the rate of structural distortion in outputs and the number of steps needed to reach target FID at high noise levels.
Figures
read the original abstract
Diffusion models have achieved remarkable success, yet their training remains inefficient due to a severe optimization bottleneck, which we term Representation Degradation. As noise levels increase, the outputs of the trained model exhibit progressive structural distortion, which can destabilize training and impair generation quality. Our analysis suggests that this instability is driven by mismatched target recoverability, which is associated with Neural Tangent Kernel (NTK) spectral weakening and effective low-rank behavior. To address this, we propose Elucidated Representation Diffusion (ERD), a plug-and-play framework that dynamically reallocates optimization effort according to effective recoverability. By stabilizing representation learning without external supervision, ERD accelerates convergence and achieves strong empirical performance across diffusion backbones.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript identifies a Representation Degradation problem in diffusion model training, where model outputs exhibit progressive structural distortion as noise levels increase, destabilizing training and impairing generation quality. The authors suggest this stems from mismatched target recoverability, which they associate with Neural Tangent Kernel (NTK) spectral weakening and effective low-rank behavior. They introduce Elucidated Representation Diffusion (ERD), a plug-and-play framework that dynamically reallocates optimization effort according to effective recoverability to stabilize representation learning without external supervision, accelerate convergence, and deliver strong empirical performance across diffusion backbones.
Significance. If the causal link between mismatched recoverability, NTK weakening, and degradation is rigorously demonstrated, and ERD is shown via controlled experiments to specifically counteract this mechanism rather than provide generic stabilization, the work could meaningfully advance efficient training of diffusion models central to generative AI. The plug-and-play design is a practical strength, but the current lack of supporting analysis limits the assessed impact.
major comments (2)
- [Abstract] Abstract: The claim that instability is 'driven by' mismatched target recoverability 'associated with' NTK spectral weakening and low-rank behavior is presented without any derivations, spectral analysis, equations, or ablation studies isolating this from confounders such as gradient variance growth, noise scheduling, or batch statistics. This association is load-bearing for the ERD construction, yet remains correlational based on the provided text.
- [Abstract] Abstract: The assertions of accelerated convergence and 'strong empirical performance across diffusion backbones' are made without quantitative results, baselines, error bars, tables, figures, or experimental details, preventing assessment of effect sizes, statistical significance, or reproducibility.
minor comments (1)
- The abstract relies on suggestive phrasing ('suggests', 'associated with') that should be replaced with precise statements once the full analysis is presented.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below, clarifying the relationship between the abstract and the supporting analysis in the full paper. We will revise the abstract to better link claims to the detailed evidence provided in the body of the work.
read point-by-point responses
-
Referee: [Abstract] Abstract: The claim that instability is 'driven by' mismatched target recoverability 'associated with' NTK spectral weakening and low-rank behavior is presented without any derivations, spectral analysis, equations, or ablation studies isolating this from confounders such as gradient variance growth, noise scheduling, or batch statistics. This association is load-bearing for the ERD construction, yet remains correlational based on the provided text.
Authors: We appreciate this observation regarding the abstract. The abstract is a concise summary; the full manuscript contains the requested derivations, NTK spectral analysis, equations, and ablation studies that isolate the recoverability mismatch from the listed confounders (detailed in Sections 3.2–3.4 and 4.1–4.2). These sections demonstrate the association through both theoretical analysis and controlled experiments. To address the concern that the abstract does not sufficiently indicate this support, we will revise the abstract to include brief references to the relevant sections and to more precisely characterize the nature of the association as supported by our analysis rather than purely correlational. revision: yes
-
Referee: [Abstract] Abstract: The assertions of accelerated convergence and 'strong empirical performance across diffusion backbones' are made without quantitative results, baselines, error bars, tables, figures, or experimental details, preventing assessment of effect sizes, statistical significance, or reproducibility.
Authors: Thank you for noting this. The abstract summarizes the empirical outcomes, while the full manuscript reports the quantitative results, including baselines, error bars, tables, figures, effect sizes, and full experimental details with reproducibility information across multiple diffusion backbones (presented in Section 5, with additional controls in the appendix). We agree the abstract could better convey the strength of these results. We will revise the abstract to incorporate key quantitative highlights (e.g., convergence speedups and performance metrics) or explicit pointers to Section 5. revision: yes
Circularity Check
No circularity in derivation chain
full rationale
The abstract and visible text present the core claims as empirical observations ('analysis suggests', 'associated with') rather than a mathematical derivation. No equations, fitted parameters renamed as predictions, self-definitional loops, or load-bearing self-citations appear. The ERD framework is introduced as a plug-and-play reallocation method grounded in the observed degradation pattern, without reducing the central premise to its own inputs by construction. The paper is therefore self-contained against external benchmarks, with no identifiable circular steps.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclearOur analysis suggests that this instability is driven by mismatched target recoverability, which is associated with Neural Tangent Kernel (NTK) spectral weakening and effective low-rank behavior. ... ERD sets w⋆_y(λ) ∝ ω_y(λ)
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanabsolute_floor_iff_bare_distinguishability unclearTheorem 3.8 (Spectral Local ELBO Bound) ... modes with 2M(λ)κ_λ^j > γ contract rapidly
Reference graph
Works this paper leans on
-
[1]
Arash Ahmadian, Saurabh Dash, Hongyu Chen, Bharat Venkitesh, Zhen Stephen Gou, Phil Blunsom, Ahmet Üstün, and Sara Hooker. Intriguing properties of quantization at scale.Advances in Neural Information Processing Systems, 36:34278–34294, 2023. 8
work page 2023
-
[2]
All are worth words: A vit backbone for diffusion models
Fan Bao, Shen Nie, Kaiwen Xue, Yue Cao, Chongxuan Li, Hang Su, and Jun Zhu. All are worth words: A vit backbone for diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 22669–22679, 2023. 1, 7, 8, 9, 33, 35
work page 2023
-
[3]
Perception prioritized training of diffusion models
Jooyoung Choi, Jungbeom Lee, Chaehun Shin, Sungwon Kim, Hyunwoo Kim, and Sungroh Yoon. Perception prioritized training of diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11472–11481, 2022. 2, 6, 8, 9
work page 2022
-
[4]
Xiangxiang Chu, Renda Li, and Yong Wang. Usp: Unified self-supervised pretraining for image generation and understanding.arXiv preprint arXiv:2503.06132, 2025. 9
-
[5]
Imagenet: A large-scale hierarchical image database
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. InIEEE Conference on Computer Vision and Pattern Recognition, 2009. 7
work page 2009
-
[6]
Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis.Advances in neural information processing systems, 34:8780–8794, 2021. 7, 9, 33
work page 2021
-
[7]
Jun Gao, Di He, Xu Tan, Tao Qin, Liwei Wang, and Tie-Yan Liu. Representation degeneration problem in training natural language generation models.arXiv preprint arXiv:1907.12009, 2019. 1
-
[8]
Masked diffusion transformer is a strong image synthesizer
Shanghua Gao, Pan Zhou, Ming-Ming Cheng, and Shuicheng Yan. Mdtv2: Masked diffusion transformer is a strong image synthesizer.arXiv preprint arXiv:2303.14389, 2023. 9
-
[9]
Neural Network-Based Score Estimation in Diffusion Models: Optimization and Generalization
Yinbin Han, Meisam Razaviyayn, and Renyuan Xu. Neural network-based score estimation in diffusion models: Optimization and generalization.arXiv preprint arXiv:2401.15604, 2024. 9
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[10]
Improved noise schedule for diffusion training
Tiankai Hang and Shuyang Gu. Improved noise schedule for diffusion training.arXiv preprint arXiv:2407.03297, 2024. 1, 9
-
[11]
Efficient diffusion training via min-snr weighting strategy
Tiankai Hang, Shuyang Gu, Chen Li, Jianmin Bao, Dong Chen, Han Hu, Xin Geng, and Baining Guo. Efficient diffusion training via min-snr weighting strategy. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 7441–7451, 2023. 1, 6, 8, 9
work page 2023
-
[12]
Diffit: Diffusion vision transformers for image generation
Ali Hatamizadeh, Jiaming Song, Guilin Liu, Jan Kautz, and Arash Vahdat. Diffit: Diffusion vision transformers for image generation. InEuropean Conference on Computer Vision, pages 37–55. Springer,
-
[13]
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium.Advances in neural information processing systems, 30, 2017. 7, 33
work page 2017
-
[14]
Classifier-Free Diffusion Guidance
Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance.arXiv preprint arXiv:2207.12598, 2022. 7
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[15]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020. 1, 7, 28
work page 2020
-
[16]
Jonathan Ho, Chitwan Saharia, William Chan, David J Fleet, Mohammad Norouzi, and Tim Salimans. Cascaded diffusion models for high fidelity image generation.Journal of Machine Learning Research, 23 (47):1–33, 2022. 9
work page 2022
-
[17]
Video diffusion models.Advances in Neural Information Processing Systems, 35:8633–8646, 2022
Jonathan Ho, Tim Salimans, Alexey Gritsenko, William Chan, Mohammad Norouzi, and David J Fleet. Video diffusion models.Advances in Neural Information Processing Systems, 35:8633–8646, 2022. 1 10
work page 2022
-
[18]
simple diffusion: End-to-end diffusion for high resolution images
Emiel Hoogeboom, Jonathan Heek, and Tim Salimans. simple diffusion: End-to-end diffusion for high resolution images. InInternational Conference on Machine Learning, pages 13213–13232. PMLR, 2023. 9
work page 2023
-
[19]
Arthur Jacot, Franck Gabriel, and Clément Hongler. Neural tangent kernel: Convergence and generalization in neural networks.Advances in neural information processing systems, 31, 2018. 1, 3, 9
work page 2018
-
[20]
Dengyang Jiang, Mengmeng Wang, Liuzhuozheng Li, Lei Zhang, Haoyu Wang, Wei Wei, Guang Dai, Yanning Zhang, and Jingdong Wang. No other representation component is needed: Diffusion transformers can provide representation guidance by themselves.arXiv preprint arXiv:2505.02831, 2025. 9, 35
-
[21]
Li Jing, Pascal Vincent, Yann LeCun, and Yuandong Tian. Understanding dimensional collapse in contrastive self-supervised learning.arXiv preprint arXiv:2110.09348, 2021. 2
-
[22]
Understanding dimensional collapse in contrastive self-supervised learning
Li Jing, Pascal Vincent, Yann LeCun, and Yuandong Tian. Understanding dimensional collapse in contrastive self-supervised learning. InInternational Conference on Learning Representations, 2022. 9
work page 2022
-
[23]
Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models.Advances in neural information processing systems, 35:26565–26577, 2022. 1, 7, 33
work page 2022
-
[24]
Analyzing and improving the training dynamics of diffusion models
Tero Karras, Miika Aittala, Jaakko Lehtinen, Janne Hellsten, Timo Aila, and Samuli Laine. Analyzing and improving the training dynamics of diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 24174–24184, 2024. 9, 33
work page 2024
-
[25]
Diederik Kingma and Ruiqi Gao. Understanding diffusion objectives as the elbo with simple data augmentation.Advances in Neural Information Processing Systems, 36:65484–65516, 2023. 1, 2, 9, 28
work page 2023
-
[26]
Variational diffusion models.Advances in neural information processing systems, 34:21696–21707, 2021
Diederik Kingma, Tim Salimans, Ben Poole, and Jonathan Ho. Variational diffusion models.Advances in neural information processing systems, 34:21696–21707, 2021. 2, 9, 28
work page 2021
-
[27]
Adam: A Method for Stochastic Optimization
Diederik P Kingma. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980, 2014. 7, 8, 33
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[28]
Diffwave: A versatile diffusion model for audio synthesis.arXiv preprint arXiv:2009.09761, 2020
Zhifeng Kong, Wei Ping, Jiaji Huang, Kexin Zhao, and Bryan Catanzaro. Diffwave: A versatile diffusion model for audio synthesis.arXiv preprint arXiv:2009.09761, 2020. 1
-
[29]
Tuomas Kynkäänniemi, Tero Karras, Samuli Laine, Jaakko Lehtinen, and Timo Aila. Improved precision and recall metric for assessing generative models.Advances in neural information processing systems, 32,
-
[30]
Applying guidance in a limited interval improves sample and distribution quality in diffusion models
Tuomas Kynkäänniemi, Miika Aittala, Tero Karras, Samuli Laine, Timo Aila, and Jaakko Lehtinen. Applying guidance in a limited interval improves sample and distribution quality in diffusion models.arXiv preprint arXiv:2404.07724, 2024. 9, 35
-
[31]
Wide neural networks of any depth evolve as linear models under gradient descent
Jaehoon Lee, Lechao Xiao, Samuel Schoenholz, Yasaman Bahri, Roman Novak, Jascha Sohl-Dickstein, and Jeffrey Pennington. Wide neural networks of any depth evolve as linear models under gradient descent. Advances in neural information processing systems, 32, 2019. 9
work page 2019
-
[32]
Repa-e: Unlocking vae for end-to-end tuning with latent diffusion transformers, 2025
Xingjian Leng, Jaskirat Singh, Yunzhong Hou, Zhenchang Xing, Saining Xie, and Liang Zheng. Repa-e: Unlocking vae for end-to-end tuning with latent diffusion transformers.arXiv preprint arXiv:2504.10483,
-
[33]
Wenhao Li, Xiu Su, Shan You, Tao Huang, Fei Wang, Chen Qian, and Chang Xu. Not all steps are equal: Efficient generation with progressive diffusion models.arXiv preprint arXiv:2312.13307, 2023. 9
-
[34]
Xiang Li, John Thickstun, Ishaan Gulrajani, Percy S Liang, and Tatsunori B Hashimoto. Diffusion-lm improves controllable text generation.Advances in Neural Information Processing Systems, 35:4328–4343,
-
[35]
Xiao Li, Zekai Zhang, Xiang Li, Siyi Chen, Zhihui Zhu, Peng Wang, and Qing Qu. Understanding repre- sentation dynamics of diffusion models via low-dimensional modeling.arXiv preprint arXiv:2502.05743,
-
[36]
Flow Matching for Generative Modeling
Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling.arXiv preprint arXiv:2210.02747, 2022. 4, 28
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[37]
Deep learning face attributes in the wild
Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. In Proceedings of the IEEE international conference on computer vision, pages 3730–3738, 2015. 7 11
work page 2015
-
[38]
Decoupled Weight Decay Regularization
Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.arXiv preprint arXiv:1711.05101, 2017. 7, 33
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[39]
Calvin Luo. Understanding diffusion models: A unified perspective.arXiv preprint arXiv:2208.11970,
-
[40]
Sit: Exploring flow and diffusion-based generative models with scalable interpolant transformers
Nanye Ma, Mark Goldstein, Michael S Albergo, Nicholas M Boffi, Eric Vanden-Eijnden, and Saining Xie. Sit: Exploring flow and diffusion-based generative models with scalable interpolant transformers. In European Conference on Computer Vision, pages 23–40. Springer, 2024. 9, 28, 34, 35
work page 2024
- [41]
-
[42]
Improved denoising diffusion probabilistic models
Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. In International conference on machine learning, pages 8162–8171. PMLR, 2021. 8, 9, 33
work page 2021
-
[43]
Scalable diffusion models with transformers
William Peebles and Saining Xie. Scalable diffusion models with transformers. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 4195–4205, 2023. 1, 7, 8, 9, 33, 34, 35
work page 2023
-
[44]
On the spectral bias of neural networks
Nasim Rahaman, Aristide Baratin, Devansh Arpit, Felix Draxler, Min Lin, Fred Hamprecht, Yoshua Bengio, and Aaron Courville. On the spectral bias of neural networks. InInternational conference on machine learning, pages 5301–5310. PMLR, 2019. 2, 9
work page 2019
-
[45]
High-resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022. 7, 8, 9, 33
work page 2022
-
[46]
U-net: Convolutional networks for biomedical image segmentation
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. InMedical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18, pages 234–241. Springer, 2015. 7
work page 2015
-
[47]
Progressive Distillation for Fast Sampling of Diffusion Models
Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models.arXiv preprint arXiv:2202.00512, 2022. 4, 8, 28
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[48]
Improved techniques for training gans.Advances in neural information processing systems, 29, 2016
Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training gans.Advances in neural information processing systems, 29, 2016. 7, 34
work page 2016
-
[49]
arXiv preprint arXiv:2510.15301 (2025)
Minglei Shi, Haolin Wang, Wenzhao Zheng, Ziyang Yuan, Xiaoshi Wu, Xintao Wang, Pengfei Wan, Jie Zhou, and Jiwen Lu. Latent diffusion model without variational autoencoder.arXiv preprint arXiv:2510.15301, 2025. 9
-
[50]
Deep unsupervised learning using nonequilibrium thermodynamics
Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. InInternational conference on machine learning, pages 2256–2265. PMLR, 2015. 1
work page 2015
-
[51]
Denoising Diffusion Implicit Models
Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models.arXiv preprint arXiv:2010.02502, 2020. 8
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[52]
Score-Based Generative Modeling through Stochastic Differential Equations
Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations.arXiv preprint arXiv:2011.13456, 2020. 1, 7, 28
work page internal anchor Pith review Pith/arXiv arXiv 2011
-
[53]
Rethinking the inception architecture for computer vision
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 2818–2826, 2016. 33
work page 2016
-
[54]
Neural discrete representation learning.Advances in neural information processing systems, 30, 2017
Aaron Van Den Oord, Oriol Vinyals, et al. Neural discrete representation learning.Advances in neural information processing systems, 30, 2017. 7
work page 2017
-
[55]
Kai Wang, Yukun Zhou, Mingjia Shi, Zhihang Yuan, Yuzhang Shang, Xiaojiang Peng, Hanwang Zhang, and Yang You. A closer look at time steps is worthy of triple speed-up for diffusion model training.arXiv preprint arXiv:2405.17403, 2024. 1, 6, 9
-
[56]
Ge Wu, Shen Zhang, Ruijing Shi, Shanghua Gao, Zhenyuan Chen, Lei Wang, Zhaowei Chen, Hongcheng Gao, Yao Tang, Jian Yang, et al. Representation entanglement for generation: Training diffusion transform- ers is much easier than you think.arXiv preprint arXiv:2507.01467, 2025. 9
-
[57]
Zike Wu, Pan Zhou, Kenji Kawaguchi, and Hanwang Zhang. Fast diffusion model, 2023. 9 12
work page 2023
-
[58]
Tianshuo Xu, Peng Mi, Ruilin Wang, and Yingcong Chen. Towards faster training of diffusion models: An inspiration of a consistency phenomenon.arXiv preprint arXiv:2404.07946, 2024. 9
-
[59]
Jingfeng Yao, Wang Cheng, Wenyu Liu, and Xinggang Wang. Fasterdit: Towards faster diffusion transformers training without architecture modification.arXiv preprint arXiv:2410.10356, 2024. 9
-
[60]
Jingfeng Yao, Bin Yang, and Xinggang Wang. Reconstruction vs. generation: Taming optimization dilemma in latent diffusion models. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 15703–15712, 2025. 8, 9
work page 2025
-
[61]
Debias the training of diffusion models.arXiv preprint arXiv:2310.08442, 2023
Hu Yu, Li Shen, Jie Huang, Man Zhou, Hongsheng Li, and Feng Zhao. Debias the training of diffusion models.arXiv preprint arXiv:2310.08442, 2023. 6, 9
-
[62]
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Sihyun Yu, Sangkyung Kwak, Huiwon Jang, Jongheon Jeong, Jonathan Huang, Jinwoo Shin, and Saining Xie. Representation alignment for generation: Training diffusion transformers is easier than you think. arXiv preprint arXiv:2410.06940, 2024. 1, 8, 9
work page internal anchor Pith review arXiv 2024
-
[63]
Diffusion Transformers with Representation Autoencoders
Boyang Zheng, Nanye Ma, Shengbang Tong, and Saining Xie. Diffusion transformers with representation autoencoders.arXiv preprint arXiv:2510.11690, 2025. 9
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[64]
Hongkai Zheng, Weili Nie, Arash Vahdat, and Anima Anandkumar. Fast training of diffusion models with masked transformers.arXiv preprint arXiv:2306.09305, 2023. 9
-
[65]
Non-uniform timestep sampling: Towards faster diffusion model training
Tianyi Zheng, Cong Geng, Peng-Tao Jiang, Ben Wan, Hao Zhang, Jinwei Chen, Jia Wang, and Bo Li. Non-uniform timestep sampling: Towards faster diffusion model training. InACM Multimedia 2024, 2024. 1, 6, 9
work page 2024
-
[66]
Beta-tuned timestep diffusion model
Tianyi Zheng, Peng-Tao Jiang, Ben Wan, Hao Zhang, Jinwei Chen, Jia Wang, and Bo Li. Beta-tuned timestep diffusion model. InEuropean Conference on Computer Vision, 2024. 1, 9
work page 2024
-
[67]
3d shape generation and completion through point-voxel diffusion
Linqi Zhou, Yilun Du, and Jiajun Wu. 3d shape generation and completion through point-voxel diffusion. InProceedings of the IEEE/CVF international conference on computer vision, pages 5826–5835, 2021. 1
work page 2021
-
[68]
Rui Zhu, Yingwei Pan, Yehao Li, Ting Yao, Zhenglong Sun, Tao Mei, and Chang Wen Chen. Sd-dit: Unleashing the power of self-supervised discrimination in diffusion transformer. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8435–8445, 2024. 8, 9 13 Elucidating Representation Degradation Problem in Diffusion Model ...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.