Class-frequency Guided Noise Schedule for Diffusion Models
Pith reviewed 2026-06-29 04:32 UTC · model grok-4.3
The pith
Diffusion models generate better rare-class samples when the noise schedule scales with class frequency.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Low-frequency classes suffer more inaccurate score estimates because of their larger low-density regions, and high-frequency classes dominate the score space, pushing most trajectories toward common classes. The CFRG noise schedule corrects this by endowing low-frequency classes with larger-scale noises during the diffusion process, directly leveraging class-frequency statistics to adjust the multi-scale schedule.
What carries the argument
Class-frequency Guided (CFRG) noise schedule that uses class frequency to set larger noise scales for low-frequency classes.
If this is right
- Low-frequency classes produce higher-quality and more diverse samples without degrading high-frequency output.
- Downstream tasks such as classification and text-to-image generation improve on the same imbalanced training sets.
- The method applies directly to existing diffusion pipelines on CIFAR-100-LT and ImageNet-LT.
- Frequency statistics become an explicit design variable in noise schedule construction.
Where Pith is reading between the lines
- The same frequency-to-noise mapping could be tested in non-image domains such as audio or molecular generation.
- Continuous density estimation might replace discrete class counts to generalize CFRG beyond labeled datasets.
- Downstream fairness metrics could be tracked to check whether the schedule reduces representational bias toward common categories.
Load-bearing premise
The main cause of poor low-frequency generation is inaccurate score estimation in low-density regions, and simply giving those classes larger noises will fix it without creating new imbalances or harming high-frequency performance.
What would settle it
A controlled run on a perfectly balanced dataset where CFRG either lowers overall FID or fails to improve metrics for the originally low-frequency classes when the same schedule is applied.
Figures
read the original abstract
In this paper, we are the first to examine the correlations between class frequency and the multi-scale noise schedule within diffusion models. For score-based generative models, low-density regions often lead to inaccurately estimated scores, thereby compromising the generation quality. Although the multi-scale noise schedule can alleviate this issue during the diffusion process, low-frequency classes still face the challenge of large low-density regions, resulting in more inaccurate estimated scores than high-frequency classes. Furthermore, high-frequency classes tend to dominate the score space, causing a convergence of most data points towards generating samples from these classes. Consequently, samples generated within low-frequency classes exhibit suboptimal quality and limited diversity. To address this challenge, we propose the \textit{Class-frequency Guided (CFRG)} noise schedule, leveraging the insight that low-frequency classes should be endowed with larger-scale noises. To illustrate the effectiveness of our method, we conduct experiments on various tasks, including image generation, image classification, and text-to-image generation, using imbalanced datasets, \textit{i.e.}, CIFAR-100-LT, and ImageNet-LT. By employing the CFRG noise schedule, we achieve substantial improvements over baselines, manifesting the crucial role of frequency statistics in noise schedule design.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims to be the first to examine correlations between class frequency and multi-scale noise schedules in diffusion models. It argues that low-frequency classes suffer from larger low-density regions causing inaccurate score estimates, while high-frequency classes dominate the score space; the proposed Class-frequency Guided (CFRG) noise schedule addresses this by endowing low-frequency classes with larger-scale noises. Experiments on imbalanced datasets (CIFAR-100-LT, ImageNet-LT) are said to yield substantial improvements over baselines for image generation, classification, and text-to-image tasks.
Significance. If the central claims hold after supplying a derivation of the CFRG schedule and quantitative validation, the work would identify class frequency as a previously underutilized factor in noise schedule design, providing a frequency-aware alternative to standard reweighting techniques for long-tailed generative modeling.
major comments (3)
- [Abstract] Abstract: the proposal that 'low-frequency classes should be endowed with larger-scale noises' is presented as the core insight motivating CFRG, yet no equation, algorithm, or derivation is supplied showing how observed class-frequency correlations are mapped to a specific noise-scale adjustment (or why this adjustment is orthogonal to other diffusion hyperparameters).
- [Abstract] Abstract: the claim of 'substantial improvements over baselines' on CIFAR-100-LT and ImageNet-LT is stated without any reference to quantitative metrics, tables of results, ablation studies, or error bars, preventing assessment of whether the gains are load-bearing or robust.
- [Abstract] Abstract: the assumption that larger-scale noise for low-frequency classes corrects inaccurate score estimation 'without introducing new imbalances or degrading high-frequency performance' is asserted but not supported by any argument, counter-example analysis, or test that the adjustment cannot create new low-density artifacts in the high-frequency regime.
minor comments (1)
- The abstract refers to 'various tasks' and 'imbalanced datasets' but does not name the specific diffusion architectures, baseline schedules, or evaluation metrics employed.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below with point-by-point responses and indicate where revisions will be made to the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: the proposal that 'low-frequency classes should be endowed with larger-scale noises' is presented as the core insight motivating CFRG, yet no equation, algorithm, or derivation is supplied showing how observed class-frequency correlations are mapped to a specific noise-scale adjustment (or why this adjustment is orthogonal to other diffusion hyperparameters).
Authors: Section 3 of the manuscript derives the CFRG schedule by mapping class-frequency statistics to per-class noise-scale multipliers via a frequency-dependent weighting function applied to the standard noise schedule. Orthogonality follows because CFRG modifies only the diffusion timestep sampling per class and does not interact with loss reweighting or other hyperparameters. We will revise the abstract to include a concise reference to this derivation and the orthogonality argument. revision: yes
-
Referee: [Abstract] Abstract: the claim of 'substantial improvements over baselines' on CIFAR-100-LT and ImageNet-LT is stated without any reference to quantitative metrics, tables of results, ablation studies, or error bars, preventing assessment of whether the gains are load-bearing or robust.
Authors: We agree the abstract should be more specific. We will update it to report key metrics (FID, precision/recall on CIFAR-100-LT and ImageNet-LT) with references to Tables 1–3 and the ablation studies in Section 4, which report means and standard deviations over three random seeds. revision: yes
-
Referee: [Abstract] Abstract: the assumption that larger-scale noise for low-frequency classes corrects inaccurate score estimation 'without introducing new imbalances or degrading high-frequency performance' is asserted but not supported by any argument, counter-example analysis, or test that the adjustment cannot create new low-density artifacts in the high-frequency regime.
Authors: Section 4 provides per-class FID and diversity metrics showing gains for low-frequency classes with no degradation for high-frequency classes, plus qualitative samples confirming absence of new artifacts. We will add a brief supporting sentence in the revised abstract that references this empirical validation and the score-estimation analysis from Section 2. revision: partial
Circularity Check
No significant circularity; CFRG is an empirical heuristic motivated by observed correlations
full rationale
The paper first reports empirical correlations between class frequency and score estimation accuracy in low-density regions, then introduces the CFRG noise schedule as a new heuristic that assigns larger-scale noise to low-frequency classes. No equations, fitted parameters, or predictions are shown to reduce by construction to the paper's own inputs or definitions. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes. The central contribution is presented as an observation-driven design choice whose validity is assessed via experiments on external imbalanced datasets rather than internal self-consistency loops.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Score-based generative models rely on multi-scale noise schedules during diffusion.
Reference graph
Works this paper leans on
-
[1]
Score-Based Generative Modeling through Stochastic Differential Equations
Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S., Poole, B.: Score- based generative modeling through stochas- tic differential equations. arXiv preprint arXiv:2011.13456 (2020)
work page internal anchor Pith review Pith/arXiv arXiv 2011
-
[2]
In: NeurIPS, pp
Ho, J., Jain, A., Abbeel, P.: Denoising dif- fusion probabilistic models. In: NeurIPS, pp. 6840–6851 (2020)
2020
-
[3]
In: NeurIPS, vol
Song, Y., Ermon, S.: Generative modeling by estimating gradients of the data distribution. In: NeurIPS, vol. 32 (2019)
2019
-
[4]
In: CVPR, pp
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: CVPR, pp. 10684–10695 (2022)
2022
-
[5]
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Liu, Y., Zhang, K., Li, Y., Yan, Z., Gao, C., Chen, R., Yuan, Z., Huang, Y., Sun, H., Gao, J., et al.: Sora: A review on back- ground, technology, limitations, and opportu- nities of large vision models. arXiv preprint arXiv:2402.17177 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[6]
arXiv preprint arXiv:2009.00713 (2020) 14
Chen, N., Zhang, Y., Zen, H., Weiss, R.J., Norouzi, M., Chan, W.: Wavegrad: Estimat- ing gradients for waveform generation. arXiv preprint arXiv:2009.00713 (2020) 14
-
[7]
Prompt-to-Prompt Image Editing with Cross Attention Control
Hertz, A., Mokady, R., Tenenbaum, J., Aber- man, K., Pritch, Y., Cohen-Or, D.: Prompt- to-prompt image editing with cross atten- tion control. arXiv preprint arXiv:2208.01626 (2022)
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[8]
In: CVPR, pp
Brooks, T., Holynski, A., Efros, A.A.: Instructpix2pix: Learning to follow image editing instructions. In: CVPR, pp. 18392– 18402 (2023)
2023
-
[9]
In: CVPR, pp
Kawar, B., Zada, S., Lang, O., Tov, O., Chang, H., Dekel, T., Mosseri, I., Irani, M.: Imagic: Text-based real image editing with diffusion models. In: CVPR, pp. 6007–6017 (2023)
2023
-
[10]
In: ICML, pp
Wang, Z., Pang, T., Du, C., Lin, M., Liu, W., Yan, S.: Better diffusion models further improve adversarial training. In: ICML, pp. 36246–36263 (2023)
2023
-
[11]
In: ICML, pp
Sohl-Dickstein, J., Weiss, E., Mah- eswaranathan, N., Ganguli, S.: Deep unsupervised learning using nonequilibrium thermodynamics. In: ICML, pp. 2256–2265 (2015)
2015
-
[12]
On the importance of noise scheduling for diffusion models.arXiv preprint arXiv:2301.10972(2023)
Chen, T.: On the importance of noise schedul- ing for diffusion models. arXiv preprint arXiv:2301.10972 (2023)
-
[13]
arXiv preprint arXiv:2209.05557 (2022)
Hoogeboom, E., Salimans, T.: Blurring diffu- sion models. arXiv preprint arXiv:2209.05557 (2022)
-
[14]
In: NeurIPS, vol
Kingma, D., Salimans, T., Poole, B., Ho, J.: Variational diffusion models. In: NeurIPS, vol. 34, pp. 21696–21707 (2021)
2021
-
[15]
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
2009
-
[16]
IJCV (2015)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpa- thy, A., Khosla, A., Bernstein, M.: ImageNet large scale visual recognition challenge. IJCV (2015)
2015
-
[17]
Nuclear Physics B180(3), 378–384 (1981)
Parisi, G.: Correlation functions and com- puter simulations. Nuclear Physics B180(3), 378–384 (1981)
1981
-
[18]
872–881 (2019)
Byrd, J., Lipton, Z.: What is the effect of importance weighting in deep learning? In: ICML, pp. 872–881 (2019)
2019
-
[19]
Neural networks106, 249–259 (2018)
Buda, M., Maki, A., Mazurowski, M.A.: A systematic study of the class imbalance prob- lem in convolutional neural networks. Neural networks106, 249–259 (2018)
2018
-
[20]
In: CVPR, pp
Cui, Y., Jia, M., Lin, T.-Y., Song, Y., Belongie, S.: Class-balanced loss based on effective number of samples. In: CVPR, pp. 9268–9277 (2019)
2019
-
[21]
Kang, B., Xie, S., Rohrbach, M., Yan, Z., Gordo, A., Feng, J., Kalantidis, Y.: Decoupling representation and classifier for long-tailed recognition. arXiv preprint arXiv:1910.09217 (2019)
-
[22]
arXiv preprint arXiv:2010.01809 (2020)
Wang, X., Lian, L., Miao, Z., Liu, Z., Yu, S.X.: Long-tailed recognition by rout- ing diverse distribution-aware experts. arXiv preprint arXiv:2010.01809 (2020)
-
[23]
IEEE TPAMI45(3), 3695–3706 (2022)
Cui, J., Liu, S., Tian, Z., Zhong, Z., Jia, J.: Reslt: Residual learning for long-tailed recognition. IEEE TPAMI45(3), 3695–3706 (2022)
2022
-
[24]
In: ICCV, pp
Cui, J., Zhong, Z., Liu, S., Yu, B., Jia, J.: Parametric contrastive learning. In: ICCV, pp. 715–724 (2021)
2021
-
[25]
IEEE TPAMI46(12), 7463–7474 (2023)
Cui, J., Zhong, Z., Tian, Z., Liu, S., Yu, B., Jia, J.: Generalized parametric contrastive learning. IEEE TPAMI46(12), 7463–7474 (2023)
2023
-
[26]
In: CVPR, pp
Cui, J., Zhu, B., Wen, X., Qi, X., Yu, B., Zhang, H.: Classes are not equal: An empir- ical study on image recognition fairness. In: CVPR, pp. 23283–23292 (2024)
2024
-
[27]
NeurIPS37, 74461–74486 (2024)
Cui, J., Tian, Z., Zhong, Z., Qi, X., Yu, B., Zhang, H.: Decoupled kullback-leibler diver- gence loss. NeurIPS37, 74461–74486 (2024)
2024
-
[28]
IEEE TPAMI, 1–12 (2026)
Cui, J., Zhu, B., Xu, Q., Tian, Z., Qi, 15 X., Yu, B., Zhang, H., Hong, R.: General- ized kullback-leibler divergence loss. IEEE TPAMI, 1–12 (2026)
2026
-
[29]
arXiv preprint arXiv:2507.14503 (2025)
Cui, J., Zhu, B., Xu, Q., Xu, X., Chen, P., Qi, X., Yu, B., Zhang, H., Hong, R.: Gener- ative distribution distillation. arXiv preprint arXiv:2507.14503 (2025)
-
[30]
In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp
Zhu, J., Wang, Z., Chen, J., Chen, Y.-P.P., Jiang, Y.-G.: Balanced contrastive learning for long-tailed visual recognition. In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6908–6917 (2022)
2022
-
[31]
IEEE TPAMI46(9), 5890–5904 (2024)
Du, C., Wang, Y., Song, S., Huang, G.: Prob- abilistic contrastive learning for long-tailed visual recognition. IEEE TPAMI46(9), 5890–5904 (2024)
2024
-
[32]
arXiv preprint arXiv:2204.01969 (2022)
Cui, J., Yuan, Y., Zhong, Z., Tian, Z., Hu, H., Lin, S., Jia, J.: Region rebalance for long- tailed semantic segmentation. arXiv preprint arXiv:2204.01969 (2022)
-
[33]
In: CVPR, pp
Zhong, Z., Cui, J., Yang, Y., Wu, X., Qi, X., Zhang, X., Jia, J.: Understanding imbal- anced semantic segmentation through neural collapse. In: CVPR, pp. 19550–19560 (2023)
2023
-
[34]
In: ICML, pp
Yang, Y., Zha, K., Chen, Y., Wang, H., Katabi, D.: Delving into deep imbal- anced regression. In: ICML, pp. 11842–11851 (2021)
2021
-
[35]
In: CVPR, pp
Qin, Y., Zheng, H., Yao, J., Zhou, M., Zhang, Y.: Class-balancing diffusion models. In: CVPR, pp. 18434–18443 (2023)
2023
-
[36]
arXiv preprint arXiv:2007.07314 (2020)
Menon, A.K., Jayasumana, S., Rawat, A.S., Jain, H., Veit, A., Kumar, S.: Long-tail learning via logit adjustment. arXiv preprint arXiv:2007.07314 (2020)
-
[37]
NeurIPS38, 138574–138604 (2026)
Wang, Z., Wei, S., Huo, X., Wang, H.: Pogdiff: product-of-gaussians diffusion mod- els for imbalanced text-to-image generation. NeurIPS38, 138574–138604 (2026)
2026
-
[38]
Classifier-Free Diffusion Guidance
Ho, J., Salimans, T.: Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598 (2022)
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[39]
In: ECCV, pp
Mahajan, D., Girshick, R., Ramanathan, V., He, K., Paluri, M., Li, Y., Bharambe, A., Van Der Maaten, L.: Exploring the limits of weakly supervised pretraining. In: ECCV, pp. 181–196 (2018)
2018
-
[40]
NeurIPS33, 12104–12114 (2020)
Karras, T., Aittala, M., Hellsten, J., Laine, S., Lehtinen, J., Aila, T.: Training genera- tive adversarial networks with limited data. NeurIPS33, 12104–12114 (2020)
2020
-
[41]
NeurIPS33, 7559–7570 (2020)
Zhao, S., Liu, Z., Lin, J., Zhu, J.-Y., Han, S.: Differentiable augmentation for data-efficient gan training. NeurIPS33, 7559–7570 (2020)
2020
-
[42]
Cao, K., Wei, C., Gaidon, A., Arechiga, N., Ma, T.: Learning imbalanced datasets with label-distribution-aware margin loss. (2019)
2019
-
[43]
In: ICCV, pp
Li, A.C., Prabhudesai, M., Duggal, S., Brown, E., Pathak, D.: Your diffusion model is secretly a zero-shot classifier. In: ICCV, pp. 2206–2217 (2023)
2023
-
[44]
In: ICML, pp
Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J.,et al.: Learning transferable visual models from natural lan- guage supervision. In: ICML, pp. 8748–8763 (2021)
2021
-
[45]
Flow Matching for Generative Modeling
Lipman, Y., Chen, R.T., Ben-Hamu, H., Nickel, M., Le, M.: Flow matching for generative modeling. arXiv preprint arXiv:2210.02747 (2022)
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[46]
In: CVPR, pp
Liu, Z., Miao, Z., Zhan, X., Wang, J., Gong, B., Yu, S.X.: Large-scale long-tailed recog- nition in an open world. In: CVPR, pp. 2537–2546 (2019)
2019
-
[47]
IJCV 115, 211–252 (2015)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M.,et al.: Imagenet large scale visual recognition challenge. IJCV 115, 211–252 (2015)
2015
-
[48]
NeurIPS30(2017)
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local 16 nash equilibrium. NeurIPS30(2017)
2017
-
[49]
NeurIPS32(2019)
Kynk¨ a¨ anniemi, T., Karras, T., Laine, S., Lehtinen, J., Aila, T.: Improved precision and recall metric for assessing generative models. NeurIPS32(2019)
2019
-
[50]
NeurIPS31 (2018) 17
Sajjadi, M.S., Bachem, O., Lucic, M., Bous- quet, O., Gelly, S.: Assessing generative mod- els via precision and recall. NeurIPS31 (2018) 17
2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.