pith. machine review for the scientific record. sign in

arxiv: 2605.08577 · v1 · submitted 2026-05-09 · 💻 cs.CV · cs.LG

Recognition: 2 theorem links

· Lean Theorem

Improving Generative Adversarial Networks with Self-Distillation

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:08 UTC · model grok-4.3

classification 💻 cs.CV cs.LG
keywords generative adversarial networksself-distillationexponential moving averageperceptual lossimage generationtraining stabilityFID metric
0
0 comments X

The pith

Using the EMA generator as a teacher via perceptual loss improves GAN image quality and training stability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Self-Distilled GAN (SD-GAN), which feeds the exponential moving average of the generator weights back into training as a teacher model. It supplies a perceptual loss to the actively updated generator, turning a component normally reserved for final use into an active source of guidance. If this works, the method should produce higher-quality images, reduce the cycling that destabilizes standard GAN optimization, and add a training signal that does not simply duplicate the adversarial objective. The authors also show it helps when fine-tuning already-trained models. A reader would care because it offers a lightweight way to make better use of an existing practice in GAN pipelines.

Core claim

The central claim is that SD-GAN, by employing the EMA generator as a teacher that supplies perceptual loss to the active generator, improves final image quality on metrics such as FID and random-FID, stabilizes the optimization trajectory, dampens parasitic cycling, and supplies learning guidance that is not trivially correlated with the conventional adversarial loss. This is shown through a proof of local asymptotic stability in the Dirac-GAN setting and through empirical tests on established architectures and datasets. The approach also works when fine-tuning pretrained GAN models.

What carries the argument

Self-Distillation, in which the EMA generator acts as a teacher supplying perceptual loss to guide the actively trained generator student.

If this is right

  • Higher image quality on FID and random-FID metrics across tested architectures and datasets.
  • More stable optimization trajectories that reduce cycling behavior.
  • An extra learning signal that is not redundant with the adversarial loss.
  • Improved performance when fine-tuning already trained GAN models.
  • Local asymptotic stability in the Dirac-GAN toy setting.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method could be tested for compatibility with other common GAN stabilizers such as gradient penalties to see if they compound or substitute for one another.
  • Because the EMA model is already computed in most pipelines, the added cost is low, so the approach might scale readily to larger models without new infrastructure.
  • The independence of the perceptual loss could be measured directly by correlation analysis during training to quantify how much new information it contributes.
  • The same teacher-student idea might be tried in other generative settings that maintain an averaged model, such as diffusion models.

Load-bearing premise

The perceptual loss from the EMA generator supplies guidance that is both beneficial and sufficiently independent from the standard adversarial loss, and that the local stability seen in the Dirac-GAN toy setting extends to practical high-dimensional training.

What would settle it

Train SD-GAN and a baseline GAN on CIFAR-10 or ImageNet using the same architecture and hyperparameters, then compare final FID scores and the presence of cycling oscillations in the loss curves; no improvement or worse stability would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.08577 by Antoni Nowinowski, Krzysztof Krawiec.

Figure 1
Figure 1. Figure 1: Overview of the SD-GAN framework. The training is guided by both the conventional GAN [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Parameter trajectories of the active generator ( [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: FID scores (log scale) during training on the FFHQ and LSUN Church datasets across [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: FID scores during fine-tuning on the FFHQ at [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Learning curves for SD-GAN and its ablated variants. [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative interplay between the objectives. [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Evolution of generated images for a fixed latent vector [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Qualitative interplay between the objectives (detailed version of Fig. [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Evolution of generated images for a fixed latent vector [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗
Figure 9
Figure 9. Figure 9: Evolution of generated images for a fixed latent vector [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Qualitative results after fine-tuning on FFHQ at [PITH_FULL_IMAGE:figures/full_fig_p019_10.png] view at source ↗
read the original abstract

In modern GANs, maintaining an Exponential Moving Average (EMA) of the generator's weights is a standard practice, as such an averaged model consistently outperforms the actively trained generator. However, the EMA generator is used for final deployment only and does not influence the training process. To address this missed opportunity, we introduce Self-Distilled GAN (SD-GAN) that employs the EMA generator as a teacher to guide the active generator (student) via perceptual loss. We prove the local asymptotic stability of SD-GAN in the Dirac-GAN setting and show that it dampens the parasitic cycling behavior that plagues the conventional GANs. Empirical evaluations across established architectures and datasets demonstrate that SD-GAN improves the final image quality on several metrics (FID and random-FID in particular), stabilizes the optimization trajectory and provides additional learning guidance that is not trivially correlated with the conventional adversarial loss. It also proves effective for fine-tuning pretrained GAN models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces Self-Distilled GAN (SD-GAN), which repurposes the standard EMA generator as a teacher to provide perceptual loss guidance to the actively trained generator (student). It proves local asymptotic stability and damping of cycling in the Dirac-GAN toy setting, and reports empirical improvements in FID, random-FID, and other metrics across established GAN architectures and datasets, plus benefits when fine-tuning pretrained models. The perceptual loss is presented as supplying non-redundant guidance independent of the adversarial objective.

Significance. If the claims hold, the contribution is meaningful because it activates an existing EMA component to improve training dynamics without introducing new networks or substantial overhead. The toy-model stability result and the reported metric gains on standard benchmarks constitute concrete strengths. The work could influence practical GAN training pipelines if the mechanism generalizes beyond the toy case.

major comments (3)
  1. [Stability proof (Dirac-GAN analysis)] The local asymptotic stability and damping of parasitic cycling are established only in the Dirac-GAN setting. No perturbation analysis, Lyapunov extension, or high-dimensional control is supplied to indicate how the same damping survives the non-convex landscapes and mode-interaction oscillations of StyleGAN/BigGAN-scale training. This directly bears on the central claim that SD-GAN stabilizes practical optimization trajectories.
  2. [Empirical evaluation and ablations] The claim that the EMA-derived perceptual loss supplies guidance that is not trivially correlated with the adversarial loss rests on empirical results, yet the experimental section lacks ablations that isolate the perceptual term from the EMA averaging effect itself (e.g., a baseline that retains EMA but omits the distillation loss). Without such controls, attribution of the observed FID improvements specifically to self-distillation remains incomplete.
  3. [Perceptual loss formulation and independence argument] The manuscript asserts that the perceptual loss is sufficiently independent of the standard adversarial objective, but does not report gradient correlation statistics or cosine-similarity measurements between the two loss gradients in the high-dimensional regime. Such quantification would be required to substantiate the non-redundancy claim that underpins the method's added value.
minor comments (2)
  1. [Figures and experimental protocol] The optimization-trajectory figures would be strengthened by reporting statistics over multiple random seeds or including shaded variance bands to make the claimed stabilization visually and quantitatively clearer.
  2. [Notation and definitions] Notation for the EMA decay rate, perceptual-loss weight, and teacher/student generators should be introduced once and used consistently; occasional redefinition of symbols reduces readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and insightful comments. Below we provide point-by-point responses to the major comments and indicate the revisions we will incorporate.

read point-by-point responses
  1. Referee: [Stability proof (Dirac-GAN analysis)] The local asymptotic stability and damping of parasitic cycling are established only in the Dirac-GAN setting. No perturbation analysis, Lyapunov extension, or high-dimensional control is supplied to indicate how the same damping survives the non-convex landscapes and mode-interaction oscillations of StyleGAN/BigGAN-scale training. This directly bears on the central claim that SD-GAN stabilizes practical optimization trajectories.

    Authors: We agree that the local asymptotic stability result is derived exclusively in the Dirac-GAN toy setting. This choice follows the standard practice in the GAN dynamics literature, where simplified models are used to obtain analytical insight into phenomena such as cycling before empirical validation on realistic models. A full Lyapunov or perturbation analysis for high-dimensional non-convex GAN objectives remains an open theoretical challenge. In the manuscript we complement the toy-model analysis with empirical measurements of training stability (reduced FID variance and smoother trajectories) across multiple architectures and datasets. In the revision we will expand the discussion section to explicitly acknowledge the scope of the theoretical result and outline why extending the proof to practical regimes is non-trivial. revision: partial

  2. Referee: [Empirical evaluation and ablations] The claim that the EMA-derived perceptual loss supplies guidance that is not trivially correlated with the adversarial loss rests on empirical results, yet the experimental section lacks ablations that isolate the perceptual term from the EMA averaging effect itself (e.g., a baseline that retains EMA but omits the distillation loss). Without such controls, attribution of the observed FID improvements specifically to self-distillation remains incomplete.

    Authors: The referee is correct that the current experiments do not contain a control that keeps the EMA generator but removes the perceptual distillation loss. Such an ablation would more cleanly separate the contribution of the distillation term from the mere presence of EMA. We will add this baseline in the revised version: we will train models using standard EMA without the self-distillation objective and report FID, random-FID, and trajectory statistics alongside the SD-GAN results to strengthen the attribution. revision: yes

  3. Referee: [Perceptual loss formulation and independence argument] The manuscript asserts that the perceptual loss is sufficiently independent of the standard adversarial objective, but does not report gradient correlation statistics or cosine-similarity measurements between the two loss gradients in the high-dimensional regime. Such quantification would be required to substantiate the non-redundancy claim that underpins the method's added value.

    Authors: We acknowledge that the manuscript states the perceptual loss supplies non-redundant guidance yet does not include quantitative measurements of gradient alignment. In the revision we will compute and report the average cosine similarity between the gradients of the adversarial loss and the perceptual loss at multiple training checkpoints on at least one large-scale dataset and architecture. These statistics will be added to the experimental section to directly support the independence claim. revision: yes

Circularity Check

0 steps flagged

No circularity detected in derivation chain

full rationale

The SD-GAN perceptual loss is defined directly from the EMA generator weights as an independent teacher signal, separate from the adversarial objective. The local asymptotic stability result is derived and stated only for the simplified Dirac-GAN toy model, without any reduction to or dependence on the high-dimensional empirical claims. No load-bearing self-citations, fitted parameters renamed as predictions, or ansatzes smuggled via prior work appear in the provided derivation steps. The central claims rest on independent definitions and separate toy-model analysis, making the overall chain self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the standard GAN training framework plus the assumption that perceptual loss from the EMA model supplies non-redundant guidance; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)
  • domain assumption Local asymptotic stability holds in the Dirac-GAN setting and generalizes to realistic GANs
    Invoked to support the claim that the method dampens cycling behavior.

pith-pipeline@v0.9.0 · 5455 in / 1213 out tokens · 58229 ms · 2026-05-12T01:08:25.059060+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Foundation/ArrowOfTime.lean arrow_from_z unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    We prove the local asymptotic stability of SD-GAN in the Dirac-GAN setting and show that it dampens the parasitic cycling behavior... Jacobian matrix J... Routh-Hurwitz criterion... α > 0 strictly guarantees... local asymptotic convergence

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    The final objective function for the generator is a weighted sum of the standard adversarial loss L_adv and L_SD: L_G = L_adv + α · L_SD(T(G(z)), T(G_EMA(z)))

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · 2 internal anchors

  1. [1]

    Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio

    Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In:Proceedings of the 28th International Conference on Neural Information Processing Systems. V ol. 2. Cambridge, MA, USA: MIT Press, 2014, pp. 2672–2680

  2. [2]

    Mescheder, Andreas Geiger, and Sebastian Nowozin

    Lars M. Mescheder, Andreas Geiger, and Sebastian Nowozin. Which Training Methods for GANs do actually Converge? In:Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Ed. by Jennifer G. Dy and Andreas Krause. Proceedings of Machine Learning Research. PMLR, 2018, pp. 3478–3487

  3. [3]

    Vetrov, and Andrew Gordon Wilson

    Pavel Izmailov, Dmitrii Podoprikhin, Timur Garipov, Dmitry P. Vetrov, and Andrew Gordon Wilson. Averaging Weights Leads to Wider Optima and Better Generalization. In:Proceedings of the Thirty-Fourth Conference on Uncertainty in Artificial Intelligence, UAI 2018, Monterey, California, USA, August 6-10,

  4. [4]

    by Amir Globerson and Ricardo Silva

    Ed. by Amir Globerson and Ricardo Silva. AUAI Press, 2018, pp. 876–885

  5. [5]

    Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results

    Antti Tarvainen and Harri Valpola. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In:Proceedings of the 31st International Confer- ence on Neural Information Processing Systems. Red Hook, NY, USA: Curran Associates Inc., 2017, pp. 1195–1204.ISBN: 978-1-5108-6096-4

  6. [6]

    Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bi- lal Piot, Koray Kavukcuoglu, Rémi Munos, and Michal Valko

    Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bi- lal Piot, Koray Kavukcuoglu, Rémi Munos, and Michal Valko. Bootstrap your own latent a new approach to self-supervised learning. In:Proceedings of the 34th Interna...

  7. [7]

    Emerging Properties in Self-Supervised Vision Transformers

    Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerging Properties in Self-Supervised Vision Transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2021, pp. 9650–9660

  8. [8]

    Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jégou, Julien Mairal, Patrick L...

  9. [9]

    Oriane Siméoni, Huy V . V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seung Eun Yi, Michaël Ramamonjisoa, Francisco Massa, Daniel Haziza, Luca Wehrstedt, Jianyuan Wang, Timothée Darcet, Théo Moutakanni, Leonel Sentana, Claire Roberts, Andrea Vedaldi, Jamie Tolan, John Brandt, Camille Couprie, Juli...

  10. [10]

    Efros, Eli Shechtman, and Oliver Wang

    Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2018, pp. 586–595

  11. [11]

    Improved techniques for training GANs

    Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training GANs. In:Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook, NY, USA: Curran Associates Inc., 2016, pp. 2234–2242.ISBN: 978-1- 5108-3881-9

  12. [12]

    Progressive Growing of GANs for Improved Quality, Stability, and Variation

    Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. Progressive Growing of GANs for Improved Quality, Stability, and Variation. In:6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018

  13. [13]

    The Unusual Effectiveness of Averaging in GAN Training

    Yasin Yazici, Chuan-Sheng Foo, Stefan Winkler, Kim-Hui Yap, Georgios Piliouras, and Vijay Chan- drasekhar. The Unusual Effectiveness of Averaging in GAN Training. In:7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019

  14. [14]

    Exponential Moving Average of Weights in Deep Learning: Dynamics and Benefits

    Daniel Morales-Brotons, Thijs V ogels, and Hadrien Hendrikx. Exponential Moving Average of Weights in Deep Learning: Dynamics and Benefits. In:Trans. Mach. Learn. Res.2024 (2024). 10

  15. [15]

    Large Scale GAN Training for High Fidelity Natural Image Synthesis

    Andrew Brock, Jeff Donahue, and Karen Simonyan. Large Scale GAN Training for High Fidelity Natural Image Synthesis. In:7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019

  16. [16]

    A Style-Based Generator Architecture for Generative Ad- versarial Networks

    Tero Karras, Samuli Laine, and Timo Aila. A Style-Based Generator Architecture for Generative Ad- versarial Networks. In:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2019, pp. 4401–4410

  17. [17]

    Analyzing and Improving the Image Quality of StyleGAN

    Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. Analyzing and Improving the Image Quality of StyleGAN. In:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2020, pp. 8110–8119

  18. [18]

    Training generative adversarial networks with limited data

    Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, and Timo Aila. Training generative adversarial networks with limited data. In:Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook, NY, USA: Curran Associates Inc., 2020, pp. 12104– 12114.ISBN: 978-1-71382-954-6

  19. [19]

    Alias-free generative adversarial networks

    Tero Karras, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. Alias-free generative adversarial networks. In:Proceedings of the 35th International Conference on Neural Information Processing Systems. NIPS ’21. Red Hook, NY, USA: Curran Associates Inc., 2021, pp. 852–863.ISBN: 978-1-71384-539-3

  20. [20]

    Projected GANs converge faster

    Axel Sauer, Kashyap Chitta, Jens Müller, and Andreas Geiger. Projected GANs converge faster. In: Proceedings of the 35th International Conference on Neural Information Processing Systems. Red Hook, NY, USA: Curran Associates Inc., 2021, pp. 17480–17492.ISBN: 978-1-71384-539-3

  21. [21]

    StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets

    Axel Sauer, Katja Schwarz, and Andreas Geiger. StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets. In:ACM SIGGRAPH 2022 Conference Proceedings. SIGGRAPH ’22. New York, NY, USA: Association for Computing Machinery, 2022, pp. 1–10.ISBN: 978-1-4503-9337-9.DOI: 10 . 1145 / 3528233.3530738

  22. [22]

    StyleGAN-T: unlocking the power of GANs for fast large-scale text-to-image synthesis

    Axel Sauer, Tero Karras, Samuli Laine, Andreas Geiger, and Timo Aila. StyleGAN-T: unlocking the power of GANs for fast large-scale text-to-image synthesis. In:Proceedings of the 40th International Conference on Machine Learning. V ol. 202. ICML’23. Honolulu, Hawaii, USA: JMLR.org, 2023, pp. 30105–30118

  23. [23]

    SAN: Inducing Metrizability of GAN with Discriminative Normalized Linear Layer

    Yuhta Takida, Masaaki Imaizumi, Takashi Shibuya, Chieh-Hsin Lai, Toshimitsu Uesaka, Naoki Murata, and Yuki Mitsufuji. SAN: Inducing Metrizability of GAN with Discriminative Normalized Linear Layer. In:The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net, 2024

  24. [24]

    Efficient Image Generation with Variadic Attention Heads

    Steven Walton, Ali Hassani, Xingqian Xu, Zhangyang Wang, and Humphrey Shi. Efficient Image Generation with Variadic Attention Heads. In:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. 2025, pp. 3264–3275

  25. [25]

    Denoising diffusion probabilistic models

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In:Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS ’20. Red Hook, NY, USA: Curran Associates Inc., 2020, pp. 6840–6851.ISBN: 978-1-71382-954-6

  26. [26]

    Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole

    Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-Based Generative Modeling through Stochastic Differential Equations. In:9th International Con- ference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021

  27. [27]

    Diffusion models beat GANs on image synthesis

    Prafulla Dhariwal and Alex Nichol. Diffusion models beat GANs on image synthesis. In:Proceedings of the 35th International Conference on Neural Information Processing Systems. NIPS ’21. Red Hook, NY, USA: Curran Associates Inc., 2021, pp. 8780–8794.ISBN: 978-1-71384-539-3

  28. [28]

    Consistency Regularization for Generative Adversarial Networks

    Han Zhang, Zizhao Zhang, Augustus Odena, and Honglak Lee. Consistency Regularization for Generative Adversarial Networks. In:8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020

  29. [29]

    Metaxas, and Han Zhang

    Long Zhao, Zizhao Zhang, Ting Chen, Dimitris N. Metaxas, and Han Zhang. Improved transformer for high-resolution GANs. In:Proceedings of the 35th International Conference on Neural Information Processing Systems. NIPS ’21. Red Hook, NY, USA: Curran Associates Inc., 2021, pp. 18367–18380. ISBN: 978-1-7138-4539-3

  30. [30]

    Differentiable augmentation for data- efficient GAN training

    Shengyu Zhao, Zhijian Liu, Ji Lin, Jun-Yan Zhu, and Song Han. Differentiable augmentation for data- efficient GAN training. In:Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS ’20. Red Hook, NY, USA: Curran Associates Inc., 2020, pp. 7559–7570.ISBN: 978-1-7138-2954-6. 11

  31. [31]

    2021 , url =

    Hung-Yu Tseng, Lu Jiang, Ce Liu, Ming-Hsuan Yang, and Weilong Yang. Regularizing Generative Adversarial Networks Under Limited Data. In:IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021. Computer Vision Foundation / IEEE, 2021, pp. 7921– 7931.DOI:10.1109/CVPR46437.2021.00783

  32. [32]

    Revisiting discriminator in GAN compression: a generator-discriminator cooperative compression scheme

    Shaojie Li, Jie Wu, Xuefeng Xiao, Fei Chao, Xudong Mao, and Rongrong Ji. Revisiting discriminator in GAN compression: a generator-discriminator cooperative compression scheme. In:Proceedings of the 35th International Conference on Neural Information Processing Systems. NIPS ’21. Red Hook, NY, USA: Curran Associates Inc., 2021, pp. 28560–28572.ISBN: 978-1-...

  33. [33]

    Online Multi-Granularity Distillation for GAN Compression

    Yuxi Ren, Jie Wu, Xuefeng Xiao, and Jianchao Yang. Online Multi-Granularity Distillation for GAN Compression. In:Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2021, pp. 6793–6803

  34. [34]

    Adversarial Diffusion Distillation

    Axel Sauer, Dominik Lorenz, Andreas Blattmann, and Robin Rombach. Adversarial Diffusion Distillation. In:Computer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part LXXXVI. Berlin, Heidelberg: Springer-Verlag, 2024, pp. 87–103.ISBN: 978-3-031-73015-3.DOI:10.1007/978-3-031-73016-0_6

  35. [35]

    Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation

    Axel Sauer, Frederic Boesel, Tim Dockhorn, Andreas Blattmann, Patrick Esser, and Robin Rombach. Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation. In:SIGGRAPH Asia 2024 Conference Papers, SA 2024, Tokyo, Japan, December 3-6, 2024. Ed. by Takeo Igarashi, Ariel Shamir, and Hao (Richard) Zhang. ACM, 2024, 106:1–106:11.DOI:10...

  36. [36]

    Distilling Diffusion Models Into Conditional GANs

    Minguk Kang, Richard Zhang, Connelly Barnes, Sylvain Paris, Suha Kwak, Jaesik Park, Eli Shechtman, Jun-Yan Zhu, and Taesung Park. Distilling Diffusion Models Into Conditional GANs. In:Computer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part XXVIII. Berlin, Heidelberg: Springer-Verlag, 2024, pp. 4...

  37. [37]

    Semi-TSGAN: Semi-Supervised Learning for Highlight Removal Based on Teacher-Student Generative Adversarial Network

    Yuanfeng Zheng, Yuchen Yan, and Hao Jiang. Semi-TSGAN: Semi-Supervised Learning for Highlight Removal Based on Teacher-Student Generative Adversarial Network. In:Sensors24.10 (Jan. 2024), p. 3090.ISSN: 1424-8220.DOI:10.3390/s24103090

  38. [38]

    Consistency models

    Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models. In:Proceedings of the 40th International Conference on Machine Learning. V ol. 202. ICML’23. Honolulu, Hawaii, USA: JMLR.org, 2023, pp. 32211–32252

  39. [39]

    Self-Distilled StyleGAN: Towards Generation from Internet Photos

    Ron Mokady, Omer Tov, Michal Yarom, Oran Lang, Inbar Mosseri, Tali Dekel, Daniel Cohen-Or, and Michal Irani. Self-Distilled StyleGAN: Towards Generation from Internet Photos. In:ACM SIGGRAPH 2022 Conference Proceedings. SIGGRAPH ’22. New York, NY, USA: Association for Computing Machinery, 2022, pp. 1–9.ISBN: 978-1-4503-9337-9.DOI:10.1145/3528233.3530708

  40. [40]

    Khalil.Nonlinear Systems

    Hassan K. Khalil.Nonlinear Systems. 2nd. Upper Saddle River, NJ: Prentice-Hall, 1996.ISBN: 978-0-13- 228024-2

  41. [41]

    The numerics of GANs

    Lars Mescheder, Sebastian Nowozin, and Andreas Geiger. The numerics of GANs. In:Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS’17. Red Hook, NY, USA: Curran Associates Inc., 2017, pp. 1823–1833.ISBN: 978-1-5108-6096-4

  42. [42]

    Zico Kolter

    Vaishnavh Nagarajan and J. Zico Kolter. Gradient descent GAN optimization is locally stable. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS’17. Red Hook, NY, USA: Curran Associates Inc., 2017, pp. 5591–5600.ISBN: 978-1-5108-6096-4

  43. [43]

    LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop

    Fisher Yu, Ari Seff, Yinda Zhang, Shuran Song, Thomas Funkhouser, and Jianxiong Xiao.LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop. June 4, 2016.DOI:10.48550/arXiv.1506.03365. arXiv:1506.03365[cs]

  44. [44]

    GANs trained by a two time-scale update rule converge to a local nash equilibrium

    Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. GANs trained by a two time-scale update rule converge to a local nash equilibrium. In:Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS’17. Red Hook, NY, USA: Curran Associates Inc., 2017, pp. 6629–6640.ISBN: 978-1-5108-6096-4

  45. [45]

    Cresswell, Rasa Hosseinzadeh, Yi Sui, Brendan Leigh Ross, Valentin Villecroze, Zhaoyan Liu, Anthony L

    George Stein, Jesse C. Cresswell, Rasa Hosseinzadeh, Yi Sui, Brendan Leigh Ross, Valentin Villecroze, Zhaoyan Liu, Anthony L. Caterini, J. Eric T. Taylor, and Gabriel Loaiza-Ganem. Exposing flaws of generative model evaluation metrics and their unfair treatment of diffusion models. In:Proceedings of the 37th International Conference on Neural Information ...

  46. [46]

    Mehdi S. M. Sajjadi, Olivier Bachem, Mario Lucic, Olivier Bousquet, and Sylvain Gelly. Assessing generative models via precision and recall. In:Proceedings of the 32nd International Conference on Neural Information Processing Systems. NIPS’18. Red Hook, NY, USA: Curran Associates Inc., 2018, pp. 5234–5243

  47. [47]

    Deep Learning Face Attributes in the Wild

    Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep Learning Face Attributes in the Wild. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV). ICCV ’15. USA: IEEE Computer Society, 2015, pp. 3730–3738.ISBN: 978-1-4673-8391-2.DOI: 10.1109/ICCV.2015. 425

  48. [48]

    Wang, E.P

    Z. Wang, E.P. Simoncelli, and A.C. Bovik. Multiscale structural similarity for image quality assessment. In:The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003. V ol. 2. 2003, 1398–1402 V ol.2.DOI:10.1109/ACSSC.2003.1292216

  49. [49]

    Wasserstein generative adversarial networks

    Martin Arjovsky, Soumith Chintala, and Léon Bottou. Wasserstein generative adversarial networks. In: Proceedings of the 34th International Conference on Machine Learning - Volume 70. ICML’17. Sydney, NSW, Australia: JMLR.org, 2017, pp. 214–223

  50. [50]

    Goodfellow, and Augustus Odena

    Samaneh Azadi, Catherine Olsson, Trevor Darrell, Ian J. Goodfellow, and Augustus Odena. Discriminator Rejection Sampling. In:7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019

  51. [51]

    Train longer, generalize better: closing the generalization gap in large batch training of neural networks

    Elad Hoffer, Itay Hubara, and Daniel Soudry. Train longer, generalize better: closing the generalization gap in large batch training of neural networks. In:Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS’17. Red Hook, NY, USA: Curran Associates Inc., 2017, pp. 1729–1739.ISBN: 978-1-5108-6096-4. 13 A Software...