arxiv: 2605.08577 · v1 · submitted 2026-05-09 · 💻 cs.CV · cs.LG

Recognition: 2 theorem links

· Lean Theorem

Improving Generative Adversarial Networks with Self-Distillation

Antoni Nowinowski , Krzysztof Krawiec

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:08 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords generative adversarial networksself-distillationexponential moving averageperceptual lossimage generationtraining stabilityFID metric

0 comments

The pith

Using the EMA generator as a teacher via perceptual loss improves GAN image quality and training stability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Self-Distilled GAN (SD-GAN), which feeds the exponential moving average of the generator weights back into training as a teacher model. It supplies a perceptual loss to the actively updated generator, turning a component normally reserved for final use into an active source of guidance. If this works, the method should produce higher-quality images, reduce the cycling that destabilizes standard GAN optimization, and add a training signal that does not simply duplicate the adversarial objective. The authors also show it helps when fine-tuning already-trained models. A reader would care because it offers a lightweight way to make better use of an existing practice in GAN pipelines.

Core claim

The central claim is that SD-GAN, by employing the EMA generator as a teacher that supplies perceptual loss to the active generator, improves final image quality on metrics such as FID and random-FID, stabilizes the optimization trajectory, dampens parasitic cycling, and supplies learning guidance that is not trivially correlated with the conventional adversarial loss. This is shown through a proof of local asymptotic stability in the Dirac-GAN setting and through empirical tests on established architectures and datasets. The approach also works when fine-tuning pretrained GAN models.

What carries the argument

Self-Distillation, in which the EMA generator acts as a teacher supplying perceptual loss to guide the actively trained generator student.

If this is right

Higher image quality on FID and random-FID metrics across tested architectures and datasets.
More stable optimization trajectories that reduce cycling behavior.
An extra learning signal that is not redundant with the adversarial loss.
Improved performance when fine-tuning already trained GAN models.
Local asymptotic stability in the Dirac-GAN toy setting.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could be tested for compatibility with other common GAN stabilizers such as gradient penalties to see if they compound or substitute for one another.
Because the EMA model is already computed in most pipelines, the added cost is low, so the approach might scale readily to larger models without new infrastructure.
The independence of the perceptual loss could be measured directly by correlation analysis during training to quantify how much new information it contributes.
The same teacher-student idea might be tried in other generative settings that maintain an averaged model, such as diffusion models.

Load-bearing premise

The perceptual loss from the EMA generator supplies guidance that is both beneficial and sufficiently independent from the standard adversarial loss, and that the local stability seen in the Dirac-GAN toy setting extends to practical high-dimensional training.

What would settle it

Train SD-GAN and a baseline GAN on CIFAR-10 or ImageNet using the same architecture and hyperparameters, then compare final FID scores and the presence of cycling oscillations in the loss curves; no improvement or worse stability would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.08577 by Antoni Nowinowski, Krzysztof Krawiec.

**Figure 2.** Figure 2: Parameter trajectories of the active generator ( [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: FID scores (log scale) during training on the FFHQ and LSUN Church datasets across [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: FID scores during fine-tuning on the FFHQ at [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Learning curves for SD-GAN and its ablated variants. [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Qualitative interplay between the objectives. [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Evolution of generated images for a fixed latent vector [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

**Figure 8.** Figure 8: Qualitative interplay between the objectives (detailed version of Fig. [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗

**Figure 9.** Figure 9: Evolution of generated images for a fixed latent vector [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗

**Figure 9.** Figure 9: Evolution of generated images for a fixed latent vector [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗

**Figure 10.** Figure 10: Qualitative results after fine-tuning on FFHQ at [PITH_FULL_IMAGE:figures/full_fig_p019_10.png] view at source ↗

read the original abstract

In modern GANs, maintaining an Exponential Moving Average (EMA) of the generator's weights is a standard practice, as such an averaged model consistently outperforms the actively trained generator. However, the EMA generator is used for final deployment only and does not influence the training process. To address this missed opportunity, we introduce Self-Distilled GAN (SD-GAN) that employs the EMA generator as a teacher to guide the active generator (student) via perceptual loss. We prove the local asymptotic stability of SD-GAN in the Dirac-GAN setting and show that it dampens the parasitic cycling behavior that plagues the conventional GANs. Empirical evaluations across established architectures and datasets demonstrate that SD-GAN improves the final image quality on several metrics (FID and random-FID in particular), stabilizes the optimization trajectory and provides additional learning guidance that is not trivially correlated with the conventional adversarial loss. It also proves effective for fine-tuning pretrained GAN models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SD-GAN adds perceptual distillation from the EMA generator as a training signal, with a Dirac-GAN stability proof and some FID gains, but the high-dimensional case rests on limited analysis.

read the letter

The main move here is to stop treating the EMA generator as a post-training artifact and instead feed it back as a perceptual teacher to the active generator. That produces an extra loss term that runs alongside the usual adversarial objective. The paper shows this in the abstract and claims the added signal is not just a rehash of the standard loss. They also derive local asymptotic stability and damping of cycling in the Dirac-GAN toy model, which is a clean, self-contained result for that simplified setting. Empirically they report better FID and random-FID numbers across a few standard architectures and datasets, plus smoother trajectories and some success when fine-tuning pretrained models. Those are the concrete positives. The work is straightforward to understand and sits inside the existing GAN training loop without new hyperparameters that look hard to tune. The citation pattern is normal and does not skip obvious prior EMA or distillation papers. The soft spot is exactly the one the stress-test flags. The stability argument stays inside the low-dimensional Dirac-GAN case; there is no perturbation analysis or scaling argument showing why the same damping should survive the mode-interaction oscillations that appear in StyleGAN or BigGAN training. The experiments also lack ablations that would separate the perceptual term from the plain effect of having an EMA model available. Without those controls it is hard to know how much of the reported gain is truly new guidance versus just better averaging. This is the kind of paper a GAN practitioner might try on their own setup to see if the numbers hold. It is not a foundational shift, but the idea is practical enough and the toy proof is honest about its scope. A serious editor should send it to review so the community can check the empirical robustness and decide whether the high-dimensional extension needs more work.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces Self-Distilled GAN (SD-GAN), which repurposes the standard EMA generator as a teacher to provide perceptual loss guidance to the actively trained generator (student). It proves local asymptotic stability and damping of cycling in the Dirac-GAN toy setting, and reports empirical improvements in FID, random-FID, and other metrics across established GAN architectures and datasets, plus benefits when fine-tuning pretrained models. The perceptual loss is presented as supplying non-redundant guidance independent of the adversarial objective.

Significance. If the claims hold, the contribution is meaningful because it activates an existing EMA component to improve training dynamics without introducing new networks or substantial overhead. The toy-model stability result and the reported metric gains on standard benchmarks constitute concrete strengths. The work could influence practical GAN training pipelines if the mechanism generalizes beyond the toy case.

major comments (3)

[Stability proof (Dirac-GAN analysis)] The local asymptotic stability and damping of parasitic cycling are established only in the Dirac-GAN setting. No perturbation analysis, Lyapunov extension, or high-dimensional control is supplied to indicate how the same damping survives the non-convex landscapes and mode-interaction oscillations of StyleGAN/BigGAN-scale training. This directly bears on the central claim that SD-GAN stabilizes practical optimization trajectories.
[Empirical evaluation and ablations] The claim that the EMA-derived perceptual loss supplies guidance that is not trivially correlated with the adversarial loss rests on empirical results, yet the experimental section lacks ablations that isolate the perceptual term from the EMA averaging effect itself (e.g., a baseline that retains EMA but omits the distillation loss). Without such controls, attribution of the observed FID improvements specifically to self-distillation remains incomplete.
[Perceptual loss formulation and independence argument] The manuscript asserts that the perceptual loss is sufficiently independent of the standard adversarial objective, but does not report gradient correlation statistics or cosine-similarity measurements between the two loss gradients in the high-dimensional regime. Such quantification would be required to substantiate the non-redundancy claim that underpins the method's added value.

minor comments (2)

[Figures and experimental protocol] The optimization-trajectory figures would be strengthened by reporting statistics over multiple random seeds or including shaded variance bands to make the claimed stabilization visually and quantitatively clearer.
[Notation and definitions] Notation for the EMA decay rate, perceptual-loss weight, and teacher/student generators should be introduced once and used consistently; occasional redefinition of symbols reduces readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and insightful comments. Below we provide point-by-point responses to the major comments and indicate the revisions we will incorporate.

read point-by-point responses

Referee: [Stability proof (Dirac-GAN analysis)] The local asymptotic stability and damping of parasitic cycling are established only in the Dirac-GAN setting. No perturbation analysis, Lyapunov extension, or high-dimensional control is supplied to indicate how the same damping survives the non-convex landscapes and mode-interaction oscillations of StyleGAN/BigGAN-scale training. This directly bears on the central claim that SD-GAN stabilizes practical optimization trajectories.

Authors: We agree that the local asymptotic stability result is derived exclusively in the Dirac-GAN toy setting. This choice follows the standard practice in the GAN dynamics literature, where simplified models are used to obtain analytical insight into phenomena such as cycling before empirical validation on realistic models. A full Lyapunov or perturbation analysis for high-dimensional non-convex GAN objectives remains an open theoretical challenge. In the manuscript we complement the toy-model analysis with empirical measurements of training stability (reduced FID variance and smoother trajectories) across multiple architectures and datasets. In the revision we will expand the discussion section to explicitly acknowledge the scope of the theoretical result and outline why extending the proof to practical regimes is non-trivial. revision: partial
Referee: [Empirical evaluation and ablations] The claim that the EMA-derived perceptual loss supplies guidance that is not trivially correlated with the adversarial loss rests on empirical results, yet the experimental section lacks ablations that isolate the perceptual term from the EMA averaging effect itself (e.g., a baseline that retains EMA but omits the distillation loss). Without such controls, attribution of the observed FID improvements specifically to self-distillation remains incomplete.

Authors: The referee is correct that the current experiments do not contain a control that keeps the EMA generator but removes the perceptual distillation loss. Such an ablation would more cleanly separate the contribution of the distillation term from the mere presence of EMA. We will add this baseline in the revised version: we will train models using standard EMA without the self-distillation objective and report FID, random-FID, and trajectory statistics alongside the SD-GAN results to strengthen the attribution. revision: yes
Referee: [Perceptual loss formulation and independence argument] The manuscript asserts that the perceptual loss is sufficiently independent of the standard adversarial objective, but does not report gradient correlation statistics or cosine-similarity measurements between the two loss gradients in the high-dimensional regime. Such quantification would be required to substantiate the non-redundancy claim that underpins the method's added value.

Authors: We acknowledge that the manuscript states the perceptual loss supplies non-redundant guidance yet does not include quantitative measurements of gradient alignment. In the revision we will compute and report the average cosine similarity between the gradients of the adversarial loss and the perceptual loss at multiple training checkpoints on at least one large-scale dataset and architecture. These statistics will be added to the experimental section to directly support the independence claim. revision: yes

Circularity Check

0 steps flagged

No circularity detected in derivation chain

full rationale

The SD-GAN perceptual loss is defined directly from the EMA generator weights as an independent teacher signal, separate from the adversarial objective. The local asymptotic stability result is derived and stated only for the simplified Dirac-GAN toy model, without any reduction to or dependence on the high-dimensional empirical claims. No load-bearing self-citations, fitted parameters renamed as predictions, or ansatzes smuggled via prior work appear in the provided derivation steps. The central claims rest on independent definitions and separate toy-model analysis, making the overall chain self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the standard GAN training framework plus the assumption that perceptual loss from the EMA model supplies non-redundant guidance; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)

domain assumption Local asymptotic stability holds in the Dirac-GAN setting and generalizes to realistic GANs
Invoked to support the claim that the method dampens cycling behavior.

pith-pipeline@v0.9.0 · 5455 in / 1213 out tokens · 58229 ms · 2026-05-12T01:08:25.059060+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/ArrowOfTime.lean arrow_from_z unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We prove the local asymptotic stability of SD-GAN in the Dirac-GAN setting and show that it dampens the parasitic cycling behavior... Jacobian matrix J... Routh-Hurwitz criterion... α > 0 strictly guarantees... local asymptotic convergence
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The final objective function for the generator is a weighted sum of the standard adversarial loss L_adv and L_SD: L_G = L_adv + α · L_SD(T(G(z)), T(G_EMA(z)))

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · 2 internal anchors

[1]

Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In:Proceedings of the 28th International Conference on Neural Information Processing Systems. V ol. 2. Cambridge, MA, USA: MIT Press, 2014, pp. 2672–2680

work page 2014
[2]

Mescheder, Andreas Geiger, and Sebastian Nowozin

Lars M. Mescheder, Andreas Geiger, and Sebastian Nowozin. Which Training Methods for GANs do actually Converge? In:Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Ed. by Jennifer G. Dy and Andreas Krause. Proceedings of Machine Learning Research. PMLR, 2018, pp. 3478–3487

work page 2018
[3]

Vetrov, and Andrew Gordon Wilson

Pavel Izmailov, Dmitrii Podoprikhin, Timur Garipov, Dmitry P. Vetrov, and Andrew Gordon Wilson. Averaging Weights Leads to Wider Optima and Better Generalization. In:Proceedings of the Thirty-Fourth Conference on Uncertainty in Artificial Intelligence, UAI 2018, Monterey, California, USA, August 6-10,

work page 2018
[4]

by Amir Globerson and Ricardo Silva

Ed. by Amir Globerson and Ricardo Silva. AUAI Press, 2018, pp. 876–885

work page 2018
[5]

Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results

Antti Tarvainen and Harri Valpola. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In:Proceedings of the 31st International Confer- ence on Neural Information Processing Systems. Red Hook, NY, USA: Curran Associates Inc., 2017, pp. 1195–1204.ISBN: 978-1-5108-6096-4

work page 2017
[6]

Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bi- lal Piot, Koray Kavukcuoglu, Rémi Munos, and Michal Valko

Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bi- lal Piot, Koray Kavukcuoglu, Rémi Munos, and Michal Valko. Bootstrap your own latent a new approach to self-supervised learning. In:Proceedings of the 34th Interna...

work page 2020
[7]

Emerging Properties in Self-Supervised Vision Transformers

Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerging Properties in Self-Supervised Vision Transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2021, pp. 9650–9660

work page 2021
[8]

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jégou, Julien Mairal, Patrick L...

work page 2024
[9]

Oriane Siméoni, Huy V . V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seung Eun Yi, Michaël Ramamonjisoa, Francisco Massa, Daniel Haziza, Luca Wehrstedt, Jianyuan Wang, Timothée Darcet, Théo Moutakanni, Leonel Sentana, Claire Roberts, Andrea Vedaldi, Jamie Tolan, John Brandt, Camille Couprie, Juli...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2508.10104 2025
[10]

Efros, Eli Shechtman, and Oliver Wang

Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2018, pp. 586–595

work page 2018
[11]

Improved techniques for training GANs

Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training GANs. In:Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook, NY, USA: Curran Associates Inc., 2016, pp. 2234–2242.ISBN: 978-1- 5108-3881-9

work page 2016
[12]

Progressive Growing of GANs for Improved Quality, Stability, and Variation

Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. Progressive Growing of GANs for Improved Quality, Stability, and Variation. In:6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018

work page 2018
[13]

The Unusual Effectiveness of Averaging in GAN Training

Yasin Yazici, Chuan-Sheng Foo, Stefan Winkler, Kim-Hui Yap, Georgios Piliouras, and Vijay Chan- drasekhar. The Unusual Effectiveness of Averaging in GAN Training. In:7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019

work page 2019
[14]

Exponential Moving Average of Weights in Deep Learning: Dynamics and Benefits

Daniel Morales-Brotons, Thijs V ogels, and Hadrien Hendrikx. Exponential Moving Average of Weights in Deep Learning: Dynamics and Benefits. In:Trans. Mach. Learn. Res.2024 (2024). 10

work page 2024
[15]

Large Scale GAN Training for High Fidelity Natural Image Synthesis

Andrew Brock, Jeff Donahue, and Karen Simonyan. Large Scale GAN Training for High Fidelity Natural Image Synthesis. In:7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019

work page 2019
[16]

A Style-Based Generator Architecture for Generative Ad- versarial Networks

Tero Karras, Samuli Laine, and Timo Aila. A Style-Based Generator Architecture for Generative Ad- versarial Networks. In:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2019, pp. 4401–4410

work page 2019
[17]

Analyzing and Improving the Image Quality of StyleGAN

Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. Analyzing and Improving the Image Quality of StyleGAN. In:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2020, pp. 8110–8119

work page 2020
[18]

Training generative adversarial networks with limited data

Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, and Timo Aila. Training generative adversarial networks with limited data. In:Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook, NY, USA: Curran Associates Inc., 2020, pp. 12104– 12114.ISBN: 978-1-71382-954-6

work page 2020
[19]

Alias-free generative adversarial networks

Tero Karras, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. Alias-free generative adversarial networks. In:Proceedings of the 35th International Conference on Neural Information Processing Systems. NIPS ’21. Red Hook, NY, USA: Curran Associates Inc., 2021, pp. 852–863.ISBN: 978-1-71384-539-3

work page 2021
[20]

Projected GANs converge faster

Axel Sauer, Kashyap Chitta, Jens Müller, and Andreas Geiger. Projected GANs converge faster. In: Proceedings of the 35th International Conference on Neural Information Processing Systems. Red Hook, NY, USA: Curran Associates Inc., 2021, pp. 17480–17492.ISBN: 978-1-71384-539-3

work page 2021
[21]

StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets

Axel Sauer, Katja Schwarz, and Andreas Geiger. StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets. In:ACM SIGGRAPH 2022 Conference Proceedings. SIGGRAPH ’22. New York, NY, USA: Association for Computing Machinery, 2022, pp. 1–10.ISBN: 978-1-4503-9337-9.DOI: 10 . 1145 / 3528233.3530738

work page arXiv 2022
[22]

StyleGAN-T: unlocking the power of GANs for fast large-scale text-to-image synthesis

Axel Sauer, Tero Karras, Samuli Laine, Andreas Geiger, and Timo Aila. StyleGAN-T: unlocking the power of GANs for fast large-scale text-to-image synthesis. In:Proceedings of the 40th International Conference on Machine Learning. V ol. 202. ICML’23. Honolulu, Hawaii, USA: JMLR.org, 2023, pp. 30105–30118

work page 2023
[23]

SAN: Inducing Metrizability of GAN with Discriminative Normalized Linear Layer

Yuhta Takida, Masaaki Imaizumi, Takashi Shibuya, Chieh-Hsin Lai, Toshimitsu Uesaka, Naoki Murata, and Yuki Mitsufuji. SAN: Inducing Metrizability of GAN with Discriminative Normalized Linear Layer. In:The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net, 2024

work page 2024
[24]

Efficient Image Generation with Variadic Attention Heads

Steven Walton, Ali Hassani, Xingqian Xu, Zhangyang Wang, and Humphrey Shi. Efficient Image Generation with Variadic Attention Heads. In:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. 2025, pp. 3264–3275

work page 2025
[25]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In:Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS ’20. Red Hook, NY, USA: Curran Associates Inc., 2020, pp. 6840–6851.ISBN: 978-1-71382-954-6

work page 2020
[26]

Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole

Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-Based Generative Modeling through Stochastic Differential Equations. In:9th International Con- ference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021

work page 2021
[27]

Diffusion models beat GANs on image synthesis

Prafulla Dhariwal and Alex Nichol. Diffusion models beat GANs on image synthesis. In:Proceedings of the 35th International Conference on Neural Information Processing Systems. NIPS ’21. Red Hook, NY, USA: Curran Associates Inc., 2021, pp. 8780–8794.ISBN: 978-1-71384-539-3

work page 2021
[28]

Consistency Regularization for Generative Adversarial Networks

Han Zhang, Zizhao Zhang, Augustus Odena, and Honglak Lee. Consistency Regularization for Generative Adversarial Networks. In:8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020

work page 2020
[29]

Metaxas, and Han Zhang

Long Zhao, Zizhao Zhang, Ting Chen, Dimitris N. Metaxas, and Han Zhang. Improved transformer for high-resolution GANs. In:Proceedings of the 35th International Conference on Neural Information Processing Systems. NIPS ’21. Red Hook, NY, USA: Curran Associates Inc., 2021, pp. 18367–18380. ISBN: 978-1-7138-4539-3

work page 2021
[30]

Differentiable augmentation for data- efficient GAN training

Shengyu Zhao, Zhijian Liu, Ji Lin, Jun-Yan Zhu, and Song Han. Differentiable augmentation for data- efficient GAN training. In:Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS ’20. Red Hook, NY, USA: Curran Associates Inc., 2020, pp. 7559–7570.ISBN: 978-1-7138-2954-6. 11

work page 2020
[31]

2021 , url =

Hung-Yu Tseng, Lu Jiang, Ce Liu, Ming-Hsuan Yang, and Weilong Yang. Regularizing Generative Adversarial Networks Under Limited Data. In:IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021. Computer Vision Foundation / IEEE, 2021, pp. 7921– 7931.DOI:10.1109/CVPR46437.2021.00783

work page doi:10.1109/cvpr46437.2021.00783 2021
[32]

Revisiting discriminator in GAN compression: a generator-discriminator cooperative compression scheme

Shaojie Li, Jie Wu, Xuefeng Xiao, Fei Chao, Xudong Mao, and Rongrong Ji. Revisiting discriminator in GAN compression: a generator-discriminator cooperative compression scheme. In:Proceedings of the 35th International Conference on Neural Information Processing Systems. NIPS ’21. Red Hook, NY, USA: Curran Associates Inc., 2021, pp. 28560–28572.ISBN: 978-1-...

work page 2021
[33]

Online Multi-Granularity Distillation for GAN Compression

Yuxi Ren, Jie Wu, Xuefeng Xiao, and Jianchao Yang. Online Multi-Granularity Distillation for GAN Compression. In:Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2021, pp. 6793–6803

work page 2021
[34]

Adversarial Diffusion Distillation

Axel Sauer, Dominik Lorenz, Andreas Blattmann, and Robin Rombach. Adversarial Diffusion Distillation. In:Computer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part LXXXVI. Berlin, Heidelberg: Springer-Verlag, 2024, pp. 87–103.ISBN: 978-3-031-73015-3.DOI:10.1007/978-3-031-73016-0_6

work page doi:10.1007/978-3-031-73016-0_6 2024
[35]

Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation

Axel Sauer, Frederic Boesel, Tim Dockhorn, Andreas Blattmann, Patrick Esser, and Robin Rombach. Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation. In:SIGGRAPH Asia 2024 Conference Papers, SA 2024, Tokyo, Japan, December 3-6, 2024. Ed. by Takeo Igarashi, Ariel Shamir, and Hao (Richard) Zhang. ACM, 2024, 106:1–106:11.DOI:10...

work page doi:10.1145/3680528.3687625 2024
[36]

Distilling Diffusion Models Into Conditional GANs

Minguk Kang, Richard Zhang, Connelly Barnes, Sylvain Paris, Suha Kwak, Jaesik Park, Eli Shechtman, Jun-Yan Zhu, and Taesung Park. Distilling Diffusion Models Into Conditional GANs. In:Computer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part XXVIII. Berlin, Heidelberg: Springer-Verlag, 2024, pp. 4...

work page doi:10.1007/978-3-031-73390-1_25 2024
[37]

Semi-TSGAN: Semi-Supervised Learning for Highlight Removal Based on Teacher-Student Generative Adversarial Network

Yuanfeng Zheng, Yuchen Yan, and Hao Jiang. Semi-TSGAN: Semi-Supervised Learning for Highlight Removal Based on Teacher-Student Generative Adversarial Network. In:Sensors24.10 (Jan. 2024), p. 3090.ISSN: 1424-8220.DOI:10.3390/s24103090

work page doi:10.3390/s24103090 2024
[38]

Consistency models

Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models. In:Proceedings of the 40th International Conference on Machine Learning. V ol. 202. ICML’23. Honolulu, Hawaii, USA: JMLR.org, 2023, pp. 32211–32252

work page 2023
[39]

Self-Distilled StyleGAN: Towards Generation from Internet Photos

Ron Mokady, Omer Tov, Michal Yarom, Oran Lang, Inbar Mosseri, Tali Dekel, Daniel Cohen-Or, and Michal Irani. Self-Distilled StyleGAN: Towards Generation from Internet Photos. In:ACM SIGGRAPH 2022 Conference Proceedings. SIGGRAPH ’22. New York, NY, USA: Association for Computing Machinery, 2022, pp. 1–9.ISBN: 978-1-4503-9337-9.DOI:10.1145/3528233.3530708

work page doi:10.1145/3528233.3530708 2022
[40]

Khalil.Nonlinear Systems

Hassan K. Khalil.Nonlinear Systems. 2nd. Upper Saddle River, NJ: Prentice-Hall, 1996.ISBN: 978-0-13- 228024-2

work page 1996
[41]

The numerics of GANs

Lars Mescheder, Sebastian Nowozin, and Andreas Geiger. The numerics of GANs. In:Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS’17. Red Hook, NY, USA: Curran Associates Inc., 2017, pp. 1823–1833.ISBN: 978-1-5108-6096-4

work page 2017
[42]

Zico Kolter

Vaishnavh Nagarajan and J. Zico Kolter. Gradient descent GAN optimization is locally stable. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS’17. Red Hook, NY, USA: Curran Associates Inc., 2017, pp. 5591–5600.ISBN: 978-1-5108-6096-4

work page 2017
[43]

LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop

Fisher Yu, Ari Seff, Yinda Zhang, Shuran Song, Thomas Funkhouser, and Jianxiong Xiao.LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop. June 4, 2016.DOI:10.48550/arXiv.1506.03365. arXiv:1506.03365[cs]

work page internal anchor Pith review doi:10.48550/arxiv.1506.03365 2016
[44]

GANs trained by a two time-scale update rule converge to a local nash equilibrium

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. GANs trained by a two time-scale update rule converge to a local nash equilibrium. In:Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS’17. Red Hook, NY, USA: Curran Associates Inc., 2017, pp. 6629–6640.ISBN: 978-1-5108-6096-4

work page 2017
[45]

Cresswell, Rasa Hosseinzadeh, Yi Sui, Brendan Leigh Ross, Valentin Villecroze, Zhaoyan Liu, Anthony L

George Stein, Jesse C. Cresswell, Rasa Hosseinzadeh, Yi Sui, Brendan Leigh Ross, Valentin Villecroze, Zhaoyan Liu, Anthony L. Caterini, J. Eric T. Taylor, and Gabriel Loaiza-Ganem. Exposing flaws of generative model evaluation metrics and their unfair treatment of diffusion models. In:Proceedings of the 37th International Conference on Neural Information ...

work page 2023
[46]

Mehdi S. M. Sajjadi, Olivier Bachem, Mario Lucic, Olivier Bousquet, and Sylvain Gelly. Assessing generative models via precision and recall. In:Proceedings of the 32nd International Conference on Neural Information Processing Systems. NIPS’18. Red Hook, NY, USA: Curran Associates Inc., 2018, pp. 5234–5243

work page 2018
[47]

Deep Learning Face Attributes in the Wild

Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep Learning Face Attributes in the Wild. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV). ICCV ’15. USA: IEEE Computer Society, 2015, pp. 3730–3738.ISBN: 978-1-4673-8391-2.DOI: 10.1109/ICCV.2015. 425

work page doi:10.1109/iccv.2015 2015
[48]

Wang, E.P

Z. Wang, E.P. Simoncelli, and A.C. Bovik. Multiscale structural similarity for image quality assessment. In:The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003. V ol. 2. 2003, 1398–1402 V ol.2.DOI:10.1109/ACSSC.2003.1292216

work page doi:10.1109/acssc.2003.1292216 2003
[49]

Wasserstein generative adversarial networks

Martin Arjovsky, Soumith Chintala, and Léon Bottou. Wasserstein generative adversarial networks. In: Proceedings of the 34th International Conference on Machine Learning - Volume 70. ICML’17. Sydney, NSW, Australia: JMLR.org, 2017, pp. 214–223

work page 2017
[50]

Goodfellow, and Augustus Odena

Samaneh Azadi, Catherine Olsson, Trevor Darrell, Ian J. Goodfellow, and Augustus Odena. Discriminator Rejection Sampling. In:7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019

work page 2019
[51]

Train longer, generalize better: closing the generalization gap in large batch training of neural networks

Elad Hoffer, Itay Hubara, and Daniel Soudry. Train longer, generalize better: closing the generalization gap in large batch training of neural networks. In:Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS’17. Red Hook, NY, USA: Curran Associates Inc., 2017, pp. 1729–1739.ISBN: 978-1-5108-6096-4. 13 A Software...

work page 2017