GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium
read the original abstract
Generative Adversarial Networks (GANs) excel at creating realistic images with complex models for which maximum likelihood is infeasible. However, the convergence of GAN training has still not been proved. We propose a two time-scale update rule (TTUR) for training GANs with stochastic gradient descent on arbitrary GAN loss functions. TTUR has an individual learning rate for both the discriminator and the generator. Using the theory of stochastic approximation, we prove that the TTUR converges under mild assumptions to a stationary local Nash equilibrium. The convergence carries over to the popular Adam optimization, for which we prove that it follows the dynamics of a heavy ball with friction and thus prefers flat minima in the objective landscape. For the evaluation of the performance of GANs at image generation, we introduce the "Fr\'echet Inception Distance" (FID) which captures the similarity of generated images to real ones better than the Inception Score. In experiments, TTUR improves learning for DCGANs and Improved Wasserstein GANs (WGAN-GP) outperforming conventional GAN training on CelebA, CIFAR-10, SVHN, LSUN Bedrooms, and the One Billion Word Benchmark.
This paper has not been read by Pith yet.
Forward citations
Cited by 40 Pith papers
-
Denoising Diffusion Implicit Models
DDIMs construct non-Markovian diffusion processes that share DDPM training objectives but allow much faster reverse sampling, demonstrated empirically at 10-50x wall-clock speedup.
-
AGAN: Towards Automated Design of Generative Adversarial Networks
AGAN is the first neural architecture search method for GANs that discovers architectures outperforming state-of-the-art on CIFAR-10 unsupervised image generation and competitive on supervised tasks.
-
iTRIALSPACE: Programmable Virtual Lesion Trials for Controlled Evaluation of Lung CT Models
iTRIALSPACE generates realistic virtual lesion trials on lung CTs that isolate performance drivers and show strong transfer of model rankings to real clinical data (ρ=0.93).
-
Dream-Cubed: Controllable Generative Modeling in Minecraft by Training on Billions of Cubes
Dream-Cubed releases a billion-scale voxel dataset and 3D diffusion models that generate controllable Minecraft worlds by operating directly on blocks.
-
ExpertEdit: Learning Skill-Aware Motion Editing from Expert Videos
ExpertEdit edits novice motions to expert skill levels by learning a motion prior from unpaired videos and infilling masked skill-critical spans.
-
Efficient Unlearning through Maximizing Relearning Convergence Delay
The Influence Eliminating Unlearning framework maximizes relearning convergence delay via weight decay and noise injection to remove the influence of a forgetting set while preserving accuracy on retained data.
-
DiV-INR: Extreme Low-Bitrate Diffusion Video Compression with INR Conditioning
DiV-INR integrates implicit neural representations as conditioning signals for diffusion models to achieve better perceptual quality than HEVC, VVC, and prior neural codecs at extremely low bitrates under 0.05 bpp.
-
Setting-Matched and Semantics-Scaled Benchmarking of One-Step Generative Models Against Multistep Diffusion and Flow Models
Matched benchmarking reveals FID misleads in few-step regimes under CFG, prompting CLIP-scaled and PickScore-scaled FID and IS variants for better semantic evaluation of one-step image generators.
-
MAGIC: Few-Shot Mask-Guided Anomaly Inpainting with Prompt Perturbation, Spatially Adaptive Guidance, and Context Awareness
MAGIC is a few-shot mask-guided anomaly inpainting framework using Gaussian prompt perturbation, spatially adaptive guidance, and context-aware mask alignment to produce high-fidelity, diverse anomalies that outperfor...
-
Imagen Video: High Definition Video Generation with Diffusion Models
Imagen Video generates high-definition text-conditional videos via a cascade of base and super-resolution diffusion models, achieving high fidelity and controllability.
-
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
Imagen achieves state-of-the-art photorealistic text-to-image generation by scaling a text-only pretrained T5 language model within a diffusion framework, reaching FID 7.27 on COCO without training on it.
-
Broken Memories: Detecting and Mitigating Memorization in Diffusion Models with Degraded Generations
Memorization in diffusion models is detected via latent update norm instability and mitigated on-the-fly, yielding AUC over 0.999 and zero memorization rate on Stable Diffusion 1.4.
-
FlowErase-RL: Rethinking Concept Erasure as Reward Optimization in Flow Matching Models
FlowErase-RL is the first GRPO-based reward optimization framework for concept erasure in flow matching models, using a dynamic dual-path reward mechanism to suppress target concepts while preserving generative quality.
-
LiFT: Lifted Inter-slice Feature Trajectories for 3D Image Generation from 2D Generators
LiFT factorizes 3D medical volume synthesis into per-slice 2D generation and inter-slice trajectory learning, using a tri-planar drifting loss for unconditional coherence and a z-context mixer for paired translation tasks.
-
Ensemble Distributionally Robust Bayesian Optimisation
A tractable ensemble distributionally robust Bayesian optimization method achieves improved sublinear regret bounds under context uncertainty.
-
CASCADE: Context-Aware Relaxation for Speculative Image Decoding
CASCADE formalizes semantic interchangeability and convergence in target model representations to enable context-aware acceptance relaxation in tree-based speculative decoding, delivering up to 3.6x speedup on text-to...
-
Stability of the Monge Map in Semi-Dual Optimal Transport
Semi-dual optimal transport has a degenerate saddle-point structure equivalent to constrained optimization, with necessary and sufficient conditions derived for Monge map convergence independent of dual potential optimality.
-
Defining Robust Ultrasound Quality Metrics via an Ultrasound Foundation Model
Proposes TinyUSFM-uLPIPS and TinyUSFM-NRQ metrics that show better alignment with segmentation task performance and expert preference than PSNR or VGG-LPIPS in ultrasound imaging.
-
Defining Robust Ultrasound Quality Metrics via an Ultrasound Foundation Model
TinyUSFM-uLPIPS and TinyUSFM-NRQ provide task-linked, cross-organ, and clinically predictive quality assessment for ultrasound images that outperforms conventional metrics in calibration with segmentation performance ...
-
Evaluating AI-Generated Images of Cultural Artifacts with Community-Informed Rubrics
Community members from the UK blind community, Kerala, and Tamil Nadu helped define what counts as culturally appropriate depictions of artifacts, and the authors tested whether those definitions can be turned into re...
-
Diffusion Models Memorize in Training -- and Generalize in Inference
Diffusion models overfit denoising loss at intermediate noise but generalize in inference as model error smooths the flow field and sampling paths avoid memorized noisy training data.
-
InfiniteDiffusion: Bridging Learned Fidelity and Procedural Utility for Open-World Terrain Generation
InfiniteDiffusion adapts diffusion models to produce infinite, seed-consistent, high-fidelity terrain with procedural-noise-like access and 9x speed over prior methods.
-
F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions
F1 integrates next-scale visual foresight prediction into a Mixture-of-Transformer VLA architecture to reformulate action generation as foresight-guided inverse dynamics, achieving higher success rates on 136 tasks.
-
A Generalist Model for Diverse Text-Guided Medical Image Synthesis
MediSyn is a generalist latent diffusion model that synthesizes text-guided medical images across multiple specialties and modalities from public data and improves downstream classifiers in low-data settings.
-
MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework
MetaGPT embeds human SOPs into LLM prompts to create role-specialized agent teams that produce more coherent solutions on collaborative software engineering tasks than prior chat-based multi-agent systems.
-
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis
SDXL improves upon prior Stable Diffusion versions through a larger UNet backbone, dual text encoders, novel conditioning, and a refinement model, producing higher-fidelity images competitive with black-box state-of-t...
-
Knowledge Distillation in Iterative Generative Models for Improved Sampling Speed
Denoising Student distills the multi-step denoising process of score-based and diffusion models into a single forward pass, matching GAN sampling speed while producing comparable sample quality on CIFAR-10, CelebA, an...
-
Demystifying MMD GANs
MMD GANs have unbiased critic gradients but biased generator gradients from sample-based learning, and the Kernel Inception Distance provides a practical new measure for GAN convergence and dynamic learning rate adaptation.
-
Broken Memories: Detecting and Mitigating Memorization in Diffusion Models with Degraded Generations
Proposes stability regions based on latent update norms to detect and mitigate memorization in diffusion models, reporting AUC over 0.999 and zero memorization rate after mitigation on Stable Diffusion 1.4.
-
CaloArt: Large-Patch x-Prediction Diffusion Transformers for High-Granularity Calorimeter Shower Generation
CaloArt achieves top FPD, high-level, and classifier metrics on CaloChallenge datasets 2 and 3 while keeping single-GPU generation at 9-11 ms per shower by combining large-patch tokenization, x-prediction, and conditi...
-
Stability of the Monge Map in Semi-Dual Optimal Transport
Semi-dual optimal transport has a degenerate saddle-point structure whose solution is a constrained optimization problem, giving necessary and sufficient conditions for Monge map convergence independent of dual optimality.
-
Stability of the Monge Map in Semi-Dual Optimal Transport
Semi-dual OT formulation has degenerate saddle-point structure; necessary and sufficient conditions for Monge map convergence are derived without requiring dual potential optimality.
-
LoRaQ: Optimized Low Rank Approximation for 4-bit Quantization
LoRaQ enables fully sub-16-bit quantized diffusion models by optimizing low-rank error compensation in a data-free way, outperforming prior methods at equal memory cost on Pixart-Σ and SANA while supporting mixed low-...
-
UniMesh: Unifying 3D Mesh Understanding and Generation
UniMesh unifies 3D mesh generation and understanding in one model via a Mesh Head interface, Chain of Mesh iterative editing, and an Actor-Evaluator self-reflection loop.
-
Protecting and Preserving Protest Dynamics for Responsible Analysis
A responsible computing framework substitutes real protest imagery with labeled synthetic reproductions from conditional image synthesis to enable privacy-aware analysis of collective action patterns.
-
Evaluating AI-Generated Images of Cultural Artifacts with Community-Informed Rubrics
Case studies with blind UK residents and people from Kerala and Tamil Nadu demonstrate that community input at the systematization stage produces culturally grounded definitions of appropriateness for text-to-image mo...
-
CG-MLLM: Captioning and Generating 3D content via Multi-modal Large Language Models
CG-MLLM is a multimodal LLM using a Mixture-of-Transformer architecture with separate TokenAR and BlockAR components integrated with a pre-trained vision-language backbone and 3D VAE to enable 3D captioning and high-f...
-
A Utility-Preserving GAN for Face Obscuration
UP-GAN uses a GAN to obscure faces while preserving utility attributes like age, gender, pose, and expression better than blurring or pixelation.
-
Generative Texture Diversification of 3D Pedestrians for Robust Autonomous Driving Perception
Generative texture synthesis from StyleGAN2 diversifies 3D pedestrian assets from a single base model, improving robustness in 2D object detection while exposing 3D perception models' sensitivity to geometric domain gaps.
-
Discrete Meanflow Training Curriculum
A DMF curriculum initialized from pretrained flow models achieves one-step FID 3.36 on CIFAR-10 after only 2000 epochs by exploiting a discretized consistency property in the Meanflow objective.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.