hub

Progressive Growing of GANs for Improved Quality, Stability, and Variation

Tero Karras, Timo Aila, Samuli Laine, Jaakko Lehtinen · 2017 · cs.NE · arXiv 1710.10196

32 Pith papers cite this work. Polarity classification is still indexing.

32 Pith papers citing it

open full Pith review browse 32 citing papers arXiv PDF

abstract

We describe a new training methodology for generative adversarial networks. The key idea is to grow both the generator and discriminator progressively: starting from a low resolution, we add new layers that model increasingly fine details as training progresses. This both speeds the training up and greatly stabilizes it, allowing us to produce images of unprecedented quality, e.g., CelebA images at 1024^2. We also propose a simple way to increase the variation in generated images, and achieve a record inception score of 8.80 in unsupervised CIFAR10. Additionally, we describe several implementation details that are important for discouraging unhealthy competition between the generator and discriminator. Finally, we suggest a new metric for evaluating GAN results, both in terms of image quality and variation. As an additional contribution, we construct a higher-quality version of the CelebA dataset.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

AI safety via debate

stat.ML · 2018-05-02 · conditional · novelty 8.0

AI agents trained through competitive debate can allow polynomial-time human judges to oversee PSPACE-level questions, with MNIST experiments boosting sparse classifier accuracy from 59% to 89% using only 6 pixels.

ImageAttributionBench: How Far Are We from Generalizable Attribution?

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

ImageAttributionBench is a benchmark dataset demonstrating that state-of-the-art image attribution methods lack robustness to image degradation and fail to generalize to semantically disjoint domains.

What Cohort INRs Encode and Where to Freeze Them

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

Optimal INR freeze depth matches highest weight stable rank layer; SAEs reveal SIREN atoms are localized while FFMLP atoms trace cohort contours with causal impact on PSNR.

LEGO: LoRA-Enabled Generator-Oriented Framework for Synthetic Image Detection

cs.CV · 2026-05-06 · unverdicted · novelty 7.0

LEGO uses multiple generator-specific LoRA modules modulated by an MLP and fused with attention to detect synthetic images, achieving better performance than prior methods while using under 10% of the training data.

ID-Eraser: Proactive Defense Against Face Swapping via Identity Perturbation

cs.CV · 2026-04-23 · unverdicted · novelty 7.0

A feature-space method that erases usable identity information from face images via learnable perturbations and a Face Revive Generator, rendering them ineffective for deepfake swapping while preserving visual quality.

ReImagine: Rethinking Controllable High-Quality Human Video Generation via Image-First Synthesis

cs.CV · 2026-04-21 · unverdicted · novelty 7.0

ReImagine decouples human appearance from temporal consistency via pretrained image backbones, SMPL-X motion guidance, and training-free video diffusion refinement to generate high-quality controllable videos.

MetaCloak-JPEG: JPEG-Robust Adversarial Perturbation for Preventing Unauthorized DreamBooth-Based Deepfake Generation

cs.CV · 2026-04-20 · unverdicted · novelty 7.0

MetaCloak-JPEG uses a DiffJPEG layer with straight-through estimator inside a JPEG-aware EOT and curriculum meta-learning loop to produce l-inf bounded perturbations that retain 91.3% effectiveness after real JPEG compression.

ExpertEdit: Learning Skill-Aware Motion Editing from Expert Videos

cs.CV · 2026-04-12 · unverdicted · novelty 7.0

ExpertEdit edits novice motions to expert skill levels by learning a motion prior from unpaired videos and infilling masked skill-critical spans.

Diffusion Posterior Sampling for General Noisy Inverse Problems

stat.ML · 2022-09-29 · unverdicted · novelty 7.0

Diffusion models solve noisy (non)linear inverse problems via approximated posterior sampling that blends diffusion steps with manifold gradients without strict consistency projection.

High-Resolution Image Synthesis with Latent Diffusion Models

cs.CV · 2021-12-20 · conditional · novelty 7.0

Latent diffusion models achieve state-of-the-art inpainting and competitive results on unconditional generation, scene synthesis, and super-resolution by performing the diffusion process in the latent space of pretrained autoencoders with cross-attention conditioning, while cutting computational and

SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations

cs.CV · 2021-08-02 · conditional · novelty 7.0

SDEdit performs guided image synthesis and editing by adding noise to inputs and refining them via denoising with a diffusion model's SDE prior, outperforming GAN methods in human studies without task-specific training.

The Diffusion Encoder

cs.LG · 2026-05-13 · unverdicted · novelty 6.0

A diffusion model serves as the encoder in an autoencoder when trained alternately with the decoder to resolve opposing update directions while retaining the standard diffusion training objective.

Deep Probabilistic Unfolding for Quantized Compressive Sensing

cs.CV · 2026-05-12 · unverdicted · novelty 6.0

A probabilistic unfolding network with stable likelihood projection and dual-domain Mamba achieves state-of-the-art reconstruction in quantized compressive sensing.

DiffATS: Diffusion in Aligned Tensor Space

cs.LG · 2026-05-10 · unverdicted · novelty 6.0

DiffATS trains diffusion models directly on aligned Tucker tensor primitives that are proven to be homeomorphisms, delivering efficient unconditional and conditional generation across images, videos, and PDE data with high compression.

Decoupling Semantics and Fingerprints: A Universal Representation for AI-Generated Image Detection

cs.CV · 2026-05-08 · unverdicted · novelty 6.0

ODP-Net structurally disentangles universal forgery traces from generator fingerprints and semantics via orthogonal decomposition and purification, delivering state-of-the-art generalization to unseen AI image generators such as Stable Diffusion 3.

Conditional Diffusion Under Linear Constraints: Langevin Mixing and Information-Theoretic Guarantees

cs.LG · 2026-05-06 · unverdicted · novelty 6.0

Error in approximating the tangent conditional score by the unconditional score in diffusion models is bounded by dimension-free conditional mutual information, with a projected-Langevin method outperforming baselines in inpainting and super-resolution.

A Few-Step Generative Model on Cumulative Flow Maps

cs.LG · 2026-05-05 · unverdicted · novelty 6.0

Cumulative flow maps unify few-step generative modeling for diffusion and flow models via cumulative transport and parameterization with minimal changes to time embeddings and objectives.

Which Face and Whose Identity? Solving the Dual Challenge of Deepfake Proactive Forensics in Multi-Face Scenarios

cs.CV · 2026-04-29 · unverdicted · novelty 6.0

DAWF embeds identity watermarks via a parallel multi-face architecture and uses selective loss to answer which face was forged and whose identity was used.

Selective Depthwise Separable Convolution for Lightweight Joint Source-Channel Coding in Wireless Image Transmission

eess.IV · 2026-04-24 · unverdicted · novelty 6.0

A selective replacement of convolutional layers by depthwise separable convolutions in JSCC systems cuts parameters substantially while keeping reconstruction performance nearly intact for wireless image transmission.

Combating Pattern and Content Bias: Adversarial Feature Learning for Generalized AI-Generated Image Detection

cs.CV · 2026-04-14 · unverdicted · novelty 6.0

MAFL uses adversarial training to suppress pattern and content biases, guiding models to learn shared generative features for better cross-model generalization in detecting AI images.

SyncBreaker:Stage-Aware Multimodal Adversarial Attacks on Audio-Driven Talking Head Generation

cs.CV · 2026-04-09 · unverdicted · novelty 6.0

SyncBreaker jointly attacks image and audio streams with Multi-Interval Sampling and Cross-Attention Fooling to degrade speech-driven talking head generation more than single-modality baselines.

VideoGPT: Video Generation using VQ-VAE and Transformers

cs.CV · 2021-04-20 · accept · novelty 6.0

VideoGPT generates competitive natural videos by learning discrete latents with VQ-VAE and modeling them autoregressively with a transformer.

Evidence-based Decision Modeling for Synthetic Face Detection with Uncertainty-driven Active Learning

cs.CV · 2026-05-11 · unverdicted · novelty 5.0

EMSFD uses Dirichlet-based evidence modeling to capture prediction uncertainty in synthetic face detection and applies uncertainty-driven active learning to achieve 15% higher accuracy than prior methods.

Exploring and Exploiting Stability in Latent Flow Matching

cs.LG · 2026-05-08 · unverdicted · novelty 5.0

Latent Flow Matching models exhibit inherent stability to data reduction and model shrinkage due to the flow matching objective, enabling reduced-dataset training and two-stage inference with over 2x speedup while preserving output quality.

citing papers explorer

Showing 32 of 32 citing papers.

AI safety via debate stat.ML · 2018-05-02 · conditional · none · ref 21 · internal anchor
AI agents trained through competitive debate can allow polynomial-time human judges to oversee PSPACE-level questions, with MNIST experiments boosting sparse classifier accuracy from 59% to 89% using only 6 pixels.
ImageAttributionBench: How Far Are We from Generalizable Attribution? cs.CV · 2026-05-13 · unverdicted · none · ref 37 · internal anchor
ImageAttributionBench is a benchmark dataset demonstrating that state-of-the-art image attribution methods lack robustness to image degradation and fail to generalize to semantically disjoint domains.
What Cohort INRs Encode and Where to Freeze Them cs.LG · 2026-05-08 · unverdicted · none · ref 28 · internal anchor
Optimal INR freeze depth matches highest weight stable rank layer; SAEs reveal SIREN atoms are localized while FFMLP atoms trace cohort contours with causal impact on PSNR.
LEGO: LoRA-Enabled Generator-Oriented Framework for Synthetic Image Detection cs.CV · 2026-05-06 · unverdicted · none · ref 15 · internal anchor
LEGO uses multiple generator-specific LoRA modules modulated by an MLP and fused with attention to detect synthetic images, achieving better performance than prior methods while using under 10% of the training data.
ID-Eraser: Proactive Defense Against Face Swapping via Identity Perturbation cs.CV · 2026-04-23 · unverdicted · none · ref 34 · internal anchor
A feature-space method that erases usable identity information from face images via learnable perturbations and a Face Revive Generator, rendering them ineffective for deepfake swapping while preserving visual quality.
ReImagine: Rethinking Controllable High-Quality Human Video Generation via Image-First Synthesis cs.CV · 2026-04-21 · unverdicted · none · ref 19 · internal anchor
ReImagine decouples human appearance from temporal consistency via pretrained image backbones, SMPL-X motion guidance, and training-free video diffusion refinement to generate high-quality controllable videos.
MetaCloak-JPEG: JPEG-Robust Adversarial Perturbation for Preventing Unauthorized DreamBooth-Based Deepfake Generation cs.CV · 2026-04-20 · unverdicted · none · ref 12 · internal anchor
MetaCloak-JPEG uses a DiffJPEG layer with straight-through estimator inside a JPEG-aware EOT and curriculum meta-learning loop to produce l-inf bounded perturbations that retain 91.3% effectiveness after real JPEG compression.
ExpertEdit: Learning Skill-Aware Motion Editing from Expert Videos cs.CV · 2026-04-12 · unverdicted · none · ref 25 · internal anchor
ExpertEdit edits novice motions to expert skill levels by learning a motion prior from unpaired videos and infilling masked skill-critical spans.
Diffusion Posterior Sampling for General Noisy Inverse Problems stat.ML · 2022-09-29 · unverdicted · none · ref 73 · internal anchor
Diffusion models solve noisy (non)linear inverse problems via approximated posterior sampling that blends diffusion steps with manifold gradients without strict consistency projection.
High-Resolution Image Synthesis with Latent Diffusion Models cs.CV · 2021-12-20 · conditional · none · ref 39 · internal anchor
Latent diffusion models achieve state-of-the-art inpainting and competitive results on unconditional generation, scene synthesis, and super-resolution by performing the diffusion process in the latent space of pretrained autoencoders with cross-attention conditioning, while cutting computational and
SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations cs.CV · 2021-08-02 · conditional · none · ref 6 · internal anchor
SDEdit performs guided image synthesis and editing by adding noise to inputs and refining them via denoising with a diffusion model's SDE prior, outperforming GAN methods in human studies without task-specific training.
The Diffusion Encoder cs.LG · 2026-05-13 · unverdicted · none · ref 36 · internal anchor
A diffusion model serves as the encoder in an autoencoder when trained alternately with the decoder to resolve opposing update directions while retaining the standard diffusion training objective.
Deep Probabilistic Unfolding for Quantized Compressive Sensing cs.CV · 2026-05-12 · unverdicted · none · ref 13 · internal anchor
A probabilistic unfolding network with stable likelihood projection and dual-domain Mamba achieves state-of-the-art reconstruction in quantized compressive sensing.
DiffATS: Diffusion in Aligned Tensor Space cs.LG · 2026-05-10 · unverdicted · none · ref 23 · internal anchor
DiffATS trains diffusion models directly on aligned Tucker tensor primitives that are proven to be homeomorphisms, delivering efficient unconditional and conditional generation across images, videos, and PDE data with high compression.
Decoupling Semantics and Fingerprints: A Universal Representation for AI-Generated Image Detection cs.CV · 2026-05-08 · unverdicted · none · ref 2 · internal anchor
ODP-Net structurally disentangles universal forgery traces from generator fingerprints and semantics via orthogonal decomposition and purification, delivering state-of-the-art generalization to unseen AI image generators such as Stable Diffusion 3.
Conditional Diffusion Under Linear Constraints: Langevin Mixing and Information-Theoretic Guarantees cs.LG · 2026-05-06 · unverdicted · none · ref 7 · internal anchor
Error in approximating the tangent conditional score by the unconditional score in diffusion models is bounded by dimension-free conditional mutual information, with a projected-Langevin method outperforming baselines in inpainting and super-resolution.
A Few-Step Generative Model on Cumulative Flow Maps cs.LG · 2026-05-05 · unverdicted · none · ref 71 · internal anchor
Cumulative flow maps unify few-step generative modeling for diffusion and flow models via cumulative transport and parameterization with minimal changes to time embeddings and objectives.
Which Face and Whose Identity? Solving the Dual Challenge of Deepfake Proactive Forensics in Multi-Face Scenarios cs.CV · 2026-04-29 · unverdicted · none · ref 17 · internal anchor
DAWF embeds identity watermarks via a parallel multi-face architecture and uses selective loss to answer which face was forged and whose identity was used.
Selective Depthwise Separable Convolution for Lightweight Joint Source-Channel Coding in Wireless Image Transmission eess.IV · 2026-04-24 · unverdicted · none · ref 16 · internal anchor
A selective replacement of convolutional layers by depthwise separable convolutions in JSCC systems cuts parameters substantially while keeping reconstruction performance nearly intact for wireless image transmission.
Combating Pattern and Content Bias: Adversarial Feature Learning for Generalized AI-Generated Image Detection cs.CV · 2026-04-14 · unverdicted · none · ref 46 · internal anchor
MAFL uses adversarial training to suppress pattern and content biases, guiding models to learn shared generative features for better cross-model generalization in detecting AI images.
SyncBreaker:Stage-Aware Multimodal Adversarial Attacks on Audio-Driven Talking Head Generation cs.CV · 2026-04-09 · unverdicted · none · ref 26 · internal anchor
SyncBreaker jointly attacks image and audio streams with Multi-Interval Sampling and Cross-Attention Fooling to degrade speech-driven talking head generation more than single-modality baselines.
VideoGPT: Video Generation using VQ-VAE and Transformers cs.CV · 2021-04-20 · accept · none · ref 18 · internal anchor
VideoGPT generates competitive natural videos by learning discrete latents with VQ-VAE and modeling them autoregressively with a transformer.
Evidence-based Decision Modeling for Synthetic Face Detection with Uncertainty-driven Active Learning cs.CV · 2026-05-11 · unverdicted · none · ref 17 · internal anchor
EMSFD uses Dirichlet-based evidence modeling to capture prediction uncertainty in synthetic face detection and applies uncertainty-driven active learning to achieve 15% higher accuracy than prior methods.
Exploring and Exploiting Stability in Latent Flow Matching cs.LG · 2026-05-08 · unverdicted · none · ref 10 · internal anchor
Latent Flow Matching models exhibit inherent stability to data reduction and model shrinkage due to the flow matching objective, enabling reduced-dataset training and two-stage inference with over 2x speedup while preserving output quality.
Structured Diffusion Bridges: Inductive Bias for Denoising Diffusion Bridges cs.LG · 2026-05-03 · unverdicted · none · ref 85 · 2 links · internal anchor
A structured diffusion bridge method achieves near fully-paired modality translation quality using alignment constraints even in unpaired or semi-paired regimes.
Mesh Based Simulations with Spatial and Temporal awareness cs.LG · 2026-05-02 · unverdicted · none · ref 28 · internal anchor
A unified training framework for mesh-based ML surrogates in CFD improves accuracy and long-horizon stability by enforcing spatial derivative consistency via multi-node prediction, using temporal cross-attention correction, and adding 3D rotary positional embeddings.
IdentiFace: Multi-Modal Iterative Diffusion Framework for Identifiable Suspect Face Generation in Crime Investigations cs.CV · 2026-05-01 · unverdicted · none · ref 15 · internal anchor
IdentiFace is a multi-modal iterative diffusion framework that generates identifiable suspect faces with improved identity retrieval for law enforcement applications.
HiMix: Hierarchical Artifact-aware Mixup for Generalized Synthetic Image Detection cs.CV · 2026-04-30 · unverdicted · none · ref 18 · internal anchor
HiMix combines mixup augmentation to create transitional real-fake samples with hierarchical global-local artifact feature fusion to achieve better generalization in detecting AI-generated images from unseen generators.
The Amazing Stability of Flow Matching cs.CV · 2026-04-17 · unverdicted · none · ref 16 · internal anchor
Flow matching generative models preserve sample quality, diversity, and latent representations despite pruning 50% of the CelebA-HQ dataset or altering architecture and training configurations.
DiffMagicFace: Identity Consistent Facial Editing of Real Videos cs.CV · 2026-04-15 · unverdicted · none · ref 44 · internal anchor
DiffMagicFace uses concurrent fine-tuned text and image diffusion models plus a rendered multi-view dataset to achieve identity-consistent text-conditioned editing of real facial videos.
Adaptive Forensic Feature Refinement via Intrinsic Importance Perception cs.CV · 2026-04-18 · unverdicted · none · ref 23 · internal anchor
I2P adaptively selects the most discriminative layers from visual foundation models for synthetic image detection and constrains task updates to low-sensitivity parameter subspaces to improve specificity without harming generalization.
Evolution of Video Generative Foundations cs.CV · 2026-04-07 · unverdicted · none · ref 44 · internal anchor
This survey traces video generation technology from GANs to diffusion models and then to autoregressive and multimodal approaches while analyzing principles, strengths, and future trends.

Progressive Growing of GANs for Improved Quality, Stability, and Variation

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer