DDIMs construct non-Markovian diffusion processes that share DDPM training objectives but allow much faster reverse sampling, demonstrated empirically at 10-50x wall-clock speedup.
hub Baseline reference
A Style-Based Generator Architecture for Generative Adversarial Networks
Baseline reference. 50% of citing Pith papers use this work as a benchmark or comparison.
abstract
We propose an alternative generator architecture for generative adversarial networks, borrowing from style transfer literature. The new architecture leads to an automatically learned, unsupervised separation of high-level attributes (e.g., pose and identity when trained on human faces) and stochastic variation in the generated images (e.g., freckles, hair), and it enables intuitive, scale-specific control of the synthesis. The new generator improves the state-of-the-art in terms of traditional distribution quality metrics, leads to demonstrably better interpolation properties, and also better disentangles the latent factors of variation. To quantify interpolation quality and disentanglement, we propose two new, automated methods that are applicable to any generator architecture. Finally, we introduce a new, highly varied and high-quality dataset of human faces.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
Data geometry makes time identifiable from noisy interpolants at rate O(1/sqrt(d-k)), rendering the time-blindness gap asymptotically negligible relative to coupling variance.
Creativity is defined as meta-learning where a frozen diffusion creator optimizes candidates for rapid improvement by an adapting appraiser such as an autoencoder or CLIP adapter.
A U-Net GAN reconstructs CMB T and E maps from Planck-like simulations with foregrounds and systematics, achieving under 1% error outside the Galactic region and demonstrating first-time correction for non-circular beams and asymmetric scans.
A 3.5-billion-parameter diffusion model with classifier-free guidance generates images preferred over DALL-E by human raters and can be fine-tuned for text-guided inpainting.
Diffusion models with architecture improvements and classifier guidance achieve superior FID scores to GANs on unconditional and conditional ImageNet image synthesis.
Derives closed-form optimal loss for unified diffusion models, provides variance-controlled estimators, and shows improved diagnosis, training schedules, and power-law scaling after subtracting the optimal value.
Score-based generative modeling via multi-noise-level score matching and annealed Langevin dynamics produces samples on par with GANs and sets a new inception score record on CIFAR-10.
Adversarial perturbations disrupt DNN-based face detectors under white-box, gray-box, and black-box settings to sabotage training data for AI face synthesis.
Adversarial training on time-frequency representations yields consistent gains in frame-level and note-level accuracy over the Onsets and Frames baseline for automatic music transcription.
The paper shows that multiple-identity image attacks succeed due to modest angular separation between matching (~90°) and non-matching (40-60°) face representations, with image morphing and representation inversion realizing effective attacks that transfer across comparators.
Prior-Aligned AutoEncoders shape latent manifolds with spatial coherence, local continuity, and global semantics to improve latent diffusion, achieving SOTA gFID 1.03 on ImageNet 256x256 with up to 13x faster convergence.
LatRef-Diff replaces semantic directions in diffusion models with latent and reference-guided style codes, uses a hierarchical style modulation module, and applies forward-backward consistency training to achieve state-of-the-art facial attribute editing and style manipulation on CelebA-HQ.
ANL uses diffusion noise prediction and attention to regularize deepfake detectors for better generalization to unseen synthesis methods without added inference cost.
Presents the CCAI ontology and SPARQL retrieval method to convert ephemeral Human-Generative AI prompt interactions into explicit, machine-readable collaboration traces, illustrated in a competency-profile software case study.
FRAMER improves real-world super-resolution by decomposing features into low- and high-frequency bands via FFT, applying intra- and inter-contrastive losses with adaptive modulators, and using the final layer as teacher for intermediate layers during diffusion denoising.
NS-Net uses null-space projection on CLIP features plus contrastive learning and patch selection to improve generalization of AI-generated image detectors across 40 unseen generative models.
CCNETS is a new modular causal framework using three cooperative modules and a Zoint mechanism to align synthetic data generation with classifier needs on imbalanced pattern recognition tasks.
A conditional GAN fuses gene expression profiles with background images at multiple scales to generate synthetic nodule images and learn radiogenomic correlations end-to-end on NSCLC data.
Visual generation models are evolving from passive renderers to interactive agentic world modelers, but current systems lack spatial reasoning, temporal consistency, and causal understanding, with evaluations overemphasizing perceptual quality.
AttDiff-GAN decouples attribute manipulation via feature-level adversarial learning and guides diffusion generation with the edited features, plus PriorMapper and RefineExtractor modules, to achieve more accurate edits and better non-target preservation on CelebA-HQ.
Applies forensic psychology profiling to characterize AI risks via nine features and proposes cognitive sovereignty, measurable control, and partial autonomy as a framework for an AI-resilient society.
A conditional Wasserstein GAN generates plausible future SWI drought trajectories for French insurance risk management under climate change.
SAGE-GAN integrates a self-attention U-Net into a CycleGAN framework to generate realistic synthetic electron microscopy image-mask pairs that augment training data for nanoparticle segmentation without human labeling.
citing papers explorer
-
Denoising Diffusion Implicit Models
DDIMs construct non-Markovian diffusion processes that share DDPM training objectives but allow much faster reverse sampling, demonstrated empirically at 10-50x wall-clock speedup.
-
What Time Is It? How Data Geometry Makes Time Conditioning Optional for Flow Matching
Data geometry makes time identifiable from noisy interpolants at rate O(1/sqrt(d-k)), rendering the time-blindness gap asymptotically negligible relative to coupling variance.
-
Seeking the Unfamiliar but Memorable: Conceptual Creativity as Meta-Learning
Creativity is defined as meta-learning where a frozen diffusion creator optimizes candidates for rapid improvement by an adapting appraiser such as an autoencoder or CLIP adapter.
-
Deep Learning for CMB Foreground Removal and Beam Deconvolution: A U-Net GAN Approach
A U-Net GAN reconstructs CMB T and E maps from Planck-like simulations with foregrounds and systematics, achieving under 1% error outside the Galactic region and demonstrating first-time correction for non-circular beams and asymmetric scans.
-
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
A 3.5-billion-parameter diffusion model with classifier-free guidance generates images preferred over DALL-E by human raters and can be fine-tuned for text-guided inpainting.
-
Diffusion Models Beat GANs on Image Synthesis
Diffusion models with architecture improvements and classifier guidance achieve superior FID scores to GANs on unconditional and conditional ImageNet image synthesis.
-
Diagnosing and Improving Diffusion Models by Estimating the Optimal Loss Value
Derives closed-form optimal loss for unified diffusion models, provides variance-controlled estimators, and shows improved diagnosis, training schedules, and power-law scaling after subtracting the optimal value.
-
Generative Modeling by Estimating Gradients of the Data Distribution
Score-based generative modeling via multi-noise-level score matching and annealed Langevin dynamics produces samples on par with GANs and sets a new inception score record on CIFAR-10.
-
Hiding Faces in Plain Sight: Disrupting AI Face Synthesis with Adversarial Perturbations
Adversarial perturbations disrupt DNN-based face detectors under white-box, gray-box, and black-box settings to sabotage training data for AI face synthesis.
-
Adversarial Learning for Improved Onsets and Frames Music Transcription
Adversarial training on time-frequency representations yields consistent gains in frame-level and note-level accuracy over the Onsets and Frames baseline for automatic music transcription.
-
Multiple-Identity Image Attacks Against Face-based Identity Verification
The paper shows that multiple-identity image attacks succeed due to modest angular separation between matching (~90°) and non-matching (40-60°) face representations, with image morphing and representation inversion realizing effective attacks that transfer across comparators.
-
What Matters for Diffusion-Friendly Latent Manifold? Prior-Aligned Autoencoders for Latent Diffusion
Prior-Aligned AutoEncoders shape latent manifolds with spatial coherence, local continuity, and global semantics to improve latent diffusion, achieving SOTA gFID 1.03 on ImageNet 256x256 with up to 13x faster convergence.
-
LatRef-Diff: Latent and Reference-Guided Diffusion for Facial Attribute Editing and Style Manipulation
LatRef-Diff replaces semantic directions in diffusion models with latent and reference-guided style codes, uses a hierarchical style modulation module, and applies forward-backward consistency training to achieve state-of-the-art facial attribute editing and style manipulation on CelebA-HQ.
-
Deepfake Detection Generalization with Diffusion Noise
ANL uses diffusion noise prediction and attention to regularize deepfake detectors for better generalization to unseen synthesis methods without added inference cost.
-
From Prompts to Context: An Ontology-Driven Framework for Human-Generative AI Collaboration
Presents the CCAI ontology and SPARQL retrieval method to convert ephemeral Human-Generative AI prompt interactions into explicit, machine-readable collaboration traces, illustrated in a competency-profile software case study.
-
FRAMER: Frequency-Aligned Self-Distillation with Adaptive Modulation Leveraging Diffusion Priors for Real-World Image Super-Resolution
FRAMER improves real-world super-resolution by decomposing features into low- and high-frequency bands via FFT, applying intra- and inter-contrastive losses with adaptive modulators, and using the final layer as teacher for intermediate layers during diffusion denoising.
-
NS-Net: Decoupling CLIP Semantic Information through NULL-Space for Generalizable AI-Generated Image Detection
NS-Net uses null-space projection on CLIP features plus contrastive learning and patch selection to improve generalization of AI-generated image detectors across 40 unseen generative models.
-
CCNETS: A Modular Causal Learning Framework for Pattern Recognition in Imbalanced Datasets
CCNETS is a new modular causal framework using three cooperative modules and a Zoint mechanism to align synthetic data generation with classifier needs on imbalanced pattern recognition tasks.
-
Correlation via synthesis: end-to-end nodule image generation and radiogenomic map learning based on generative adversarial network
A conditional GAN fuses gene expression profiles with background images at multiple scales to generate synthetic nodule images and learn radiogenomic correlations end-to-end on NSCLC data.
-
Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling
Visual generation models are evolving from passive renderers to interactive agentic world modelers, but current systems lack spatial reasoning, temporal consistency, and causal understanding, with evaluations overemphasizing perceptual quality.
-
AttDiff-GAN: A Hybrid Diffusion-GAN Framework for Facial Attribute Editing
AttDiff-GAN decouples attribute manipulation via feature-level adversarial learning and guides diffusion generation with the edited features, plus PriorMapper and RefineExtractor modules, to achieve more accurate edits and better non-target preservation on CelebA-HQ.
-
Why we need an AI-resilient society
Applies forensic psychology profiling to characterize AI risks via nine features and proposes cognitive sovereignty, measurable control, and partial autonomy as a framework for an AI-resilient society.
-
A Wasserstein GAN-based climate scenario generator for risk management and insurance: the case of soil subsidence
A conditional Wasserstein GAN generates plausible future SWI drought trajectories for French insurance risk management under climate change.
-
SAGE-GAN: Towards Realistic and Robust Segmentation of Spatially Ordered Nanoparticles via Attention-Guided GANs
SAGE-GAN integrates a self-attention U-Net into a CycleGAN framework to generate realistic synthetic electron microscopy image-mask pairs that augment training data for nanoparticle segmentation without human labeling.
-
Generative Modeling of Bach-Style Symbolic Music: A Comparative Study of Autoregressive, Latent-Variable, and Adversarial Approaches
Autoregressive LSTM with attention yields the most coherent Bach-style samples; vector quantization improves VAE structure over standard recurrent VAEs while GANs struggle with training stability and style generalization.