GPT-f, a transformer-based prover for Metamath, generated new short proofs that were accepted into the main library—the first such contribution from a deep-learning system.
hub Baseline reference
Progressive Growing of GANs for Improved Quality, Stability, and Variation
Baseline reference. 50% of citing Pith papers use this work as a benchmark or comparison.
abstract
We describe a new training methodology for generative adversarial networks. The key idea is to grow both the generator and discriminator progressively: starting from a low resolution, we add new layers that model increasingly fine details as training progresses. This both speeds the training up and greatly stabilizes it, allowing us to produce images of unprecedented quality, e.g., CelebA images at 1024^2. We also propose a simple way to increase the variation in generated images, and achieve a record inception score of 8.80 in unsupervised CIFAR10. Additionally, we describe several implementation details that are important for discouraging unhealthy competition between the generator and discriminator. Finally, we suggest a new metric for evaluating GAN results, both in terms of image quality and variation. As an additional contribution, we construct a higher-quality version of the CelebA dataset.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
AI agents trained through competitive debate can allow polynomial-time human judges to oversee PSPACE-level questions, with MNIST experiments boosting sparse classifier accuracy from 59% to 89% using only 6 pixels.
ImageAuditor is the first MIA for IRAG that achieves over 80% AUROC with four queries by using reward-guided policy optimization for cross-modal retrieval and task-specific prompting for signal extraction.
FiSeR uses coarse contrastive separation of natural vs synthetic images plus fine contrastive grouping by generator identity to improve cross-domain AUROC by +10.22 over DIRE baseline on multiple test sets.
Spectral Guidance learns singular functions via self-supervised objective to project guidance signals onto diffusion sampling trajectories, enabling stable control without retraining or backpropagation and improving CIFAR-10 accuracy by 37 points with 4x faster sampling.
MM-SOLD is a training-free particle sampler whose large-particle limit converges to a moment-matched Gibbs distribution obtained by exponentially tilting a score-smoothed target.
ImageAttributionBench is a benchmark dataset demonstrating that state-of-the-art image attribution methods lack robustness to image degradation and fail to generalize to semantically disjoint domains.
Optimal INR freeze depth matches highest weight stable rank layer; SAEs reveal SIREN atoms are localized while FFMLP atoms trace cohort contours with causal impact on PSNR.
LEGO uses multiple generator-specific LoRA modules modulated by an MLP and fused with attention to detect synthetic images, achieving better performance than prior methods while using under 10% of the training data.
DAWF introduces isolated identity attribution spaces and selective regional supervision to unify detection, localization, and source tracing for multi-face deepfakes.
A feature-space method that erases usable identity information from face images via learnable perturbations and a Face Revive Generator, rendering them ineffective for deepfake swapping while preserving visual quality.
ReImagine decouples human appearance from temporal consistency via pretrained image backbones, SMPL-X motion guidance, and training-free video diffusion refinement to generate high-quality controllable videos.
MetaCloak-JPEG uses a DiffJPEG layer with straight-through estimator inside a JPEG-aware EOT and curriculum meta-learning loop to produce l-inf bounded perturbations that retain 91.3% effectiveness after real JPEG compression.
ExpertEdit edits novice motions to expert skill levels by learning a motion prior from unpaired videos and infilling masked skill-critical spans.
LPNSR derives optimal intermediate noise for diffusion SR via MLE and implements it with an LR-guided noise predictor, reaching SOTA perceptual quality in 4 steps without text priors.
Transport-geodesic attribution via optimal generative flows selects principled paths for feature attributions by minimizing kinetic action.
Factored Classifier-Free Guidance enables per-attribute control in classifier-free guidance for diffusion models to produce more sound counterfactuals.
FakeReasoning is an MLLM-based framework for unified forgery detection and reasoning on AI-generated images, supported by the new MMFR-Dataset of 120K images and 378K annotations across 10 generators.
Diffusion models solve noisy (non)linear inverse problems via approximated posterior sampling that blends diffusion steps with manifold gradients without strict consistency projection.
Latent diffusion models achieve state-of-the-art inpainting and competitive results on unconditional generation, scene synthesis, and super-resolution by performing the diffusion process in the latent space of pretrained autoencoders with cross-attention conditioning, while cutting computational and
SDEdit performs guided image synthesis and editing by adding noise to inputs and refining them via denoising with a diffusion model's SDE prior, outperforming GAN methods in human studies without task-specific training.
Proposes cINN architecture for conditional image generation that by construction yields diverse sharp samples, demonstrated on MNIST digit generation and image colorization with latent space manipulation.
Envisage applies FLUX.1 inpainting to rhinoplasty goal visualization and shows via SurgicalScore that mask-decomposed metrics outperform full-face identity scores for hard-composited localized edits.
CAT achieves FID-50K of 1.56 on ImageNet-256 with one-step inference after 60 epochs by aligning intermediate GAN outputs to the final sample.
citing papers explorer
-
Generative Language Modeling for Automated Theorem Proving
GPT-f, a transformer-based prover for Metamath, generated new short proofs that were accepted into the main library—the first such contribution from a deep-learning system.
-
AI safety via debate
AI agents trained through competitive debate can allow polynomial-time human judges to oversee PSPACE-level questions, with MNIST experiments boosting sparse classifier accuracy from 59% to 89% using only 6 pixels.
-
ImageAuditor: Membership Inference Attack against Image-based Retrieval-Augmented Generation
ImageAuditor is the first MIA for IRAG that achieves over 80% AUROC with four queries by using reward-guided policy optimization for cross-modal retrieval and task-specific prompting for signal extraction.
-
FiSeR: Fine-Grained Source Representations for Cross-Domain AI Image Detection
FiSeR uses coarse contrastive separation of natural vs synthetic images plus fine contrastive grouping by generator identity to improve cross-domain AUROC by +10.22 over DIRE baseline on multiple test sets.
-
Spectral Guidance for Flexible and Efficient Control of Diffusion Models
Spectral Guidance learns singular functions via self-supervised objective to project guidance signals onto diffusion sampling trajectories, enabling stable control without retraining or backpropagation and improving CIFAR-10 accuracy by 37 points with 4x faster sampling.
-
Training-Free Generative Sampling via Moment-Matched Score Smoothing
MM-SOLD is a training-free particle sampler whose large-particle limit converges to a moment-matched Gibbs distribution obtained by exponentially tilting a score-smoothed target.
-
ImageAttributionBench: How Far Are We from Generalizable Attribution?
ImageAttributionBench is a benchmark dataset demonstrating that state-of-the-art image attribution methods lack robustness to image degradation and fail to generalize to semantically disjoint domains.
-
What Cohort INRs Encode and Where to Freeze Them
Optimal INR freeze depth matches highest weight stable rank layer; SAEs reveal SIREN atoms are localized while FFMLP atoms trace cohort contours with causal impact on PSNR.
-
LEGO: LoRA-Enabled Generator-Oriented Framework for Synthetic Image Detection
LEGO uses multiple generator-specific LoRA modules modulated by an MLP and fused with attention to detect synthetic images, achieving better performance than prior methods while using under 10% of the training data.
-
Whether, Which, and Whose: Solving the Triple Challenge of Deepfake Proactive Forensics in Multi-Face Scenarios
DAWF introduces isolated identity attribution spaces and selective regional supervision to unify detection, localization, and source tracing for multi-face deepfakes.
-
ID-Eraser: Proactive Defense Against Face Swapping via Identity Perturbation
A feature-space method that erases usable identity information from face images via learnable perturbations and a Face Revive Generator, rendering them ineffective for deepfake swapping while preserving visual quality.
-
ReImagine: Rethinking Controllable High-Quality Human Video Generation via Image-First Synthesis
ReImagine decouples human appearance from temporal consistency via pretrained image backbones, SMPL-X motion guidance, and training-free video diffusion refinement to generate high-quality controllable videos.
-
MetaCloak-JPEG: JPEG-Robust Adversarial Perturbation for Preventing Unauthorized DreamBooth-Based Deepfake Generation
MetaCloak-JPEG uses a DiffJPEG layer with straight-through estimator inside a JPEG-aware EOT and curriculum meta-learning loop to produce l-inf bounded perturbations that retain 91.3% effectiveness after real JPEG compression.
-
ExpertEdit: Learning Skill-Aware Motion Editing from Expert Videos
ExpertEdit edits novice motions to expert skill levels by learning a motion prior from unpaired videos and infilling masked skill-critical spans.
-
LPNSR: Optimal Noise-Guided Diffusion Image Super-Resolution Via Learnable Noise Prediction
LPNSR derives optimal intermediate noise for diffusion SR via MLE and implements it with an LR-guided noise predictor, reaching SOTA perceptual quality in 4 steps without text priors.
-
From Baselines to Transport Geodesics: Axiomatic Attribution via Optimal Generative Flows
Transport-geodesic attribution via optimal generative flows selects principled paths for feature attributions by minimizing kinetic action.
-
Factored Classifier-Free Guidance
Factored Classifier-Free Guidance enables per-attribute control in classifier-free guidance for diffusion models to produce more sound counterfactuals.
-
Toward Generalizable Forgery Detection and Reasoning
FakeReasoning is an MLLM-based framework for unified forgery detection and reasoning on AI-generated images, supported by the new MMFR-Dataset of 120K images and 378K annotations across 10 generators.
-
Diffusion Posterior Sampling for General Noisy Inverse Problems
Diffusion models solve noisy (non)linear inverse problems via approximated posterior sampling that blends diffusion steps with manifold gradients without strict consistency projection.
-
High-Resolution Image Synthesis with Latent Diffusion Models
Latent diffusion models achieve state-of-the-art inpainting and competitive results on unconditional generation, scene synthesis, and super-resolution by performing the diffusion process in the latent space of pretrained autoencoders with cross-attention conditioning, while cutting computational and
-
SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations
SDEdit performs guided image synthesis and editing by adding noise to inputs and refining them via denoising with a diffusion model's SDE prior, outperforming GAN methods in human studies without task-specific training.
-
Guided Image Generation with Conditional Invertible Neural Networks
Proposes cINN architecture for conditional image generation that by construction yields diverse sharp samples, demonstrated on MNIST digit generation and image colorization with latent space manipulation.
-
Envisage: Diffusion-Based Rhinoplasty Goal Visualization with Mask-Decomposed Evaluation
Envisage applies FLUX.1 inpainting to rhinoplasty goal visualization and shows via SurgicalScore that mask-decomposed metrics outperform full-face identity scores for hard-composited localized edits.
-
Cross-scale Aligned Supervision for Training GANs
CAT achieves FID-50K of 1.56 on ImageNet-256 with one-step inference after 60 epochs by aligning intermediate GAN outputs to the final sample.
-
Controlla: Learning Controllability via Graph-Constrained Latent Geometry
Controlla learns identity and attribute factors from multimodal inputs and aligns them with graph priors using graph-constrained optimal transport to enforce consistent attribute trajectories while preserving reference identity.
-
Reduce the Artifacts Bias for More Generalizable AI-Generated Image Detection
SEF introduces GAN upsampling for diverse artifacts and expert fusion to reduce domain interference, yielding stronger generalization on 13 benchmarks for AI-generated image detection.
-
The Diffusion Encoder
A diffusion model serves as the encoder in an autoencoder when trained alternately with the decoder to resolve opposing update directions while retaining the standard diffusion training objective.
-
Deep Probabilistic Unfolding for Quantized Compressive Sensing
A probabilistic unfolding network with stable likelihood projection and dual-domain Mamba achieves state-of-the-art reconstruction in quantized compressive sensing.
-
DiffATS: Diffusion in Aligned Tensor Space
DiffATS trains diffusion models directly on aligned Tucker tensor primitives that are proven to be homeomorphisms, delivering efficient unconditional and conditional generation across images, videos, and PDE data with high compression.
-
Decoupling Semantics and Fingerprints: A Universal Representation for AI-Generated Image Detection
ODP-Net uses instance-aware orthogonal decomposition, perturbation-based purification, and manifold alignment to separate universal forgery traces, generator fingerprints, and semantics, achieving SOTA on unseen architectures like Stable Diffusion 3.
-
Conditional Diffusion Under Linear Constraints: Langevin Mixing and Information-Theoretic Guarantees
Error in approximating the tangent conditional score by the unconditional score in diffusion models is bounded by dimension-free conditional mutual information, with a projected-Langevin method outperforming baselines in inpainting and super-resolution.
-
A Few-Step Generative Model on Cumulative Flow Maps
Cumulative flow maps unify few-step generative modeling for diffusion and flow models via cumulative transport and parameterization with minimal changes to time embeddings and objectives.
-
Selective Depthwise Separable Convolution for Lightweight Joint Source-Channel Coding in Wireless Image Transmission
A selective replacement of convolutional layers by depthwise separable convolutions in JSCC systems cuts parameters substantially while keeping reconstruction performance nearly intact for wireless image transmission.
-
Combating Pattern and Content Bias: Adversarial Feature Learning for Generalized AI-Generated Image Detection
MAFL uses adversarial training to suppress pattern and content biases, guiding models to learn shared generative features for better cross-model generalization in detecting AI images.
-
SyncBreaker:Stage-Aware Multimodal Adversarial Attacks on Audio-Driven Talking Head Generation
SyncBreaker jointly attacks image and audio streams with Multi-Interval Sampling and Cross-Attention Fooling to degrade speech-driven talking head generation more than single-modality baselines.
-
Mitigating Membership Inference in Intermediate Representations with Differentially Private Training
LM-DP-SGD estimates layer-specific MIA risks from shadow models and reweights gradients to give stronger protection to vulnerable layers, improving the privacy-utility trade-off over uniform DP-SGD.
-
Implicit Neural Representation-Based Continuous Single Image Super-Resolution: An Empirical Benchmark
Systematic benchmark reveals recent complex INR methods for continuous image super-resolution offer only marginal gains, with performance tied to training setups, auxiliary losses improving textures, and scaling laws holding.
-
Open Set Face Forgery Detection via Dual-Level Evidence Collection
DLED reformulates open-set face forgery detection as an uncertainty estimation task and uses dual-level spatial-frequency evidence collection to identify novel fake categories, claiming 20% average gains over baselines.
-
Training-Free Reward-Guided Image Editing via Trajectory Optimal Control
A trajectory optimal control framework for reward-guided image editing in diffusion models that balances reward maximization with source fidelity better than prior inversion-based baselines.
-
DiffClean: Diffusion-based Makeup Removal for Accurate Age Estimation
DiffClean applies text-guided diffusion to erase makeup from faces, boosting age estimation and verification accuracy over makeup-affected images.
-
Navigating the Challenges of AI-Generated Image Detection in the Wild: What Truly Matters?
The ITW-SM dataset and targeted optimization of detector design choices yield a 26.87% average AUC improvement for state-of-the-art AI-generated image detectors under real-world social media conditions.
-
NullFace: Training-Free Localized Face Anonymization
NullFace performs training-free localized face anonymization by inverting images to noise and denoising with modified identity embeddings from a pre-trained diffusion model.
-
VideoGPT: Video Generation using VQ-VAE and Transformers
VideoGPT generates competitive natural videos by learning discrete latents with VQ-VAE and modeling them autoregressively with a transformer.
-
Rethinking FID Through the Geometry of the Reference Dataset
FID improves with better samples only on concentrated reference datasets but can worsen on dispersed ones, as shown by density and effective rank in a controlled study across six datasets.
-
Multi-Objective Learning for Diffusion Models: A Statistical Theory under Semi-Supervised Learning
A semi-supervised MOL framework for diffusion models with generalization bounds depending only on specialist model complexity, extended to diffusion policies for sequential decisions.
-
HDRFace: Rethinking Face Restoration with High-Dimensional Representation
HDRFace injects high-dimensional facial features from low-quality and intermediate images into diffusion models via SDFM fusion, reporting gains on SD V2.1-base and Qwen-Image.
-
Evidence-based Decision Modeling for Synthetic Face Detection with Uncertainty-driven Active Learning
EMSFD uses Dirichlet-based evidence modeling to capture prediction uncertainty in synthetic face detection and applies uncertainty-driven active learning to achieve 15% higher accuracy than prior methods.
-
Exploring and Exploiting Stability in Latent Flow Matching
LFM models exhibit stability to data reduction and capacity shrinkage that is tied to the flow matching objective, enabling reduced-data training and coarse-to-fine inference with over 2x speedup.
-
Structured Diffusion Bridges: Inductive Bias for Denoising Diffusion Bridges
A structured diffusion bridge method achieves near fully-paired modality translation quality using alignment constraints even in unpaired or semi-paired regimes.
-
Mesh Based Simulations with Spatial and Temporal awareness
A unified training framework for mesh-based ML surrogates in CFD improves accuracy and long-horizon stability by enforcing spatial derivative consistency via multi-node prediction, using temporal cross-attention correction, and adding 3D rotary positional embeddings.