hub Mixed citations

Demystifying MMD GANs

· 2018 · stat.ML · arXiv 1801.01401

Mixed citation behavior. Most common role is method (50%).

49 Pith papers citing it

Method 50% of classified citations

open full Pith review browse 49 citing papers arXiv PDF

abstract

We investigate the training and performance of generative adversarial networks using the Maximum Mean Discrepancy (MMD) as critic, termed MMD GANs. As our main theoretical contribution, we clarify the situation with bias in GAN loss functions raised by recent work: we show that gradient estimators used in the optimization process for both MMD GANs and Wasserstein GANs are unbiased, but learning a discriminator based on samples leads to biased gradients for the generator parameters. We also discuss the issue of kernel choice for the MMD critic, and characterize the kernel corresponding to the energy distance used for the Cramer GAN critic. Being an integral probability metric, the MMD benefits from training strategies recently developed for Wasserstein GANs. In experiments, the MMD GAN is able to employ a smaller critic network than the Wasserstein GAN, resulting in a simpler and faster-training algorithm with matching performance. We also propose an improved measure of GAN convergence, the Kernel Inception Distance, and show how to use it to dynamically adapt learning rates during GAN training.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

method 3 background 2 dataset 1

citation-polarity summary

use method 3 background 2 use dataset 1

representative citing papers

Habitat-Matterport 3D Dataset (HM3D): 1000 Large-scale 3D Environments for Embodied AI

cs.CV · 2021-09-16 · accept · novelty 8.0

HM3D offers 1000 building-scale 3D environments that are larger and higher-fidelity than existing datasets, enabling better-performing embodied AI agents for tasks like PointGoal navigation.

Learning a Maximum Entropy Model for Visual Textures using Diffusion

cs.CV · 2026-06-15 · unverdicted · novelty 7.0

A diffusion-trained maximum entropy model uses 512 learned statistics to synthesize visual textures at quality matching or exceeding prior models that rely on ~177k statistics.

A Unifying Framework for Concept-Based Representational Similarity

cs.LG · 2026-06-08 · unverdicted · novelty 7.0

A unifying framework decomposes concept alignment into instance-wise and distributional translation and concept consistency, introduces the InterVenchA benchmark, and shows that joint optimization via CoSAE recovers strong alignment even with 0.1% paired data.

TrioPose: Native Triple-Stream Diffusion Transformers for Pose-Guided Text-to-Image Generation

cs.CV · 2026-06-05 · unverdicted · novelty 7.0

TrioPose proposes a Triple-Stream Pose-Aware DiT with relational bias masks and spatial loss weighting to achieve SOTA pose-guided text-to-image results on multi-person benchmarks like Human-Art.

Text-to-Image Models Need Less from Text Encoders Than You Think

cs.CV · 2026-06-02 · unverdicted · novelty 7.0

A bag-of-position-tagged-words embedding guides text-to-image diffusion models as effectively as full contextual text embeddings from standard encoders.

Aligning Few-Step Generative Models by Amortizing Sample-based Variational Inference

cs.LG · 2026-05-26 · unverdicted · novelty 7.0

FAV aligns few-step generative models by amortizing SVGD updates from reward-tilted sampling into generator parameters via fixed-point regression, requiring only sample access, and shows outperformance on robotics tasks plus scaling on image generators.

Distributed Image Compression with Multimodal Side Information at Extremely Low Bitrates

cs.CV · 2026-05-21 · unverdicted · novelty 7.0

MDIC uses a text-conditioned diffusion decoder and a supervised feature-mask generator on visual side information to achieve SOTA perceptual quality in distributed image compression at extremely low bitrates.

SeamCam: Quantifying Seamless Camouflage via Multi-Cue Visual Detectability

cs.CV · 2026-05-15 · conditional · novelty 7.0

SeamCam quantifies camouflage by computing one minus the highest IoU recoverable from category-conditioned detection proposals against a ground-truth mask, achieving 78.82% agreement with human judgments.

DirectTryOn: One-Step Virtual Try-On via Straightened Conditional Transport

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

DirectTryOn achieves state-of-the-art one-step virtual try-on performance by applying pure conditional transport, garment preservation loss, and self-consistency loss to straighten trajectories in pretrained generative models.

STRIDE: Training-Free Diversity Guidance via PCA-Directed Feature Perturbation in Single-Step Diffusion Models

cs.CV · 2026-05-12 · unverdicted · novelty 7.0

STRIDE boosts diversity in one-step diffusion models by injecting PCA-aligned pink noise into transformer features while preserving text alignment and quality.

Faithful Extreme Image Rescaling with Learnable Reversible Transformation and Semantic Priors

cs.CV · 2026-05-01 · unverdicted · novelty 7.0

FaithEIR combines learnable reversible latent transformations, an adaptive high-frequency detail prior, and semantic conditioning to outperform prior methods in fidelity and perceptual quality for extreme image rescaling.

OccDirector: Language-Guided Behavior and Interaction Generation in 4D Occupancy Space

cs.CV · 2026-04-24 · unverdicted · novelty 7.0

OccDirector uses a VLM-guided Spatio-Temporal MMDiT model with history anchoring to generate physically plausible 4D occupancy from language scripts, supported by the new OccInteract-85k dataset.

FIT: A Large-Scale Dataset for Fit-Aware Virtual Try-On

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

FIT is a large-scale dataset of 1.13M try-on triplets with exact size data plus a synthetic generation pipeline that enables training of virtual try-on models capable of depicting realistic garment fit including ill-fit cases.

Flow-Based Conformal Predictive Distributions

stat.ML · 2026-02-07 · unverdicted · novelty 7.0

Differentiable nonconformity scores induce flows that sample conformal prediction set boundaries, and mixing flows across levels produces conformal predictive distributions whose quantiles match the sets.

MAGIC: Few-Shot Mask-Guided Anomaly Inpainting with Prompt Perturbation, Spatially Adaptive Guidance, and Context Awareness

cs.CV · 2025-07-03 · unverdicted · novelty 7.0

MAGIC is a few-shot mask-guided anomaly inpainting framework using Gaussian prompt perturbation, spatially adaptive guidance, and context-aware mask alignment to produce high-fidelity, diverse anomalies that outperform prior methods on downstream detection tasks.

Diffusion Posterior Sampling for General Noisy Inverse Problems

stat.ML · 2022-09-29 · unverdicted · novelty 7.0

Diffusion models solve noisy (non)linear inverse problems via approximated posterior sampling that blends diffusion steps with manifold gradients without strict consistency projection.

SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations

cs.CV · 2021-08-02 · conditional · novelty 7.0

SDEdit performs guided image synthesis and editing by adding noise to inputs and refining them via denoising with a diffusion model's SDE prior, outperforming GAN methods in human studies without task-specific training.

WarpI2I: Image Warping for Image-to-Image Translation

cs.CV · 2026-06-30 · unverdicted · novelty 6.0

A saliency-guided warp-unwarp method reallocates spatial representation to preserve fine structures in latent diffusion models for image-to-image translation.

SatSplatDiff: Geometry-preserving generative refinement for high-fidelity satellite Gaussian Splatting

cs.CV · 2026-06-25 · unverdicted · novelty 6.0

SatSplatDiff combines depth supervision and shadow-guided generative refinement with 2DGS to reduce geometric MAE by up to 18% and improve visual fidelity by 28-45% on satellite datasets while enabling 5x resolution enhancement.

HiFiVe: High-Fidelity Vehicle Generation Leveraging Auto-Regressive 2D Generative Priors

cs.CV · 2026-06-24 · unverdicted · novelty 6.0

HiFiVe is a training-free framework using an auto-regressive texture refinement pipeline with depth-based warping, multi-view fusion, and symmetry to enhance both texture and geometry fidelity in vehicle generation from 2D priors.

C3VD-DEFCOL: A Deformable Colonoscopy Dataset with Time-Resolved 3D Ground Truth and Realistic Appearance

cs.CV · 2026-06-05 · unverdicted · novelty 6.0

C3VD-DEFCOL supplies 110 videos from 11 colon meshes with paired realistic RGB appearance and dense time-resolved 3D ground truth under three levels of parameterized peristaltic deformation for benchmarking deformable reconstruction.

Exploiting Semantic and Pixel Representations for Ultra-Low Bitrate Image Compression

cs.CV · 2026-06-01 · unverdicted · novelty 6.0

SPRDiff is a diffusion model for ultra-low bitrate image compression that fuses features from distortion-oriented, semantic-oriented, and VAE encoders plus a dual-feature reconstruction module to outperform prior methods on rate-distortion-perception trade-offs.

Unlearning in Diffusion Models: A Unified Framework with KL Divergence and Likelihood Constraints

cs.LG · 2026-05-29 · unverdicted · novelty 6.0

A constrained optimization framework for diffusion model unlearning via KL and likelihood constraints, with duality results and reported better retention-unlearning tradeoffs than weight-based baselines.

TOPOS: High-Fidelity and Efficient Industry-Grade 3D Head Generation

cs.CV · 2026-05-14 · unverdicted · novelty 6.0

TOPOS creates high-fidelity 3D heads with fixed industry topology from single images via a specialized VAE with Perceiver Resampler and a rectified flow transformer.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Conjuring Semantic Similarity cs.AI · 2024-10-21 · unverdicted · none · ref 7 · internal anchor
Semantic similarity between texts is measured by the Jeffreys divergence between the image distributions induced by conditioning a diffusion model on each text, computed via Monte-Carlo sampling of the reverse-time SDEs.
Text-Driven 3D Indoor Scene Synthesis in Non-Manhattan Environments cs.AI · 2026-07-02 · unverdicted · none · ref 18 · internal anchor
SPG-Layout combines statistical object priors with hierarchical large-object-first placement to produce physically plausible text-driven 3D scenes in non-Manhattan rooms and outperforms baselines on a new 500-scene benchmark.

Demystifying MMD GANs

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer