hub

Advances in neural information processing systems , volume=

Diffusion models beat gans on image synthesis , author=

28 Pith papers cite this work. Polarity classification is still indexing.

28 Pith papers citing it

browse 28 citing papers

hub tools

JSON dossier citing papers JSON

citation-role summary

background 2 dataset 1

citation-polarity summary

background 2 use dataset 1

representative citing papers

Functionalization via Structure Completion and Motion Rectification

cs.CV · 2026-05-18 · unverdicted · novelty 7.0

Object functionalization is cast as neural graph completion over a functional graph of parts, contacts, and motions, followed by geometry realization that also rectifies erroneous motions, demonstrated on furniture with a new paired dataset.

Designing streetscapes from street-view imagery using diffusion models

cs.CV · 2026-05-17 · conditional · novelty 7.0

A multimodal diffusion model generates controllable alternative streetscapes from street-view imagery using visual metrics and text, shown on Chicago and Orlando data with gains in semantic consistency.

Generating HDR Video from SDR Video

cs.CV · 2026-05-14 · unverdicted · novelty 7.0

A multi-exposure video model predicts bracketed linear SDR sequences from single nonlinear SDR input, which a merging model combines into HDR video preserving shadow and highlight detail.

Stylized Text-to-Motion Generation via Hypernetwork-Driven Low-Rank Adaptation

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

A hypernetwork maps style motion embeddings to LoRA updates that stylize text-driven motion diffusion models with improved generalization to unseen styles via contrastive structuring of the style space.

The tractability landscape of diffusion alignment: regularization, rewards, and computational primitives

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

The choice of closeness measure in diffusion reward alignment determines the computational primitives and tractable reward classes, with linear exponential tilts sufficing for KL with convex rewards and proximal oracles for Wasserstein with concave or low-dimensional Lipschitz rewards.

Relative Score Policy Optimization for Diffusion Language Models

cs.CL · 2026-05-11 · unverdicted · novelty 7.0

RSPO interprets reward advantages as targets for relative log-ratios in dLLMs, calibrating noisy estimates to stabilize RLVR training and achieve strong gains on planning tasks with competitive math reasoning performance.

Guidance Is Not a Hyperparameter: Learning Dynamic Control in Diffusion Language Models

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

Adaptive guidance trajectories learned via PPO outperform fixed-scale CFG on controllability-quality balance in three controlled NLP generation tasks with discrete diffusion models.

MaMi-HOI: Harmonizing Global Kinematics and Local Geometry for Human-Object Interaction Generation

cs.RO · 2026-05-07 · unverdicted · novelty 7.0

MaMi-HOI counters geometric forgetting in diffusion models via a Geometry-Aware Proximity Adapter for precise contacts and a Kinematic Harmony Adapter for natural whole-body postures in human-object interactions.

PODiff: Latent Diffusion in Proper Orthogonal Decomposition Space for Scientific Super-Resolution

cs.LG · 2026-05-05 · unverdicted · novelty 7.0

PODiff performs conditional diffusion in a fixed, variance-ordered POD latent space to enable efficient probabilistic super-resolution of high-dimensional scientific fields with lower memory and better-calibrated uncertainty than pixel-space or dropout baselines.

Watch Your Step: Information Injection in Diffusion Models via Shadow Timestep Embedding

cs.LG · 2026-05-01 · unverdicted · novelty 7.0

Timestep embeddings in diffusion models function as a separable side channel that can carry dedicated information for adversarial injection or detection.

Long-Text-to-Image Generation via Compositional Prompt Decomposition

cs.CV · 2026-04-20 · unverdicted · novelty 7.0

PRISM lets pre-trained text-to-image models handle long prompts by breaking them into compositional parts, predicting noise separately, and merging outputs via energy-based conjunction, matching fine-tuned models while generalizing better to prompts over 500 tokens.

Simple Approximation and Derivative Free Inference-Time Scaling for Diffusion Models via Sequential Monte Carlo on Path Measures

stat.ML · 2026-05-18 · unverdicted · novelty 6.0

URGE performs unbiased inference-time scaling for diffusion models by attaching multiplicative path weights from Girsanov estimation and resampling trajectories, with a proven equivalence to prior particle-wise SMC schemes.

Plan First, Diffuse Later: Extrinsic Graph Guidance for Long-Horizon Diffusion Planning

cs.RO · 2026-05-16 · unverdicted · novelty 6.0

XDiffuser combines extrinsic graph planning with diffusion models to guide denoising and improve performance on long-horizon robotic tasks including multi-agent coordination and TSP-style problems.

Ada-Diffuser: Latent-Aware Adaptive Diffusion for Decision-Making

cs.LG · 2026-05-15 · unverdicted · novelty 6.0

Ada-Diffuser is a causal diffusion model that jointly learns observed interaction structure and underlying latent dynamics from minimal observations for adaptive planning and policy learning.

Diagnosing and Correcting Concept Omission in Multimodal Diffusion Transformers

cs.CV · 2026-05-14 · unverdicted · novelty 6.0

Text embeddings in MM-DiTs encode a detectable omission signal for missing concepts; amplifying it via OSI reduces concept omission in text-to-image outputs on FLUX.1-Dev and SD3.5-Medium.

Discrete Flow Matching for Offline-to-Online Reinforcement Learning

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

DRIFT enables stable offline-to-online fine-tuning of CTMC policies in discrete RL via advantage-weighted discrete flow matching, path-space regularization, and candidate-set approximation.

Couple to Control: Joint Initial Noise Design in Diffusion Models

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

Coupled initial noises in diffusion models, with designed dependence but unchanged marginal Gaussians, improve generated image diversity on Stable Diffusion variants while preserving quality and alignment.

Post-hoc Selective Classification for Reliable Synthetic Image Detection

cs.CV · 2026-05-09 · unverdicted · novelty 6.0

ReSIDe generalizes logit-based confidence scores to intermediate layers of synthetic image detectors and uses preference optimization to aggregate them, cutting area under the risk-coverage curve by up to 69.55% under covariate shifts.

Stylistic Attribute Control in Latent Diffusion Models

cs.CV · 2026-05-04 · unverdicted · novelty 6.0

A technique for parametric stylistic control in latent diffusion models learns disentangled directions from synthetic datasets and applies them via guidance composition while preserving semantics.

Visual Implicit Autoregressive Modeling

cs.CV · 2026-05-02 · unverdicted · novelty 6.0

VIAR embeds implicit equilibrium layers in visual autoregressive models to achieve ImageNet FID 2.16 with 38.4% of VAR parameters and controllable inference compute.

DAG-STL: A Hierarchical Framework for Zero-Shot Trajectory Planning under Signal Temporal Logic Specifications

cs.RO · 2026-04-20 · unverdicted · novelty 6.0

DAG-STL decomposes long-horizon STL planning into decomposition, timed waypoint allocation, and diffusion-based trajectory generation to enable zero-shot planning under unknown dynamics.

PixArt-$\alpha$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

cs.CV · 2023-09-30 · accept · novelty 6.0

PixArt-α matches commercial text-to-image quality with a diffusion transformer trained in 675 A100 GPU days through decomposed training stages, cross-attention text injection, and vision-language model dense captions.

Beyond Instance-Level Self-Supervision in 3D Multi-Modal Medical Imaging

cs.CV · 2026-05-14 · unverdicted · novelty 5.0

A self-supervised approach uses consistent spatial relationships of anatomical structures across patients to improve 3D multi-modal medical image representations, yielding modest gains on segmentation and classification tasks.

Behavioral Mode Discovery for Fine-tuning Multimodal Generative Policies

cs.LG · 2026-05-12 · unverdicted · novelty 5.0

Unsupervised behavioral mode discovery combined with mutual information rewards enables RL fine-tuning of multimodal generative policies that achieves higher success rates without losing action diversity.

citing papers explorer

Showing 13 of 13 citing papers after filters.

Functionalization via Structure Completion and Motion Rectification cs.CV · 2026-05-18 · unverdicted · none · ref 162
Object functionalization is cast as neural graph completion over a functional graph of parts, contacts, and motions, followed by geometry realization that also rectifies erroneous motions, demonstrated on furniture with a new paired dataset.
Designing streetscapes from street-view imagery using diffusion models cs.CV · 2026-05-17 · conditional · none · ref 16
A multimodal diffusion model generates controllable alternative streetscapes from street-view imagery using visual metrics and text, shown on Chicago and Orlando data with gains in semantic consistency.
Generating HDR Video from SDR Video cs.CV · 2026-05-14 · unverdicted · none · ref 151
A multi-exposure video model predicts bracketed linear SDR sequences from single nonlinear SDR input, which a merging model combines into HDR video preserving shadow and highlight detail.
Stylized Text-to-Motion Generation via Hypernetwork-Driven Low-Rank Adaptation cs.CV · 2026-05-13 · unverdicted · none · ref 24
A hypernetwork maps style motion embeddings to LoRA updates that stylize text-driven motion diffusion models with improved generalization to unseen styles via contrastive structuring of the style space.
Long-Text-to-Image Generation via Compositional Prompt Decomposition cs.CV · 2026-04-20 · unverdicted · none · ref 24
PRISM lets pre-trained text-to-image models handle long prompts by breaking them into compositional parts, predicting noise separately, and merging outputs via energy-based conjunction, matching fine-tuned models while generalizing better to prompts over 500 tokens.
Diagnosing and Correcting Concept Omission in Multimodal Diffusion Transformers cs.CV · 2026-05-14 · unverdicted · none · ref 14
Text embeddings in MM-DiTs encode a detectable omission signal for missing concepts; amplifying it via OSI reduces concept omission in text-to-image outputs on FLUX.1-Dev and SD3.5-Medium.
Post-hoc Selective Classification for Reliable Synthetic Image Detection cs.CV · 2026-05-09 · unverdicted · none · ref 6
ReSIDe generalizes logit-based confidence scores to intermediate layers of synthetic image detectors and uses preference optimization to aggregate them, cutting area under the risk-coverage curve by up to 69.55% under covariate shifts.
Stylistic Attribute Control in Latent Diffusion Models cs.CV · 2026-05-04 · unverdicted · none · ref 69
A technique for parametric stylistic control in latent diffusion models learns disentangled directions from synthetic datasets and applies them via guidance composition while preserving semantics.
Visual Implicit Autoregressive Modeling cs.CV · 2026-05-02 · unverdicted · none · ref 35
VIAR embeds implicit equilibrium layers in visual autoregressive models to achieve ImageNet FID 2.16 with 38.4% of VAR parameters and controllable inference compute.
PixArt-$\alpha$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis cs.CV · 2023-09-30 · accept · none · ref 152
PixArt-α matches commercial text-to-image quality with a diffusion transformer trained in 675 A100 GPU days through decomposed training stages, cross-attention text injection, and vision-language model dense captions.
Beyond Instance-Level Self-Supervision in 3D Multi-Modal Medical Imaging cs.CV · 2026-05-14 · unverdicted · none · ref 99
A self-supervised approach uses consistent spatial relationships of anatomical structures across patients to improve 3D multi-modal medical image representations, yielding modest gains on segmentation and classification tasks.
Mutual Enhancement Between Global Tokens and Patch Tokens: From Theory to Practice cs.CV · 2026-05-11 · unverdicted · none · ref 66
TaTok is a theoretically grounded adaptive tokenization method that uses global tokens and cumulative conditional entropy filtering to reduce redundancy while improving reconstruction quality over fixed-rate patch tokenization.
Unifying Deep Stochastic Processes for Image Enhancement cs.CV · 2026-05-02 · unverdicted · none · ref 27
Stochastic image enhancement methods are shown to be variants of a shared SDE differing in drift, diffusion, terminal distributions and boundary conditions, with controlled experiments revealing no single dominant family and a new modular library released.

Advances in neural information processing systems , volume=

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer