Title resolution pending

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, Oliver Wang

35 Pith papers cite this work. Polarity classification is still indexing.

35 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 2 baseline 1

citation-polarity summary

background 2 baseline 1

representative citing papers

Towards Realistic 3D Emission Materials: Dataset, Baseline, and Evaluation for Emission Texture Generation

cs.CV · 2026-04-13 · unverdicted · novelty 8.0

The work creates the first dataset and baseline for generating emission textures on 3D objects to reproduce glowing materials from input images.

A Cross-Modal Prompt Injection Attack against Large Vision-Language Models with Image-Only Perturbation

cs.CR · 2026-05-15 · unverdicted · novelty 7.0

CrossMPI steers both visual and textual interpretations in LVLMs through image-only perturbations by optimizing in hidden-state space at selected middle layers with distance-based budget allocation.

What Concepts Lie Within? Detecting and Suppressing Risky Content in Diffusion Transformers

cs.CV · 2026-05-11 · unverdicted · novelty 7.0

A method using attention head vectors detects and suppresses risky content generation in Diffusion Transformers at inference time.

Ground4D: Spatially-Grounded Feedforward 4D Reconstruction for Unstructured Off-Road Scenes

cs.CV · 2026-05-06 · unverdicted · novelty 7.0

Ground4D resolves temporal conflicts in feedforward 4D Gaussian reconstruction for off-road scenes via voxel-grounded temporal aggregation with intra-voxel softmax and surface normal regularization, outperforming prior methods on ORAD-3D and RELLIS-3D while generalizing zero-shot.

Direct Discrepancy Replay: Distribution-Discrepancy Condensation and Manifold-Consistent Replay for Continual Face Forgery Detection

cs.CV · 2026-04-14 · unverdicted · novelty 7.0

A replay method for continual face forgery detection condenses real-fake distribution discrepancies into compact maps and synthesizes compatible samples from current real faces to reduce forgetting under tight memory budgets without storing historical images.

IAD-Unify: A Region-Grounded Unified Model for Industrial Anomaly Segmentation, Understanding, and Generation

cs.CV · 2026-04-14 · unverdicted · novelty 7.0

IAD-Unify unifies industrial anomaly segmentation, region-grounded language understanding, and mask-guided generation in one framework using DINOv2 token injection into Qwen3.5, supported by the new Anomaly-56K dataset of 59,916 images.

Beyond Monologue: Interactive Talking-Listening Avatar Generation with Conversational Audio Context-Aware Kernels

cs.AI · 2026-04-11 · unverdicted · novelty 7.0

Multi-head Gaussian kernels inject temporal scale discrepancy as inductive bias to enable full-duplex talking-listening avatar generation, supported by a new decoupled VoxHear dataset and claimed SOTA naturalness.

DiV-INR: Extreme Low-Bitrate Diffusion Video Compression with INR Conditioning

eess.IV · 2026-04-09 · unverdicted · novelty 7.0

DiV-INR integrates implicit neural representations as conditioning signals for diffusion models to achieve better perceptual quality than HEVC, VVC, and prior neural codecs at extremely low bitrates under 0.05 bpp.

Not All Frames Deserve Full Computation: Accelerating Autoregressive Video Generation via Selective Computation and Predictive Extrapolation

cs.CV · 2026-04-03 · conditional · novelty 7.0

SCOPE accelerates autoregressive video diffusion up to 4.73x by using a tri-modal cache-predict-recompute scheduler with Taylor extrapolation and selective active-frame computation while preserving output quality.

When Surfaces Lie: Exploiting Wrinkle-Induced Attention Shift to Attack Vision-Language Models

cs.CV · 2026-03-29 · unverdicted · novelty 7.0

A wrinkle-field perturbation method creates photorealistic non-rigid image changes that degrade state-of-the-art VLMs on image captioning and VQA more effectively than prior baselines.

PlanViz: Evaluating Planning-Oriented Image Generation and Editing for Computer-Use Tasks

cs.CV · 2026-02-06 · unverdicted · novelty 7.0

PlanViz is a new benchmark with three sub-tasks and PlanScore metric to evaluate planning-oriented image generation and editing by unified multimodal models for computer-use tasks.

Uncertainty-Aware 4D Gaussian Splatting for Monocular Occluded Human Rendering

cs.CV · 2026-02-06 · unverdicted · novelty 7.0

U-4DGS reformulates occluded dynamic human rendering as MAP estimation under heteroscedastic noise, using a Probabilistic Deformation Network and uncertainty-modulated joint rasterization plus confidence-aware regularizations to deliver SOTA fidelity and robustness on ZJU-MoCap and OcMotion.

SR-Ground: Image Quality Grounding for Super-Resolved Content

cs.CV · 2026-05-20 · unverdicted · novelty 6.0

The paper releases SR-Ground, a crowdsourced dataset for pixel-level segmentation of six artifact types in super-resolved images, and shows its use for training grounded IQA models and artifact-reducing fine-tuning.

SandSim: Curve-Guided Gaussian Splatting for Reconstructing Sand Painting Processes

cs.GR · 2026-04-30 · unverdicted · novelty 6.0

SandSim reconstructs temporally coherent sand painting processes from single images using curve-guided Gaussian splatting, subtractive compositing for accumulation, and semantic-guided stroke planning.

EAD-Net: Emotion-Aware Talking Head Generation with Spatial Refinement and Temporal Coherence

cs.CV · 2026-04-25 · unverdicted · novelty 6.0

EAD-Net uses a diffusion model with new spatio-temporal attention, graph-based temporal reasoning, and LLM-derived semantic descriptions to generate emotionally expressive talking head videos with improved lip-sync and coherence over prior methods.

Rethinking Where to Edit: Task-Aware Localization for Instruction-Based Image Editing

cs.CV · 2026-04-22 · unverdicted · novelty 6.0

Task-aware localization via attention cues and feature centroids from source/target streams in IIE models improves non-edit consistency while preserving instruction following.

LBFTI: Layer-Based Facial Template Inversion for Identity-Preserving Fine-Grained Face Reconstruction

cs.CV · 2026-04-20 · unverdicted · novelty 6.0

LBFTI decomposes faces into three layers with dedicated generators and a three-stage training process to invert templates into fine-grained, identity-preserving images, claiming 25.3% better TAR than prior methods.

Cross-Modal Generation: From Commodity WiFi to High-Fidelity mmWave and RFID Sensing

cs.LG · 2026-04-17 · unverdicted · novelty 6.0

RF-CMG synthesizes high-quality mmWave and RFID signals from WiFi using a diffusion model with Modality-Guided Embedding for high-frequency details and Low-Frequency Modality Consistency to preserve physical structure.

DVFace: Spatio-Temporal Dual-Prior Diffusion for Video Face Restoration

cs.CV · 2026-04-16 · unverdicted · novelty 6.0

DVFace uses a spatio-temporal dual-codebook and asymmetric fusion in a one-step diffusion model to deliver better video face restoration quality, temporal consistency, and identity preservation than recent methods.

ArtifactWorld: Scaling 3D Gaussian Splatting Artifact Restoration via Video Generation Models

cs.CV · 2026-04-14 · unverdicted · novelty 6.0

ArtifactWorld restores artifacts in 3D Gaussian Splatting by training a video diffusion backbone on 107.5K paired clips with an isomorphic predictor for artifact heatmaps and an Artifact-Aware Triplet Fusion mechanism to achieve better sparse-view novel synthesis.

VersaVogue: Visual Expert Orchestration and Preference Alignment for Unified Fashion Synthesis

cs.CV · 2026-04-08 · unverdicted · novelty 6.0

VersaVogue unifies garment generation and virtual dressing via trait-routing attention with mixture-of-experts and an automated multi-perspective preference optimization pipeline that uses DPO without human labels.

Improving Random Testing via LLM-powered UI Tarpit Escaping for Mobile Apps

cs.SE · 2026-04-08 · conditional · novelty 6.0

LLM-powered monitoring of UI similarity allows random testing tools to escape tarpits, yielding 45-55% higher coverage and more unique bugs across 12 apps.

RHVI-FDD: A Hierarchical Decoupling Framework for Low-Light Image Enhancement

cs.CV · 2026-04-07 · unverdicted · novelty 6.0

RHVI-FDD hierarchically decouples luminance-chrominance and then frequency components in low-light images to correct color, suppress noise, and preserve details better than prior methods.

Rethinking Exposure Correction for Spatially Non-uniform Degradation

cs.CV · 2026-04-05 · unverdicted · novelty 6.0

Introduces spatially adaptive modulation with a signal encoder and uncertainty-inspired loss for correcting non-uniform exposure degradations in images.

citing papers explorer

Showing 35 of 35 citing papers.

Towards Realistic 3D Emission Materials: Dataset, Baseline, and Evaluation for Emission Texture Generation cs.CV · 2026-04-13 · unverdicted · none · ref 32
The work creates the first dataset and baseline for generating emission textures on 3D objects to reproduce glowing materials from input images.
A Cross-Modal Prompt Injection Attack against Large Vision-Language Models with Image-Only Perturbation cs.CR · 2026-05-15 · unverdicted · none · ref 78
CrossMPI steers both visual and textual interpretations in LVLMs through image-only perturbations by optimizing in hidden-state space at selected middle layers with distance-based budget allocation.
What Concepts Lie Within? Detecting and Suppressing Risky Content in Diffusion Transformers cs.CV · 2026-05-11 · unverdicted · none · ref 50
A method using attention head vectors detects and suppresses risky content generation in Diffusion Transformers at inference time.
Ground4D: Spatially-Grounded Feedforward 4D Reconstruction for Unstructured Off-Road Scenes cs.CV · 2026-05-06 · unverdicted · none · ref 60
Ground4D resolves temporal conflicts in feedforward 4D Gaussian reconstruction for off-road scenes via voxel-grounded temporal aggregation with intra-voxel softmax and surface normal regularization, outperforming prior methods on ORAD-3D and RELLIS-3D while generalizing zero-shot.
Direct Discrepancy Replay: Distribution-Discrepancy Condensation and Manifold-Consistent Replay for Continual Face Forgery Detection cs.CV · 2026-04-14 · unverdicted · none · ref 47
A replay method for continual face forgery detection condenses real-fake distribution discrepancies into compact maps and synthesizes compatible samples from current real faces to reduce forgetting under tight memory budgets without storing historical images.
IAD-Unify: A Region-Grounded Unified Model for Industrial Anomaly Segmentation, Understanding, and Generation cs.CV · 2026-04-14 · unverdicted · none · ref 40
IAD-Unify unifies industrial anomaly segmentation, region-grounded language understanding, and mask-guided generation in one framework using DINOv2 token injection into Qwen3.5, supported by the new Anomaly-56K dataset of 59,916 images.
Beyond Monologue: Interactive Talking-Listening Avatar Generation with Conversational Audio Context-Aware Kernels cs.AI · 2026-04-11 · unverdicted · none · ref 46
Multi-head Gaussian kernels inject temporal scale discrepancy as inductive bias to enable full-duplex talking-listening avatar generation, supported by a new decoupled VoxHear dataset and claimed SOTA naturalness.
DiV-INR: Extreme Low-Bitrate Diffusion Video Compression with INR Conditioning eess.IV · 2026-04-09 · unverdicted · none · ref 44
DiV-INR integrates implicit neural representations as conditioning signals for diffusion models to achieve better perceptual quality than HEVC, VVC, and prior neural codecs at extremely low bitrates under 0.05 bpp.
Not All Frames Deserve Full Computation: Accelerating Autoregressive Video Generation via Selective Computation and Predictive Extrapolation cs.CV · 2026-04-03 · conditional · none · ref 60
SCOPE accelerates autoregressive video diffusion up to 4.73x by using a tri-modal cache-predict-recompute scheduler with Taylor extrapolation and selective active-frame computation while preserving output quality.
When Surfaces Lie: Exploiting Wrinkle-Induced Attention Shift to Attack Vision-Language Models cs.CV · 2026-03-29 · unverdicted · none · ref 39
A wrinkle-field perturbation method creates photorealistic non-rigid image changes that degrade state-of-the-art VLMs on image captioning and VQA more effectively than prior baselines.
PlanViz: Evaluating Planning-Oriented Image Generation and Editing for Computer-Use Tasks cs.CV · 2026-02-06 · unverdicted · none · ref 55
PlanViz is a new benchmark with three sub-tasks and PlanScore metric to evaluate planning-oriented image generation and editing by unified multimodal models for computer-use tasks.
Uncertainty-Aware 4D Gaussian Splatting for Monocular Occluded Human Rendering cs.CV · 2026-02-06 · unverdicted · none · ref 52
U-4DGS reformulates occluded dynamic human rendering as MAP estimation under heteroscedastic noise, using a Probabilistic Deformation Network and uncertainty-modulated joint rasterization plus confidence-aware regularizations to deliver SOTA fidelity and robustness on ZJU-MoCap and OcMotion.
SR-Ground: Image Quality Grounding for Super-Resolved Content cs.CV · 2026-05-20 · unverdicted · none · ref 45
The paper releases SR-Ground, a crowdsourced dataset for pixel-level segmentation of six artifact types in super-resolved images, and shows its use for training grounded IQA models and artifact-reducing fine-tuning.
SandSim: Curve-Guided Gaussian Splatting for Reconstructing Sand Painting Processes cs.GR · 2026-04-30 · unverdicted · none · ref 59
SandSim reconstructs temporally coherent sand painting processes from single images using curve-guided Gaussian splatting, subtractive compositing for accumulation, and semantic-guided stroke planning.
EAD-Net: Emotion-Aware Talking Head Generation with Spatial Refinement and Temporal Coherence cs.CV · 2026-04-25 · unverdicted · none · ref 53
EAD-Net uses a diffusion model with new spatio-temporal attention, graph-based temporal reasoning, and LLM-derived semantic descriptions to generate emotionally expressive talking head videos with improved lip-sync and coherence over prior methods.
Rethinking Where to Edit: Task-Aware Localization for Instruction-Based Image Editing cs.CV · 2026-04-22 · unverdicted · none · ref 42
Task-aware localization via attention cues and feature centroids from source/target streams in IIE models improves non-edit consistency while preserving instruction following.
LBFTI: Layer-Based Facial Template Inversion for Identity-Preserving Fine-Grained Face Reconstruction cs.CV · 2026-04-20 · unverdicted · none · ref 41
LBFTI decomposes faces into three layers with dedicated generators and a three-stage training process to invert templates into fine-grained, identity-preserving images, claiming 25.3% better TAR than prior methods.
Cross-Modal Generation: From Commodity WiFi to High-Fidelity mmWave and RFID Sensing cs.LG · 2026-04-17 · unverdicted · none · ref 61
RF-CMG synthesizes high-quality mmWave and RFID signals from WiFi using a diffusion model with Modality-Guided Embedding for high-frequency details and Low-Frequency Modality Consistency to preserve physical structure.
DVFace: Spatio-Temporal Dual-Prior Diffusion for Video Face Restoration cs.CV · 2026-04-16 · unverdicted · none · ref 53
DVFace uses a spatio-temporal dual-codebook and asymmetric fusion in a one-step diffusion model to deliver better video face restoration quality, temporal consistency, and identity preservation than recent methods.
ArtifactWorld: Scaling 3D Gaussian Splatting Artifact Restoration via Video Generation Models cs.CV · 2026-04-14 · unverdicted · none · ref 45
ArtifactWorld restores artifacts in 3D Gaussian Splatting by training a video diffusion backbone on 107.5K paired clips with an isomorphic predictor for artifact heatmaps and an Artifact-Aware Triplet Fusion mechanism to achieve better sparse-view novel synthesis.
VersaVogue: Visual Expert Orchestration and Preference Alignment for Unified Fashion Synthesis cs.CV · 2026-04-08 · unverdicted · none · ref 52
VersaVogue unifies garment generation and virtual dressing via trait-routing attention with mixture-of-experts and an automated multi-perspective preference optimization pipeline that uses DPO without human labels.
Improving Random Testing via LLM-powered UI Tarpit Escaping for Mobile Apps cs.SE · 2026-04-08 · conditional · none · ref 83
LLM-powered monitoring of UI similarity allows random testing tools to escape tarpits, yielding 45-55% higher coverage and more unique bugs across 12 apps.
RHVI-FDD: A Hierarchical Decoupling Framework for Low-Light Image Enhancement cs.CV · 2026-04-07 · unverdicted · none · ref 76
RHVI-FDD hierarchically decouples luminance-chrominance and then frequency components in low-light images to correct color, suppress noise, and preserve details better than prior methods.
Rethinking Exposure Correction for Spatially Non-uniform Degradation cs.CV · 2026-04-05 · unverdicted · none · ref 51
Introduces spatially adaptive modulation with a signal encoder and uncertainty-inspired loss for correcting non-uniform exposure degradations in images.
TIQA: Human-Aligned Perceptual Text Quality Assessment in Generated Images cs.CV · 2026-03-07 · unverdicted · none · ref 67
TIQA introduces datasets and a model that predict human perceptual quality of rendered text in AI images, achieving PLCC 0.942 on crops and improving selected image text quality by 0.36 MOS.
SmokeSVD: Smoke Reconstruction from A Single View via Progressive Novel View Synthesis and Refinement with Diffusion Models cs.GR · 2025-07-16 · unverdicted · none · ref 11
SmokeSVD reconstructs dynamic smoke from a single video via diffusion-based side-view synthesis, progressive multi-view refinement, and Navier-Stokes-guided density-velocity estimation.
Towards a General-Purpose Zero-Shot Synthetic Low-Light Image and Video Pipeline cs.CV · 2025-04-16 · unverdicted · none · ref 51
A self-supervised Degradation Estimation Network estimates parameters for physics-informed noise distributions to generate realistic synthetic low-light data, showing gains on noise replication, enhancement, and detection tasks.
Generator-Refiner-Examiner: A Tri-Module Data Augmentation Framework for 3D Human Avatar Learning from Monocular Videos cs.CV · 2026-05-22 · unverdicted · none · ref 84
TrioMan is a tri-module data augmentation framework using a Generator for pose/camera perturbations, a Refiner with one-step diffusion, and an Examiner with dual-branch attention to improve 3D avatar learning from monocular videos, claiming better results than prior methods on two benchmarks.
DealMaTe: Multi-Dimensional Material Transfer via Diffusion Transformer cs.GR · 2026-05-15 · unverdicted · none · ref 69
DealMaTe proposes a simplified diffusion framework for material transfer that injects multi-dimensional 3D conditions via Multi-Dim 3D Shader LoRA and Shader Causal Mutual Attention with KV caching.
SAMIC: A Lightweight Semantic-Aware Mamba for Efficient Perceptual Image Compression cs.CV · 2026-05-06 · unverdicted · none · ref 46
SAMIC introduces semantic-aware Mamba blocks and SVD-based redundancy reduction to achieve efficient perceptual image compression with improved rate-distortion-perception tradeoffs.
Do Protective Perturbations Really Protect Portrait Privacy under Real-world Image Transformations? cs.CV · 2026-04-26 · conditional · none · ref 46
Pixel-level protective perturbations for portrait privacy are ineffective against common image transformations, and a low-cost purification framework can strip them out.
Identity-Decoupled Anonymization for Visual Evidence in Multi-modal Retrieval-Augmented Generation cs.CV · 2026-04-26 · unverdicted · none · ref 40
Proposes a three-part generative anonymization pipeline using disentangled variational encoding, manifold-aware identity replacement, and distilled latent diffusion to protect face identities in MRAG while preserving non-identity attributes.
Discrete Preference Learning for Personalized Multimodal Generation cs.IR · 2026-04-22 · unverdicted · none · ref 64
DPPMG learns discrete modal-specific preferences via a dedicated GNN from multimodal user data, quantizes them into tokens, and feeds them into generators with a consistency reward to produce personalized text and images.
Ride the Wave: Precision-Allocated Sparse Attention for Smooth Video Generation cs.CV · 2026-04-14 · unverdicted · none · ref 31
PASA uses curvature-aware dynamic budgeting, grouped approximations, and stochastic attention routing to accelerate video diffusion transformers while eliminating temporal flickering from sparse patterns.
Eulerian Motion Guidance: Robust Image Animation via Bidirectional Geometric Consistency cs.CV · 2026-05-07 · unreviewed · ref 39 · 3 links

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer