Mixed citations

Title resolution pending

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer · 2022

Mixed citation behavior. Most common role is background (64%).

35 Pith papers citing it

Background 64% of classified citations

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 7 baseline 1 dataset 1 method 1 other 1

citation-polarity summary

background 7 baseline 1 unclear 1 use dataset 1 use method 1

representative citing papers

ORBIS: Output-Guided Token Reduction with Distribution-Aware Matching for Video Diffusion Acceleration

cs.CV · 2026-05-21 · unverdicted · novelty 7.0

ORBIS uses output-guided token reduction and DATM to achieve 2x higher token reduction than AsymRnR, with up to 4.5x speedup and 79.3% energy savings versus A100 GPU for video DiT models.

ShadeBench: A Benchmark Dataset for Building Shade Simulation in Sustainable Society

cs.CV · 2026-05-19 · unverdicted · novelty 7.0

ShadeBench is a multimodal benchmark dataset for urban shade understanding that includes temporally varying shade maps, satellite imagery, building representations, and text to support shade generation, segmentation, and 3D reconstruction tasks.

A Cross-Modal Prompt Injection Attack against Large Vision-Language Models with Image-Only Perturbation

cs.CR · 2026-05-15 · unverdicted · novelty 7.0

CrossMPI steers both visual and textual interpretations in LVLMs through image-only perturbations by optimizing in hidden-state space at selected middle layers with distance-based budget allocation.

What Concepts Lie Within? Detecting and Suppressing Risky Content in Diffusion Transformers

cs.CV · 2026-05-11 · unverdicted · novelty 7.0

A method using attention head vectors detects and suppresses risky content generation in Diffusion Transformers at inference time.

LEGO: LoRA-Enabled Generator-Oriented Framework for Synthetic Image Detection

cs.CV · 2026-05-06 · unverdicted · novelty 7.0

LEGO uses multiple generator-specific LoRA modules modulated by an MLP and fused with attention to detect synthetic images, achieving better performance than prior methods while using under 10% of the training data.

ResetEdit: Precise Text-guided Editing of Generated Image via Resettable Starting Latent

cs.CV · 2026-04-28 · unverdicted · novelty 7.0

ResetEdit embeds a recoverable discrepancy signal during image generation in diffusion models to reconstruct an approximate original latent for high-fidelity text-guided editing.

Hallo-Live: Real-Time Streaming Joint Audio-Video Avatar Generation with Asynchronous Dual-Stream and Human-Centric Preference Distillation

cs.CV · 2026-04-26 · unverdicted · novelty 7.0

Hallo-Live achieves 20.38 FPS real-time text-to-audio-video avatar generation with 0.94s latency using asynchronous dual-stream diffusion and HP-DMD preference distillation, matching teacher model quality at 16x higher throughput.

FlowAnchor: Stabilizing the Editing Signal for Inversion-Free Video Editing

cs.CV · 2026-04-24 · unverdicted · novelty 7.0

FlowAnchor stabilizes editing signals in flow-based inversion-free video editing via spatial-aware attention refinement and adaptive magnitude modulation for improved faithfulness and temporal coherence.

UniEditBench: A Unified and Cost-Effective Benchmark for Image and Video Editing via Distilled MLLMs

cs.CV · 2026-04-17 · unverdicted · novelty 7.0

UniEditBench unifies image and video editing evaluation with a nine-plus-eight operation taxonomy and cost-effective 4B/8B distilled MLLM evaluators that align with human judgments.

IAD-Unify: A Region-Grounded Unified Model for Industrial Anomaly Segmentation, Understanding, and Generation

cs.CV · 2026-04-14 · unverdicted · novelty 7.0

IAD-Unify unifies industrial anomaly segmentation, region-grounded language understanding, and mask-guided generation in one framework using DINOv2 token injection into Qwen3.5, supported by the new Anomaly-56K dataset of 59,916 images.

MAST: Mask-Guided Attention Mass Allocation for Training-Free Multi-Style Transfer

cs.CV · 2026-04-14 · unverdicted · novelty 7.0

MAST is a mask-guided attention allocation method that enables artifact-free multi-style transfer in diffusion models by anchoring layout, distributing attention mass, scaling sharpness, and injecting details.

Not All Frames Deserve Full Computation: Accelerating Autoregressive Video Generation via Selective Computation and Predictive Extrapolation

cs.CV · 2026-04-03 · conditional · novelty 7.0

SCOPE accelerates autoregressive video diffusion up to 4.73x by using a tri-modal cache-predict-recompute scheduler with Taylor extrapolation and selective active-frame computation while preserving output quality.

SHARP: Spectrum-aware Highly-dynamic Adaptation for Resolution Promotion in Remote Sensing Synthesis

cs.CV · 2026-03-23 · conditional · novelty 7.0

SHARP applies a spectrum-aware dynamic RoPE scaling schedule that promotes resolution more strongly in early denoising stages and relaxes it later, outperforming static baselines on quality metrics for remote sensing images.

Substantial, Decomposable, and Invisible: Visual Context Misalignment in Instructional Videos for Physical Tasks

cs.HC · 2026-05-16 · conditional · novelty 6.0

Fully aligned instructional videos for physical tasks yield 11.1% better completion quality and 15.5% faster times, with four decomposable visual attributes whose isolated misalignments degrade performance without users noticing.

ClickRemoval: An Interactive Open-Source Tool for Object Removal in Diffusion Models

cs.CV · 2026-05-14 · unverdicted · novelty 6.0

ClickRemoval delivers click-driven object removal and background restoration in diffusion models through self-attention modulation without additional training or inputs.

When Should Teachers Control AI Generation for Mathematics Visuals?

cs.HC · 2026-05-11 · conditional · novelty 6.0

Post-generation control in AI-assisted math visual creation yields higher teacher ratings for predictability and correctness than pre- or mid-generation control, with qualitative trade-offs in agency and effort.

SpatialFusion: Endowing Unified Image Generation with Intrinsic 3D Geometric Awareness

cs.CV · 2026-04-29 · unverdicted · novelty 6.0

SpatialFusion internalizes 3D geometric awareness into unified image generation models by pairing an MLLM with a spatial transformer that produces depth maps to constrain diffusion generation.

Latent Denoising Improves Visual Alignment in Large Multimodal Models

cs.CV · 2026-04-23 · unverdicted · novelty 6.0

A latent denoising objective with saliency-aware corruption and contrastive distillation improves visual alignment and corruption robustness in large multimodal models.

Rethinking Where to Edit: Task-Aware Localization for Instruction-Based Image Editing

cs.CV · 2026-04-22 · unverdicted · novelty 6.0

Task-aware localization via attention cues and feature centroids from source/target streams in IIE models improves non-edit consistency while preserving instruction following.

Bridging the Micro--Macro Gap: Frequency-Aware Semantic Alignment for Image Manipulation Localization

cs.CV · 2026-04-14 · unverdicted · novelty 6.0

FASA bridges low-level forensic frequency signals and high-level semantic consistency to achieve state-of-the-art localization of both conventional and diffusion-generated image manipulations.

Precise Shield: Explaining and Aligning VLLM Safety via Neuron-Level Guidance

cs.CV · 2026-04-10 · unverdicted · novelty 6.0

Precise Shield identifies safety neurons in VLLMs via activation contrasts and aligns only them with gradient masking, boosting safety, preserving generalization, and enabling zero-shot cross-lingual and cross-modal transfer.

VersaVogue: Visual Expert Orchestration and Preference Alignment for Unified Fashion Synthesis

cs.CV · 2026-04-08 · unverdicted · novelty 6.0

VersaVogue unifies garment generation and virtual dressing via trait-routing attention with mixture-of-experts and an automated multi-perspective preference optimization pipeline that uses DPO without human labels.

CAGE: Bridging the Accuracy-Aesthetics Gap in Educational Diagrams via Code-Anchored Generative Enhancement

cs.CV · 2026-04-06 · unverdicted · novelty 6.0

CAGE uses LLM-generated code for label-correct diagrams followed by ControlNet-conditioned diffusion refinement to produce both accurate and visually engaging educational graphics, backed by the new EduDiagram-2K dataset.

InsTraj: Instructing Diffusion Models with Travel Intentions to Generate Real-world Trajectories

cs.AI · 2026-04-05 · unverdicted · novelty 6.0

InsTraj generates realistic, instruction-faithful GPS trajectories by using an LLM to parse natural-language travel intent and a multimodal diffusion transformer to produce the paths.

citing papers explorer

Showing 3 of 3 citing papers after filters.

Substantial, Decomposable, and Invisible: Visual Context Misalignment in Instructional Videos for Physical Tasks cs.HC · 2026-05-16 · conditional · none · ref 31
Fully aligned instructional videos for physical tasks yield 11.1% better completion quality and 15.5% faster times, with four decomposable visual attributes whose isolated misalignments degrade performance without users noticing.
When Should Teachers Control AI Generation for Mathematics Visuals? cs.HC · 2026-05-11 · conditional · none · ref 66
Post-generation control in AI-assisted math visual creation yields higher teacher ratings for predictability and correctness than pre- or mid-generation control, with qualitative trade-offs in agency and effort.
The Algorithmic Gaze of Image Quality Assessment: An Audit and Trace Ethnography of the LAION-Aesthetics Predictor cs.HC · 2026-01-14 · conditional · none · ref 89
LAION-Aesthetics Predictor reinforces Western and male biases by preferentially selecting images associated with women and realistic Western/Japanese art while excluding men, LGBTQ+ references, and other styles.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer