hub

Semantic im- age inversion and editing using rectified stochastic differen- tial equations

Litu Rout, Yujia Chen, Nataniel Ruiz, Constantine Caramanis, Sanjay Shakkottai, Wen-Sheng Chu · 2024 · arXiv 2410.10792

15 Pith papers cite this work. Polarity classification is still indexing.

15 Pith papers citing it

read on arXiv browse 15 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2 other 1

citation-polarity summary

background 2 unclear 1

representative citing papers

UniEditBench: A Unified and Cost-Effective Benchmark for Image and Video Editing via Distilled MLLMs

cs.CV · 2026-04-17 · unverdicted · novelty 7.0

UniEditBench unifies image and video editing evaluation with a nine-plus-eight operation taxonomy and cost-effective 4B/8B distilled MLLM evaluators that align with human judgments.

Beyond Prompts: Unconditional 3D Inversion for Out-of-Distribution Shapes

cs.CV · 2026-04-16 · unverdicted · novelty 7.0

Text-to-3D models lose prompt sensitivity for out-of-distribution shapes due to sink traps but retain geometric diversity via unconditional priors, enabling a decoupled inversion method for robust editing.

Efficient Zero-Shot Inpainting with Decoupled Diffusion Guidance

cs.CV · 2025-12-20 · conditional · novelty 7.0

A new decoupled diffusion guidance method enables efficient zero-shot inpainting by avoiding backpropagation through the denoiser while maintaining observation consistency and quality.

Exploring Cross-Modal Flows for Few-Shot Learning

cs.CV · 2025-10-16 · unverdicted · novelty 7.0

FMA introduces flow matching for multi-step cross-modal feature alignment in few-shot learning, using fixed coupling, noise augmentation, and early-stopping to outperform one-step PEFT methods.

UniEdit-Flow: Unleashing Inversion and Editing in the Era of Flow Models

cs.CV · 2025-04-17 · unverdicted · novelty 7.0

UniEdit-Flow presents tuning-free Uni-Inv and Uni-Edit methods for inversion and editing in flow models that achieve accurate reconstruction and robust region-preserving edits across generative models.

StreamGVE: Training-Free Video Editing via Few-Step Streaming Video Generation

cs.CV · 2026-05-20 · unverdicted · novelty 6.0

StreamGVE enables high-quality training-free video editing by converting the task to noise-to-data streaming generation with dual-branch fast sampling, self-attention bridges, cross-attention grounding, source-oriented guidance, and visual prompting.

VAGS: Velocity Adaptive Guidance Scale for Image Editing and Generation

cs.CV · 2026-05-15 · accept · novelty 6.0

VAGS adapts the CFG scale at each ODE step using velocity alignment signals to raise structural fidelity in editing and sample quality in generation over fixed-scale baselines.

StyleTextGen: Style-Conditioned Multilingual Scene Text Generation

cs.CV · 2026-05-14 · unverdicted · novelty 6.0

StyleTextGen proposes a dual-branch style encoder, text style consistency loss, and mask-guided inference to achieve superior style consistency and cross-lingual performance in multilingual scene text generation on a new bilingual benchmark.

Revealing the Gap in Human and VLM Scene Perception through Counterfactual Semantic Saliency

cs.CV · 2026-05-13 · conditional · novelty 6.0

VLMs exhibit size, center, and saliency biases in scene understanding, relying less on people than humans do, with size bias as a key driver of divergence.

LimeCross: Context-Conditioned Layered Image Editing with Structural Consistency

cs.CV · 2026-05-11 · unverdicted · novelty 6.0

LimeCross enables text-guided editing of individual layers in composite images by conditioning on cross-layer context via bi-stream attention while preserving layer integrity and introducing the LayerEditBench benchmark.

FluSplat: Sparse-View 3D Editing without Test-Time Optimization

cs.CV · 2026-04-21 · unverdicted · novelty 6.0

FluSplat trains a model with geometric alignment constraints on multi-view edits to produce consistent 3D scene edits from sparse views in a single forward pass without test-time optimization.

RL-RIG: A Generative Spatial Reasoner via Intrinsic Reflection

cs.CV · 2026-02-23 · unverdicted · novelty 6.0

RL-RIG uses a generate-reflect-edit loop with reinforcement learning to improve spatial accuracy in image generation, reporting up to 11% gains over prior open-source models on scene-graph metrics.

KD-CVG: A Knowledge-Driven Approach for Creative Video Generation

cs.CV · 2026-04-23 · unverdicted · novelty 5.0

KD-CVG uses an Advertising Creative Knowledge Base plus Semantic-Aware Retrieval and Multimodal Knowledge Reference modules to improve semantic alignment and motion realism in text-to-video generation for advertising.

DirectEdit: Step-Level Accurate Inversion for Flow-Based Image Editing

cs.CV · 2026-05-04

UniGeo: Unifying Geometric Guidance for Camera-Controllable Image Editing via Video Models

cs.CV · 2026-04-19 · 2 refs

citing papers explorer

Showing 15 of 15 citing papers.

UniEditBench: A Unified and Cost-Effective Benchmark for Image and Video Editing via Distilled MLLMs cs.CV · 2026-04-17 · unverdicted · none · ref 43
UniEditBench unifies image and video editing evaluation with a nine-plus-eight operation taxonomy and cost-effective 4B/8B distilled MLLM evaluators that align with human judgments.
Beyond Prompts: Unconditional 3D Inversion for Out-of-Distribution Shapes cs.CV · 2026-04-16 · unverdicted · none · ref 30
Text-to-3D models lose prompt sensitivity for out-of-distribution shapes due to sink traps but retain geometric diversity via unconditional priors, enabling a decoupled inversion method for robust editing.
Efficient Zero-Shot Inpainting with Decoupled Diffusion Guidance cs.CV · 2025-12-20 · conditional · none · ref 11
A new decoupled diffusion guidance method enables efficient zero-shot inpainting by avoiding backpropagation through the denoiser while maintaining observation consistency and quality.
Exploring Cross-Modal Flows for Few-Shot Learning cs.CV · 2025-10-16 · unverdicted · none · ref 20
FMA introduces flow matching for multi-step cross-modal feature alignment in few-shot learning, using fixed coupling, noise augmentation, and early-stopping to outperform one-step PEFT methods.
UniEdit-Flow: Unleashing Inversion and Editing in the Era of Flow Models cs.CV · 2025-04-17 · unverdicted · none · ref 52
UniEdit-Flow presents tuning-free Uni-Inv and Uni-Edit methods for inversion and editing in flow models that achieve accurate reconstruction and robust region-preserving edits across generative models.
StreamGVE: Training-Free Video Editing via Few-Step Streaming Video Generation cs.CV · 2026-05-20 · unverdicted · none · ref 61
StreamGVE enables high-quality training-free video editing by converting the task to noise-to-data streaming generation with dual-branch fast sampling, self-attention bridges, cross-attention grounding, source-oriented guidance, and visual prompting.
VAGS: Velocity Adaptive Guidance Scale for Image Editing and Generation cs.CV · 2026-05-15 · accept · none · ref 15
VAGS adapts the CFG scale at each ODE step using velocity alignment signals to raise structural fidelity in editing and sample quality in generation over fixed-scale baselines.
StyleTextGen: Style-Conditioned Multilingual Scene Text Generation cs.CV · 2026-05-14 · unverdicted · none · ref 37
StyleTextGen proposes a dual-branch style encoder, text style consistency loss, and mask-guided inference to achieve superior style consistency and cross-lingual performance in multilingual scene text generation on a new bilingual benchmark.
Revealing the Gap in Human and VLM Scene Perception through Counterfactual Semantic Saliency cs.CV · 2026-05-13 · conditional · none · ref 48
VLMs exhibit size, center, and saliency biases in scene understanding, relying less on people than humans do, with size bias as a key driver of divergence.
LimeCross: Context-Conditioned Layered Image Editing with Structural Consistency cs.CV · 2026-05-11 · unverdicted · none · ref 42
LimeCross enables text-guided editing of individual layers in composite images by conditioning on cross-layer context via bi-stream attention while preserving layer integrity and introducing the LayerEditBench benchmark.
FluSplat: Sparse-View 3D Editing without Test-Time Optimization cs.CV · 2026-04-21 · unverdicted · none · ref 38
FluSplat trains a model with geometric alignment constraints on multi-view edits to produce consistent 3D scene edits from sparse views in a single forward pass without test-time optimization.
RL-RIG: A Generative Spatial Reasoner via Intrinsic Reflection cs.CV · 2026-02-23 · unverdicted · none · ref 34
RL-RIG uses a generate-reflect-edit loop with reinforcement learning to improve spatial accuracy in image generation, reporting up to 11% gains over prior open-source models on scene-graph metrics.
KD-CVG: A Knowledge-Driven Approach for Creative Video Generation cs.CV · 2026-04-23 · unverdicted · none · ref 22
KD-CVG uses an Advertising Creative Knowledge Base plus Semantic-Aware Retrieval and Multimodal Knowledge Reference modules to improve semantic alignment and motion realism in text-to-video generation for advertising.
DirectEdit: Step-Level Accurate Inversion for Flow-Based Image Editing cs.CV · 2026-05-04 · unreviewed · ref 15
UniGeo: Unifying Geometric Guidance for Camera-Controllable Image Editing via Video Models cs.CV · 2026-04-19 · unreviewed · ref 59 · 2 links

Semantic im- age inversion and editing using rectified stochastic differen- tial equations

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer