UniEditBench unifies image and video editing evaluation with a nine-plus-eight operation taxonomy and cost-effective 4B/8B distilled MLLM evaluators that align with human judgments.
hub
Semantic im- age inversion and editing using rectified stochastic differen- tial equations
15 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
fields
cs.CV 15representative citing papers
Text-to-3D models lose prompt sensitivity for out-of-distribution shapes due to sink traps but retain geometric diversity via unconditional priors, enabling a decoupled inversion method for robust editing.
A new decoupled diffusion guidance method enables efficient zero-shot inpainting by avoiding backpropagation through the denoiser while maintaining observation consistency and quality.
FMA introduces flow matching for multi-step cross-modal feature alignment in few-shot learning, using fixed coupling, noise augmentation, and early-stopping to outperform one-step PEFT methods.
UniEdit-Flow presents tuning-free Uni-Inv and Uni-Edit methods for inversion and editing in flow models that achieve accurate reconstruction and robust region-preserving edits across generative models.
StreamGVE enables high-quality training-free video editing by converting the task to noise-to-data streaming generation with dual-branch fast sampling, self-attention bridges, cross-attention grounding, source-oriented guidance, and visual prompting.
VAGS adapts the CFG scale at each ODE step using velocity alignment signals to raise structural fidelity in editing and sample quality in generation over fixed-scale baselines.
StyleTextGen proposes a dual-branch style encoder, text style consistency loss, and mask-guided inference to achieve superior style consistency and cross-lingual performance in multilingual scene text generation on a new bilingual benchmark.
VLMs exhibit size, center, and saliency biases in scene understanding, relying less on people than humans do, with size bias as a key driver of divergence.
LimeCross enables text-guided editing of individual layers in composite images by conditioning on cross-layer context via bi-stream attention while preserving layer integrity and introducing the LayerEditBench benchmark.
FluSplat trains a model with geometric alignment constraints on multi-view edits to produce consistent 3D scene edits from sparse views in a single forward pass without test-time optimization.
RL-RIG uses a generate-reflect-edit loop with reinforcement learning to improve spatial accuracy in image generation, reporting up to 11% gains over prior open-source models on scene-graph metrics.
KD-CVG uses an Advertising Creative Knowledge Base plus Semantic-Aware Retrieval and Multimodal Knowledge Reference modules to improve semantic alignment and motion realism in text-to-video generation for advertising.
citing papers explorer
-
UniEditBench: A Unified and Cost-Effective Benchmark for Image and Video Editing via Distilled MLLMs
UniEditBench unifies image and video editing evaluation with a nine-plus-eight operation taxonomy and cost-effective 4B/8B distilled MLLM evaluators that align with human judgments.
-
Beyond Prompts: Unconditional 3D Inversion for Out-of-Distribution Shapes
Text-to-3D models lose prompt sensitivity for out-of-distribution shapes due to sink traps but retain geometric diversity via unconditional priors, enabling a decoupled inversion method for robust editing.
-
Efficient Zero-Shot Inpainting with Decoupled Diffusion Guidance
A new decoupled diffusion guidance method enables efficient zero-shot inpainting by avoiding backpropagation through the denoiser while maintaining observation consistency and quality.
-
Exploring Cross-Modal Flows for Few-Shot Learning
FMA introduces flow matching for multi-step cross-modal feature alignment in few-shot learning, using fixed coupling, noise augmentation, and early-stopping to outperform one-step PEFT methods.
-
UniEdit-Flow: Unleashing Inversion and Editing in the Era of Flow Models
UniEdit-Flow presents tuning-free Uni-Inv and Uni-Edit methods for inversion and editing in flow models that achieve accurate reconstruction and robust region-preserving edits across generative models.
-
StreamGVE: Training-Free Video Editing via Few-Step Streaming Video Generation
StreamGVE enables high-quality training-free video editing by converting the task to noise-to-data streaming generation with dual-branch fast sampling, self-attention bridges, cross-attention grounding, source-oriented guidance, and visual prompting.
-
VAGS: Velocity Adaptive Guidance Scale for Image Editing and Generation
VAGS adapts the CFG scale at each ODE step using velocity alignment signals to raise structural fidelity in editing and sample quality in generation over fixed-scale baselines.
-
StyleTextGen: Style-Conditioned Multilingual Scene Text Generation
StyleTextGen proposes a dual-branch style encoder, text style consistency loss, and mask-guided inference to achieve superior style consistency and cross-lingual performance in multilingual scene text generation on a new bilingual benchmark.
-
Revealing the Gap in Human and VLM Scene Perception through Counterfactual Semantic Saliency
VLMs exhibit size, center, and saliency biases in scene understanding, relying less on people than humans do, with size bias as a key driver of divergence.
-
LimeCross: Context-Conditioned Layered Image Editing with Structural Consistency
LimeCross enables text-guided editing of individual layers in composite images by conditioning on cross-layer context via bi-stream attention while preserving layer integrity and introducing the LayerEditBench benchmark.
-
FluSplat: Sparse-View 3D Editing without Test-Time Optimization
FluSplat trains a model with geometric alignment constraints on multi-view edits to produce consistent 3D scene edits from sparse views in a single forward pass without test-time optimization.
-
RL-RIG: A Generative Spatial Reasoner via Intrinsic Reflection
RL-RIG uses a generate-reflect-edit loop with reinforcement learning to improve spatial accuracy in image generation, reporting up to 11% gains over prior open-source models on scene-graph metrics.
-
KD-CVG: A Knowledge-Driven Approach for Creative Video Generation
KD-CVG uses an Advertising Creative Knowledge Base plus Semantic-Aware Retrieval and Multimodal Knowledge Reference modules to improve semantic alignment and motion realism in text-to-video generation for advertising.
- DirectEdit: Step-Level Accurate Inversion for Flow-Based Image Editing
- UniGeo: Unifying Geometric Guidance for Camera-Controllable Image Editing via Video Models