pith. sign in

hub Mixed citations

Uniworld-V2: Reinforce Image Editing with Diffusion Negative-aware Finetuning and MLLM Implicit Feedback

Mixed citation behavior. Most common role is baseline (42%).

26 Pith papers citing it
Baseline 42% of classified citations
abstract

Instruction-based image editing has achieved remarkable progress; however, models solely trained via supervised fine-tuning often overfit to annotated patterns, hindering their ability to explore and generalize beyond training distributions. To this end, we introduce Edit-R1, a novel post-training framework for instruction-based image editing based on policy optimization. Specifically, we utilize Diffusion Negative-aware Finetuning (DiffusionNFT), a likelihood-free policy optimization method consistent with the flow matching forward process, thereby enabling the use of higher-order samplers and more efficient training. Another key challenge here is the absence of a universal reward model, resulting from the diverse nature of editing instructions and tasks. To bridge this gap, we employ a Multimodal Large Language Model (MLLM) as a unified, training-free reward model, leveraging its output logits to provide fine-grained feedback. Furthermore, we carefully design a low-variance group filtering mechanism to reduce MLLM scoring noise and stabilize optimization. \texttt{UniWorld-V2}, trained with this framework, achieves \textbf{state-of-the-art} results on the ImgEdit and GEdit-Bench benchmarks, scoring 4.49 and 7.83, respectively. Crucially, our framework is model-agnostic, delivering substantial performance gains when applied to diverse base models like Qwen-Image-Edit and FLUX-Kontext, demonstrating its wide applicability. Code and models are publicly available to support further research.

hub tools

citation-role summary

background 6 baseline 5 other 1

citation-polarity summary

years

2026 22 2025 4

clear filters

representative citing papers

Inline Critic Steers Image Editing

cs.CV · 2026-05-12 · conditional · novelty 7.0

Inline Critic uses a learnable token to critique and steer a frozen image-editing model's intermediate layers during generation, delivering state-of-the-art results on GEdit-Bench, RISEBench, and KRIS-Bench.

RewardHarness: Self-Evolving Agentic Post-Training

cs.AI · 2026-05-09 · unverdicted · novelty 7.0

RewardHarness self-evolves a tool-and-skill library from 100 preference examples to reach 47.4% accuracy on image-edit evaluation, beating GPT-5, and yields stronger RL-tuned models.

TextWand: A Unified Framework for Scene Text Editing

cs.CV · 2026-06-04 · unverdicted · novelty 6.0

TextWand unifies scene text removal, generation and replacement via rendering/erasure decomposition, ORPE for layout fidelity, RAS for clean erasure, and the new TextWand-Bench dataset, claiming superior accuracy and quality over prior models.

GenClaw: Code-Driven Agentic Image Generation

cs.CV · 2026-05-28 · unverdicted · novelty 6.0

GenClaw introduces a three-stage code-driven workflow for agentic image generation that inserts programmatic sketches between linguistic reasoning and pixel synthesis.

SpatialEdit: Benchmarking Fine-Grained Image Spatial Editing

cs.CV · 2026-04-06 · unverdicted · novelty 6.0

SpatialEdit provides a benchmark, large synthetic dataset, and baseline model for precise object and camera spatial manipulations in images, with the model beating priors on spatial editing.

Semantic Generative Tuning for Unified Multimodal Models

cs.CV · 2026-05-18 · unverdicted · novelty 5.0 · 2 refs

Semantic Generative Tuning applies segmentation-based generative proxies during post-training to align and improve both understanding and generation in unified multimodal models.

citing papers explorer

Showing 6 of 6 citing papers after filters.