PhyEditBench is a new benchmark for physics-aware image editing with real and synthetic instances plus a training-free PhyWorld baseline that uses test-time scaling to outperform SOTA models.
Editthinker: Unlocking iterative reasoning for any image editor
13 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 13roles
background 1polarities
background 1representative citing papers
InterleaveThinker is the first multi-agent pipeline enabling interleaved generation in any image generator through planner-critic agents, SFT on custom datasets, and GRPO RL with accuracy and step-wise rewards.
Uni-Edit introduces a data synthesis pipeline turning VQA data into reasoning-intensive editing instructions, enabling single-task tuning that boosts all three capabilities in models like BAGEL and Janus-Pro.
Presents Entity-Rubrics and AbstractEdit benchmark to measure image editing models on abstract intent, finding standard models struggle to balance edit intent with image preservation.
EditRefiner uses a perception-reasoning-action-evaluation agent loop and the EditFHF-15K human feedback dataset to refine text-guided image edits more accurately than prior methods.
Gen-Searcher is the first trained search-augmented image generation agent using SFT followed by GRPO reinforcement learning with dual text-image rewards, delivering 15-16 point gains on knowledge-intensive benchmarks.
SPIRAL is a closed-loop think-act-reflect framework using PlanAgent, VideoGenerator, and CriticAgent plus GRPO self-evolution to improve long-horizon action-conditioned video generation, with new dataset and benchmark showing gains over open-loop baselines.
GMO-E²DIT is an agentic editing framework that decouples VLM-based planning from mask-conditioned rendering and uses reflection to execute multi-operation e-commerce image edits with error recovery.
AesFormer decouples aesthetic planning from image editing via AesThinker and AesEditor to enable structural reconstruction in photos for better aesthetics.
Latent Action Control learns unobserved action trajectories via variational alignment and GRPO to inject reasoning into flow-based image generation, yielding gains on compositional benchmarks.
AdaTooler-V trains MLLMs to adaptively use vision tools via AT-GRPO reinforcement learning and new datasets, reaching 89.8% on V* and outperforming GPT-4o.
Introduces ProductConsistency dataset, benchmark, and Cyclic Consistency reward to fine-tune image editing models, achieving a 5x reduction in character error rate for product identity preservation.
citing papers explorer
-
PhyEditBench: A Real-World Multi-Stage Benchmark for Physics-Aware Image Editing
PhyEditBench is a new benchmark for physics-aware image editing with real and synthetic instances plus a training-free PhyWorld baseline that uses test-time scaling to outperform SOTA models.
-
InterleaveThinker: Reinforcing Agentic Interleaved Generation
InterleaveThinker is the first multi-agent pipeline enabling interleaved generation in any image generator through planner-critic agents, SFT on custom datasets, and GRPO RL with accuracy and step-wise rewards.
-
Uni-Edit: Intelligent Editing Is A General Task For Unified Model Tuning
Uni-Edit introduces a data synthesis pipeline turning VQA data into reasoning-intensive editing instructions, enabling single-task tuning that boosts all three capabilities in models like BAGEL and Janus-Pro.
-
Editor's Choice: Evaluating Abstract Intent in Image Editing through Atomic Entity Analysis
Presents Entity-Rubrics and AbstractEdit benchmark to measure image editing models on abstract intent, finding standard models struggle to balance edit intent with image preservation.
-
EditRefiner: A Human-Aligned Agentic Framework for Image Editing Refinement
EditRefiner uses a perception-reasoning-action-evaluation agent loop and the EditFHF-15K human feedback dataset to refine text-guided image edits more accurately than prior methods.
-
Gen-Searcher: Reinforcing Agentic Search for Image Generation
Gen-Searcher is the first trained search-augmented image generation agent using SFT followed by GRPO reinforcement learning with dual text-image rewards, delivering 15-16 point gains on knowledge-intensive benchmarks.
-
SPIRAL: Self-Evolving Action-Conditioned Video Generation via Reflective Planning Agents
SPIRAL is a closed-loop think-act-reflect framework using PlanAgent, VideoGenerator, and CriticAgent plus GRPO self-evolution to improve long-horizon action-conditioned video generation, with new dataset and benchmark showing gains over open-loop baselines.
-
GMO-E$^2$DIT: Grounded Multi-Operation Editing for E-Commerce Images
GMO-E²DIT is an agentic editing framework that decouples VLM-based planning from mask-conditioned rendering and uses reflection to execute multi-operation e-commerce image edits with error recovery.
-
AesFormer: Transform Everyday Photos into Beautiful Memories
AesFormer decouples aesthetic planning from image editing via AesThinker and AesEditor to enable structural reconstruction in photos for better aesthetics.
-
Latent Action Control for Reasoning-Guided Unified Image Generation
Latent Action Control learns unobserved action trajectories via variational alignment and GRPO to inject reasoning into flow-based image generation, yielding gains on compositional benchmarks.
-
AdaTooler-V: Adaptive Tool-Use for Images and Videos
AdaTooler-V trains MLLMs to adaptively use vision tools via AT-GRPO reinforcement learning and new datasets, reaching 89.8% on V* and outperforming GPT-4o.
-
ProductConsistency: Improving Product Identity Preservation in Instruction-Based Image Editing via SFT and RL
Introduces ProductConsistency dataset, benchmark, and Cyclic Consistency reward to fine-tune image editing models, achieving a 5x reduction in character error rate for product identity preservation.
- Making Image Editing Easier via Adaptive Task Reformulation with Agentic Executions