PhyEditBench is a new benchmark for physics-aware image editing with real and synthetic instances plus a training-free PhyWorld baseline that uses test-time scaling to outperform SOTA models.
Editthinker: Unlocking iterative reasoning for any image editor
10 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 10roles
background 1polarities
background 1representative citing papers
Uni-Edit introduces a data synthesis pipeline turning VQA data into reasoning-intensive editing instructions, enabling single-task tuning that boosts all three capabilities in models like BAGEL and Janus-Pro.
Presents Entity-Rubrics and AbstractEdit benchmark to measure image editing models on abstract intent, finding standard models struggle to balance edit intent with image preservation.
EditRefiner uses a perception-reasoning-action-evaluation agent loop and the EditFHF-15K human feedback dataset to refine text-guided image edits more accurately than prior methods.
Gen-Searcher is the first trained search-augmented image generation agent using SFT followed by GRPO reinforcement learning with dual text-image rewards, delivering 15-16 point gains on knowledge-intensive benchmarks.
SPIRAL is a closed-loop think-act-reflect framework using PlanAgent, VideoGenerator, and CriticAgent plus GRPO self-evolution to improve long-horizon action-conditioned video generation, with new dataset and benchmark showing gains over open-loop baselines.
AesFormer decouples aesthetic planning from image editing via AesThinker and AesEditor to enable structural reconstruction in photos for better aesthetics.
Latent Action Control learns unobserved action trajectories via variational alignment and GRPO to inject reasoning into flow-based image generation, yielding gains on compositional benchmarks.
AdaTooler-V trains MLLMs to adaptively use vision tools via AT-GRPO reinforcement learning and new datasets, reaching 89.8% on V* and outperforming GPT-4o.
citing papers explorer
-
PhyEditBench: A Real-World Multi-Stage Benchmark for Physics-Aware Image Editing
PhyEditBench is a new benchmark for physics-aware image editing with real and synthetic instances plus a training-free PhyWorld baseline that uses test-time scaling to outperform SOTA models.
-
Uni-Edit: Intelligent Editing Is A General Task For Unified Model Tuning
Uni-Edit introduces a data synthesis pipeline turning VQA data into reasoning-intensive editing instructions, enabling single-task tuning that boosts all three capabilities in models like BAGEL and Janus-Pro.
-
Editor's Choice: Evaluating Abstract Intent in Image Editing through Atomic Entity Analysis
Presents Entity-Rubrics and AbstractEdit benchmark to measure image editing models on abstract intent, finding standard models struggle to balance edit intent with image preservation.
-
EditRefiner: A Human-Aligned Agentic Framework for Image Editing Refinement
EditRefiner uses a perception-reasoning-action-evaluation agent loop and the EditFHF-15K human feedback dataset to refine text-guided image edits more accurately than prior methods.
-
Gen-Searcher: Reinforcing Agentic Search for Image Generation
Gen-Searcher is the first trained search-augmented image generation agent using SFT followed by GRPO reinforcement learning with dual text-image rewards, delivering 15-16 point gains on knowledge-intensive benchmarks.
-
SPIRAL: Self-Evolving Action-Conditioned Video Generation via Reflective Planning Agents
SPIRAL is a closed-loop think-act-reflect framework using PlanAgent, VideoGenerator, and CriticAgent plus GRPO self-evolution to improve long-horizon action-conditioned video generation, with new dataset and benchmark showing gains over open-loop baselines.
-
AesFormer: Transform Everyday Photos into Beautiful Memories
AesFormer decouples aesthetic planning from image editing via AesThinker and AesEditor to enable structural reconstruction in photos for better aesthetics.
-
Latent Action Control for Reasoning-Guided Unified Image Generation
Latent Action Control learns unobserved action trajectories via variational alignment and GRPO to inject reasoning into flow-based image generation, yielding gains on compositional benchmarks.
-
AdaTooler-V: Adaptive Tool-Use for Images and Videos
AdaTooler-V trains MLLMs to adaptively use vision tools via AT-GRPO reinforcement learning and new datasets, reaching 89.8% on V* and outperforming GPT-4o.
- Making Image Editing Easier via Adaptive Task Reformulation with Agentic Executions