pith. sign in

arxiv: 2606.26872 · v1 · pith:VBWV4FD2new · submitted 2026-06-25 · 💻 cs.CV

SpatialFlow-GRPO: Where Spatial Credit Drives Image Editing

Pith reviewed 2026-06-26 05:35 UTC · model grok-4.3

classification 💻 cs.CV
keywords image editingreinforcement learningspatial rewardsregion-aware feedbackflow matchingGRPOcredit assignment
0
0 comments X

The pith

SpatialFlow-GRPO replaces whole-image rewards with region-aligned signals to improve fine-grained image editing.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that standard Flow-GRPO methods fail at precise edits because they apply one reward to the entire image and cannot tell which regions drove the score. SpatialFlow-GRPO fixes this by training a region-aware reward model called SFReward, turning its local scores into per-region advantage signals, and forcing those signals to line up with the exact latent positions being updated. When this alignment is enforced during policy optimization, editing quality rises on OmniGen2 and FLUX.2-klein-4B across GEdit-Bench, ImgEdit-Bench, and the new MultiEditBench. The authors also release SFReward-14K, a dataset of region-annotated editing pairs, to support the reward model.

Core claim

SpatialFlow-GRPO converts region-aware rewards into semantic-region-level optimization signals and aligns region advantages with the corresponding latent positions during policy updates, which removes the spatial uniformity assumption of prior whole-image reward methods and produces higher-quality edits on OmniGen2 and FLUX.2-klein-4B.

What carries the argument

The conversion of region-aware rewards from SFReward into semantic-region-level optimization signals that are explicitly aligned with latent positions during the GRPO policy update.

If this is right

  • Editing quality improves on GEdit-Bench, ImgEdit-Bench, and MultiEditBench relative to Flow-GRPO.
  • The method supports multi-region editing evaluation via the introduced MultiEditBench.
  • A region-annotated dataset SFReward-14K can be used to train further region-aware reward models.
  • The same spatial-alignment step can be applied to other Flow-GRPO variants without changing the base model.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same region-to-latent alignment idea could be tested on video or 3D generation where temporal or depth credit assignment is the analogous problem.
  • If the alignment step is removed, performance should collapse back to the level of standard Flow-GRPO; that controlled ablation would isolate the contribution of spatial credit.
  • The approach may generalize to any generative model that uses flow-matching or diffusion latents, provided a region-aware reward model can be trained.

Load-bearing premise

Region-aware rewards produced by SFReward can be turned into optimization signals that correctly match the spatial locations of the latents being updated.

What would settle it

Run the same training loop on OmniGen2 or FLUX.2-klein-4B but replace the spatial-alignment step with uniform whole-image advantages; if GEdit-Bench and MultiEditBench scores show no improvement over Flow-GRPO, the central claim is false.

Figures

Figures reproduced from arXiv: 2606.26872 by Bin Wen, Fan Yang, Han Li, Hongyang Wei, Shuo Yang, Tingting Gao, Wei Chen, Xingyu Lu, Yancheng Long, Yankai Yang.

Figure 1
Figure 1. Figure 1: Motivation. In multi-region editing, different regions may have different quality out￾comes. Flow-GRPO collapses them into one scalar reward, while SpatialFlow-GRPO attaches re￾ward feedback to semantic regions and enables spatially localized credit assignment. same feedback. Region-level feedback can instead assign separate scores to the affected semantic regions, making the update signal spatially locali… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of SpatialFlow-GRPO. The policy samples a group of edited images for each instruction, and SFReward returns region boxes, semantic labels, and scores. Region and global scores are converted into hybrid advantages, mapped back to latent regions, and optimized through region-consistent ratios and power-weighted aggregation. 2 Related work Image editing. Image editing modifies a source image accordin… view at source ↗
Figure 3
Figure 3. Figure 3: Structured output format of SFReward. their bounding boxes, and rewards {Ri,r} for each output. SpatialFlow-GRPO compares semanti￾cally corresponding regions under the same instruction and combines region-level advantages with a global quality anchor. The resulting hybrid advantages are aligned with the corresponding latent positions and used in the policy objective. In this way, the reward no longer affec… view at source ↗
Figure 4
Figure 4. Figure 4: Composition of MultiEditBench [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative examples of SpatialFlow-GRPO on OmniGen2. Compared with Flow￾GRPO, SpatialFlow-GRPO better preserves source identity and applies multiple requested local edits. 8 [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: SFReward-14K construction pipeline. Multi-source edit triplets are annotated with expert region boxes and labels, scored by a multimodal teacher, and filtered through automatic vali￾dation and human audit to produce data for training a region-aware reward model [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Additional OmniGen2 qualitative comparisons (1/3). Each row compares the base model, Flow-GRPO, and SpatialFlow-GRPO under multi-target editing instructions. 22 [PITH_FULL_IMAGE:figures/full_fig_p022_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Additional OmniGen2 qualitative comparisons (2/3). 23 [PITH_FULL_IMAGE:figures/full_fig_p023_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Additional OmniGen2 qualitative comparisons (3/3). 24 [PITH_FULL_IMAGE:figures/full_fig_p024_9.png] view at source ↗
read the original abstract

Recent online reinforcement learning has substantially improved image editing quality. However, existing Flow-GRPO-style methods usually rely on a single whole-image reward, which makes fine-grained editing optimization difficult. We observe that a key obstacle in image editing is this spatial uniformity assumption: a whole-image reward cannot distinguish how different spatial regions contribute to image quality. To address this issue, we propose SpatialFlow-GRPO, a training framework that introduces spatially fine-grained reward feedback. The framework converts region-aware rewards into semantic-region-level optimization signals and aligns region advantages with the corresponding latent positions during policy updates. We also train a region-aware reward model, SFReward, construct SFReward-14K with region-annotated editing samples, and introduce MultiEditBench to evaluate multi-region editing ability. On OmniGen2 and FLUX.2-klein-4B, SpatialFlow-GRPO outperforms Flow-GRPO on GEdit-Bench, ImgEdit-Bench, and MultiEditBench. The results show that SpatialFlow-GRPO converts local feedback into spatially aligned update signals and improves editing quality.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript proposes SpatialFlow-GRPO, an extension of Flow-GRPO for image editing that incorporates spatially fine-grained reward feedback via a new region-aware reward model SFReward (trained on the introduced SFReward-14K dataset). It converts region-aware rewards into semantic-region-level optimization signals, aligns region advantages with corresponding latent positions during policy updates, and introduces MultiEditBench to evaluate multi-region editing. Experiments claim that SpatialFlow-GRPO outperforms Flow-GRPO on GEdit-Bench, ImgEdit-Bench, and MultiEditBench using OmniGen2 and FLUX.2-klein-4B.

Significance. If the spatial alignment mechanism is robust and the mapping from pixel-space regions to latent positions is accurate, the approach could meaningfully advance fine-grained credit assignment in RL for generative image editing by overcoming the spatial uniformity of whole-image rewards. The new dataset and benchmark for multi-region editing are constructive additions to the field.

major comments (1)
  1. [Abstract / Method] Abstract and Method description of the alignment step: the central claim that the framework 'converts region-aware rewards into semantic-region-level optimization signals and aligns region advantages with the corresponding latent positions' rests on an untested correspondence assumption. Flow-model latents are downsampled and potentially entangled; without an explicit mapping procedure (coordinate scaling, attention, etc.), ablation on spatial distortions, or visualization confirming that advantages are not misaligned or globally averaged, the 'spatially aligned update signals' may not differ from standard Flow-GRPO in a load-bearing way.
minor comments (1)
  1. [Abstract] Abstract: multiple new terms (SFReward, SFReward-14K, MultiEditBench) are introduced without a brief definitional clause or forward reference, reducing immediate clarity for readers.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed and constructive review. We address the single major comment below.

read point-by-point responses
  1. Referee: [Abstract / Method] Abstract and Method description of the alignment step: the central claim that the framework 'converts region-aware rewards into semantic-region-level optimization signals and aligns region advantages with the corresponding latent positions' rests on an untested correspondence assumption. Flow-model latents are downsampled and potentially entangled; without an explicit mapping procedure (coordinate scaling, attention, etc.), ablation on spatial distortions, or visualization confirming that advantages are not misaligned or globally averaged, the 'spatially aligned update signals' may not differ from standard Flow-GRPO in a load-bearing way.

    Authors: We agree that the current manuscript description of the alignment step is insufficiently detailed and that the correspondence between pixel-space regions and latent positions requires explicit justification and validation. The referee's concern is valid: without a documented mapping, ablation, or visualization, it is difficult to confirm that the proposed signals are spatially localized rather than effectively global. In the revised manuscript we will (1) add a precise description of the mapping procedure (coordinate scaling by the model's downsampling factor with bilinear interpolation for non-grid alignment, followed by token-level masking in the GRPO objective), (2) include an ablation that intentionally perturbs the spatial mapping to quantify performance drop, and (3) add visualizations of per-region advantage maps overlaid on both pixel and latent grids. These additions will be placed in Section 3 and the supplementary material. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The abstract and described method introduce SpatialFlow-GRPO as an extension of Flow-GRPO that adds an independent spatial alignment step, a new region-aware reward model (SFReward), a new dataset (SFReward-14K), and a new benchmark (MultiEditBench). No equations or steps are shown that reduce the central claim (conversion of region rewards to aligned latent advantages) to a self-definition, a fitted parameter renamed as prediction, or a load-bearing self-citation chain. The mapping assumption is stated explicitly as a modeling choice rather than derived from prior inputs by construction. This is the normal case of an incremental method with external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 4 invented entities

Review limited to abstract; several new components are introduced whose internal construction details and assumptions are not provided.

axioms (1)
  • domain assumption A whole-image reward cannot distinguish how different spatial regions contribute to image quality.
    Presented as the key obstacle motivating the work.
invented entities (4)
  • SpatialFlow-GRPO no independent evidence
    purpose: Training framework that converts region-aware rewards into semantic-region-level optimization signals.
    Core proposed method.
  • SFReward no independent evidence
    purpose: Region-aware reward model trained on annotated editing samples.
    New component for providing fine-grained rewards.
  • SFReward-14K no independent evidence
    purpose: Dataset of region-annotated editing samples.
    Constructed to train the reward model.
  • MultiEditBench no independent evidence
    purpose: Benchmark for evaluating multi-region editing ability.
    Introduced to measure the new capability.

pith-pipeline@v0.9.1-grok · 5744 in / 1374 out tokens · 36253 ms · 2026-06-26T05:35:07.510553+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

57 extracted references · 2 canonical work pages

  1. [1]

    Qwen3-vl technical report, 2025

    Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, Xionghui Chen, Zesen Cheng, Lianghao Deng, Wei Ding, Chang Gao, Chunjiang Ge, Wenbin Ge, Zhifang Guo, Qidong Huang, Jie Huang, Fei Huang, Binyuan Hui, Shutong Jiang, Zhaohai Li, Mingsheng Li, Mei Li, Kaixin Li, Zicheng Lin, Junyang Lin, Xuejing Liu, Jiawei Liu, Chenglong Liu, Yang Liu, Dayiheng Liu, Shixuan ...

  2. [2]

    Training diffusion models with reinforcement learning, 2024

    Kevin Black, Michael Janner, Yilun Du, Ilya Kostrikov, and Sergey Levine. Training diffusion models with reinforcement learning, 2024. URLhttps://arxiv.org/abs/2305.13301

  3. [3]

    FLUX.2 [klein] 4b base

    Black Forest Labs. FLUX.2 [klein] 4b base. Hugging Face model card, 2026. URLhttps: //huggingface.co/black-forest-labs/FLUX.2-klein-base-4B

  4. [4]

    Tim Brooks, Aleksander Holynski, and Alexei A. Efros. Instructpix2pix: Learning to follow image editing instructions. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 18392–18402, June 2023

  5. [5]

    Diffedit: Diffusion-based semantic image editing with mask guidance, 2022

    Guillaume Couairon, Jakob Verbeek, Holger Schwenk, and Matthieu Cord. Diffedit: Diffusion-based semantic image editing with mask guidance, 2022. URLhttps://arxiv. org/abs/2210.11427

  6. [6]

    Scaling rectified flow transformers for high-resolution image synthesis

    Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, and Robin Rombach. Scaling rectified flow transformers for high-resolution image synthesis. InForty-first International Conference on Machine Learning, 2024. URLhttps: ...

  7. [7]

    Dpok: Reinforce- ment learning for fine-tuning text-to-image diffusion models

    Ying Fan, Olivia Watkins, Yuqing Du, Hao Liu, Moonkyung Ryu, Craig Boutilier, Pieter Abbeel, Mohammad Ghavamzadeh, Kangwook Lee, and Kimin Lee. Dpok: Reinforce- ment learning for fine-tuning text-to-image diffusion models. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural In- formation Processing Systems, v...

  8. [8]

    URLhttps://proceedings.neurips.cc/paper_files/paper/2023/file/ fc65fab891d83433bd3c8d966edde311-Paper-Conference.pdf

  9. [9]

    Guiding instruction-based image editing via multimodal large language models, 2024

    Tsu-Jui Fu, Wenze Hu, Xianzhi Du, William Yang Wang, Yinfei Yang, and Zhe Gan. Guiding instruction-based image editing via multimodal large language models, 2024. URLhttps: //arxiv.org/abs/2309.17102

  10. [10]

    Gemini 3 pro model card, 2025

    Google DeepMind. Gemini 3 pro model card, 2025. URLhttps://storage.googleapis. com/deepmind-media/Model-Cards/Gemini-3-Pro-Model-Card.pdf

  11. [11]

    Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z. F. Wu, Zhibin Gou, Zhihong Shao, Zhuoshu Li, Ziyi Gao, Aixin Liu, Bing Xue, Bingxuan Wang, Bochao Wu, Bei Feng, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chong Ruan, Damai Dai, Deli Chen, Dongjie Ji, ...

  12. [12]

    Imagedoctor: Diagnosing text-to-image generation via grounded image reasoning, 2025

    Yuxiang Guo, Jiang Liu, Ze Wang, Hao Chen, Ximeng Sun, Yang Zhao, Jialian Wu, Xiaodong Yu, Zicheng Liu, and Emad Barsoum. Imagedoctor: Diagnosing text-to-image generation via grounded image reasoning, 2025. URLhttps://arxiv.org/abs/2510.01010

  13. [13]

    Prompt-to-prompt image editing with cross attention control, 2022

    Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. Prompt-to-prompt image editing with cross attention control, 2022. URLhttps://arxiv. org/abs/2208.01626

  14. [14]

    Denoising diffusion probabilistic models

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors,Advances in Neural Information Processing Systems, volume 33, pages 6840–6851. Curran As- sociates, Inc., 2020. URLhttps://proceedings.neurips.cc/paper_files/paper/ 2020/file/4c5bcfec8584af0d967f1a...

  15. [15]

    Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen

    Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models, 2021. URLhttps://arxiv.org/abs/2106.09685

  16. [16]

    Multimodal rewardbench 2: Evaluating omni reward models for interleaved text and image, 2026

    Yushi Hu, Reyhane Askari-Hemmat, Melissa Hall, Emily Dinan, Luke Zettlemoyer, and Mar- jan Ghazvininejad. Multimodal rewardbench 2: Evaluating omni reward models for interleaved text and image, 2026. URLhttps://arxiv.org/abs/2512.16899

  17. [17]

    Towards better alignment: Training diffusion models with reinforce- ment learning against sparse rewards

    Zijing Hu, Fengda Zhang, Long Chen, Kun Kuang, Jiahui Li, Kaifeng Gao, Jun Xiao, Xin Wang, and Wenwu Zhu. Towards better alignment: Training diffusion models with reinforce- ment learning against sparse rewards. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 23604–23614, June 2025

  18. [18]

    Paralleledits: Efficient multi-object image editing, 2025

    Mingzhen Huang, Jialing Cai, Shan Jia, Vishnu Suresh Lokhande, and Siwei Lyu. Paralleledits: Efficient multi-object image editing, 2025. URLhttps://arxiv.org/abs/2406.00985

  19. [19]

    Smartedit: Exploring complex instruction-based image editing with multimodal large language models

    Yuzhou Huang, Liangbin Xie, Xintao Wang, Ziyang Yuan, Xiaodong Cun, Yixiao Ge, Jiantao Zhou, Chao Dong, Rui Huang, Ruimao Zhang, and Ying Shan. Smartedit: Exploring complex instruction-based image editing with multimodal large language models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8362– 8371, June 2024

  20. [20]

    Hq-edit: A high-quality dataset for instruction-based image editing, 2024

    Mude Hui, Siwei Yang, Bingchen Zhao, Yichun Shi, Heng Wang, Peng Wang, Yuyin Zhou, and Cihang Xie. Hq-edit: A high-quality dataset for instruction-based image editing, 2024. URLhttps://arxiv.org/abs/2404.09990

  21. [21]

    Pick-a-pic: An open dataset of user preferences for text-to-image generation

    Yuval Kirstain, Adam Polyak, Uriel Singer, Shahbuland Matiana, Joe Penna, and Omer Levy. Pick-a-pic: An open dataset of user preferences for text-to-image generation. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Ad- vances in Neural Information Processing Systems, volume 36, pages 36652–36663. Curran Associates, Inc., 20...

  22. [22]

    Viescore: Towards explain- able metrics for conditional image synthesis evaluation, 2024

    Max Ku, Dongfu Jiang, Cong Wei, Xiang Yue, and Wenhu Chen. Viescore: Towards explain- able metrics for conditional image synthesis evaluation, 2024. URLhttps://arxiv.org/ abs/2312.14867

  23. [23]

    Hp-edit: A human- preference post-training framework for image editing, 2026

    Fan Li, Chonghuinan Wang, Lina Lei, Yuping Qiu, Jiaqi Xu, Jiaxiu Jiang, Xinran Qin, Zhikai Chen, Fenglong Song, Zhixin Wang, Renjing Pei, and Wangmeng Zuo. Hp-edit: A human- preference post-training framework for image editing, 2026. URLhttps://arxiv.org/ abs/2604.19406

  24. [24]

    Instructrl4pix: Training diffusion for im- age editing by reinforcement learning, 2024

    Tiancheng Li, Jinxiu Liu, Huajun Chen, and Qi Liu. Instructrl4pix: Training diffusion for im- age editing by reinforcement learning, 2024. URLhttps://arxiv.org/abs/2406.09973

  25. [25]

    Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling, 2023. URLhttps://arxiv.org/abs/2210.02747

  26. [26]

    Flow-grpo: Training flow matching models via online rl, 2025

    Jie Liu, Gongye Liu, Jiajun Liang, Yangguang Li, Jiaheng Liu, Xintao Wang, Pengfei Wan, Di Zhang, and Wanli Ouyang. Flow-grpo: Training flow matching models via online rl, 2025. URLhttps://arxiv.org/abs/2505.05470

  27. [27]

    Step1x-edit: A practical framework for general image editing,

    Shiyu Liu, Yucheng Han, Peng Xing, Fukun Yin, Rui Wang, Wei Cheng, Jiaqi Liao, Yingming Wang, Honghao Fu, Chunrui Han, Guopeng Li, Yuang Peng, Quan Sun, Jingwei Wu, Yan Cai, Zheng Ge, Ranchen Ming, Lei Xia, Xianfang Zeng, Yibo Zhu, Binxing Jiao, Xiangyu Zhang, Gang Yu, and Daxin Jiang. Step1x-edit: A practical framework for general image editing,

  28. [28]

    URLhttps://arxiv.org/abs/2504.17761

  29. [29]

    Spatialreward: Bridging the perception gap in online rl for image editing via explicit spatial reasoning, 2026

    Yancheng Long, Yankai Yang, Hongyang Wei, Wei Chen, Tianke Zhang, Haonan fan, Changyi Liu, Kaiyu Jiang, Jiankang Chen, Kaiyu Tang, Bin Wen, Fan Yang, Tingting Gao, Han Li, and Shuo Yang. Spatialreward: Bridging the perception gap in online rl for image editing via explicit spatial reasoning, 2026. URLhttps://arxiv.org/abs/2602.07458

  30. [30]

    Editscore: Unlocking online rl for image editing via high-fidelity reward modeling, 2026

    Xin Luo, Jiahao Wang, Chenyuan Wu, Shitao Xiao, Xiyan Jiang, Defu Lian, Jiajun Zhang, Dong Liu, and Zheng liu. Editscore: Unlocking online rl for image editing via high-fidelity reward modeling, 2026. URLhttps://arxiv.org/abs/2509.23909

  31. [31]

    I2ebench: A comprehensive benchmark for instruction- based image editing

    Yiwei Ma, Jiayi Ji, Ke Ye, Weihuang Lin, Zhibin Wang, Yonghan Zheng, Qiang Zhou, Xi- aoshuai Sun, and Rongrong Ji. I2ebench: A comprehensive benchmark for instruction- based image editing. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors,Advances in Neural Information Processing Sys- tems, volume 37, pages 4149...

  32. [32]

    Sdedit: Guided image synthesis and editing with stochastic differential equations,

    Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, and Stefano Ermon. Sdedit: Guided image synthesis and editing with stochastic differential equations,

  33. [33]

    URLhttps://arxiv.org/abs/2108.01073

  34. [34]

    Introducing GPT-4.1 in the API, 2025

    OpenAI. Introducing GPT-4.1 in the API, 2025. URLhttps://openai.com/index/ gpt-4-1/

  35. [35]

    Scalable diffusion models with transformers

    William Peebles and Saining Xie. Scalable diffusion models with transformers. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 4195–4205, October 2023

  36. [36]

    High-resolution image synthesis with latent diffusion models, 2022

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models, 2022. URLhttps://arxiv. org/abs/2112.10752

  37. [37]

    Proximal policy optimization algorithms, 2017

    John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms, 2017. URLhttps://arxiv.org/abs/1707.06347

  38. [38]

    High- dimensional continuous control using generalized advantage estimation, 2018

    John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel. High- dimensional continuous control using generalized advantage estimation, 2018. URLhttps: //arxiv.org/abs/1506.02438. 12

  39. [39]

    Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y . K. Li, Y . Wu, and Daya Guo. Deepseekmath: Pushing the limits of mathematical reasoning in open language models, 2024. URLhttps://arxiv.org/abs/ 2402.03300

  40. [40]

    Emu edit: Precise image editing via recognition and generation tasks

    Shelly Sheynin, Adam Polyak, Uriel Singer, Yuval Kirstain, Amit Zohar, Oron Ashual, Devi Parikh, and Yaniv Taigman. Emu edit: Precise image editing via recognition and generation tasks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion (CVPR), pages 8871–8879, June 2024

  41. [41]

    Seededit: Align image re-generation to image editing, 2024

    Yichun Shi, Peng Wang, and Weilin Huang. Seededit: Align image re-generation to image editing, 2024. URLhttps://arxiv.org/abs/2411.06686

  42. [42]

    Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole

    Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations, 2021. URLhttps://arxiv.org/abs/2011.13456

  43. [43]

    Diffusion model alignment us- ing direct preference optimization

    Bram Wallace, Meihua Dang, Rafael Rafailov, Linqi Zhou, Aaron Lou, Senthil Purushwalkam, Stefano Ermon, Caiming Xiong, Shafiq Joty, and Nikhil Naik. Diffusion model alignment us- ing direct preference optimization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8228–8238, June 2024

  44. [44]

    Omniedit: Building image editing generalist models through specialist supervision

    Cong Wei, Zheyang Xiong, Weiming Ren, Xeron Du, Ge Zhang, and Wenhu Chen. Omniedit: Building image editing generalist models through specialist supervision. InThe Thirteenth International Conference on Learning Representations, 2025. URLhttps://openreview. net/forum?id=Hlm0cga0sv

  45. [45]

    Omnigen2: Towards instruction-aligned multimodal generation, 2026

    Chenyuan Wu, Pengfei Zheng, Ruiran Yan, Shitao Xiao, Xin Luo, Yueze Wang, Wanli Li, Xiyan Jiang, Yexin Liu, Junjie Zhou, Ze Liu, Ziyi Xia, Chaofan Li, Haoge Deng, Jiahao Wang, Kun Luo, Bo Zhang, Defu Lian, Xinlong Wang, Zhongyuan Wang, Tiejun Huang, and Zheng Liu. Omnigen2: Towards instruction-aligned multimodal generation, 2026. URLhttps: //arxiv.org/abs...

  46. [46]

    Editreward: A human-aligned reward model for instruction-guided image editing, 2026

    Keming Wu, Sicong Jiang, Max Ku, Ping Nie, Minghao Liu, and Wenhu Chen. Editreward: A human-aligned reward model for instruction-guided image editing, 2026. URLhttps: //arxiv.org/abs/2509.26346

  47. [47]

    Human preference score v2: A solid benchmark for evaluating human preferences of text- to-image synthesis, 2023

    Xiaoshi Wu, Yiming Hao, Keqiang Sun, Yixiong Chen, Feng Zhu, Rui Zhao, and Hongsheng Li. Human preference score v2: A solid benchmark for evaluating human preferences of text- to-image synthesis, 2023. URLhttps://arxiv.org/abs/2306.09341

  48. [48]

    Imagereward: Learning and evaluating human preferences for text-to-image gen- eration

    Jiazheng Xu, Xiao Liu, Yuchen Wu, Yuxuan Tong, Qinkai Li, Ming Ding, Jie Tang, and Yux- iao Dong. Imagereward: Learning and evaluating human preferences for text-to-image gen- eration. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems, volume 36, pages 15903–15935. Curran Ass...

  49. [49]

    Dancegrpo: Unleashing grpo on visual generation, 2025

    Zeyue Xue, Jie Wu, Yu Gao, Fangyuan Kong, Lingting Zhu, Mengzhao Chen, Zhiheng Liu, Wei Liu, Qiushan Guo, Weilin Huang, and Ping Luo. Dancegrpo: Unleashing grpo on visual generation, 2025. URLhttps://arxiv.org/abs/2505.07818

  50. [50]

    Joint reward modeling: Internalizing chain-of-thought for efficient visual reward models, 2026

    Yankai Yang, Yancheng Long, Hongyang Wei, Wei Chen, Tianke Zhang, Kaiyu Jiang, Haonan Fan, Changyi Liu, Jiankang Chen, Kaiyu Tang, Bin Wen, Fan Yang, Tingting Gao, Han Li, and Shuo Yang. Joint reward modeling: Internalizing chain-of-thought for efficient visual reward models, 2026. URLhttps://arxiv.org/abs/2602.07533

  51. [51]

    Imgedit: A unified image editing dataset and benchmark, 2025

    Yang Ye, Xianyi He, Zongjian Li, Bin Lin, Shenghai Yuan, Zhiyuan Yan, Bohan Hou, and Li Yuan. Imgedit: A unified image editing dataset and benchmark, 2025. URLhttps:// arxiv.org/abs/2505.20275. 13

  52. [52]

    Dapo: An open-source llm reinforcement learning system at scale, 2025

    Qiying Yu, Zheng Zhang, Ruofei Zhu, Yufeng Yuan, Xiaochen Zuo, Yu Yue, Weinan Dai, Tiantian Fan, Gaohong Liu, Lingjun Liu, Xin Liu, Haibin Lin, Zhiqi Lin, Bole Ma, Guangming Sheng, Yuxuan Tong, Chi Zhang, Mofan Zhang, Wang Zhang, Hang Zhu, Jinhua Zhu, Jiaze Chen, Jiangjie Chen, Chengyi Wang, Hongli Yu, Yuxuan Song, Xiangpeng Wei, Hao Zhou, Jingjing Liu, W...

  53. [53]

    Magicbrush: A man- ually annotated dataset for instruction-guided image editing

    Kai Zhang, Lingbo Mo, Wenhu Chen, Huan Sun, and Yu Su. Magicbrush: A man- ually annotated dataset for instruction-guided image editing. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural In- formation Processing Systems, volume 36, pages 31428–31449. Curran Associates, Inc.,

  54. [54]

    URLhttps://proceedings.neurips.cc/paper_files/paper/2023/file/ 64008fa30cba9b4d1ab1bd3bd3d57d61-Paper-Datasets_and_Benchmarks.pdf

  55. [55]

    edit_region

    Haozhe Zhao, Xiaojian Ma, Liang Chen, Shuzheng Si, Rujie Wu, Kaikai An, Peiyu Yu, Minjia Zhang, Qing Li, and Baobao Chang. Ultraedit: Instruction-based fine- grained image editing at scale. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors,Advances in Neural Information Pro- cessing Systems, volume 37, pages 305...

  56. [56]

    score_success: how well the edit follows the instruction (0=no change, 25=perfect)

  57. [57]

    Change the man’s shirt to red and add a tree in the background

    score_preserve: degree of preservation within the region (0=completely different, 25=minimal effective edit). BACKGROUND (0–25): Rate how well non-edited areas are preserved. Penalize unexpected edits, layout changes, artifacts outside editing regions. OVERALL (0–25): Overall success score and overall overediting score. The prompt has three key design ele...