pith. machine review for the scientific record. sign in

arxiv: 2505.20275 · v1 · submitted 2025-05-26 · 💻 cs.CV

Recognition: 2 theorem links

ImgEdit: A Unified Image Editing Dataset and Benchmark

Authors on Pith no claims yet

Pith reviewed 2026-05-12 18:13 UTC · model grok-4.3

classification 💻 cs.CV
keywords image editingdatasetbenchmarkvision-language modelgenerative modelsmulti-turn editinginstruction followinginpainting
0
0 comments X

The pith

ImgEdit supplies 1.2 million curated edit pairs that let a vision-language-model editor outperform prior open-source systems on instruction-based image changes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors create ImgEdit, a dataset of 1.2 million single-turn and multi-turn image edit pairs produced by a pipeline that chains a vision-language model with detection, segmentation, inpainting, and post-processing steps. They use the dataset to train ImgEdit-E1, which processes a reference image plus text prompt and shows stronger results than other open-source editing models across several tasks. They also release ImgEdit-Bench, which tests models on instruction adherence, editing quality, and detail preservation through basic, hard single-turn, and multi-turn suites. A sympathetic reader would care because the work directly tackles the data bottleneck that has kept open models behind closed ones, offering both training material and a standardized way to measure progress.

Core claim

ImgEdit is a large-scale image-editing dataset of 1.2 million carefully curated pairs that contain both novel complex single-turn edits and challenging multi-turn tasks; a multi-stage pipeline using a vision-language model, detection, segmentation, inpainting, and strict post-processing ensures quality and diversity; models trained on ImgEdit, specifically the VLM-based ImgEdit-E1, outperform existing open-source editors on multiple tasks; and ImgEdit-Bench provides evaluation across instruction adherence, editing quality, and detail preservation for open-source, proprietary, and the new model.

What carries the argument

The multi-stage curation pipeline that integrates a vision-language model, detection model, segmentation model, task-specific inpainting, and post-processing to generate high-quality edit pairs from reference images and prompts.

If this is right

  • Open-source image editing can advance on both single-turn and multi-turn instructions once high-quality paired data is available.
  • Standardized benchmarks like ImgEdit-Bench expose concrete gaps in current models' ability to preserve details while following edits.
  • The same curation approach could scale to produce even larger training sets without manual labeling.
  • Proprietary models may lose their edge if open models continue to train on comparably clean and diverse pairs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The pipeline's reliance on existing detection and segmentation tools suggests that further gains in those sub-models would automatically improve future editing datasets.
  • Multi-turn evaluation suites could become the default test for interactive creative tools, shifting research focus from one-shot generation to iterative refinement.
  • If the dataset is adopted widely, community fine-tunes of ImgEdit-E1 may appear that specialize in domains such as product photography or medical imagery.

Load-bearing premise

The pairs produced by the automated multi-stage pipeline are sufficiently free of curation artifacts and diverse enough that models trained on them generalize to real editing requests rather than learning pipeline-specific patterns.

What would settle it

Run ImgEdit-E1 and competing open-source models on a fresh set of user-provided prompts and images never seen during curation, then measure whether human raters still judge ImgEdit-E1 edits as superior in instruction match and visual quality.

read the original abstract

Recent advancements in generative models have enabled high-fidelity text-to-image generation. However, open-source image-editing models still lag behind their proprietary counterparts, primarily due to limited high-quality data and insufficient benchmarks. To overcome these limitations, we introduce ImgEdit, a large-scale, high-quality image-editing dataset comprising 1.2 million carefully curated edit pairs, which contain both novel and complex single-turn edits, as well as challenging multi-turn tasks. To ensure the data quality, we employ a multi-stage pipeline that integrates a cutting-edge vision-language model, a detection model, a segmentation model, alongside task-specific in-painting procedures and strict post-processing. ImgEdit surpasses existing datasets in both task novelty and data quality. Using ImgEdit, we train ImgEdit-E1, an editing model using Vision Language Model to process the reference image and editing prompt, which outperforms existing open-source models on multiple tasks, highlighting the value of ImgEdit and model design. For comprehensive evaluation, we introduce ImgEdit-Bench, a benchmark designed to evaluate image editing performance in terms of instruction adherence, editing quality, and detail preservation. It includes a basic testsuite, a challenging single-turn suite, and a dedicated multi-turn suite. We evaluate both open-source and proprietary models, as well as ImgEdit-E1, providing deep analysis and actionable insights into the current behavior of image-editing models. The source data are publicly available on https://github.com/PKU-YuanGroup/ImgEdit.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces ImgEdit, a dataset of 1.2 million image-editing pairs curated via a multi-stage pipeline (VLM prompt generation, detection, segmentation, inpainting, and post-processing) that claims higher novelty and quality than prior datasets. It trains ImgEdit-E1, a VLM-based editing model, and reports that this model outperforms existing open-source models on the authors' new ImgEdit-Bench, which comprises basic, challenging single-turn, and multi-turn suites measuring instruction adherence, editing quality, and detail preservation.

Significance. If the pipeline produces genuinely high-quality, diverse, and generalizable edit pairs without systematic curation artifacts, the work would be significant: it supplies a large public resource that could narrow the gap between open-source and proprietary editing models, while the benchmark offers a standardized evaluation framework. The public release of the data and code is a clear strength that supports reproducibility.

major comments (3)
  1. [Section 3] Section 3 (dataset construction): the multi-stage pipeline is presented without any quantitative validation of output quality (human ratings, inter-annotator agreement, error analysis, or ablation on individual stages such as VLM prompt generation or inpainting). This directly underpins the central claim that ImgEdit surpasses existing datasets in quality and novelty.
  2. [Section 4] Section 4 (model training and results): the reported outperformance of ImgEdit-E1 is given without details on how benchmark scores were computed, without ablations isolating the contribution of the new dataset versus model architecture, and without checks for pipeline-induced biases that could make superiority non-generalizable.
  3. [Section 5] Section 5 (ImgEdit-Bench): the benchmark description lacks explicit definitions or formulas for the three evaluation axes (instruction adherence, editing quality, detail preservation) and provides no analysis of metric reliability or potential annotation artifacts in the test suites.
minor comments (2)
  1. [Abstract / Section 3] The abstract and Section 3 refer to 'strict post-processing' without enumerating the exact filtering criteria or thresholds, which would aid reproducibility.
  2. [Tables/Figures] Table or figure captions comparing ImgEdit to prior datasets could more explicitly list the exact metrics used for the 'novelty' and 'quality' claims.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thorough review and constructive comments on our paper. We address each of the major comments point by point below, providing clarifications and committing to revisions where necessary to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Section 3] Section 3 (dataset construction): the multi-stage pipeline is presented without any quantitative validation of output quality (human ratings, inter-annotator agreement, error analysis, or ablation on individual stages such as VLM prompt generation or inpainting). This directly underpins the central claim that ImgEdit surpasses existing datasets in quality and novelty.

    Authors: We appreciate the referee pointing out the need for quantitative validation to support our claims about the dataset's quality and novelty. Although the pipeline is designed with multiple quality-control stages, we agree that empirical validation is essential. In the revised manuscript, we will add human evaluation results on a subset of the data, inter-annotator agreement metrics, detailed error analysis, and ablations studying the impact of individual components like the VLM prompt generation and inpainting steps. This will provide concrete evidence that ImgEdit offers higher quality and novelty compared to existing datasets. revision: yes

  2. Referee: [Section 4] Section 4 (model training and results): the reported outperformance of ImgEdit-E1 is given without details on how benchmark scores were computed, without ablations isolating the contribution of the new dataset versus model architecture, and without checks for pipeline-induced biases that could make superiority non-generalizable.

    Authors: We acknowledge that additional details and analyses are required to fully substantiate the outperformance claims. In the revision, we will provide explicit details on the computation of the benchmark scores, include ablation studies that isolate the contributions of the ImgEdit dataset versus the VLM-based architecture, and conduct an analysis of potential biases arising from the curation pipeline, along with discussions on how these might affect the generalizability of the results. revision: yes

  3. Referee: [Section 5] Section 5 (ImgEdit-Bench): the benchmark description lacks explicit definitions or formulas for the three evaluation axes (instruction adherence, editing quality, detail preservation) and provides no analysis of metric reliability or potential annotation artifacts in the test suites.

    Authors: We agree that clear definitions, formulas, and reliability analysis are important for the benchmark's utility. We will revise Section 5 to include explicit definitions and mathematical formulas for the three evaluation axes: instruction adherence, editing quality, and detail preservation. Furthermore, we will add an analysis of the metrics' reliability and discuss potential annotation artifacts or biases in the basic, challenging single-turn, and multi-turn test suites. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical dataset and benchmark construction

full rationale

The paper presents an empirical contribution: curation of 1.2M edit pairs via a multi-stage pipeline (VLM + detection + segmentation + inpainting + post-processing), training of ImgEdit-E1 on those pairs, and introduction of ImgEdit-Bench for evaluation. No equations, fitted parameters, or derivations are claimed. The central claims (dataset quality, model outperformance) are externally falsifiable via human ratings, ablations, or comparisons on held-out data and do not reduce to self-definition or self-citation chains. Self-citations, if present, are not load-bearing for any mathematical result.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on standard computer-vision assumptions about the reliability of off-the-shelf detection, segmentation, and inpainting models plus the ability of a VLM to generate useful edit instructions; no new entities or free parameters are introduced in the abstract.

axioms (1)
  • domain assumption Off-the-shelf vision-language, detection, and segmentation models produce sufficiently accurate outputs for curation without introducing systematic biases that degrade downstream editing performance.
    Invoked in the description of the multi-stage pipeline.

pith-pipeline@v0.9.0 · 5585 in / 1307 out tokens · 41325 ms · 2026-05-12T18:13:13.948179+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 32 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Edit-Compass & EditReward-Compass: A Unified Benchmark for Image Editing and Reward Modeling

    cs.CV 2026-05 unverdicted novelty 7.0

    Edit-Compass and EditReward-Compass are new unified benchmarks for fine-grained image editing evaluation and realistic reward modeling in reinforcement learning optimization.

  2. UniCustom: Unified Visual Conditioning for Multi-Reference Image Generation

    cs.CV 2026-05 unverdicted novelty 7.0

    UniCustom fuses ViT and VAE features before VLM encoding and uses two-stage training plus slot-wise regularization to improve subject consistency in multi-reference diffusion-based image generation.

  3. RewardHarness: Self-Evolving Agentic Post-Training

    cs.AI 2026-05 unverdicted novelty 7.0

    RewardHarness self-evolves a tool-and-skill library from 100 preference examples to reach 47.4% accuracy on image-edit evaluation, beating GPT-5, and yields stronger RL-tuned models.

  4. SpecEdit: Training-Free Acceleration for Diffusion based Image Editing via Semantic Locking

    cs.CV 2026-05 unverdicted novelty 7.0

    SpecEdit accelerates diffusion-based image editing up to 10x by using a low-resolution draft to identify edit-relevant tokens via semantic discrepancies for selective high-resolution denoising.

  5. UniEditBench: A Unified and Cost-Effective Benchmark for Image and Video Editing via Distilled MLLMs

    cs.CV 2026-04 unverdicted novelty 7.0

    UniEditBench unifies image and video editing evaluation with a nine-plus-eight operation taxonomy and cost-effective 4B/8B distilled MLLM evaluators that align with human judgments.

  6. HiddenObjects: Scalable Diffusion-Distilled Spatial Priors for Object Placement

    cs.CV 2026-04 unverdicted novelty 7.0

    A diffusion-based pipeline creates a 27M-annotation dataset of object placements that outperforms human annotations and baselines on image editing tasks, then distills it into a fast model.

  7. AIM-Bench: Benchmarking and Improving Affective Image Manipulation via Fine-Grained Hierarchical Control

    cs.CV 2026-04 unverdicted novelty 7.0

    AIM-Bench is the first dedicated benchmark for editing images to evoke specific emotions with fine-grained control, paired with AIM-40k dataset that delivers a 9.15% performance gain by correcting training data imbalances.

  8. RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details

    cs.CV 2026-04 unverdicted novelty 7.0

    RefineAnything is a multimodal diffusion model using Focus-and-Refine crop-and-resize with blended paste-back to achieve high-fidelity local image refinement and near-perfect background preservation.

  9. CAMEO: A Conditional and Quality-Aware Multi-Agent Image Editing Orchestrator

    cs.CV 2026-04 unverdicted novelty 7.0

    CAMEO uses coordinated agents for planning, prompting, generation, and quality feedback to achieve higher structural reliability in conditional image editing than single-step models.

  10. UniCustom: Unified Visual Conditioning for Multi-Reference Image Generation

    cs.CV 2026-05 unverdicted novelty 6.0

    A unified visual conditioning approach fuses semantic and appearance features before VLM processing, with two-stage training and slot-wise regularization, to improve consistency in multi-reference image generation.

  11. GeoR-Bench: Evaluating Geoscience Visual Reasoning

    cs.CV 2026-05 unverdicted novelty 6.0

    GeoR-Bench shows top multimodal models reach only 42.7% strict accuracy on geoscience visual reasoning tasks while open-source models reach 10.3%, with outputs often visually plausible yet scientifically inaccurate.

  12. HiDream-O1-Image: A Natively Unified Image Generative Foundation Model with Pixel-level Unified Transformer

    cs.CV 2026-05 unverdicted novelty 6.0

    A pixel-space Diffusion Transformer with Unified Transformer architecture unifies image generation, editing, and personalization in an end-to-end model that maps all inputs to a shared token space and scales from 8B t...

  13. DiffCap-Bench: A Comprehensive, Challenging, Robust Benchmark for Image Difference Captioning

    cs.CV 2026-05 unverdicted novelty 6.0

    DiffCap-Bench supplies a diverse IDC benchmark with ten categories and LLM judging grounded in human difference lists to evaluate MLLMs more robustly than prior lexical metrics.

  14. PhysEdit: Physically-Consistent Region-Aware Image Editing via Adaptive Spatio-Temporal Reasoning

    cs.CV 2026-05 unverdicted novelty 6.0

    PhysEdit introduces adaptive reasoning depth and spatial masking to make image editing faster and more instruction-aligned without retraining the base model.

  15. SpatialFusion: Endowing Unified Image Generation with Intrinsic 3D Geometric Awareness

    cs.CV 2026-04 unverdicted novelty 6.0

    SpatialFusion internalizes 3D geometric awareness into unified image generation models by pairing an MLLM with a spatial transformer that produces depth maps to constrain diffusion generation.

  16. Meta-CoT: Enhancing Granularity and Generalization in Image Editing

    cs.CV 2026-04 unverdicted novelty 6.0

    Meta-CoT uses two-level decomposition of editing operations into meta-tasks and a CoT consistency reward to improve granularity and generalization, reporting 15.8% gains across 21 tasks.

  17. Image Generators are Generalist Vision Learners

    cs.CV 2026-04 unverdicted novelty 6.0

    Image generation pretraining produces generalist vision models that reframe perception tasks as image synthesis and reach SOTA results on segmentation, depth estimation, and other 2D/3D tasks.

  18. Image Generators are Generalist Vision Learners

    cs.CV 2026-04 conditional novelty 6.0

    Image generation pretraining builds generalist vision models that reach SOTA on 2D and 3D perception tasks by reframing them as RGB image outputs.

  19. Making Image Editing Easier via Adaptive Task Reformulation with Agentic Executions

    cs.CV 2026-04 unverdicted novelty 6.0

    An MLLM agent reformulates image editing tasks into executable operation sequences to improve reliability on challenging cases across existing generative backbones.

  20. VibeFlow: Versatile Video Chroma-Lux Editing through Self-Supervised Learning

    cs.CV 2026-04 unverdicted novelty 6.0

    VibeFlow performs versatile video chroma-lux editing in zero-shot fashion by self-supervised disentanglement of structure and color-illumination cues inside pre-trained video models, plus residual velocity fields and ...

  21. InsEdit: Towards Instruction-based Visual Editing via Data-Efficient Video Diffusion Models Adaptation

    cs.CV 2026-04 unverdicted novelty 6.0

    InsEdit adapts a video diffusion backbone for text-instruction video editing via Mutual Context Attention, achieving SOTA open-source results with O(100K) data while also supporting image editing.

  22. SpatialEdit: Benchmarking Fine-Grained Image Spatial Editing

    cs.CV 2026-04 unverdicted novelty 6.0

    SpatialEdit provides a benchmark, large synthetic dataset, and baseline model for precise object and camera spatial manipulations in images, with the model beating priors on spatial editing.

  23. SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture

    cs.CV 2026-05 unverdicted novelty 5.0

    SenseNova-U1 presents native unified multimodal models that match top understanding VLMs while delivering strong performance in image generation, infographics, and interleaved tasks via the NEO-unify architecture.

  24. DataEvolver: Let Your Data Build and Improve Itself via Goal-Driven Loop Agents

    cs.AI 2026-05 unverdicted novelty 5.0

    DataEvolver introduces a reusable framework with generation-time self-correction and validation-time self-expansion loops that improves visual datasets, shown to outperform baselines on an object-rotation task.

  25. Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling

    cs.CV 2026-04 unverdicted novelty 5.0

    Visual generation models are evolving from passive renderers to interactive agentic world modelers, but current systems lack spatial reasoning, temporal consistency, and causal understanding, with evaluations overemph...

  26. Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation

    cs.CV 2026-04 unverdicted novelty 5.0

    Tuna-2 shows pixel embeddings can replace vision encoders in unified multimodal models, achieving competitive or superior results on understanding and generation benchmarks.

  27. FineEdit: Fine-Grained Image Edit with Bounding Box Guidance

    cs.CV 2026-04 unverdicted novelty 5.0

    FineEdit adds multi-level bounding box injection to diffusion image editing, releases a 1.2M-pair dataset with box annotations, and shows better instruction following and background consistency than prior open models ...

  28. Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer

    cs.CV 2025-11 unverdicted novelty 5.0

    Z-Image is an efficient 6B-parameter foundation model for image generation that rivals larger commercial systems in photorealism and bilingual text rendering through a new single-stream diffusion transformer and strea...

  29. UniWorld-V1: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation

    cs.CV 2025-06 unverdicted novelty 5.0

    UniWorld-V1 shows that semantic features from large multimodal models enable unified visual understanding and generation, achieving strong results on perception and manipulation tasks with only 2.7 million training samples.

  30. Awaking Spatial Intelligence in Unified Multimodal Understanding and Generation

    cs.GR 2026-05 unverdicted novelty 4.0

    JoyAI-Image unifies visual understanding, generation, and editing in one model and claims stronger spatial intelligence through bidirectional perception-generation loops.

  31. Mamoda2.5: Enhancing Unified Multimodal Model with DiT-MoE

    cs.CV 2026-05 unverdicted novelty 4.0

    Mamoda2.5 is a 25B-parameter DiT-MoE unified AR-Diffusion model that reaches top video generation and editing benchmarks with 4-step inference up to 95.9x faster than baselines.

  32. TorchUMM: A Unified Multimodal Model Codebase for Evaluation, Analysis, and Post-training

    cs.AI 2026-04 unverdicted novelty 4.0

    TorchUMM is the first unified codebase and benchmark suite for standardized evaluation of diverse unified multimodal models on understanding, generation, and editing tasks.

Reference graph

Works this paper leans on

85 extracted references · 85 canonical work pages · cited by 30 Pith papers · 11 internal anchors

  1. [1]

    GPT-4 Technical Report

    Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023

  2. [2]

    Nocaps: Novel object captioning at scale

    Harsh Agrawal, Karan Desai, Yufei Wang, Xinlei Chen, Rishabh Jain, Mark Johnson, Dhruv Batra, Devi Parikh, Stefan Lee, and Peter Anderson. Nocaps: Novel object captioning at scale. In Proceedings of the IEEE/CVF international conference on computer vision, pages 8948–8957, 2019

  3. [3]

    Humanedit: A high-quality human-rewarded dataset for instruction-based image editing

    Jinbin Bai, Wei Chow, Ling Yang, Xiangtai Li, Juncheng Li, Hanwang Zhang, and Shuicheng Yan. Humanedit: A high-quality human-rewarded dataset for instruction-based image editing. arXiv preprint arXiv:2412.04280, 2024

  4. [4]

    Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, et al. Qwen2. 5-vl technical report. arXiv preprint arXiv:2502.13923, 2025

  5. [5]

    Editval: Benchmarking diffusion based text-guided image editing methods

    Samyadeep Basu, Mehrdad Saberi, Shweta Bhardwaj, Atoosa Malemir Chegini, Daniela Mas- siceti, Maziar Sanjabi, Shell Xu Hu, and Soheil Feizi. Editval: Benchmarking diffusion based text-guided image editing methods. arXiv preprint arXiv:2310.02426, 2023

  6. [6]

    Tim Brooks, Aleksander Holynski, and Alexei A. Efros. Instructpix2pix: Learning to follow image editing instructions. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 18392–18402, 2022

  7. [7]

    Emerging properties in self-supervised vision transformers

    Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9650–9660, 2021

  8. [8]

    Yolo-world: Real-time open-vocabulary object detection

    Tianheng Cheng, Lin Song, Yixiao Ge, Wenyu Liu, Xinggang Wang, and Ying Shan. Yolo-world: Real-time open-vocabulary object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16901–16911, 2024

  9. [9]

    Sequential attention gan for interactive image editing

    Yu Cheng, Zhe Gan, Yitong Li, Jingjing Liu, and Jianfeng Gao. Sequential attention gan for interactive image editing. In Proceedings of the 28th ACM international conference on multimedia, pages 4383–4391, 2020

  10. [10]

    Easy2hard-bench: Standardized difficulty labels for profiling llm performance and generalization

    Mucong Ding, Chenghao Deng, Jocelyn Choo, Zichu Wu, Aakriti Agrawal, Avi Schwarzschild, Tianyi Zhou, Tom Goldstein, John Langford, Animashree Anandkumar, et al. Easy2hard-bench: Standardized difficulty labels for profiling llm performance and generalization. Advances in Neural Information Processing Systems, 37:44323–44365, 2024

  11. [11]

    Tell, draw, and repeat: Gen- erating and modifying images based on continual linguistic instruction

    Alaaeldin El-Nouby, Shikhar Sharma, Hannes Schulz, R Devon Hjelm, Layla El Asri, Samira Ebrahimi Kahou, Yoshua Bengio, and Graham Taylor. Tell, draw, and repeat: Gen- erating and modifying images based on continual linguistic instruction. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 10303–10311, 2019

  12. [13]

    Scaling rectified flow trans- formers for high-resolution image synthesis

    Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, et al. Scaling rectified flow trans- formers for high-resolution image synthesis. In Forty-first international conference on machine learning, 2024

  13. [14]

    Got: Unleashing reasoning capability of multimodal large language model for visual generation and editing.arXiv preprint arXiv:2503.10639, 2025a

    Rongyao Fang, Chengqi Duan, Kun Wang, Linjiang Huang, Hao Li, Shilin Yan, Hao Tian, Xingyu Zeng, Rui Zhao, Jifeng Dai, Xihui Liu, and Hongsheng Li. Got: Unleashing reasoning capability of multimodal large language model for visual generation and editing. arXiv preprint arXiv:2503.10639, 2025. 10

  14. [15]

    Vitron: A unified pixel-level vision llm for understanding, generating, segmenting, editing

    Hao Fei, Shengqiong Wu, Hanwang Zhang, Tat-Seng Chua, and Shuicheng Yan. Vitron: A unified pixel-level vision llm for understanding, generating, segmenting, editing. arXiv preprint arXiv:2412.19806, 2024

  15. [16]

    arXiv preprint arXiv:2309.17102 (2023)

    Tsu-Jui Fu, Wenze Hu, Xianzhi Du, William Yang Wang, Yinfei Yang, and Zhe Gan. Guid- ing instruction-based image editing via multimodal large language models. arXiv preprint arXiv:2309.17102, 2023

  16. [17]

    Sscr: Iterative language-based image editing via self-supervised counterfactual reasoning

    Tsu-Jui Fu, Xin Eric Wang, Scott Grafton, Miguel Eckstein, and William Yang Wang. Sscr: Iterative language-based image editing via self-supervised counterfactual reasoning. arXiv preprint arXiv:2009.09566, 2020

  17. [18]

    Seed-data-edit technical report: A hybrid dataset for instructional image editing

    Yuying Ge, Sijie Zhao, Chen Li, Yixiao Ge, and Ying Shan. Seed-data-edit technical report: A hybrid dataset for instructional image editing. arXiv preprint arXiv:2405.04007, 2024

  18. [19]

    Experiment with gemini 2.0 flash native image generation, 2025

    Google Gemini2. Experiment with gemini 2.0 flash native image generation, 2025

  19. [20]

    Instructdiffusion: A generalist modeling interface for vision tasks

    Zigang Geng, Binxin Yang, Tiankai Hang, Chen Li, Shuyang Gu, Ting Zhang, Jianmin Bao, Zheng Zhang, Houqiang Li, Han Hu, et al. Instructdiffusion: A generalist modeling interface for vision tasks. In Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pages 12709–12720, 2024

  20. [21]

    The Llama 3 Herd of Models

    Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024

  21. [22]

    Infinity: Scaling bitwise autoregressive modeling for high-resolution image synthesis, 2024

    Jian Han, Jinlai Liu, Yi Jiang, Bin Yan, Yuqi Zhang, Zehuan Yuan, Bingyue Peng, and Xiaobing Liu. Infinity: Scaling bitwise autoregressive modeling for high-resolution image synthesis. arXiv preprint arXiv:2412.04431, 2024

  22. [23]

    arXiv preprint arXiv:2410.15553 , year=

    Yun He, Di Jin, Chaoqi Wang, Chloe Bi, Karishma Mandyam, Hejia Zhang, Chen Zhu, Ning Li, Tengyu Xu, Hongjiang Lv, et al. Multi-if: Benchmarking llms on multi-turn and multilingual instructions following. arXiv preprint arXiv:2410.15553, 2024

  23. [25]

    Prompt-to-Prompt Image Editing with Cross Attention Control

    Amir Hertz, Ron Mokady, Jay M. Tenenbaum, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. Prompt-to-prompt image editing with cross attention control. ArXiv, abs/2208.01626, 2022

  24. [26]

    Lora: Low-rank adaptation of large language models

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models. ICLR, 1(2):3, 2022

  25. [27]

    Smartedit: Exploring complex instruction-based image editing with multimodal large language models

    Yuzhou Huang, Liangbin Xie, Xintao Wang, Ziyang Yuan, Xiaodong Cun, Yixiao Ge, Jiantao Zhou, Chao Dong, Rui Huang, Ruimao Zhang, and Ying Shan. Smartedit: Exploring complex instruction-based image editing with multimodal large language models. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8362–8371, 2024

  26. [28]

    arXiv preprint arXiv:2404.09990 , year=

    Mude Hui, Siwei Yang, Bingchen Zhao, Yichun Shi, Heng Wang, Peng Wang, Yuyin Zhou, and Cihang Xie. Hq-edit: A high-quality dataset for instruction-based image editing. arXiv preprint arXiv:2404.09990, 2024

  27. [29]

    Moh: Multi-head attention as mixture-of-head attention

    Peng Jin, Bo Zhu, Li Yuan, and Shuicheng Yan. Moh: Multi-head attention as mixture-of-head attention. arXiv preprint arXiv:2410.11842, 2024

  28. [30]

    Imagic: Text-based real image editing with diffusion models

    Bahjat Kawar, Shiran Zada, Oran Lang, Omer Tov, Huiwen Chang, Tali Dekel, Inbar Mosseri, and Michal Irani. Imagic: Text-based real image editing with diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6007–6017, 2023. 11

  29. [31]

    Segment anything

    Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al. Segment anything. In Proceedings of the IEEE/CVF international conference on computer vision, pages 4015–4026, 2023

  30. [32]

    Pick-a-pic: An open dataset of user preferences for text-to-image generation

    Yuval Kirstain, Adam Polyak, Uriel Singer, Shahbuland Matiana, Joe Penna, and Omer Levy. Pick-a-pic: An open dataset of user preferences for text-to-image generation. Advances in Neural Information Processing Systems, 36:36652–36663, 2023

  31. [33]

    Peak signal-to-noise ratio revisited: Is simple beautiful? In 2012 Fourth International Workshop on Quality of Multimedia Experience, pages 37–38, 2012

    Jari Korhonen and Junyong You. Peak signal-to-noise ratio revisited: Is simple beautiful? In 2012 Fourth International Workshop on Quality of Multimedia Experience, pages 37–38, 2012

  32. [34]

    Learning action and reasoning-centric image editing from videos and simulation

    Benno Krojer, Dheeraj Vattikonda, Luis Lara, Varun Jampani, Eva Portelance, Chris Pal, and Siva Reddy. Learning action and reasoning-centric image editing from videos and simulation. Advances in Neural Information Processing Systems, 37:38035–38078, 2024

  33. [35]

    Viescore: Towards explainable metrics for conditional image synthesis evaluation

    Max Ku, Dongfu Jiang, Cong Wei, Xiang Yue, and Wenhu Chen. Viescore: Towards explainable metrics for conditional image synthesis evaluation. arXiv preprint arXiv:2312.14867, 2023

  34. [36]

    Seed-bench: Benchmarking multimodal large language models

    Bohao Li, Yuying Ge, Yixiao Ge, Guangzhi Wang, Rui Wang, Ruimao Zhang, and Ying Shan. Seed-bench: Benchmarking multimodal large language models. In 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13299–13308, 2024

  35. [37]

    Have we unified image generation and understanding yet? an empirical study of gpt-4o’s image generation ability

    Ning Li, Jingran Zhang, and Justin Cui. Have we unified image generation and understanding yet? an empirical study of gpt-4o’s image generation ability. arXiv preprint arXiv:2504.08003, 2025

  36. [38]

    Instructany2pix: Flexible visual editing via multimodal instruction following

    Shufan Li, Harkanwar Singh, and Aditya Grover. Instructany2pix: Flexible visual editing via multimodal instruction following. arXiv preprint arXiv:2312.06738, 2023

  37. [39]

    Open-sora plan: Open-source large video generation model.arXiv preprint arXiv:2412.00131, 2024

    Bin Lin, Yunyang Ge, Xinhua Cheng, Zongjian Li, Bin Zhu, Shaodong Wang, Xianyi He, Yang Ye, Shenghai Yuan, Liuhan Chen, et al. Open-sora plan: Open-source large video generation model. arXiv preprint arXiv:2412.00131, 2024

  38. [40]

    Moe-llava: Mix- ture of experts for large vision-language models

    Bin Lin, Zhenyu Tang, Yang Ye, Jiaxi Cui, Bin Zhu, Peng Jin, Jinfa Huang, Junwu Zhang, Yatian Pang, Munan Ning, et al. Moe-llava: Mixture of experts for large vision-language models. arXiv preprint arXiv:2401.15947, 2024

  39. [41]

    Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

    Bin Lin, Yang Ye, Bin Zhu, Jiaxi Cui, Munan Ning, Peng Jin, and Li Yuan. Video-llava: Learning united visual representation by alignment before projection. arXiv preprint arXiv:2311.10122, 2023

  40. [42]

    Vila: On pre-training for visual language models

    Ji Lin, Hongxu Yin, Wei Ping, Pavlo Molchanov, Mohammad Shoeybi, and Song Han. Vila: On pre-training for visual language models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 26689–26699, 2024

  41. [43]

    Microsoft coco: Common objects in context

    Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In Computer vision–ECCV 2014: 13th European conference, zurich, Switzerland, September 6-12, 2014, proceedings, part v 13, pages 740–755. Springer, 2014

  42. [44]

    A multimodal dialogue system for conversational image editing

    Tzu-Hsiang Lin, Trung Bui, Doo Soon Kim, and Jean Oh. A multimodal dialogue system for conversational image editing. arXiv preprint arXiv:2002.06484, 2020

  43. [45]

    Visual instruction tuning.Advances in neural information processing systems, 36:34892–34916, 2023

    Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning.Advances in neural information processing systems, 36:34892–34916, 2023

  44. [46]

    Step1X-Edit: A Practical Framework for General Image Editing

    Shiyu Liu, Yucheng Han, Peng Xing, Fukun Yin, Rui Wang, Wei Cheng, Jiaqi Liao, Yingming Wang, Honghao Fu, Chunrui Han, et al. Step1x-edit: A practical framework for general image editing. arXiv preprint arXiv:2504.17761, 2025

  45. [47]

    Mmdu: A multi-turn multi-image dialog understanding benchmark and instruction-tuning dataset for lvlms

    Ziyu Liu, Tao Chu, Yuhang Zang, Xilin Wei, Xiaoyi Dong, Pan Zhang, Zijian Liang, Yuanjun Xiong, Yu Qiao, Dahua Lin, et al. Mmdu: A multi-turn multi-image dialog understanding benchmark and instruction-tuning dataset for lvlms. arXiv preprint arXiv:2406.11833, 2024. 12

  46. [48]

    I2ebench: A comprehensive benchmark for instruction-based image editing

    Yiwei Ma, Jiayi Ji, Ke Ye, Weihuang Lin, Zhibin Wang, Yonghan Zheng, Qiang Zhou, Xiaoshuai Sun, and Rongrong Ji. I2ebench: A comprehensive benchmark for instruction-based image editing. arXiv preprint arXiv:2408.14180, 2024

  47. [49]

    Superedit: Rectifying and facilitating supervision for instruction-based image editing

    Li Ming, Gu Xin, Chen Fan, Xing Xiaoying, Wen Longyin, Chen Chen, and Zhu Sijie. Superedit: Rectifying and facilitating supervision for instruction-based image editing. arXiv preprint arXiv:2505.02370, 2025

  48. [50]

    Mishchenko and A

    Konstantin Mishchenko and Aaron Defazio. Prodigy: An expeditiously adaptive parameter-free learner. arXiv preprint arXiv:2306.06101, 2023

  49. [51]

    Introducing 4o image generation, 2025

    OpenAI. Introducing 4o image generation, 2025

  50. [52]

    DINOv2: Learning Robust Visual Features without Supervision

    Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023

  51. [53]

    SDXL: Improving latent diffusion models for high-resolution image synthesis

    Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, and Robin Rombach. SDXL: Improving latent diffusion models for high-resolution image synthesis. In The Twelfth International Conference on Learning Representations, 2024

  52. [54]

    Learning transferable visual models from natural language supervision

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PmLR, 2021

  53. [55]

    Exploring the limits of transfer learning with a unified text-to-text transformer

    Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research, 21(140):1–67, 2020

  54. [56]

    Zero-shot text-to-image generation

    Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea V oss, Alec Radford, Mark Chen, and Ilya Sutskever. Zero-shot text-to-image generation. In International conference on machine learning, pages 8821–8831. Pmlr, 2021

  55. [57]

    SAM 2: Segment anything in images and videos

    Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman Rädle, Chloe Rolland, Laura Gustafson, Eric Mintun, Junting Pan, Kalyan Vasudev Alwala, Nicolas Carion, Chao-Yuan Wu, Ross Girshick, Piotr Dollar, and Christoph Feichtenhofer. SAM 2: Segment anything in images and videos. In The Thirteenth Internatio...

  56. [58]

    Laion- 5b: An open large-scale dataset for training next generation image-text models

    Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, et al. Laion- 5b: An open large-scale dataset for training next generation image-text models. Advances in neural information processing systems, 35:25278–25294, 2022

  57. [59]

    Emu edit: Precise image editing via recognition and generation tasks

    Shelly Sheynin, Adam Polyak, Uriel Singer, Yuval Kirstain, Amit Zohar, Oron Ashual, Devi Parikh, and Yaniv Taigman. Emu edit: Precise image editing via recognition and generation tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8871–8879, 2024

  58. [60]

    Seededit: Align image re-generation to image editing

    Yichun Shi, Peng Wang, and Weilin Huang. Seededit: Align image re-generation to image editing. arXiv preprint arXiv:2411.06686, 2024

  59. [61]

    Improving image captioning with better use of captions

    Zhan Shi, Xu Zhou, Xipeng Qiu, and Xiaodan Zhu. Improving image captioning with better use of captions. arXiv preprint arXiv:2006.11807, 2020

  60. [62]

    arXiv preprint arXiv:2501.17399 , year=

    Ved Sirdeshmukh, Kaustubh Deshpande, Johannes Mols, Lifeng Jin, Ed-Yeremai Cardona, Dean Lee, Jeremy Kritz, Willow Primack, Summer Yue, and Chen Xing. Multichallenge: A realistic multi-turn conversation evaluation benchmark challenging to frontier llms. arXiv preprint arXiv:2501.17399, 2025

  61. [63]

    Journeydb: A benchmark for generative image understanding

    Keqiang Sun, Junting Pan, Yuying Ge, Hao Li, Haodong Duan, Xiaoshi Wu, Renrui Zhang, Aojun Zhou, Zipeng Qin, Yi Wang, et al. Journeydb: A benchmark for generative image understanding. Advances in neural information processing systems, 36:49659–49678, 2023. 13

  62. [64]

    Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

    Peize Sun, Yi Jiang, Shoufa Chen, Shilong Zhang, Bingyue Peng, Ping Luo, and Zehuan Yuan. Autoregressive model beats diffusion: Llama for scalable image generation. arXiv preprint arXiv:2406.06525, 2024

  63. [65]

    Ominicontrol: Minimal and uni- versal control for diffusion transformer.arXiv preprint arXiv:2411.15098, 2024

    Zhenxiong Tan, Songhua Liu, Xingyi Yang, Qiaochu Xue, and Xinchao Wang. Ominicontrol: Minimal and universal control for diffusion transformer. arXiv preprint arXiv:2411.15098, 2024

  64. [66]

    Llama 2: Open Foundation and Fine-Tuned Chat Models

    Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023

  65. [67]

    Action-based image editing guided by human instructions

    Maria Mihaela Trusca, Mingxiao Li, and Marie-Francine Moens. Action-based image editing guided by human instructions. arXiv preprint arXiv:2412.04558, 2024

  66. [68]

    SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

    Michael Tschannen, Alexey Gritsenko, Xiao Wang, Muhammad Ferjad Naeem, Ibrahim Alab- dulmohsin, Nikhil Parthasarathy, Talfan Evans, Lucas Beyer, Ye Xia, Basil Mustafa, et al. Siglip 2: Multilingual vision-language encoders with improved semantic understanding, localization, and dense features. arXiv preprint arXiv:2502.14786, 2025

  67. [69]

    Flexedit: Marrying free-shape masks to vllm for flexible image editing

    Jue Wang, Yuxiang Lin, Tianshuo Yuan, Zhi-Qi Cheng, Xiaolong Wang, Jiao GH, Wei Chen, and Xiaojiang Peng. Flexedit: Marrying free-shape masks to vllm for flexible image editing. arXiv preprint arXiv:2408.12429, 2024

  68. [70]

    Imagen editor and editbench: Advancing and evaluating text-guided image inpainting

    Su Wang, Chitwan Saharia, Ceslee Montgomery, Jordi Pont-Tuset, Shai Noy, Stefano Pellegrini, Yasumasa Onoe, Sarah Laszlo, David J Fleet, Radu Soricut, et al. Imagen editor and editbench: Advancing and evaluating text-guided image inpainting. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 18359–18369, 2023

  69. [71]

    Genartist: Multimodal llm as an agent for unified image generation and editing

    Zhenyu Wang, Aoxue Li, Zhenguo Li, and Xihui Liu. Genartist: Multimodal llm as an agent for unified image generation and editing. Advances in Neural Information Processing Systems, 37:128374–128395, 2024

  70. [72]

    Bovik, H.R

    Zhou Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4):600–612, 2004

  71. [73]

    Om- niedit: Building image editing generalist models through specialist supervision

    Cong Wei, Zheyang Xiong, Weiming Ren, Xinrun Du, Ge Zhang, and Wenhu Chen. Om- niedit: Building image editing generalist models through specialist supervision. arXiv preprint arXiv:2411.07199, 2024

  72. [74]

    arXiv preprint arXiv:2412.08573 (2024)

    Ioannis Xarchakos and Theodoros Koukopoulos. Tryoffanyone: Tiled cloth generation from a dressed person. arXiv preprint arXiv:2412.08573, 2024

  73. [75]

    Fakeshield: Explainable image forgery detection and localization via multi-modal large language models

    Zhipei Xu, Xuanyu Zhang, Runyi Li, Zecheng Tang, Qing Huang, and Jian Zhang. Fakeshield: Explainable image forgery detection and localization via multi-modal large language models. arXiv preprint arXiv:2410.02761, 2024

  74. [76]

    Gpt-imgeval: A comprehensive benchmark for diagnosing gpt4o in image generation.arXiv preprint arXiv:2504.02782,

    Zhiyuan Yan, Junyan Ye, Weijia Li, Zilong Huang, Shenghai Yuan, Xiangyang He, Kaiqing Lin, Jun He, Conghui He, and Li Yuan. Gpt-imgeval: A comprehensive benchmark for diagnosing gpt4o in image generation. arXiv preprint arXiv:2504.02782, 2025

  75. [77]

    A preliminary study for gpt-4o on image restoration

    Hao Yang, Yan Yang, Ruikun Zhang, and Liyuan Pan. A preliminary study for gpt-4o on image restoration. arXiv preprint arXiv:2505.05621, 2025

  76. [78]

    IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models

    Hu Ye, Jun Zhang, Sibo Liu, Xiao Han, and Wei Yang. Ip-adapter: Text compatible image prompt adapter for text-to-image diffusion models. arXiv preprint arXiv:2308.06721, 2023

  77. [79]

    Anyedit: Mastering unified high-quality image editing for any idea

    Qifan Yu, Wei Chow, Zhongqi Yue, Kaihang Pan, Yang Wu, Xiaoyang Wan, Juncheng Li, Siliang Tang, Hanwang Zhang, and Yueting Zhuang. Anyedit: Mastering unified high-quality image editing for any idea. arXiv preprint arXiv:2411.15738, 2024

  78. [80]

    Promptfix: You prompt and we fix the photo

    Yongsheng Yu, Ziyun Zeng, Hang Hua, Jianlong Fu, and Jiebo Luo. Promptfix: You prompt and we fix the photo. arXiv preprint arXiv:2405.16785, 2024. 14

  79. [81]

    Identity-preserving text-to-video generation by frequency decomposition

    Shenghai Yuan, Jinfa Huang, Xianyi He, Yunyuan Ge, Yujun Shi, Liuhan Chen, Jiebo Luo, and Li Yuan. Identity-preserving text-to-video generation by frequency decomposition. arXiv preprint arXiv:2411.17440, 2024

  80. [82]

    Cat-dm: Controllable accelerated virtual try-on with diffusion model

    Jianhao Zeng, Dan Song, Weizhi Nie, Hongshuo Tian, Tongtong Wang, and An-An Liu. Cat-dm: Controllable accelerated virtual try-on with diffusion model. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8372–8382, 2024

Showing first 80 references.