pith. machine review for the scientific record. sign in

arxiv: 2604.07900 · v1 · submitted 2026-04-09 · 💻 cs.CV · cs.AI

Recognition: unknown

AnomalyAgent: Agentic Industrial Anomaly Synthesis via Tool-Augmented Reinforcement Learning

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:29 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords anomaly synthesisindustrial defect detectionreinforcement learningagentic workflowssynthetic data generationtool-augmented agentsself-reflection training
0
0 comments X

The pith

An AI agent with specialized tools and reinforcement learning generates more realistic industrial anomalies by learning from real-image trajectories.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper frames industrial anomaly synthesis as an agentic task where a model iteratively refines outputs instead of generating in one pass. It extracts structured trajectories from real anomaly images and trains the agent first by imitation, then by reinforcement learning that rewards anomaly quality and placement, prompt self-improvement, and faithful adherence to effective sequences. Five tools let the agent create prompts, generate images, evaluate quality, retrieve knowledge, and produce masks in a closed loop. This produces anomalies that improve downstream detectors more than prior zero-shot approaches. A reader would care because reliable synthetic data directly addresses the scarcity of real defect examples in manufacturing inspection systems.

Core claim

AnomalyAgent treats anomaly synthesis as a multi-step reasoning process: an agent equipped with prompt generation, image generation, quality evaluation, knowledge retrieval, and mask generation tools is trained on structured trajectories from real anomalies via supervised fine-tuning followed by reinforcement learning driven by task, reflection, and behavioral rewards, yielding anomalies with greater semantic realism.

What carries the argument

A tool-augmented agent trained in two stages on trajectories from real anomalies, using a three-part reward mechanism to supervise quality and location, encourage prompt refinement, and enforce trajectory adherence.

If this is right

  • Detectors trained on the generated anomalies reach 57 percent classification accuracy with a standard ResNet and near-perfect image-level detection with a simple UNet.
  • The closed-loop process with self-reflection produces anomalies that surpass all prior zero-shot synthesis techniques on standard benchmarks.
  • Iterative refinement via quality evaluation and knowledge retrieval reduces the semantic gaps common in one-step generative approaches.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The trajectory-plus-reward structure could let similar agents generate training data for other data-scarce visual tasks without hand-crafted rules.
  • Rewarding prompt improvement may allow the agent to discover effective descriptions for rare or novel defect types autonomously.
  • Extending the same training loop to video sequences or multi-view images could address anomaly detection in dynamic industrial settings.

Load-bearing premise

The three-part reward mechanism applied to trajectories from real anomalies will consistently produce synthetic anomalies that are both semantically realistic and free of systematic biases or artifacts that could harm downstream detectors.

What would settle it

A controlled test in which detectors trained solely on the agent's outputs show equal or lower classification and localization performance on real industrial images than detectors trained on outputs from single-step synthesis methods.

Figures

Figures reproduced from arXiv: 2604.07900 by Haoyu Sun, Jiaming Su, Linfeng Zhang, Ruikang Zhang, Tengchao Yang, Zhengan Yan.

Figure 1
Figure 1. Figure 1: Motivation of AnomalyAgent. Few-shot methods [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of AnomalyAgent. Given a normal image, the agent iteratively invokes tools (PG, IG, QE, KR, MG) through a [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Pipeline of trajectory construction and taxonomy. Given an anomaly image, we generate multi-turn trajectories [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Training dynamics of AnomalyAgent. Left: SFT loss [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Visualization of anomaly synthesis results on MVTec-AD. AnomalyAgent achieves higher semantic consistency and [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Comparison of efficiency and cost-effectiveness. Bubble size indicates the Inception Score (IS). Our AnomalyAgent [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Case Study 1. Satisfactory anomaly images are obtained with a single IG call. [PITH_FULL_IMAGE:figures/full_fig_p020_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Case Study 2. After an initial low-quality generation, the prompt is refined based on QE feedback to produce satisfactory [PITH_FULL_IMAGE:figures/full_fig_p021_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Case Study 3. After an initial low-quality generation, the prompt is refined using both KG retrieval and QE feedback, [PITH_FULL_IMAGE:figures/full_fig_p022_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Generates anomaly images for various defect types in MVTec-AD. Each image is displayed along with its corresponding [PITH_FULL_IMAGE:figures/full_fig_p023_10.png] view at source ↗
read the original abstract

Industrial anomaly generation is a crucial method for alleviating the data scarcity problem in anomaly detection tasks. Most existing anomaly synthesis methods rely on single-step generation mechanisms, lacking complex reasoning and iterative optimization capabilities, making it difficult to generate anomaly samples with high semantic realism. We propose AnomalyAgent, an anomaly synthesis agent with self-reflection, knowledge retrieval, and iterative refinement capabilities, aiming to generate realistic and diverse anomalies. Specifically, AnomalyAgent is equipped with five tools: Prompt Generation (PG), Image Generation (IG), Quality Evaluation (QE), Knowledge Retrieval (KR), and Mask Generation (MG), enabling closed-loop optimization. To improve decision-making and self-reflection, we construct structured trajectories from real anomaly images and design a two-stage training framework: supervised fine-tuning followed by reinforcement learning. This process is driven by a three-part reward mechanism: (1) task rewards to supervise the quality and location rationality of generated anomalies; (2) reflection rewards to train the model's ability to improve anomaly synthesis prompt; (3) behavioral rewards to ensure adherence to the trajectory. On the MVTec-AD dataset, AnomalyAgent achieves IS/IC-L of 2.10/0.33 for anomaly generation, 57.0% classification accuracy using ResNet34, and 99.3%/74.2% AP at the image/pixel level using a simple UNet, surpassing all zero-shot SOTA methods. The code and data will be made publicly available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes AnomalyAgent, a tool-augmented reinforcement learning agent for industrial anomaly synthesis. It equips the agent with five tools (Prompt Generation, Image Generation, Quality Evaluation, Knowledge Retrieval, Mask Generation) and constructs structured trajectories from real anomaly images to enable a two-stage supervised fine-tuning followed by RL training. A three-part reward (task rewards for quality/location, reflection rewards for prompt improvement, behavioral rewards for trajectory adherence) drives iterative refinement. On MVTec-AD, it reports anomaly generation scores of IS/IC-L 2.10/0.33, 57.0% ResNet34 classification accuracy, and 99.3%/74.2% image/pixel AP with a UNet, claiming to surpass zero-shot SOTA methods. Code and data will be released.

Significance. If the performance gains hold under fair comparisons, the agentic framework with closed-loop tool use and self-reflection could meaningfully advance anomaly synthesis beyond single-step generative models, providing a path to more semantically realistic and diverse samples for data-scarce industrial detection tasks. The two-stage training and multi-component reward design represent a structured approach to incorporating reasoning into synthesis.

major comments (2)
  1. [Abstract] Abstract: The central claim that AnomalyAgent surpasses all zero-shot SOTA methods rests on the reported MVTec-AD metrics (IS/IC-L 2.10/0.33, 57.0% ResNet34 accuracy, 99.3%/74.2% AP). However, the method explicitly constructs structured trajectories from real anomaly images for SFT+RL training, while zero-shot baselines generate without access to real anomalous samples. Without an ablation removing the real-trajectory component or an explicit statement that baselines were given equivalent data, the superiority cannot be attributed to the agentic tools or RL loop rather than privileged real-data access.
  2. [Abstract] Abstract and methods description: The quantitative results for downstream ResNet34 and UNet evaluations lack details on experimental protocols, including how generated samples are integrated (e.g., number of synthetic anomalies per class, training hyperparameters, baseline implementations, and controls for overfitting to the constructed trajectories). These omissions make it impossible to assess whether the reported 57.0% accuracy and 99.3%/74.2% AP are robust or reproducible.
minor comments (1)
  1. [Abstract] The abstract introduces IS/IC-L without a definition or reference to the section where these metrics are formally defined; add a brief parenthetical or citation for clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to improve clarity on comparisons and experimental reproducibility.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that AnomalyAgent surpasses all zero-shot SOTA methods rests on the reported MVTec-AD metrics (IS/IC-L 2.10/0.33, 57.0% ResNet34 classification accuracy, and 99.3%/74.2% AP). However, the method explicitly constructs structured trajectories from real anomaly images for SFT+RL training, while zero-shot baselines generate without access to real anomalous samples. Without an ablation removing the real-trajectory component or an explicit statement that baselines were given equivalent data, the superiority cannot be attributed to the agentic tools or RL loop rather than privileged real-data access.

    Authors: We appreciate this observation on the comparison validity. The term 'zero-shot' in our paper refers to single-step generative methods without iterative tool use or self-reflection. Our training does use real-anomaly trajectories to learn agent behavior, which is a standard supervised setup for such agents but creates a data-access asymmetry. We will revise the abstract to explicitly distinguish this and add an ablation training the agent without real trajectories (using only normal images or random prompts) to isolate the agentic contribution. We will also document data access for all baselines. revision: yes

  2. Referee: [Abstract] Abstract and methods description: The quantitative results for downstream ResNet34 and UNet evaluations lack details on experimental protocols, including how generated samples are integrated (e.g., number of synthetic anomalies per class, training hyperparameters, baseline implementations, and controls for overfitting to the constructed trajectories). These omissions make it impossible to assess whether the reported 57.0% accuracy and 99.3%/74.2% AP are robust or reproducible.

    Authors: We agree that more protocol details are required for reproducibility. The revised manuscript will add a dedicated experimental subsection specifying: the exact number of synthetic anomalies per class and mixing ratios with normal samples, all ResNet34 and UNet hyperparameters, baseline implementation details and sources, and controls such as trajectory shuffling or evaluation on held-out anomaly types to address potential overfitting. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper constructs structured trajectories from real anomaly images for two-stage SFT+RL training and defines independent task/reflection/behavioral rewards to optimize generation quality and adherence. Reported metrics (IS/IC-L 2.10/0.33, 57% ResNet34 accuracy, 99.3/74.2% AP on UNet) are separate downstream evaluations on the MVTec-AD benchmark using standard detectors; these quantities are not equivalent to the reward functions or trajectory inputs by construction. No self-definitional reductions, fitted inputs renamed as predictions, load-bearing self-citations, or ansatz smuggling appear in the abstract or described method. The derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on standard assumptions from reinforcement learning and generative modeling rather than new postulates; no free parameters, axioms, or invented entities are explicitly introduced beyond the agent architecture itself.

axioms (1)
  • domain assumption Reinforcement learning with the described three-part reward can optimize tool-use policies for generating semantically realistic anomalies
    Invoked in the two-stage training framework description where SFT is followed by RL driven by task, reflection, and behavioral rewards.

pith-pipeline@v0.9.0 · 5580 in / 1306 out tokens · 60999 ms · 2026-05-10T18:29:00.068621+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

59 extracted references · 34 canonical work pages · 12 internal anchors

  1. [1]

    Paul Bergmann, Kilian Batzner, Michael Fauser, David Sattlegger, and Carsten Steger. 2021. The MVTec Anomaly Detection Dataset: A Comprehensive Real- World Dataset for Unsupervised Anomaly Detection.Int. J. Comput. Vis.129, 4 (2021), 1038–1059. doi:10.1007/S11263-020-01400-4

  2. [2]

    Zhewei Dai, Shilei Zeng, Haotian Liu, Xurui Li, Feng Xue, and Yu Zhou

  3. [3]

    arXiv:2410.14987 doi:10.48550/ARXIV.2410.14987

    SeaS: Few-shot Industrial Anomaly Image Generation with Separa- tion and Sharing Fine-tuning.CoRRabs/2410.14987 (2024). arXiv:2410.14987 doi:10.48550/ARXIV.2410.14987

  4. [4]

    DeepSeek-AI. 2025. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning.CoRRabs/2501.12948 (2025). arXiv:2501.12948 doi:10.48550/ARXIV.2501.12948

  5. [5]

    Yuxuan Duan, Yan Hong, Li Niu, and Liqing Zhang. 2023. Few-Shot Defect Image Generation via Defect-Aware Feature Manipulation. InThirty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence, IAAI 2023, Thirteenth Symposium on Educational Advances in Artificial Intellige...

  6. [6]

    Bin-Bin Gao. 2024. MetaUAS: Universal Anomaly Segmentation with One- Prompt Meta-Learning. InAdvances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024, Amir Globersons, Lester Mackey, Danielle Belgrave, Angela Fan, Ulrich Paquet, Jakub ...

  7. [7]

    Generative Adversarial Networks

    Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde- Farley, Sherjil Ozair, Aaron C. Courville, and Yoshua Bengio. 2014. Generative Adversarial Networks.CoRRabs/1406.2661 (2014). arXiv:1406.2661 http://arxiv. org/abs/1406.2661

  8. [8]

    Guan Gui, Bin-Bin Gao, Jun Liu, Chengjie Wang, and Yunsheng Wu. 2024. Few- Shot Anomaly-Driven Generation for Anomaly Classification and Segmentation. InComputer Vision - ECCV 2024 - 18th European Conference, Milan, Italy, September 29-October 4, 2024, Proceedings, Part LXXXIII (Lecture Notes in Computer Science, Vol. 15141), Ales Leonardis, Elisa Ricci, ...

  9. [9]

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising Diffusion Proba- bilistic Models. InAdvances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, Decem- ber 6-12, 2020, virtual, Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Ed...

  10. [10]

    Jack Hong, Chenxiao Zhao, ChengLin Zhu, Weiheng Lu, Guohai Xu, and Xing Yu. 2025. DeepEyesV2: Toward Agentic Multimodal Model.CoRRabs/2511.05271 (2025). arXiv:2511.05271 doi:10.48550/ARXIV.2511.05271

  11. [11]

    Teng Hu, Jiangning Zhang, Ran Yi, Yuzhen Du, Xu Chen, Liang Liu, Yabiao Wang, and Chengjie Wang. 2024. AnomalyDiffusion: Few-Shot Anomaly Image Generation with Diffusion Model. InThirty-Eighth AAAI Conference on Artificial Intelligence, AAAI 2024, Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence, IAAI 2024, Fourteenth Symposiu...

  12. [12]

    Wenxuan Huang, Bohan Jia, Zijie Zhai, Shaosheng Cao, Zheyu Ye, Fei Zhao, Zhe Xu, Yao Hu, and Shaohui Lin. 2025. Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models.CoRRabs/2503.06749 (2025). arXiv:2503.06749 doi:10.48550/ARXIV.2503.06749

  13. [13]

    Ying Jin, Jinlong Peng, Qingdong He, Teng Hu, Jiafu Wu, Hao Chen, Haoxuan Wang, Wenbing Zhu, Mingmin Chi, Jun Liu, and Yabiao Wang. 2025. Dual- Interrelated Diffusion Model for Few-Shot Anomaly Image Generation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2025, Nashville, TN, USA, June 11-15, 2025. Computer Vision Foundation / ...

  14. [14]

    Chun-Liang Li, Kihyuk Sohn, Jinsung Yoon, and Tomas Pfister. 2021. CutPaste: Self-Supervised Learning for Anomaly Detection and Localization. InIEEE Con- ference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25,

  15. [15]
  16. [16]

    Dongyun Lin, Yanpeng Cao, Wenbin Zhu, and Yiqun Li. 2021. Few-Shot Defect Segmentation Leveraging Abundant Defect-Free Training Samples Through Normal Background Regularization And Crop-And-Paste Operation. In2021 IEEE International Conference on Multimedia and Expo, ICME 2021, Shenzhen, China, July 5-9, 2021. IEEE, 1–6. doi:10.1109/ICME51207.2021.9428468

  17. [17]

    OpenAI. 2024. OpenAI o1 System Card.CoRRabs/2412.16720 (2024). arXiv:2412.16720 doi:10.48550/ARXIV.2412.16720

  18. [18]

    Liangming Pan, Michael Saxon, Wenda Xu, Deepak Nathani, Xinyi Wang, and William Yang Wang. 2023. Automatically Correcting Large Language Models: Sur- veying the landscape of diverse self-correction strategies.CoRRabs/2308.03188 (2023). arXiv:2308.03188 doi:10.48550/ARXIV.2308.03188

  19. [19]

    Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, Sihan Zhao, Lauren Hong, Runchu Tian, Ruobing Xie, Jie Zhou, Mark Gerstein, Dahai Li, Zhiyuan Liu, and Maosong Sun

  20. [20]

    InThe Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024

    ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs. InThe Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net. https://openreview.net/ forum?id=dHng2O0Jjr

  21. [21]

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-Resolution Image Synthesis with Latent Diffusion Models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022. IEEE, 10674–10685. doi:10.1109/CVPR52688. 2022.01042

  22. [22]

    In: Computer Vision – ECCV 2022 Workshops, pp

    Hannah M. Schlüter, Jeremy Tan, Benjamin Hou, and Bernhard Kainz. 2022. Natu- ral Synthetic Anomalies for Self-supervised Anomaly Detection and Localization. InComputer Vision - ECCV 2022 - 17th European Conference, Tel A viv, Israel, October 23-27, 2022, Proceedings, Part XXXI (Lecture Notes in Computer Science, Vol. 13691), Shai Avidan, Gabriel J. Brost...

  23. [23]

    John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov

  24. [24]

    Proximal Policy Optimization Algorithms

    Proximal Policy Optimization Algorithms.CoRRabs/1707.06347 (2017). arXiv:1707.06347 http://arxiv.org/abs/1707.06347

  25. [25]

    Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Mingchuan Zhang, Y. K. Li, Y. Wu, and Daya Guo. 2024. DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models.CoRRabs/2402.03300 (2024). arXiv:2402.03300 doi:10.48550/ARXIV.2402.03300

  26. [26]

    Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. 2023. Reflexion: language agents with verbal reinforcement learning. InAdvances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023, Alice Oh, Tristan Nauman...

  27. [27]

    Yulim So and Seokho Kang. 2025. AnoStyler: Text-Driven Localized Anom- aly Generation via Lightweight Style Transfer.CoRRabs/2511.06687 (2025). arXiv:2511.06687 doi:10.48550/ARXIV.2511.06687

  28. [28]

    Jaewoo Song, Daemin Park, Kanghyun Baek, Sangyub Lee, Jooyoung Choi, Eunji Kim, and Sungroh Yoon. 2025. DefectFill: Realistic Defect Generation with Inpainting Diffusion Model for Visual Inspection. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2025, Nashville, TN, USA, June 11-15, 2025. Computer Vision Foundation / IEEE, 18718–18...

  29. [29]

    Alex Su, Haozhe Wang, Weiming Ren, Fangzhen Lin, and Wenhu Chen. 2025. Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning.CoRRabs/2505.15966 (2025). arXiv:2505.15966 doi:10.48550/ARXIV.2505.15966

  30. [30]

    Han Sun, Yunkang Cao, Hao Dong, and Olga Fink. 2025. Unseen Visual Anomaly Generation. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2025, Nashville, TN, USA, June 11-15, 2025. Computer Vision Foundation / IEEE, 25508–25517. doi:10.1109/CVPR52734.2025.02375

  31. [31]

    Qwen Team. 2025. Qwen3-VL Technical Report.CoRRabs/2511.21631 (2025). arXiv:2511.21631 doi:10.48550/ARXIV.2511.21631

  32. [32]

    Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. 2024. Voyager: An Open-Ended Embodied Agent with Large Language Models.Trans. Mach. Learn. Res.2024 (2024). https: //openreview.net/forum?id=ehfRiF0R3a

  33. [33]

    Chi, Quoc V

    Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. 2022. Chain-of-Thought Prompt- ing Elicits Reasoning in Large Language Models. InAdvances in Neural Infor- mation Processing Systems 35: Annual Conference on Neural Information Pro- cessing Systems 2022, NeurIPS 2022, New Orleans, LA, USA,...

  34. [34]

    Yifan Wei, Xiaoyan Yu, Yixuan Weng, Tengfei Pan, Angsheng Li, and Li Du. 2025. AutoTIR: Autonomous Tools Integrated Reasoning via Reinforcement Learning. CoRRabs/2507.21836 (2025). arXiv:2507.21836 doi:10.48550/ARXIV.2507.21836

  35. [35]

    Xichen Xu, Yanshu Wang, Yawen Huang, Jiaqi Liu, Xiaoning Lei, Guoyang Xie, Guannan Jiang, and Zhichao Lu. 2025. A Survey on Industrial Anomalies Synthesis.CoRRabs/2502.16412 (2025). arXiv:2502.16412 doi:10.48550/ARXIV. 2502.16412

  36. [36]

    Zhenghai Xue, Longtao Zheng, Qian Liu, Yingru Li, Xiaosen Zheng, Zejun Ma, and Bo An. 2025. SimpleTIR: End-to-End Reinforcement Learning for Multi- Turn Tool-Integrated Reasoning.CoRRabs/2509.02479 (2025). arXiv:2509.02479 doi:10.48550/ARXIV.2509.02479

  37. [37]

    Sikuan Yan, Xiufeng Yang, Zuchao Huang, Ercong Nie, Zifeng Ding, Zonggen Li, Xiaowen Ma, Hinrich Schütze, Volker Tresp, and Yunpu Ma. 2025. Memory- R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning.CoRRabs/2508.19828 (2025). arXiv:2508.19828 doi:10.48550/ARXIV.2508.19828

  38. [38]

    Shuai Yang, Zhifei Chen, Pengguang Chen, Xi Fang, Yixun Liang, Shu Liu, and Yingcong Chen. 2024. Defect Spectrum: A Granular Look of Large-Scale Defect Datasets with Rich Semantics. InComputer Vision - ECCV 2024 - 18th European Conference, Milan, Italy, September 29-October 4, 2024, Proceedings, Part VII (Lec- ture Notes in Computer Science, Vol. 15065), ...

  39. [39]

    Narasimhan, and Yuan Cao

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R. Narasimhan, and Yuan Cao. 2023. ReAct: Synergizing Reasoning and Act- ing in Language Models. InThe Eleventh International Conference on Learn- ing Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net. https://openreview.net/forum?id=WE_vluYUL-X

  40. [40]

    Vitjan Zavrtanik, Matej Kristan, and Danijel Skocaj. 2021. DRÆM - A discrimina- tively trained reconstruction embedding for surface anomaly detection. In2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021. IEEE, 8310–8319. doi:10.1109/ICCV48922.2021.00822

  41. [41]

    Gongjie Zhang, Kaiwen Cui, Tzu-Yi Hung, and Shijian Lu. 2021. Defect-GAN: High-Fidelity Defect Synthesis for Automated Defect Inspection. InIEEE Winter Conference on Applications of Computer Vision, W ACV 2021, Waikoloa, HI, USA, January 3-8, 2021. IEEE, 2523–2533. doi:10.1109/WACV48630.2021.00257

  42. [42]

    Guibin Zhang, Hejia Geng, Xiaohang Yu, Zhenfei Yin, Zaibin Zhang, Zelin Tan, Heng Zhou, Zhong-Zhi Li, Xiangyuan Xue, Yijiang Li, Yifan Zhou, Yang Chen, Chen Zhang, Yutao Fan, Zihu Wang, Songtao Huang, Francisco Piedrahita Velez, Yue Liao, Hongru Wang, Mengyue Yang, Heng Ji, Jun Wang, Shuicheng Yan, Philip Torr, and Lei Bai. 2026. The Landscape of Agentic ...

  43. [43]

    Hanchen Zhang, Xiao Liu, Bowen Lv, Xueqiao Sun, Bohao Jing, Iat Long Iong, Zhenyu Hou, Zehan Qi, Hanyu Lai, Yifan Xu, Rui Lu, Hongning Wang, Jie Tang, and Yuxiao Dong. 2025. AgentRL: Scaling Agentic Reinforcement Learn- ing with a Multi-Turn, Multi-Task Framework.CoRRabs/2510.04206 (2025). arXiv:2510.04206 doi:10.48550/ARXIV.2510.04206

  44. [44]

    Ximiao Zhang, Min Xu, and Xiuzhuang Zhou. 2024. RealNet: A Feature Selection Network with Realistic Synthetic Anomaly for Anomaly Detection. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024, Seattle, W A, USA, June 16-22, 2024. IEEE, 16699–16708. doi:10.1109/CVPR52733.2024.01580

  45. [45]

    Ying Zhao. 2025. AnomalyHybrid: A Domain-agnostic Generative Framework for General Anomaly Detection. InIEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2025, Nashville, TN, USA, June 11-15, 2025. Computer Vision Foundation / IEEE, 3127–3136. https://openaccess.thecvf.com/content/CVPR2025W/SyntaGen/html/Zhao_ Anomal...

  46. [46]

    Xing, Hao Zhang, Joseph E

    Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric P. Xing, Hao Zhang, Joseph E. Gonzalez, and Ion Stoica. 2023. Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena. InAdvances in Neural Information Pro- cessing Systems 36: Annual Conference on Neural Information Processing Sys- t...

  47. [47]

    DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning

    Ziwei Zheng, Michael Yang, Jack Hong, Chenxiao Zhao, Guohai Xu, Le Yang, Chao Shen, and Xing Yu. 2025. DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning.CoRRabs/2505.14362 (2025). arXiv:2505.14362 doi:10.48550/ARXIV.2505.14362

  48. [48]

    Zijian Zhou, Ao Qu, Zhaoxuan Wu, Sunghwan Kim, Alok Prakash, Daniela Rus, Jinhua Zhao, Bryan Kian Hsiang Low, and Paul Pu Liang. 2025. MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents.CoRR abs/2506.15841 (2025). arXiv:2506.15841 doi:10.48550/ARXIV.2506.15841

  49. [49]

    Yang Zou, Jongheon Jeong, Latha Pemula, Dongqing Zhang, and Onkar Dabeer

  50. [50]

    Good Sample

    SPot-the-Difference Self-supervised Pre-training for Anomaly Detection and Segmentation. InComputer Vision - ECCV 2022 - 17th European Conference, Tel A viv, Israel, October 23-27, 2022, Proceedings, Part XXX (Lecture Notes in Computer Science, Vol. 13690), Shai Avidan, Gabriel J. Brostow, Moustapha Cissé, Giovanni Maria Farinella, and Tal Hassner (Eds.)....

  51. [51]

    The anomaly must be placed where it would naturally occur in a real industrial scenario (e.g., scratches on contact surfaces, cracks at stress points)

    **Strategic Localization (Top Priority)**: Before generating, infer the most **physically and semantically plausible location** for the {anomaly_type} on the {item_name}. The anomaly must be placed where it would naturally occur in a real industrial scenario (e.g., scratches on contact surfaces, cracks at stress points)

  52. [52]

    Keep the rest of the image, including background, lighting, and global geometry, completely unchanged

    **Strict Local Editing Format (Top Priority)**: The prompt MUST start with: **"Using the provided image, change only [the specific localized area] to introduce [the anomaly]. Keep the rest of the image, including background, lighting, and global geometry, completely unchanged. "**

  53. [53]

    {anomaly_type}

    **Hyper-Specific Realism**: - Describe the exact **texture interaction**. - Define a **limited spatial extent**: The defect should be small, localized, and subtle, not overwhelming the object. - Use positive semantic constraints for industrial realism, not artistic flair. AnomalyAgent: Agentic Industrial Anomaly Synthesis via Tool-Augmented Reinforcement ...

  54. [54]

    Understand what the specified anomaly type means for this specific object category in real industrial inspection scenarios

  55. [55]

    Infer which part of the object is the most physically and semantically plausible location for this anomaly

  56. [56]

    Determine how the anomaly should visually appear: - shape and structure - texture interaction with the object material - contrast, scale, and severity

  57. [57]

    Using the provided image, change only ... Keep the rest of the image unchanged

    Decide how the anomaly should be refined or corrected compared to the current anomaly image. # Prompt construction rules (VERY IMPORTANT): - The prompt MUST follow a local image editing style, such as: "Using the provided image, change only ... Keep the rest of the image unchanged. " - Only describe what should be edited, never describe global or stylisti...

  58. [58]

    **Location Reasonableness (Score 0-5)**: Evaluate whether the anomaly is placed on a physically valid and semantically correct part of the object, aligned with object geometry, and not floating in the background or crossing irrelevant regions

  59. [59]

    location_score

    **Quality Acceptability (Score 0-5)**: Evaluate whether the anomaly appears realistic in texture, scale, contrast, and integration with surrounding material, without obvious artifacts or signs of artificial overlay. **Scoring Guide**: - **5**: Perfect, indistinguishable from real samples. - **3-4**: Minor flaws but generally plausible. - **1-2**: Signific...