Recognition: unknown
AnomalyAgent: Agentic Industrial Anomaly Synthesis via Tool-Augmented Reinforcement Learning
Pith reviewed 2026-05-10 18:29 UTC · model grok-4.3
The pith
An AI agent with specialized tools and reinforcement learning generates more realistic industrial anomalies by learning from real-image trajectories.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
AnomalyAgent treats anomaly synthesis as a multi-step reasoning process: an agent equipped with prompt generation, image generation, quality evaluation, knowledge retrieval, and mask generation tools is trained on structured trajectories from real anomalies via supervised fine-tuning followed by reinforcement learning driven by task, reflection, and behavioral rewards, yielding anomalies with greater semantic realism.
What carries the argument
A tool-augmented agent trained in two stages on trajectories from real anomalies, using a three-part reward mechanism to supervise quality and location, encourage prompt refinement, and enforce trajectory adherence.
If this is right
- Detectors trained on the generated anomalies reach 57 percent classification accuracy with a standard ResNet and near-perfect image-level detection with a simple UNet.
- The closed-loop process with self-reflection produces anomalies that surpass all prior zero-shot synthesis techniques on standard benchmarks.
- Iterative refinement via quality evaluation and knowledge retrieval reduces the semantic gaps common in one-step generative approaches.
Where Pith is reading between the lines
- The trajectory-plus-reward structure could let similar agents generate training data for other data-scarce visual tasks without hand-crafted rules.
- Rewarding prompt improvement may allow the agent to discover effective descriptions for rare or novel defect types autonomously.
- Extending the same training loop to video sequences or multi-view images could address anomaly detection in dynamic industrial settings.
Load-bearing premise
The three-part reward mechanism applied to trajectories from real anomalies will consistently produce synthetic anomalies that are both semantically realistic and free of systematic biases or artifacts that could harm downstream detectors.
What would settle it
A controlled test in which detectors trained solely on the agent's outputs show equal or lower classification and localization performance on real industrial images than detectors trained on outputs from single-step synthesis methods.
Figures
read the original abstract
Industrial anomaly generation is a crucial method for alleviating the data scarcity problem in anomaly detection tasks. Most existing anomaly synthesis methods rely on single-step generation mechanisms, lacking complex reasoning and iterative optimization capabilities, making it difficult to generate anomaly samples with high semantic realism. We propose AnomalyAgent, an anomaly synthesis agent with self-reflection, knowledge retrieval, and iterative refinement capabilities, aiming to generate realistic and diverse anomalies. Specifically, AnomalyAgent is equipped with five tools: Prompt Generation (PG), Image Generation (IG), Quality Evaluation (QE), Knowledge Retrieval (KR), and Mask Generation (MG), enabling closed-loop optimization. To improve decision-making and self-reflection, we construct structured trajectories from real anomaly images and design a two-stage training framework: supervised fine-tuning followed by reinforcement learning. This process is driven by a three-part reward mechanism: (1) task rewards to supervise the quality and location rationality of generated anomalies; (2) reflection rewards to train the model's ability to improve anomaly synthesis prompt; (3) behavioral rewards to ensure adherence to the trajectory. On the MVTec-AD dataset, AnomalyAgent achieves IS/IC-L of 2.10/0.33 for anomaly generation, 57.0% classification accuracy using ResNet34, and 99.3%/74.2% AP at the image/pixel level using a simple UNet, surpassing all zero-shot SOTA methods. The code and data will be made publicly available.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes AnomalyAgent, a tool-augmented reinforcement learning agent for industrial anomaly synthesis. It equips the agent with five tools (Prompt Generation, Image Generation, Quality Evaluation, Knowledge Retrieval, Mask Generation) and constructs structured trajectories from real anomaly images to enable a two-stage supervised fine-tuning followed by RL training. A three-part reward (task rewards for quality/location, reflection rewards for prompt improvement, behavioral rewards for trajectory adherence) drives iterative refinement. On MVTec-AD, it reports anomaly generation scores of IS/IC-L 2.10/0.33, 57.0% ResNet34 classification accuracy, and 99.3%/74.2% image/pixel AP with a UNet, claiming to surpass zero-shot SOTA methods. Code and data will be released.
Significance. If the performance gains hold under fair comparisons, the agentic framework with closed-loop tool use and self-reflection could meaningfully advance anomaly synthesis beyond single-step generative models, providing a path to more semantically realistic and diverse samples for data-scarce industrial detection tasks. The two-stage training and multi-component reward design represent a structured approach to incorporating reasoning into synthesis.
major comments (2)
- [Abstract] Abstract: The central claim that AnomalyAgent surpasses all zero-shot SOTA methods rests on the reported MVTec-AD metrics (IS/IC-L 2.10/0.33, 57.0% ResNet34 accuracy, 99.3%/74.2% AP). However, the method explicitly constructs structured trajectories from real anomaly images for SFT+RL training, while zero-shot baselines generate without access to real anomalous samples. Without an ablation removing the real-trajectory component or an explicit statement that baselines were given equivalent data, the superiority cannot be attributed to the agentic tools or RL loop rather than privileged real-data access.
- [Abstract] Abstract and methods description: The quantitative results for downstream ResNet34 and UNet evaluations lack details on experimental protocols, including how generated samples are integrated (e.g., number of synthetic anomalies per class, training hyperparameters, baseline implementations, and controls for overfitting to the constructed trajectories). These omissions make it impossible to assess whether the reported 57.0% accuracy and 99.3%/74.2% AP are robust or reproducible.
minor comments (1)
- [Abstract] The abstract introduces IS/IC-L without a definition or reference to the section where these metrics are formally defined; add a brief parenthetical or citation for clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to improve clarity on comparisons and experimental reproducibility.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that AnomalyAgent surpasses all zero-shot SOTA methods rests on the reported MVTec-AD metrics (IS/IC-L 2.10/0.33, 57.0% ResNet34 classification accuracy, and 99.3%/74.2% AP). However, the method explicitly constructs structured trajectories from real anomaly images for SFT+RL training, while zero-shot baselines generate without access to real anomalous samples. Without an ablation removing the real-trajectory component or an explicit statement that baselines were given equivalent data, the superiority cannot be attributed to the agentic tools or RL loop rather than privileged real-data access.
Authors: We appreciate this observation on the comparison validity. The term 'zero-shot' in our paper refers to single-step generative methods without iterative tool use or self-reflection. Our training does use real-anomaly trajectories to learn agent behavior, which is a standard supervised setup for such agents but creates a data-access asymmetry. We will revise the abstract to explicitly distinguish this and add an ablation training the agent without real trajectories (using only normal images or random prompts) to isolate the agentic contribution. We will also document data access for all baselines. revision: yes
-
Referee: [Abstract] Abstract and methods description: The quantitative results for downstream ResNet34 and UNet evaluations lack details on experimental protocols, including how generated samples are integrated (e.g., number of synthetic anomalies per class, training hyperparameters, baseline implementations, and controls for overfitting to the constructed trajectories). These omissions make it impossible to assess whether the reported 57.0% accuracy and 99.3%/74.2% AP are robust or reproducible.
Authors: We agree that more protocol details are required for reproducibility. The revised manuscript will add a dedicated experimental subsection specifying: the exact number of synthetic anomalies per class and mixing ratios with normal samples, all ResNet34 and UNet hyperparameters, baseline implementation details and sources, and controls such as trajectory shuffling or evaluation on held-out anomaly types to address potential overfitting. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper constructs structured trajectories from real anomaly images for two-stage SFT+RL training and defines independent task/reflection/behavioral rewards to optimize generation quality and adherence. Reported metrics (IS/IC-L 2.10/0.33, 57% ResNet34 accuracy, 99.3/74.2% AP on UNet) are separate downstream evaluations on the MVTec-AD benchmark using standard detectors; these quantities are not equivalent to the reward functions or trajectory inputs by construction. No self-definitional reductions, fitted inputs renamed as predictions, load-bearing self-citations, or ansatz smuggling appear in the abstract or described method. The derivation remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Reinforcement learning with the described three-part reward can optimize tool-use policies for generating semantically realistic anomalies
Reference graph
Works this paper leans on
-
[1]
Paul Bergmann, Kilian Batzner, Michael Fauser, David Sattlegger, and Carsten Steger. 2021. The MVTec Anomaly Detection Dataset: A Comprehensive Real- World Dataset for Unsupervised Anomaly Detection.Int. J. Comput. Vis.129, 4 (2021), 1038–1059. doi:10.1007/S11263-020-01400-4
-
[2]
Zhewei Dai, Shilei Zeng, Haotian Liu, Xurui Li, Feng Xue, and Yu Zhou
-
[3]
arXiv:2410.14987 doi:10.48550/ARXIV.2410.14987
SeaS: Few-shot Industrial Anomaly Image Generation with Separa- tion and Sharing Fine-tuning.CoRRabs/2410.14987 (2024). arXiv:2410.14987 doi:10.48550/ARXIV.2410.14987
-
[4]
DeepSeek-AI. 2025. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning.CoRRabs/2501.12948 (2025). arXiv:2501.12948 doi:10.48550/ARXIV.2501.12948
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2501.12948 2025
-
[5]
Yuxuan Duan, Yan Hong, Li Niu, and Liqing Zhang. 2023. Few-Shot Defect Image Generation via Defect-Aware Feature Manipulation. InThirty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence, IAAI 2023, Thirteenth Symposium on Educational Advances in Artificial Intellige...
-
[6]
Bin-Bin Gao. 2024. MetaUAS: Universal Anomaly Segmentation with One- Prompt Meta-Learning. InAdvances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024, Amir Globersons, Lester Mackey, Danielle Belgrave, Angela Fan, Ulrich Paquet, Jakub ...
2024
-
[7]
Generative Adversarial Networks
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde- Farley, Sherjil Ozair, Aaron C. Courville, and Yoshua Bengio. 2014. Generative Adversarial Networks.CoRRabs/1406.2661 (2014). arXiv:1406.2661 http://arxiv. org/abs/1406.2661
work page internal anchor Pith review arXiv 2014
-
[8]
Guan Gui, Bin-Bin Gao, Jun Liu, Chengjie Wang, and Yunsheng Wu. 2024. Few- Shot Anomaly-Driven Generation for Anomaly Classification and Segmentation. InComputer Vision - ECCV 2024 - 18th European Conference, Milan, Italy, September 29-October 4, 2024, Proceedings, Part LXXXIII (Lecture Notes in Computer Science, Vol. 15141), Ales Leonardis, Elisa Ricci, ...
-
[9]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising Diffusion Proba- bilistic Models. InAdvances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, Decem- ber 6-12, 2020, virtual, Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Ed...
2020
-
[10]
Jack Hong, Chenxiao Zhao, ChengLin Zhu, Weiheng Lu, Guohai Xu, and Xing Yu. 2025. DeepEyesV2: Toward Agentic Multimodal Model.CoRRabs/2511.05271 (2025). arXiv:2511.05271 doi:10.48550/ARXIV.2511.05271
-
[11]
Teng Hu, Jiangning Zhang, Ran Yi, Yuzhen Du, Xu Chen, Liang Liu, Yabiao Wang, and Chengjie Wang. 2024. AnomalyDiffusion: Few-Shot Anomaly Image Generation with Diffusion Model. InThirty-Eighth AAAI Conference on Artificial Intelligence, AAAI 2024, Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence, IAAI 2024, Fourteenth Symposiu...
-
[12]
Wenxuan Huang, Bohan Jia, Zijie Zhai, Shaosheng Cao, Zheyu Ye, Fei Zhao, Zhe Xu, Yao Hu, and Shaohui Lin. 2025. Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models.CoRRabs/2503.06749 (2025). arXiv:2503.06749 doi:10.48550/ARXIV.2503.06749
work page internal anchor Pith review doi:10.48550/arxiv.2503.06749 2025
-
[13]
Ying Jin, Jinlong Peng, Qingdong He, Teng Hu, Jiafu Wu, Hao Chen, Haoxuan Wang, Wenbing Zhu, Mingmin Chi, Jun Liu, and Yabiao Wang. 2025. Dual- Interrelated Diffusion Model for Few-Shot Anomaly Image Generation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2025, Nashville, TN, USA, June 11-15, 2025. Computer Vision Foundation / ...
-
[14]
Chun-Liang Li, Kihyuk Sohn, Jinsung Yoon, and Tomas Pfister. 2021. CutPaste: Self-Supervised Learning for Anomaly Detection and Localization. InIEEE Con- ference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25,
2021
-
[15]
Computer Vision Foundation / IEEE, 9664–9674. doi:10.1109/CVPR46437. 2021.00954
-
[16]
Dongyun Lin, Yanpeng Cao, Wenbin Zhu, and Yiqun Li. 2021. Few-Shot Defect Segmentation Leveraging Abundant Defect-Free Training Samples Through Normal Background Regularization And Crop-And-Paste Operation. In2021 IEEE International Conference on Multimedia and Expo, ICME 2021, Shenzhen, China, July 5-9, 2021. IEEE, 1–6. doi:10.1109/ICME51207.2021.9428468
work page Pith review doi:10.1109/icme51207.2021.9428468 2021
-
[17]
OpenAI. 2024. OpenAI o1 System Card.CoRRabs/2412.16720 (2024). arXiv:2412.16720 doi:10.48550/ARXIV.2412.16720
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2412.16720 2024
-
[18]
Liangming Pan, Michael Saxon, Wenda Xu, Deepak Nathani, Xinyi Wang, and William Yang Wang. 2023. Automatically Correcting Large Language Models: Sur- veying the landscape of diverse self-correction strategies.CoRRabs/2308.03188 (2023). arXiv:2308.03188 doi:10.48550/ARXIV.2308.03188
-
[19]
Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, Sihan Zhao, Lauren Hong, Runchu Tian, Ruobing Xie, Jie Zhou, Mark Gerstein, Dahai Li, Zhiyuan Liu, and Maosong Sun
-
[20]
InThe Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs. InThe Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net. https://openreview.net/ forum?id=dHng2O0Jjr
2024
-
[21]
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-Resolution Image Synthesis with Latent Diffusion Models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022. IEEE, 10674–10685. doi:10.1109/CVPR52688. 2022.01042
-
[22]
In: Computer Vision – ECCV 2022 Workshops, pp
Hannah M. Schlüter, Jeremy Tan, Benjamin Hou, and Bernhard Kainz. 2022. Natu- ral Synthetic Anomalies for Self-supervised Anomaly Detection and Localization. InComputer Vision - ECCV 2022 - 17th European Conference, Tel A viv, Israel, October 23-27, 2022, Proceedings, Part XXXI (Lecture Notes in Computer Science, Vol. 13691), Shai Avidan, Gabriel J. Brost...
-
[23]
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov
-
[24]
Proximal Policy Optimization Algorithms
Proximal Policy Optimization Algorithms.CoRRabs/1707.06347 (2017). arXiv:1707.06347 http://arxiv.org/abs/1707.06347
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[25]
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Mingchuan Zhang, Y. K. Li, Y. Wu, and Daya Guo. 2024. DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models.CoRRabs/2402.03300 (2024). arXiv:2402.03300 doi:10.48550/ARXIV.2402.03300
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2402.03300 2024
-
[26]
Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. 2023. Reflexion: language agents with verbal reinforcement learning. InAdvances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023, Alice Oh, Tristan Nauman...
2023
-
[27]
Yulim So and Seokho Kang. 2025. AnoStyler: Text-Driven Localized Anom- aly Generation via Lightweight Style Transfer.CoRRabs/2511.06687 (2025). arXiv:2511.06687 doi:10.48550/ARXIV.2511.06687
-
[28]
Jaewoo Song, Daemin Park, Kanghyun Baek, Sangyub Lee, Jooyoung Choi, Eunji Kim, and Sungroh Yoon. 2025. DefectFill: Realistic Defect Generation with Inpainting Diffusion Model for Visual Inspection. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2025, Nashville, TN, USA, June 11-15, 2025. Computer Vision Foundation / IEEE, 18718–18...
-
[29]
Alex Su, Haozhe Wang, Weiming Ren, Fangzhen Lin, and Wenhu Chen. 2025. Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning.CoRRabs/2505.15966 (2025). arXiv:2505.15966 doi:10.48550/ARXIV.2505.15966
work page internal anchor Pith review doi:10.48550/arxiv.2505.15966 2025
-
[30]
Han Sun, Yunkang Cao, Hao Dong, and Olga Fink. 2025. Unseen Visual Anomaly Generation. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2025, Nashville, TN, USA, June 11-15, 2025. Computer Vision Foundation / IEEE, 25508–25517. doi:10.1109/CVPR52734.2025.02375
-
[31]
Qwen Team. 2025. Qwen3-VL Technical Report.CoRRabs/2511.21631 (2025). arXiv:2511.21631 doi:10.48550/ARXIV.2511.21631
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2511.21631 2025
-
[32]
Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. 2024. Voyager: An Open-Ended Embodied Agent with Large Language Models.Trans. Mach. Learn. Res.2024 (2024). https: //openreview.net/forum?id=ehfRiF0R3a
2024
-
[33]
Chi, Quoc V
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. 2022. Chain-of-Thought Prompt- ing Elicits Reasoning in Large Language Models. InAdvances in Neural Infor- mation Processing Systems 35: Annual Conference on Neural Information Pro- cessing Systems 2022, NeurIPS 2022, New Orleans, LA, USA,...
2022
-
[34]
Yifan Wei, Xiaoyan Yu, Yixuan Weng, Tengfei Pan, Angsheng Li, and Li Du. 2025. AutoTIR: Autonomous Tools Integrated Reasoning via Reinforcement Learning. CoRRabs/2507.21836 (2025). arXiv:2507.21836 doi:10.48550/ARXIV.2507.21836
-
[35]
Xichen Xu, Yanshu Wang, Yawen Huang, Jiaqi Liu, Xiaoning Lei, Guoyang Xie, Guannan Jiang, and Zhichao Lu. 2025. A Survey on Industrial Anomalies Synthesis.CoRRabs/2502.16412 (2025). arXiv:2502.16412 doi:10.48550/ARXIV. 2502.16412
work page internal anchor Pith review doi:10.48550/arxiv 2025
-
[36]
Zhenghai Xue, Longtao Zheng, Qian Liu, Yingru Li, Xiaosen Zheng, Zejun Ma, and Bo An. 2025. SimpleTIR: End-to-End Reinforcement Learning for Multi- Turn Tool-Integrated Reasoning.CoRRabs/2509.02479 (2025). arXiv:2509.02479 doi:10.48550/ARXIV.2509.02479
-
[37]
Sikuan Yan, Xiufeng Yang, Zuchao Huang, Ercong Nie, Zifeng Ding, Zonggen Li, Xiaowen Ma, Hinrich Schütze, Volker Tresp, and Yunpu Ma. 2025. Memory- R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning.CoRRabs/2508.19828 (2025). arXiv:2508.19828 doi:10.48550/ARXIV.2508.19828
work page internal anchor Pith review doi:10.48550/arxiv.2508.19828 2025
-
[38]
Shuai Yang, Zhifei Chen, Pengguang Chen, Xi Fang, Yixun Liang, Shu Liu, and Yingcong Chen. 2024. Defect Spectrum: A Granular Look of Large-Scale Defect Datasets with Rich Semantics. InComputer Vision - ECCV 2024 - 18th European Conference, Milan, Italy, September 29-October 4, 2024, Proceedings, Part VII (Lec- ture Notes in Computer Science, Vol. 15065), ...
-
[39]
Narasimhan, and Yuan Cao
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R. Narasimhan, and Yuan Cao. 2023. ReAct: Synergizing Reasoning and Act- ing in Language Models. InThe Eleventh International Conference on Learn- ing Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net. https://openreview.net/forum?id=WE_vluYUL-X
2023
-
[40]
Vitjan Zavrtanik, Matej Kristan, and Danijel Skocaj. 2021. DRÆM - A discrimina- tively trained reconstruction embedding for surface anomaly detection. In2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021. IEEE, 8310–8319. doi:10.1109/ICCV48922.2021.00822
-
[41]
Gongjie Zhang, Kaiwen Cui, Tzu-Yi Hung, and Shijian Lu. 2021. Defect-GAN: High-Fidelity Defect Synthesis for Automated Defect Inspection. InIEEE Winter Conference on Applications of Computer Vision, W ACV 2021, Waikoloa, HI, USA, January 3-8, 2021. IEEE, 2523–2533. doi:10.1109/WACV48630.2021.00257
-
[42]
Guibin Zhang, Hejia Geng, Xiaohang Yu, Zhenfei Yin, Zaibin Zhang, Zelin Tan, Heng Zhou, Zhong-Zhi Li, Xiangyuan Xue, Yijiang Li, Yifan Zhou, Yang Chen, Chen Zhang, Yutao Fan, Zihu Wang, Songtao Huang, Francisco Piedrahita Velez, Yue Liao, Hongru Wang, Mengyue Yang, Heng Ji, Jun Wang, Shuicheng Yan, Philip Torr, and Lei Bai. 2026. The Landscape of Agentic ...
2026
-
[43]
Hanchen Zhang, Xiao Liu, Bowen Lv, Xueqiao Sun, Bohao Jing, Iat Long Iong, Zhenyu Hou, Zehan Qi, Hanyu Lai, Yifan Xu, Rui Lu, Hongning Wang, Jie Tang, and Yuxiao Dong. 2025. AgentRL: Scaling Agentic Reinforcement Learn- ing with a Multi-Turn, Multi-Task Framework.CoRRabs/2510.04206 (2025). arXiv:2510.04206 doi:10.48550/ARXIV.2510.04206
-
[44]
Ximiao Zhang, Min Xu, and Xiuzhuang Zhou. 2024. RealNet: A Feature Selection Network with Realistic Synthetic Anomaly for Anomaly Detection. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024, Seattle, W A, USA, June 16-22, 2024. IEEE, 16699–16708. doi:10.1109/CVPR52733.2024.01580
-
[45]
Ying Zhao. 2025. AnomalyHybrid: A Domain-agnostic Generative Framework for General Anomaly Detection. InIEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2025, Nashville, TN, USA, June 11-15, 2025. Computer Vision Foundation / IEEE, 3127–3136. https://openaccess.thecvf.com/content/CVPR2025W/SyntaGen/html/Zhao_ Anomal...
2025
-
[46]
Xing, Hao Zhang, Joseph E
Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric P. Xing, Hao Zhang, Joseph E. Gonzalez, and Ion Stoica. 2023. Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena. InAdvances in Neural Information Pro- cessing Systems 36: Annual Conference on Neural Information Processing Sys- t...
2023
-
[47]
DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning
Ziwei Zheng, Michael Yang, Jack Hong, Chenxiao Zhao, Guohai Xu, Le Yang, Chao Shen, and Xing Yu. 2025. DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning.CoRRabs/2505.14362 (2025). arXiv:2505.14362 doi:10.48550/ARXIV.2505.14362
work page internal anchor Pith review doi:10.48550/arxiv.2505.14362 2025
-
[48]
Zijian Zhou, Ao Qu, Zhaoxuan Wu, Sunghwan Kim, Alok Prakash, Daniela Rus, Jinhua Zhao, Bryan Kian Hsiang Low, and Paul Pu Liang. 2025. MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents.CoRR abs/2506.15841 (2025). arXiv:2506.15841 doi:10.48550/ARXIV.2506.15841
work page internal anchor Pith review doi:10.48550/arxiv.2506.15841 2025
-
[49]
Yang Zou, Jongheon Jeong, Latha Pemula, Dongqing Zhang, and Onkar Dabeer
-
[50]
Good Sample
SPot-the-Difference Self-supervised Pre-training for Anomaly Detection and Segmentation. InComputer Vision - ECCV 2022 - 17th European Conference, Tel A viv, Israel, October 23-27, 2022, Proceedings, Part XXX (Lecture Notes in Computer Science, Vol. 13690), Shai Avidan, Gabriel J. Brostow, Moustapha Cissé, Giovanni Maria Farinella, and Tal Hassner (Eds.)....
2022
-
[51]
The anomaly must be placed where it would naturally occur in a real industrial scenario (e.g., scratches on contact surfaces, cracks at stress points)
**Strategic Localization (Top Priority)**: Before generating, infer the most **physically and semantically plausible location** for the {anomaly_type} on the {item_name}. The anomaly must be placed where it would naturally occur in a real industrial scenario (e.g., scratches on contact surfaces, cracks at stress points)
-
[52]
Keep the rest of the image, including background, lighting, and global geometry, completely unchanged
**Strict Local Editing Format (Top Priority)**: The prompt MUST start with: **"Using the provided image, change only [the specific localized area] to introduce [the anomaly]. Keep the rest of the image, including background, lighting, and global geometry, completely unchanged. "**
-
[53]
{anomaly_type}
**Hyper-Specific Realism**: - Describe the exact **texture interaction**. - Define a **limited spatial extent**: The defect should be small, localized, and subtle, not overwhelming the object. - Use positive semantic constraints for industrial realism, not artistic flair. AnomalyAgent: Agentic Industrial Anomaly Synthesis via Tool-Augmented Reinforcement ...
-
[54]
Understand what the specified anomaly type means for this specific object category in real industrial inspection scenarios
-
[55]
Infer which part of the object is the most physically and semantically plausible location for this anomaly
-
[56]
Determine how the anomaly should visually appear: - shape and structure - texture interaction with the object material - contrast, scale, and severity
-
[57]
Using the provided image, change only ... Keep the rest of the image unchanged
Decide how the anomaly should be refined or corrected compared to the current anomaly image. # Prompt construction rules (VERY IMPORTANT): - The prompt MUST follow a local image editing style, such as: "Using the provided image, change only ... Keep the rest of the image unchanged. " - Only describe what should be edited, never describe global or stylisti...
-
[58]
**Location Reasonableness (Score 0-5)**: Evaluate whether the anomaly is placed on a physically valid and semantically correct part of the object, aligned with object geometry, and not floating in the background or crossing irrelevant regions
-
[59]
location_score
**Quality Acceptability (Score 0-5)**: Evaluate whether the anomaly appears realistic in texture, scale, contrast, and integration with surrounding material, without obvious artifacts or signs of artificial overlay. **Scoring Guide**: - **5**: Perfect, indistinguishable from real samples. - **3-4**: Minor flaws but generally plausible. - **1-2**: Signific...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.