Feature Perturbation Pool-based Fusion Network for Unified Multi-Class Industrial Defect Detection
Pith reviewed 2026-05-10 03:22 UTC · model grok-4.3
The pith
Randomly injecting noise into features lets one model detect all industrial defect types without added cost or separate training.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By randomly enriching feature representations with a pool of noise patterns and aggregating hierarchical features across layers via residual connections and normalization, a single network can jointly model heterogeneous defect categories, overcoming inter-class feature perturbation while preserving localization detail and achieving 97.17 percent image-level AUROC and 96.93 percent pixel-level AUROC on MVTec-AD plus 91.08 percent image-level AUROC and 99.08 percent pixel-level AUROC on VisA.
What carries the argument
The stochastic feature perturbation pool that randomly injects Gaussian noise, F-Noise, and F-Drop into extracted representations, combined with residual multi-layer feature fusion and normalization.
If this is right
- A single model suffices for every defect category instead of one model per category.
- Robustness improves against domain shifts and previously unseen defect shapes.
- Both image-level classification and pixel-level localization accuracy rise on standard industrial benchmarks.
- Model size and inference speed stay identical to the baseline architecture.
Where Pith is reading between the lines
- The same perturbation approach could be tested on other anomaly-detection domains such as medical imaging where class variety and domain shift are common.
- Different noise types in the pool may contribute unequally, so ablating each one separately on new datasets could identify the most effective subset.
- Because no extra parameters are added, the method scales naturally to larger production lines with many defect classes.
Load-bearing premise
Randomly adding those specific noise patterns to features during training will increase robustness to inter-class interference and domain shifts without lowering performance on known defects or requiring any extra learnable parameters.
What would settle it
An experiment that removes the perturbation pool or replaces it with no noise and measures whether the AUROC scores on MVTec-AD and VisA fall below the reported state-of-the-art values.
Figures
read the original abstract
Multi-class defect detection constitutes a critical yet challenging task in industrial quality inspection, where existing approaches typically suffer from two fundamental limitations: (i) the necessity of training separate models for each defect category, resulting in substantial computational and memory overhead, and (ii) degraded robustness caused by inter-class feature perturbation when heterogeneous defect categories are jointly modeled. In this paper, we present FPFNet, a Feature Perturbation Pool-based Fusion Network that synergistically integrates a stochastic feature perturbation pool with a multi-layer feature fusion strategy to address these challenges within a unified detection framework. The feature perturbation pool enriches the training distribution by randomly injecting diverse noise patterns -- including Gaussian noise, F-Noise, and F-Drop -- into the extracted feature representations, thereby strengthening the model's robustness against domain shifts and unseen defect morphologies. Concurrently, the multi-layer feature fusion module aggregates hierarchical feature representations from both the encoder and decoder through residual connections and normalization, enabling the network to capture complex cross-scale relationships while preserving fine-grained spatial details essential for precise defect localization. Built upon the UniAD architecture~\cite{you2022unified}, our method achieves state-of-the-art performance on two widely adopted benchmarks: 97.17\% image-level AUROC and 96.93\% pixel-level AUROC on MVTec-AD, and 91.08\% image-level AUROC and 99.08\% pixel-level AUROC on VisA, surpassing existing methods by notable margins while introducing no additional learnable parameters or computational complexity.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes FPFNet, an extension of the UniAD architecture for unified multi-class industrial defect detection. It introduces a stochastic Feature Perturbation Pool that randomly injects Gaussian noise, F-Noise, and F-Drop into feature representations to improve robustness against inter-class perturbations and domain shifts. A multi-layer feature fusion module aggregates encoder and decoder features via residual connections and normalization. The method reports state-of-the-art AUROC scores on MVTec-AD (97.17% image-level, 96.93% pixel-level) and VisA (91.08% image-level, 99.08% pixel-level) while adding no learnable parameters or computational complexity.
Significance. If the reported results hold under the described non-parametric operations, the work provides a practical, efficiency-preserving enhancement to unified anomaly detection frameworks. The parameter-free perturbation strategy and residual fusion address real limitations in multi-class industrial inspection without increasing model size, which could facilitate deployment. The consistent performance on two standard benchmarks is a strength, and the absence of additional learnable parameters is explicitly credited as enabling broader applicability.
minor comments (4)
- Abstract: The terms F-Noise and F-Drop are introduced without a brief definition or pointer to their formulation in the methods section; this reduces immediate readability for readers scanning the abstract.
- §3 (Method): The exact sampling and injection process for the perturbation pool (e.g., at which layers and with what probability) would benefit from a compact equation or pseudocode to support reproducibility.
- §4 (Experiments): While the central claim is supported, the manuscript should explicitly state the number of random seeds used for the reported AUROC values and whether the UniAD baseline was re-implemented or taken from the original paper to allow direct comparison.
- Figure 2: The architecture diagram would be clearer if the residual connections in the fusion module were annotated with the corresponding normalization operation to match the textual description.
Simulated Author's Rebuttal
We thank the referee for the careful reading and positive assessment of our work, including the recognition of its practical value in providing a parameter-free enhancement to unified multi-class defect detection. The recommendation for minor revision is noted.
Circularity Check
No significant circularity detected
full rationale
The paper's central contribution is an empirical architecture extension of the externally cited UniAD model, adding only non-parametric operations (random noise sampling from Gaussian/F-Noise/F-Drop pools plus residual normalization). Performance numbers are reported benchmark results rather than any derived prediction or fitted quantity. No equations, self-definitions, or load-bearing self-citations appear that would reduce the claimed AUROC gains to the method's own inputs by construction; the derivation chain remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The UniAD architecture provides a suitable base that can be extended with perturbation and fusion modules while preserving parameter count and compute.
invented entities (3)
-
Feature Perturbation Pool
no independent evidence
-
F-Noise
no independent evidence
-
F-Drop
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Generalized denoising auto-encoders as generative models
Yoshua Bengio, Li Yao, Guillaume Alain, and Pas- cal Vincent. Generalized denoising auto-encoders as generative models. InProceedings of the 27th Inter- national Conference on Neural Information Processing Systems, pages 899–907, 2013
2013
-
[2]
MVTec AD—a comprehensive real-world dataset for unsupervised anomaly detec- tion
Paul Bergmann, Michael Fauser, David Sattlegger, and Carsten Steger. MVTec AD—a comprehensive real-world dataset for unsupervised anomaly detec- tion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9584–9592, 2019
2019
-
[3]
Collaborative discrepancy optimization for reli- able image anomaly localization.IEEE Transactions on Industrial Informatics, 19(11):10674–10683, 2023
Yunkang Cao, Xiaohao Xu, Zhiguo Liu, and Weiming Shen. Collaborative discrepancy optimization for reli- able image anomaly localization.IEEE Transactions on Industrial Informatics, 19(11):10674–10683, 2023
2023
-
[4]
Xian Cheng and Jianbo Yu. Retinanet with dif- ference channel attention and adaptively spatial fea- ture fusion for steel surface defect detection.IEEE Transactions on Instrumentation and Measurement, 70:2503911, 2021
2021
-
[5]
Support-vector networks.Machine Learning, 20(3):273–297, 1995
Corinna Cortes and Vladimir Vapnik. Support-vector networks.Machine Learning, 20(3):273–297, 1995
1995
-
[6]
PaDiM: A patch distribution modeling framework for anomaly detection and local- ization
Thomas Defard, Aleksandr Setkov, Angelique Loesch, and Romaric Audigier. PaDiM: A patch distribution modeling framework for anomaly detection and local- ization. InProceedings of the International Conference on Pattern Recognition, pages 475–489, 2021
2021
-
[7]
Anomaly detection via reverse distillation from one-class embedding
Hanqiu Deng and Xingyu Li. Anomaly detection via reverse distillation from one-class embedding. In Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 9727– 9736, 2022
2022
-
[8]
ImageNet: A large-scale hierarchi- cal image database
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Fei-Fei Li. ImageNet: A large-scale hierarchi- cal image database. InProceedings of the IEEE Con- ference on Computer Vision and Pattern Recognition, pages 248–255, 2009
2009
-
[9]
PGA-Net: Pyramid feature fusion and global context attention network for auto- mated surface defect detection.IEEE Transactions on Industrial Informatics, 16(12):7448–7458, 2020
Hongwen Dong, Kechen Song, Yu He, Jun Xu, Yunhui Yan, and Qingguo Meng. PGA-Net: Pyramid feature fusion and global context attention network for auto- mated surface defect detection.IEEE Transactions on Industrial Informatics, 16(12):7448–7458, 2020
2020
-
[10]
Few-shot defect image generation via defect-aware fea- ture manipulation
Yuxuan Duan, Yi Hong, Li Niu, and Liqing Zhang. Few-shot defect image generation via defect-aware fea- ture manipulation. InProceedings of the AAAI Con- ference on Artificial Intelligence, pages 571–578, 2023
2023
-
[11]
Multi-level feature fusion based locality-constrained spatial transformer network for video crowd counting.Neurocomputing, 392:98– 107, 2020
Yanyan Fang, Shenghui Gao, Jie Li, Wenxuan Luo, Lifang He, and Bin Hu. Multi-level feature fusion based locality-constrained spatial transformer network for video crowd counting.Neurocomputing, 392:98– 107, 2020
2020
-
[12]
DocPedia: Un- leashing the power of large multimodal model in the frequency domain for versatile document understand- ing.Science China Information Sciences, 2024
Hao Feng, Qi Liu, Hao Liu, Jingqun Tang, Wengang Zhou, Houqiang Li, and Can Huang. DocPedia: Un- leashing the power of large multimodal model in the frequency domain for versatile document understand- ing.Science China Information Sciences, 2024
2024
-
[13]
Hao Feng, Wei Shi, Kairen Zhang, Xiaoyu Fei, Lei Liao, Dingkang Yang, Yue Du, Xuecheng Wu, Jingqun Tang, Yuliang Liu, et al. Dolphin-v2: Universal doc- ument parsing via scalable anchor prompting.arXiv preprint arXiv:2602.05384, 2026
-
[14]
Hao Feng, Zijian Wang, Jingqun Tang, Jinghui Lu, Wengang Zhou, Houqiang Li, and Can Huang. UniDoc: A universal large multimodal model for simultaneous text detection, recognition, spotting and understanding.arXiv preprint arXiv:2308.11592, 2023
-
[15]
Dolphin: Docu- ment image parsing via heterogeneous anchor prompt- ing
Hao Feng, Shuangping Wei, Xiaoyu Fei, Wei Shi, Yuechen Han, Lei Liao, Jinghui Lu, Binghong Wu, Qi Liu, Chunhui Lin, Jingqun Tang, et al. Dolphin: Docu- ment image parsing via heterogeneous anchor prompt- ing. 2025
2025
-
[16]
Ling Fu, Zhisheng Kuang, Jingkuan Song, Mingxin Huang, Boya Yang, Yuliang Li, Liyan Zhu, Qiao Luo, Xinyu Wang, Jingqun Tang, et al. OCRBench v2: An improved benchmark for evaluating large multimodal models on visual text localization and reasoning.arXiv preprint arXiv:2501.00321, 2024
work page internal anchor Pith review arXiv 2024
-
[17]
Novel feature fusion module-based detector for small insulator defect detection.IEEE Sensors Journal, 21(15):16807–16814, 2021
Zhishuai Gao, Guodong Yang, En Li, and Zize Liang. Novel feature fusion module-based detector for small insulator defect detection.IEEE Sensors Journal, 21(15):16807–16814, 2021. 10
2021
-
[18]
Generative adversarial nets
Ian J Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. InAdvances in Neural Information Processing Systems, pages 2672–2680, 2014
2014
-
[19]
CFLOW-AD: Real-time unsupervised anomaly de- tection with localization via conditional normalizing flows
Denis Gudovskiy, Shun Ishizaka, and Kazuki Kozuka. CFLOW-AD: Real-time unsupervised anomaly de- tection with localization via conditional normalizing flows. InProceedings of the IEEE/CVF Winter Con- ference on Applications of Computer Vision, pages 98– 107, 2022
2022
-
[20]
ReContrast: Domain-specific anomaly detection via contrastive reconstruction
Jia Guo, Shuai Lu, Lize Jia, Weihang Zhang, and Huiqi Li. ReContrast: Domain-specific anomaly detection via contrastive reconstruction. InAdvances in Neural Information Processing Systems, 2023
2023
-
[21]
A diffusion-based framework for multi-class anomaly detection
Haoyang He, Jiangning Zhang, Hongxu Chen, Xuhai Chen, Zhensheng Li, Xiaobin Chen, Yabiao Wang, Chengjie Wang, and Lingxi Xie. A diffusion-based framework for multi-class anomaly detection. InPro- ceedings of the AAAI Conference on Artificial Intelli- gence, pages 8472–8480, 2024
2024
-
[22]
Deep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vi- sion and Pattern Recognition, pages 770–778, 2016
2016
-
[23]
Denoising diffusion probabilistic models
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. InAdvances in Neural Information Processing Systems, 2020
2020
-
[24]
Synthetic data augmenta- tion for surface defect detection and classification using deep learning.Journal of Intelligent Manufacturing, 33(4):1007–1020, 2022
Shubham Jain, Garima Seth, Aman Paruthi, Udit Soni, and Girish Kumar. Synthetic data augmenta- tion for surface defect detection and classification using deep learning.Journal of Intelligent Manufacturing, 33(4):1007–1020, 2022
2022
-
[25]
Effi- cient training for automatic defect classification by im- age augmentation
Nana Kondo, Minoru Harada, and Yoko Takagi. Effi- cient training for automatic defect classification by im- age augmentation. InProceedings of the IEEE Winter Conference on Applications of Computer Vision, pages 226–233, 2018
2018
-
[26]
Multi-scale feature fusion-based lightweight dual stream transformer for detection of paddy leaf disease.Environmental Moni- toring and Assessment, 195(9):1020, 2023
Ashwani Kumar, Deepak Punetha Yadav, Deepak Ku- mar, Millie Pant, and Gaurav Pant. Multi-scale feature fusion-based lightweight dual stream transformer for detection of paddy leaf disease.Environmental Moni- toring and Assessment, 195(9):1020, 2023
2023
-
[27]
CutPaste: Self-supervised learning for anomaly detection and localization
Chun-Liang Li, Kihyuk Sohn, Jinsung Yoon, and Tomas Pfister. CutPaste: Self-supervised learning for anomaly detection and localization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9664–9674, 2021
2021
-
[28]
Feature pyramid networks for object detection
Tsung-Yi Lin, Piotr Doll´ ar, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. Feature pyramid networks for object detection. InProceedings of the IEEE Conference on Computer Vision and Pat- tern Recognition, pages 2117–2125, 2017
2017
-
[29]
Anomaly detection via progressive reconstruction and hierarchical feature fusion.Sensors, 23(21):8750, 2023
Fang Liu, Xiaomin Zhu, Pingfa Feng, and Lan Zeng. Anomaly detection via progressive reconstruction and hierarchical feature fusion.Sensors, 23(21):8750, 2023
2023
-
[30]
Deep in- dustrial image anomaly detection: A survey.Machine Intelligence Research, 21:104–135, 2024
Jiaqi Liu, Guoyang Song, Yong He, et al. Deep in- dustrial image anomaly detection: A survey.Machine Intelligence Research, 21:104–135, 2024
2024
-
[31]
Multistage GAN for fabric defect de- tection.IEEE Transactions on Image Processing, 29:3388–3400, 2020
Junhui Liu, Changyu Wang, Hai Su, Bo Du, and Dacheng Tao. Multistage GAN for fabric defect de- tection.IEEE Transactions on Image Processing, 29:3388–3400, 2020
2020
-
[32]
Tongkun Liu, Bing Li, Xiao Du, Bingke Jiang, Leqi Geng, Feiyang Wang, and Zhuo Zhao. FAIR: Frequency-aware image restoration for indus- trial visual anomaly detection.arXiv preprint arXiv:2309.07068, 2023
-
[33]
SPTS v2: Single-point scene text spotting.IEEE Transac- tions on Pattern Analysis and Machine Intelligence, 45(12):15545–15559, 2023
Yuliang Liu, Jiaxin Zhang, Dezhi Peng, Mingxin Huang, Xinyu Wang, Jingqun Tang, Can Huang, Dahua Lin, Chunhua Shen, Xiang Bai, et al. SPTS v2: Single-point scene text spotting.IEEE Transac- tions on Pattern Analysis and Machine Intelligence, 45(12):15545–15559, 2023
2023
-
[34]
SimpleNet: A simple network for image anomaly detection and localization
Zhikang Liu, Yiming Zhou, Yuansheng Xu, and Zilei Wang. SimpleNet: A simple network for image anomaly detection and localization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20402–20411, 2023
2023
-
[35]
A bounding box is worth one token—interleaving layout and text in a large language model for document understanding
Jinghui Lu, Haiyang Yu, Yanjie Wang, Yongjie Ye, Jingqun Tang, Ziwei Yang, Binghong Wu, Qi Liu, Hao Feng, Han Wang, et al. A bounding box is worth one token—interleaving layout and text in a large language model for document understanding. 2025
2025
-
[36]
Anomaly detection with conditioned denoising diffu- sion models
Arian Mousakhan, Thomas Brox, and Jawad Tayyub. Anomaly detection with conditioned denoising diffu- sion models. 2023
2023
-
[37]
Data augmentation on defect detection of sanitary ceramics
Jinsong Niu, Yifei Chen, Xiaohua Yu, Zhao Li, and Haijun Gao. Data augmentation on defect detection of sanitary ceramics. InProceedings of the IECON Annual Conference of the IEEE Industrial Electronics Society, pages 5317–5322, 2020
2020
-
[38]
Semi-supervised semantic segmentation with cross- consistency training
Yassine Ouali, C´ eline Hudelot, and Myriam Tami. Semi-supervised semantic segmentation with cross- consistency training. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion, pages 12671–12681, 2020
2020
-
[39]
Multi-resolution feature fusion for face recognition.Pattern Recogni- tion, 47(2):556–567, 2014
Kin-Hong Pong and Kin-Man Lam. Multi-resolution feature fusion for face recognition.Pattern Recogni- tion, 47(2):556–567, 2014
2014
-
[40]
MFGAN: Multi- modal fusion for industrial anomaly detection using attention-based autoencoder and generative adversar- ial network.Sensors, 24(2):637, 2024
Xiaojun Qu, Zhong Liu, Changqi Wu, Aiqing Hou, Xiaoyong Yin, and Zhilong Chen. MFGAN: Multi- modal fusion for industrial anomaly detection using attention-based autoencoder and generative adversar- ial network.Sensors, 24(2):637, 2024
2024
-
[41]
Faster R-CNN: Towards real-time object detec- tion with region proposal networks
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster R-CNN: Towards real-time object detec- tion with region proposal networks. InAdvances in Neural Information Processing Systems, pages 91–99, 2015
2015
-
[42]
U-Net: Convolutional networks for biomedical image segmentation
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-Net: Convolutional networks for biomedical image segmentation. InProceedings of the International Con- 11 ference on Medical Image Computing and Computer- Assisted Intervention, pages 234–241, 2015
2015
-
[43]
To- wards total recall in industrial anomaly detection
Karsten Roth, Latha Pemula, Joaquin Zepeda, Bern- hard Sch¨ olkopf, Thomas Brox, and Peter Gehler. To- wards total recall in industrial anomaly detection. In Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 14318– 14328, 2022
2022
-
[44]
Same same but DifferNet: Semi-supervised defect de- tection with normalizing flows
Marco Rudolph, Bastian Wahl, and Bernhard Sick. Same same but DifferNet: Semi-supervised defect de- tection with normalizing flows. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1907–1916, 2021
1907
-
[45]
Explaining deep neural networks and beyond: A review of methods and applications.Proceedings of the IEEE, 109(3):247–278, 2021
Wojciech Samek, Gr´ egoire Montavon, Sebastian La- puschkin, Christopher J Anders, and Klaus-Robert M¨ uller. Explaining deep neural networks and beyond: A review of methods and applications.Proceedings of the IEEE, 109(3):247–278, 2021
2021
-
[46]
Biluo Shan, Xiaoyu Fei, Wei Shi, Anlong Wang, Guozhi Tang, Lei Liao, Jingqun Tang, Xiang Bai, and Can Huang. MCTBench: Multimodal cogni- tion towards text-rich visual scenes benchmark.arXiv preprint arXiv:2410.11538, 2024
-
[47]
A new method of feature fu- sion and its application in image recognition.Pattern Recognition, 38(12):2437–2448, 2005
Quansen Sun, Shuguang Zeng, Yan Liu, Pheng-Ann Heng, and Deshen Xia. A new method of feature fu- sion and its application in image recognition.Pattern Recognition, 38(12):2437–2448, 2005
2005
-
[48]
EfficientNet: Rethink- ing model scaling for convolutional neural networks
Mingxing Tan and Quoc V Le. EfficientNet: Rethink- ing model scaling for convolutional neural networks. InProceedings of the International Conference on Ma- chine Learning, pages 6105–6114, 2019
2019
-
[49]
Character recognition competition for street view shop signs.National Science Review, 10(6):nwad141, 2023
Jingqun Tang, Weidong Du, Bing Wang, Wen- gang Zhou, Songlin Mei, Tao Xue, Xiang Xu, and Hao Zhang. Character recognition competition for street view shop signs.National Science Review, 10(6):nwad141, 2023
2023
-
[50]
TextSquare: Scaling up text-centric visual instruction tuning,
Jingqun Tang, Chunhui Lin, Zhen Zhao, Shuangping Wei, Binghong Wu, Qi Liu, Yong He, Kangcheng Lu, Hao Feng, Yuliang Li, et al. TextSquare: Scaling up text-centric visual instruction tuning.arXiv preprint arXiv:2404.12803, 2024
-
[51]
MTVQA: Benchmarking multilingual text- centric visual question answering
Jingqun Tang, Qi Liu, Yongjie Ye, Jinghui Lu, Shuang- ping Wei, Anlong Wang, Chunhui Lin, Hao Feng, Zhen Zhao, et al. MTVQA: Benchmarking multilingual text- centric visual question answering. 2025
2025
-
[52]
Optimal boxes: Boost- ing end-to-end scene text recognition by adjusting an- notated bounding boxes via reinforcement learning
Jingqun Tang, Wenqing Qian, Luchuan Song, Xiena Dong, Lan Li, and Xiang Bai. Optimal boxes: Boost- ing end-to-end scene text recognition by adjusting an- notated bounding boxes via reinforcement learning. In Proceedings of the European Conference on Computer Vision, pages 233–248, 2022
2022
-
[53]
You can even annotate text with voice: Transcription-only- supervised text spotting
Jingqun Tang, Shuya Qiao, Benlei Cui, Yuhang Ma, Shuo Zhang, and Dimitrios Kanoulas. You can even annotate text with voice: Transcription-only- supervised text spotting. InProceedings of the 30th ACM International Conference on Multimedia, pages 4154–4163, 2022
2022
-
[54]
Few could be better than all: Feature sampling and group- ing for scene text detection
Jingqun Tang, Wenqing Zhang, Hao Liu, MingKun Yang, Bo Jiang, Guangliang Hu, and Xiang Bai. Few could be better than all: Feature sampling and group- ing for scene text detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 4563–4572, 2022
2022
-
[55]
Support vec- tor data description.Machine Learning, 54(1):45–66, 2004
David MJ Tax and Robert PW Duin. Support vec- tor data description.Machine Learning, 54(1):45–66, 2004
2004
-
[56]
Revisiting reverse distillation for anomaly detection
Tran Dinh Tien, Anh Tuan Nguyen, Nguyen Hoang Tran, Ta Duc Huy, Soan TM Duong, Chanh DT Nguyen, and Steven QH Truong. Revisiting reverse distillation for anomaly detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 24511–24520, 2023
2023
-
[57]
WildDoc: How far are we from achiev- ing comprehensive and robust document understand- ing in the wild? 2025
Anlong Wang, Jingqun Tang, Lei Liao, Hao Feng, Qi Liu, Xiaoyu Fei, Jinghui Lu, Han Wang, Hao Liu, Yu- liang Liu, et al. WildDoc: How far are we from achiev- ing comprehensive and robust document understand- ing in the wild? 2025
2025
-
[58]
Hao Wang, Ruifang Zhang, Mingyang Feng, Yukun Liu, and Guoping Yang. Global context-based self- similarity feature augmentation and bidirectional fea- ture fusion for surface defect detection.IEEE Transactions on Instrumentation and Measurement, 72:5024712, 2023
2023
-
[59]
AEKD: Unsupervised auto-encoder knowledge dis- tillation for industrial anomaly detection.Journal of Manufacturing Systems, 73:159–169, 2024
Qiwei Wu, Hui Li, Chenyu Tian, Long Wen, and Xinyu Li. AEKD: Unsupervised auto-encoder knowledge dis- tillation for industrial anomaly detection.Journal of Manufacturing Systems, 73:159–169, 2024
2024
-
[60]
Scarcity-GAN: Scarce data augmentation for defect detection via generative adversarial nets.Neurocom- puting, 566:127061, 2024
Chuangbiao Xu, Wei Li, Xiaohui Cui, Zhaoyu Wang, Fengling Zheng, Xiaowei Zhang, and Bo Chen. Scarcity-GAN: Scarce data augmentation for defect detection via generative adversarial nets.Neurocom- puting, 566:127061, 2024
2024
-
[61]
MemSeg: A semi-supervised method for image surface defect de- tection using differences and commonalities.Engineer- ing Applications of Artificial Intelligence, 119:105835, 2023
Minghui Yang, Peng Wu, and Hui Feng. MemSeg: A semi-supervised method for image surface defect de- tection using differences and commonalities.Engineer- ing Applications of Artificial Intelligence, 119:105835, 2023
2023
-
[62]
Focus the discrepancy: Intra- and inter-correlation learning for image anomaly detection
Xincheng Yao, Ruoqi Li, Zhenfang Qian, Ye Luo, and Chengyu Zhang. Focus the discrepancy: Intra- and inter-correlation learning for image anomaly detection. InProceedings of the IEEE/CVF International Con- ference on Computer Vision, pages 6780–6790, 2023
2023
-
[63]
Explicit boundary guided semi-push- pull contrastive learning for supervised anomaly de- tection
Xincheng Yao, Ruoqi Li, Jing Zhang, Jun Sun, and Chengyu Zhang. Explicit boundary guided semi-push- pull contrastive learning for supervised anomaly de- tection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 24490–24499, 2023
2023
-
[64]
Efficient fused-attention model for steel surface defect detec- tion.IEEE Transactions on Instrumentation and Mea- surement, 71:2510011, 2022
Chun Ching Yeung and Kin-Man Lam. Efficient fused-attention model for steel surface defect detec- tion.IEEE Transactions on Instrumentation and Mea- surement, 71:2510011, 2022
2022
-
[65]
A unified model for multi- 12 class anomaly detection
Zhiyuan You, Lei Cui, Yujun Shen, Kai Yang, Xin Lu, Yu Zheng, and Xinyi Le. A unified model for multi- 12 class anomaly detection. InAdvances in Neural Infor- mation Processing Systems, pages 4571–4584, 2022
2022
-
[66]
DRAEM—a discriminatively trained reconstruction embedding for surface anomaly detection
Vitjan Zavrtanik, Matej Kristan, and Danijel Skoˇ caj. DRAEM—a discriminatively trained reconstruction embedding for surface anomaly detection. InProceed- ings of the IEEE/CVF International Conference on Computer Vision, pages 8310–8319, 2021
2021
-
[67]
DSR—a dual subspace re-projection network for sur- face anomaly detection
Vitjan Zavrtanik, Matej Kristan, and Danijel Skoˇ caj. DSR—a dual subspace re-projection network for sur- face anomaly detection. InProceedings of the European Conference on Computer Vision, pages 539–554, 2022
2022
-
[68]
NLFFTNet: A non-local feature fusion transformer network for multi-scale object de- tection.Neurocomputing, 493:15–27, 2022
Ke Zeng, Qing Ma, Jiangwu Wu, Shijie Xiang, Tong Shen, and Li Zhang. NLFFTNet: A non-local feature fusion transformer network for multi-scale object de- tection.Neurocomputing, 493:15–27, 2022
2022
-
[69]
Prototypical residual networks for anomaly detection and localization
Hui Zhang, Zhixiang Wu, Zheng Wang, Zhineng Chen, and Yu-Gang Jiang. Prototypical residual networks for anomaly detection and localization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16281–16291, 2023
2023
-
[70]
Deep learn- ing for remote sensing data: A technical tutorial on the state of the art.IEEE Geoscience and Remote Sensing Magazine, 4(2):22–40, 2016
Liangpei Zhang, Lefei Zhang, and Bo Du. Deep learn- ing for remote sensing data: A technical tutorial on the state of the art.IEEE Geoscience and Remote Sensing Magazine, 4(2):22–40, 2016
2016
-
[71]
DeSTSeg: Segmentation guided denoising student-teacher for anomaly detection
Xuan Zhang, Shiyu Li, Xi Li, Ping Huang, Jiulong Shan, and Ting Chen. DeSTSeg: Segmentation guided denoising student-teacher for anomaly detection. In Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 3914– 3923, 2023
2023
-
[72]
RealNet: A feature selection network with realistic synthetic anomaly for anomaly detection
Ximei Zhang, Min Xu, and Xiuzhuang Zhou. RealNet: A feature selection network with realistic synthetic anomaly for anomaly detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 16699–16708, 2024
2024
-
[73]
TabPedia: Towards comprehensive visual table understanding with con- cept synergy
Weichao Zhao, Hao Feng, Qi Liu, Jingqun Tang, Shuangping Wei, Binghong Wu, Lei Liao, Yongjie Ye, Hao Liu, Wengang Zhou, et al. TabPedia: Towards comprehensive visual table understanding with con- cept synergy. InAdvances in Neural Information Pro- cessing Systems, 2024
2024
-
[74]
Multi-modal in-context learning makes an ego-evolving scene text recognizer
Zhen Zhao, Jingqun Tang, Binghong Wu, Chunhui Lin, Hao Liu, Zeming Zhang, Xin Tan, Can Huang, and Yuan Xie. Multi-modal in-context learning makes an ego-evolving scene text recognizer. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
2023
-
[75]
Harmonizing visual text comprehension and generation
Zhen Zhao, Jingqun Tang, Binghong Wu, Chunhui Lin, Shuangping Wei, Hao Liu, Xin Tan, Zhizhong Zhang, Can Huang, et al. Harmonizing visual text comprehension and generation. InAdvances in Neural Information Processing Systems, 2024
2024
-
[76]
Benchmark- ing and analyzing multi-class anomaly detection
Hao Zheng, Hanqiu Deng, and Xingyu Li. Benchmark- ing and analyzing multi-class anomaly detection. In Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, 2024
2024
-
[77]
Spot-the-difference self- supervised pre-training for anomaly detection and seg- mentation
Yang Zou, Jongheon Jeong, Latha Pemula, Dongqing Zhang, and Onkar Dabeer. Spot-the-difference self- supervised pre-training for anomaly detection and seg- mentation. InProceedings of the European Conference on Computer Vision, pages 392–408, 2022. 13
2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.