arxiv: 2604.19259 · v1 · submitted 2026-04-21 · 💻 cs.CV

Feature Perturbation Pool-based Fusion Network for Unified Multi-Class Industrial Defect Detection

Yuanchan Xu , Wenjun Zang , Ying Wu This is my paper

Pith reviewed 2026-05-10 03:22 UTC · model grok-4.3

classification 💻 cs.CV

keywords industrial defect detectionmulti-class anomaly detectionfeature perturbationfeature fusionMVTec-ADVisAunified detection

0 comments

The pith

Randomly injecting noise into features lets one model detect all industrial defect types without added cost or separate training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses two core problems in industrial defect detection: the high cost of training a separate model for each defect category, and the loss of robustness when multiple categories are handled together because their features interfere. It introduces a stochastic perturbation pool that randomly adds Gaussian noise, F-Noise, and F-Drop to the extracted features during training, plus a fusion module that combines encoder and decoder layers through residuals and normalization. These changes, applied on top of an existing unified architecture, produce higher accuracy for both classifying images and locating defects at the pixel level on two standard benchmarks, all while keeping the model size and speed unchanged.

Core claim

By randomly enriching feature representations with a pool of noise patterns and aggregating hierarchical features across layers via residual connections and normalization, a single network can jointly model heterogeneous defect categories, overcoming inter-class feature perturbation while preserving localization detail and achieving 97.17 percent image-level AUROC and 96.93 percent pixel-level AUROC on MVTec-AD plus 91.08 percent image-level AUROC and 99.08 percent pixel-level AUROC on VisA.

What carries the argument

The stochastic feature perturbation pool that randomly injects Gaussian noise, F-Noise, and F-Drop into extracted representations, combined with residual multi-layer feature fusion and normalization.

If this is right

A single model suffices for every defect category instead of one model per category.
Robustness improves against domain shifts and previously unseen defect shapes.
Both image-level classification and pixel-level localization accuracy rise on standard industrial benchmarks.
Model size and inference speed stay identical to the baseline architecture.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same perturbation approach could be tested on other anomaly-detection domains such as medical imaging where class variety and domain shift are common.
Different noise types in the pool may contribute unequally, so ablating each one separately on new datasets could identify the most effective subset.
Because no extra parameters are added, the method scales naturally to larger production lines with many defect classes.

Load-bearing premise

Randomly adding those specific noise patterns to features during training will increase robustness to inter-class interference and domain shifts without lowering performance on known defects or requiring any extra learnable parameters.

What would settle it

An experiment that removes the perturbation pool or replaces it with no noise and measures whether the AUROC scores on MVTec-AD and VisA fall below the reported state-of-the-art values.

Figures

Figures reproduced from arXiv: 2604.19259 by Wenjun Zang, Ying Wu, Yuanchan Xu.

**Figure 1.** Figure 1: Architecture of the proposed FPFNet. The network takes normal images as input, extracts multi-scale features via a pre-trained EfficientNet-b4, applies stochastic perturbation through the Feature Perturbation Pool, and reconstructs the original features through an encoder-decoder architecture augmented with Multi-Layer Feature Fusion modules. The anomaly score map is derived from the L2 distance between or… view at source ↗

read the original abstract

Multi-class defect detection constitutes a critical yet challenging task in industrial quality inspection, where existing approaches typically suffer from two fundamental limitations: (i) the necessity of training separate models for each defect category, resulting in substantial computational and memory overhead, and (ii) degraded robustness caused by inter-class feature perturbation when heterogeneous defect categories are jointly modeled. In this paper, we present FPFNet, a Feature Perturbation Pool-based Fusion Network that synergistically integrates a stochastic feature perturbation pool with a multi-layer feature fusion strategy to address these challenges within a unified detection framework. The feature perturbation pool enriches the training distribution by randomly injecting diverse noise patterns -- including Gaussian noise, F-Noise, and F-Drop -- into the extracted feature representations, thereby strengthening the model's robustness against domain shifts and unseen defect morphologies. Concurrently, the multi-layer feature fusion module aggregates hierarchical feature representations from both the encoder and decoder through residual connections and normalization, enabling the network to capture complex cross-scale relationships while preserving fine-grained spatial details essential for precise defect localization. Built upon the UniAD architecture~\cite{you2022unified}, our method achieves state-of-the-art performance on two widely adopted benchmarks: 97.17\% image-level AUROC and 96.93\% pixel-level AUROC on MVTec-AD, and 91.08\% image-level AUROC and 99.08\% pixel-level AUROC on VisA, surpassing existing methods by notable margins while introducing no additional learnable parameters or computational complexity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 4 minor

Summary. The manuscript proposes FPFNet, an extension of the UniAD architecture for unified multi-class industrial defect detection. It introduces a stochastic Feature Perturbation Pool that randomly injects Gaussian noise, F-Noise, and F-Drop into feature representations to improve robustness against inter-class perturbations and domain shifts. A multi-layer feature fusion module aggregates encoder and decoder features via residual connections and normalization. The method reports state-of-the-art AUROC scores on MVTec-AD (97.17% image-level, 96.93% pixel-level) and VisA (91.08% image-level, 99.08% pixel-level) while adding no learnable parameters or computational complexity.

Significance. If the reported results hold under the described non-parametric operations, the work provides a practical, efficiency-preserving enhancement to unified anomaly detection frameworks. The parameter-free perturbation strategy and residual fusion address real limitations in multi-class industrial inspection without increasing model size, which could facilitate deployment. The consistent performance on two standard benchmarks is a strength, and the absence of additional learnable parameters is explicitly credited as enabling broader applicability.

minor comments (4)

Abstract: The terms F-Noise and F-Drop are introduced without a brief definition or pointer to their formulation in the methods section; this reduces immediate readability for readers scanning the abstract.
§3 (Method): The exact sampling and injection process for the perturbation pool (e.g., at which layers and with what probability) would benefit from a compact equation or pseudocode to support reproducibility.
§4 (Experiments): While the central claim is supported, the manuscript should explicitly state the number of random seeds used for the reported AUROC values and whether the UniAD baseline was re-implemented or taken from the original paper to allow direct comparison.
Figure 2: The architecture diagram would be clearer if the residual connections in the fusion module were annotated with the corresponding normalization operation to match the textual description.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the careful reading and positive assessment of our work, including the recognition of its practical value in providing a parameter-free enhancement to unified multi-class defect detection. The recommendation for minor revision is noted.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper's central contribution is an empirical architecture extension of the externally cited UniAD model, adding only non-parametric operations (random noise sampling from Gaussian/F-Noise/F-Drop pools plus residual normalization). Performance numbers are reported benchmark results rather than any derived prediction or fitted quantity. No equations, self-definitions, or load-bearing self-citations appear that would reduce the claimed AUROC gains to the method's own inputs by construction; the derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 3 invented entities

The central claim rests on empirical benchmark gains rather than a derivation. The abstract introduces two new modules whose benefits are asserted without proof or external validation.

axioms (1)

domain assumption The UniAD architecture provides a suitable base that can be extended with perturbation and fusion modules while preserving parameter count and compute.
Explicitly stated as the foundation of the method.

invented entities (3)

Feature Perturbation Pool no independent evidence
purpose: Randomly injects diverse noise patterns into extracted features to improve robustness
Core new component introduced to solve inter-class perturbation and domain-shift problems.
F-Noise no independent evidence
purpose: One of the specific noise patterns used inside the perturbation pool
Named but not defined in the abstract; appears to be a novel variant.
F-Drop no independent evidence
purpose: One of the specific drop patterns used inside the perturbation pool
Named but not defined in the abstract; appears to be a novel variant.

pith-pipeline@v0.9.0 · 5573 in / 1521 out tokens · 59508 ms · 2026-05-10T03:22:46.605986+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

77 extracted references · 6 canonical work pages · 1 internal anchor

[1]

Generalized denoising auto-encoders as generative models

Yoshua Bengio, Li Yao, Guillaume Alain, and Pas- cal Vincent. Generalized denoising auto-encoders as generative models. InProceedings of the 27th Inter- national Conference on Neural Information Processing Systems, pages 899–907, 2013

2013
[2]

MVTec AD—a comprehensive real-world dataset for unsupervised anomaly detec- tion

Paul Bergmann, Michael Fauser, David Sattlegger, and Carsten Steger. MVTec AD—a comprehensive real-world dataset for unsupervised anomaly detec- tion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9584–9592, 2019

2019
[3]

Collaborative discrepancy optimization for reli- able image anomaly localization.IEEE Transactions on Industrial Informatics, 19(11):10674–10683, 2023

Yunkang Cao, Xiaohao Xu, Zhiguo Liu, and Weiming Shen. Collaborative discrepancy optimization for reli- able image anomaly localization.IEEE Transactions on Industrial Informatics, 19(11):10674–10683, 2023

2023
[4]

Xian Cheng and Jianbo Yu. Retinanet with dif- ference channel attention and adaptively spatial fea- ture fusion for steel surface defect detection.IEEE Transactions on Instrumentation and Measurement, 70:2503911, 2021

2021
[5]

Support-vector networks.Machine Learning, 20(3):273–297, 1995

Corinna Cortes and Vladimir Vapnik. Support-vector networks.Machine Learning, 20(3):273–297, 1995

1995
[6]

PaDiM: A patch distribution modeling framework for anomaly detection and local- ization

Thomas Defard, Aleksandr Setkov, Angelique Loesch, and Romaric Audigier. PaDiM: A patch distribution modeling framework for anomaly detection and local- ization. InProceedings of the International Conference on Pattern Recognition, pages 475–489, 2021

2021
[7]

Anomaly detection via reverse distillation from one-class embedding

Hanqiu Deng and Xingyu Li. Anomaly detection via reverse distillation from one-class embedding. In Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 9727– 9736, 2022

2022
[8]

ImageNet: A large-scale hierarchi- cal image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Fei-Fei Li. ImageNet: A large-scale hierarchi- cal image database. InProceedings of the IEEE Con- ference on Computer Vision and Pattern Recognition, pages 248–255, 2009

2009
[9]

PGA-Net: Pyramid feature fusion and global context attention network for auto- mated surface defect detection.IEEE Transactions on Industrial Informatics, 16(12):7448–7458, 2020

Hongwen Dong, Kechen Song, Yu He, Jun Xu, Yunhui Yan, and Qingguo Meng. PGA-Net: Pyramid feature fusion and global context attention network for auto- mated surface defect detection.IEEE Transactions on Industrial Informatics, 16(12):7448–7458, 2020

2020
[10]

Few-shot defect image generation via defect-aware fea- ture manipulation

Yuxuan Duan, Yi Hong, Li Niu, and Liqing Zhang. Few-shot defect image generation via defect-aware fea- ture manipulation. InProceedings of the AAAI Con- ference on Artificial Intelligence, pages 571–578, 2023

2023
[11]

Multi-level feature fusion based locality-constrained spatial transformer network for video crowd counting.Neurocomputing, 392:98– 107, 2020

Yanyan Fang, Shenghui Gao, Jie Li, Wenxuan Luo, Lifang He, and Bin Hu. Multi-level feature fusion based locality-constrained spatial transformer network for video crowd counting.Neurocomputing, 392:98– 107, 2020

2020
[12]

DocPedia: Un- leashing the power of large multimodal model in the frequency domain for versatile document understand- ing.Science China Information Sciences, 2024

Hao Feng, Qi Liu, Hao Liu, Jingqun Tang, Wengang Zhou, Houqiang Li, and Can Huang. DocPedia: Un- leashing the power of large multimodal model in the frequency domain for versatile document understand- ing.Science China Information Sciences, 2024

2024
[13]

Dolphin-v2: Universal document parsing via scalable anchor prompting.arXiv preprint arXiv:2602.05384, 2026

Hao Feng, Wei Shi, Kairen Zhang, Xiaoyu Fei, Lei Liao, Dingkang Yang, Yue Du, Xuecheng Wu, Jingqun Tang, Yuliang Liu, et al. Dolphin-v2: Universal doc- ument parsing via scalable anchor prompting.arXiv preprint arXiv:2602.05384, 2026

work page arXiv 2026
[14]

UniDoc: A universal large multimodal model for simultaneous text detection, recognition, spotting and understanding,

Hao Feng, Zijian Wang, Jingqun Tang, Jinghui Lu, Wengang Zhou, Houqiang Li, and Can Huang. UniDoc: A universal large multimodal model for simultaneous text detection, recognition, spotting and understanding.arXiv preprint arXiv:2308.11592, 2023

work page arXiv 2023
[15]

Dolphin: Docu- ment image parsing via heterogeneous anchor prompt- ing

Hao Feng, Shuangping Wei, Xiaoyu Fei, Wei Shi, Yuechen Han, Lei Liao, Jinghui Lu, Binghong Wu, Qi Liu, Chunhui Lin, Jingqun Tang, et al. Dolphin: Docu- ment image parsing via heterogeneous anchor prompt- ing. 2025

2025
[16]

OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning

Ling Fu, Zhisheng Kuang, Jingkuan Song, Mingxin Huang, Boya Yang, Yuliang Li, Liyan Zhu, Qiao Luo, Xinyu Wang, Jingqun Tang, et al. OCRBench v2: An improved benchmark for evaluating large multimodal models on visual text localization and reasoning.arXiv preprint arXiv:2501.00321, 2024

work page internal anchor Pith review arXiv 2024
[17]

Novel feature fusion module-based detector for small insulator defect detection.IEEE Sensors Journal, 21(15):16807–16814, 2021

Zhishuai Gao, Guodong Yang, En Li, and Zize Liang. Novel feature fusion module-based detector for small insulator defect detection.IEEE Sensors Journal, 21(15):16807–16814, 2021. 10

2021
[18]

Generative adversarial nets

Ian J Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. InAdvances in Neural Information Processing Systems, pages 2672–2680, 2014

2014
[19]

CFLOW-AD: Real-time unsupervised anomaly de- tection with localization via conditional normalizing flows

Denis Gudovskiy, Shun Ishizaka, and Kazuki Kozuka. CFLOW-AD: Real-time unsupervised anomaly de- tection with localization via conditional normalizing flows. InProceedings of the IEEE/CVF Winter Con- ference on Applications of Computer Vision, pages 98– 107, 2022

2022
[20]

ReContrast: Domain-specific anomaly detection via contrastive reconstruction

Jia Guo, Shuai Lu, Lize Jia, Weihang Zhang, and Huiqi Li. ReContrast: Domain-specific anomaly detection via contrastive reconstruction. InAdvances in Neural Information Processing Systems, 2023

2023
[21]

A diffusion-based framework for multi-class anomaly detection

Haoyang He, Jiangning Zhang, Hongxu Chen, Xuhai Chen, Zhensheng Li, Xiaobin Chen, Yabiao Wang, Chengjie Wang, and Lingxi Xie. A diffusion-based framework for multi-class anomaly detection. InPro- ceedings of the AAAI Conference on Artificial Intelli- gence, pages 8472–8480, 2024

2024
[22]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vi- sion and Pattern Recognition, pages 770–778, 2016

2016
[23]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. InAdvances in Neural Information Processing Systems, 2020

2020
[24]

Synthetic data augmenta- tion for surface defect detection and classification using deep learning.Journal of Intelligent Manufacturing, 33(4):1007–1020, 2022

Shubham Jain, Garima Seth, Aman Paruthi, Udit Soni, and Girish Kumar. Synthetic data augmenta- tion for surface defect detection and classification using deep learning.Journal of Intelligent Manufacturing, 33(4):1007–1020, 2022

2022
[25]

Effi- cient training for automatic defect classification by im- age augmentation

Nana Kondo, Minoru Harada, and Yoko Takagi. Effi- cient training for automatic defect classification by im- age augmentation. InProceedings of the IEEE Winter Conference on Applications of Computer Vision, pages 226–233, 2018

2018
[26]

Multi-scale feature fusion-based lightweight dual stream transformer for detection of paddy leaf disease.Environmental Moni- toring and Assessment, 195(9):1020, 2023

Ashwani Kumar, Deepak Punetha Yadav, Deepak Ku- mar, Millie Pant, and Gaurav Pant. Multi-scale feature fusion-based lightweight dual stream transformer for detection of paddy leaf disease.Environmental Moni- toring and Assessment, 195(9):1020, 2023

2023
[27]

CutPaste: Self-supervised learning for anomaly detection and localization

Chun-Liang Li, Kihyuk Sohn, Jinsung Yoon, and Tomas Pfister. CutPaste: Self-supervised learning for anomaly detection and localization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9664–9674, 2021

2021
[28]

Feature pyramid networks for object detection

Tsung-Yi Lin, Piotr Doll´ ar, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. Feature pyramid networks for object detection. InProceedings of the IEEE Conference on Computer Vision and Pat- tern Recognition, pages 2117–2125, 2017

2017
[29]

Anomaly detection via progressive reconstruction and hierarchical feature fusion.Sensors, 23(21):8750, 2023

Fang Liu, Xiaomin Zhu, Pingfa Feng, and Lan Zeng. Anomaly detection via progressive reconstruction and hierarchical feature fusion.Sensors, 23(21):8750, 2023

2023
[30]

Deep in- dustrial image anomaly detection: A survey.Machine Intelligence Research, 21:104–135, 2024

Jiaqi Liu, Guoyang Song, Yong He, et al. Deep in- dustrial image anomaly detection: A survey.Machine Intelligence Research, 21:104–135, 2024

2024
[31]

Multistage GAN for fabric defect de- tection.IEEE Transactions on Image Processing, 29:3388–3400, 2020

Junhui Liu, Changyu Wang, Hai Su, Bo Du, and Dacheng Tao. Multistage GAN for fabric defect de- tection.IEEE Transactions on Image Processing, 29:3388–3400, 2020

2020
[32]

FAIR: Frequency-aware image restoration for indus- trial visual anomaly detection.arXiv preprint arXiv:2309.07068, 2023

Tongkun Liu, Bing Li, Xiao Du, Bingke Jiang, Leqi Geng, Feiyang Wang, and Zhuo Zhao. FAIR: Frequency-aware image restoration for indus- trial visual anomaly detection.arXiv preprint arXiv:2309.07068, 2023

work page arXiv 2023
[33]

SPTS v2: Single-point scene text spotting.IEEE Transac- tions on Pattern Analysis and Machine Intelligence, 45(12):15545–15559, 2023

Yuliang Liu, Jiaxin Zhang, Dezhi Peng, Mingxin Huang, Xinyu Wang, Jingqun Tang, Can Huang, Dahua Lin, Chunhua Shen, Xiang Bai, et al. SPTS v2: Single-point scene text spotting.IEEE Transac- tions on Pattern Analysis and Machine Intelligence, 45(12):15545–15559, 2023

2023
[34]

SimpleNet: A simple network for image anomaly detection and localization

Zhikang Liu, Yiming Zhou, Yuansheng Xu, and Zilei Wang. SimpleNet: A simple network for image anomaly detection and localization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20402–20411, 2023

2023
[35]

A bounding box is worth one token—interleaving layout and text in a large language model for document understanding

Jinghui Lu, Haiyang Yu, Yanjie Wang, Yongjie Ye, Jingqun Tang, Ziwei Yang, Binghong Wu, Qi Liu, Hao Feng, Han Wang, et al. A bounding box is worth one token—interleaving layout and text in a large language model for document understanding. 2025

2025
[36]

Anomaly detection with conditioned denoising diffu- sion models

Arian Mousakhan, Thomas Brox, and Jawad Tayyub. Anomaly detection with conditioned denoising diffu- sion models. 2023

2023
[37]

Data augmentation on defect detection of sanitary ceramics

Jinsong Niu, Yifei Chen, Xiaohua Yu, Zhao Li, and Haijun Gao. Data augmentation on defect detection of sanitary ceramics. InProceedings of the IECON Annual Conference of the IEEE Industrial Electronics Society, pages 5317–5322, 2020

2020
[38]

Semi-supervised semantic segmentation with cross- consistency training

Yassine Ouali, C´ eline Hudelot, and Myriam Tami. Semi-supervised semantic segmentation with cross- consistency training. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion, pages 12671–12681, 2020

2020
[39]

Multi-resolution feature fusion for face recognition.Pattern Recogni- tion, 47(2):556–567, 2014

Kin-Hong Pong and Kin-Man Lam. Multi-resolution feature fusion for face recognition.Pattern Recogni- tion, 47(2):556–567, 2014

2014
[40]

MFGAN: Multi- modal fusion for industrial anomaly detection using attention-based autoencoder and generative adversar- ial network.Sensors, 24(2):637, 2024

Xiaojun Qu, Zhong Liu, Changqi Wu, Aiqing Hou, Xiaoyong Yin, and Zhilong Chen. MFGAN: Multi- modal fusion for industrial anomaly detection using attention-based autoencoder and generative adversar- ial network.Sensors, 24(2):637, 2024

2024
[41]

Faster R-CNN: Towards real-time object detec- tion with region proposal networks

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster R-CNN: Towards real-time object detec- tion with region proposal networks. InAdvances in Neural Information Processing Systems, pages 91–99, 2015

2015
[42]

U-Net: Convolutional networks for biomedical image segmentation

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-Net: Convolutional networks for biomedical image segmentation. InProceedings of the International Con- 11 ference on Medical Image Computing and Computer- Assisted Intervention, pages 234–241, 2015

2015
[43]

To- wards total recall in industrial anomaly detection

Karsten Roth, Latha Pemula, Joaquin Zepeda, Bern- hard Sch¨ olkopf, Thomas Brox, and Peter Gehler. To- wards total recall in industrial anomaly detection. In Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 14318– 14328, 2022

2022
[44]

Same same but DifferNet: Semi-supervised defect de- tection with normalizing flows

Marco Rudolph, Bastian Wahl, and Bernhard Sick. Same same but DifferNet: Semi-supervised defect de- tection with normalizing flows. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1907–1916, 2021

1907
[45]

Explaining deep neural networks and beyond: A review of methods and applications.Proceedings of the IEEE, 109(3):247–278, 2021

Wojciech Samek, Gr´ egoire Montavon, Sebastian La- puschkin, Christopher J Anders, and Klaus-Robert M¨ uller. Explaining deep neural networks and beyond: A review of methods and applications.Proceedings of the IEEE, 109(3):247–278, 2021

2021
[46]

MCTBench: Multimodal cognition towards text-rich visual scenes bench- mark.arXiv preprint arXiv:2410.11538, 2024

Biluo Shan, Xiaoyu Fei, Wei Shi, Anlong Wang, Guozhi Tang, Lei Liao, Jingqun Tang, Xiang Bai, and Can Huang. MCTBench: Multimodal cogni- tion towards text-rich visual scenes benchmark.arXiv preprint arXiv:2410.11538, 2024

work page arXiv 2024
[47]

A new method of feature fu- sion and its application in image recognition.Pattern Recognition, 38(12):2437–2448, 2005

Quansen Sun, Shuguang Zeng, Yan Liu, Pheng-Ann Heng, and Deshen Xia. A new method of feature fu- sion and its application in image recognition.Pattern Recognition, 38(12):2437–2448, 2005

2005
[48]

EfficientNet: Rethink- ing model scaling for convolutional neural networks

Mingxing Tan and Quoc V Le. EfficientNet: Rethink- ing model scaling for convolutional neural networks. InProceedings of the International Conference on Ma- chine Learning, pages 6105–6114, 2019

2019
[49]

Character recognition competition for street view shop signs.National Science Review, 10(6):nwad141, 2023

Jingqun Tang, Weidong Du, Bing Wang, Wen- gang Zhou, Songlin Mei, Tao Xue, Xiang Xu, and Hao Zhang. Character recognition competition for street view shop signs.National Science Review, 10(6):nwad141, 2023

2023
[50]

TextSquare: Scaling up text-centric visual instruction tuning,

Jingqun Tang, Chunhui Lin, Zhen Zhao, Shuangping Wei, Binghong Wu, Qi Liu, Yong He, Kangcheng Lu, Hao Feng, Yuliang Li, et al. TextSquare: Scaling up text-centric visual instruction tuning.arXiv preprint arXiv:2404.12803, 2024

work page arXiv 2024
[51]

MTVQA: Benchmarking multilingual text- centric visual question answering

Jingqun Tang, Qi Liu, Yongjie Ye, Jinghui Lu, Shuang- ping Wei, Anlong Wang, Chunhui Lin, Hao Feng, Zhen Zhao, et al. MTVQA: Benchmarking multilingual text- centric visual question answering. 2025

2025
[52]

Optimal boxes: Boost- ing end-to-end scene text recognition by adjusting an- notated bounding boxes via reinforcement learning

Jingqun Tang, Wenqing Qian, Luchuan Song, Xiena Dong, Lan Li, and Xiang Bai. Optimal boxes: Boost- ing end-to-end scene text recognition by adjusting an- notated bounding boxes via reinforcement learning. In Proceedings of the European Conference on Computer Vision, pages 233–248, 2022

2022
[53]

You can even annotate text with voice: Transcription-only- supervised text spotting

Jingqun Tang, Shuya Qiao, Benlei Cui, Yuhang Ma, Shuo Zhang, and Dimitrios Kanoulas. You can even annotate text with voice: Transcription-only- supervised text spotting. InProceedings of the 30th ACM International Conference on Multimedia, pages 4154–4163, 2022

2022
[54]

Few could be better than all: Feature sampling and group- ing for scene text detection

Jingqun Tang, Wenqing Zhang, Hao Liu, MingKun Yang, Bo Jiang, Guangliang Hu, and Xiang Bai. Few could be better than all: Feature sampling and group- ing for scene text detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 4563–4572, 2022

2022
[55]

Support vec- tor data description.Machine Learning, 54(1):45–66, 2004

David MJ Tax and Robert PW Duin. Support vec- tor data description.Machine Learning, 54(1):45–66, 2004

2004
[56]

Revisiting reverse distillation for anomaly detection

Tran Dinh Tien, Anh Tuan Nguyen, Nguyen Hoang Tran, Ta Duc Huy, Soan TM Duong, Chanh DT Nguyen, and Steven QH Truong. Revisiting reverse distillation for anomaly detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 24511–24520, 2023

2023
[57]

WildDoc: How far are we from achiev- ing comprehensive and robust document understand- ing in the wild? 2025

Anlong Wang, Jingqun Tang, Lei Liao, Hao Feng, Qi Liu, Xiaoyu Fei, Jinghui Lu, Han Wang, Hao Liu, Yu- liang Liu, et al. WildDoc: How far are we from achiev- ing comprehensive and robust document understand- ing in the wild? 2025

2025
[58]

Hao Wang, Ruifang Zhang, Mingyang Feng, Yukun Liu, and Guoping Yang. Global context-based self- similarity feature augmentation and bidirectional fea- ture fusion for surface defect detection.IEEE Transactions on Instrumentation and Measurement, 72:5024712, 2023

2023
[59]

AEKD: Unsupervised auto-encoder knowledge dis- tillation for industrial anomaly detection.Journal of Manufacturing Systems, 73:159–169, 2024

Qiwei Wu, Hui Li, Chenyu Tian, Long Wen, and Xinyu Li. AEKD: Unsupervised auto-encoder knowledge dis- tillation for industrial anomaly detection.Journal of Manufacturing Systems, 73:159–169, 2024

2024
[60]

Scarcity-GAN: Scarce data augmentation for defect detection via generative adversarial nets.Neurocom- puting, 566:127061, 2024

Chuangbiao Xu, Wei Li, Xiaohui Cui, Zhaoyu Wang, Fengling Zheng, Xiaowei Zhang, and Bo Chen. Scarcity-GAN: Scarce data augmentation for defect detection via generative adversarial nets.Neurocom- puting, 566:127061, 2024

2024
[61]

MemSeg: A semi-supervised method for image surface defect de- tection using differences and commonalities.Engineer- ing Applications of Artificial Intelligence, 119:105835, 2023

Minghui Yang, Peng Wu, and Hui Feng. MemSeg: A semi-supervised method for image surface defect de- tection using differences and commonalities.Engineer- ing Applications of Artificial Intelligence, 119:105835, 2023

2023
[62]

Focus the discrepancy: Intra- and inter-correlation learning for image anomaly detection

Xincheng Yao, Ruoqi Li, Zhenfang Qian, Ye Luo, and Chengyu Zhang. Focus the discrepancy: Intra- and inter-correlation learning for image anomaly detection. InProceedings of the IEEE/CVF International Con- ference on Computer Vision, pages 6780–6790, 2023

2023
[63]

Explicit boundary guided semi-push- pull contrastive learning for supervised anomaly de- tection

Xincheng Yao, Ruoqi Li, Jing Zhang, Jun Sun, and Chengyu Zhang. Explicit boundary guided semi-push- pull contrastive learning for supervised anomaly de- tection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 24490–24499, 2023

2023
[64]

Efficient fused-attention model for steel surface defect detec- tion.IEEE Transactions on Instrumentation and Mea- surement, 71:2510011, 2022

Chun Ching Yeung and Kin-Man Lam. Efficient fused-attention model for steel surface defect detec- tion.IEEE Transactions on Instrumentation and Mea- surement, 71:2510011, 2022

2022
[65]

A unified model for multi- 12 class anomaly detection

Zhiyuan You, Lei Cui, Yujun Shen, Kai Yang, Xin Lu, Yu Zheng, and Xinyi Le. A unified model for multi- 12 class anomaly detection. InAdvances in Neural Infor- mation Processing Systems, pages 4571–4584, 2022

2022
[66]

DRAEM—a discriminatively trained reconstruction embedding for surface anomaly detection

Vitjan Zavrtanik, Matej Kristan, and Danijel Skoˇ caj. DRAEM—a discriminatively trained reconstruction embedding for surface anomaly detection. InProceed- ings of the IEEE/CVF International Conference on Computer Vision, pages 8310–8319, 2021

2021
[67]

DSR—a dual subspace re-projection network for sur- face anomaly detection

Vitjan Zavrtanik, Matej Kristan, and Danijel Skoˇ caj. DSR—a dual subspace re-projection network for sur- face anomaly detection. InProceedings of the European Conference on Computer Vision, pages 539–554, 2022

2022
[68]

NLFFTNet: A non-local feature fusion transformer network for multi-scale object de- tection.Neurocomputing, 493:15–27, 2022

Ke Zeng, Qing Ma, Jiangwu Wu, Shijie Xiang, Tong Shen, and Li Zhang. NLFFTNet: A non-local feature fusion transformer network for multi-scale object de- tection.Neurocomputing, 493:15–27, 2022

2022
[69]

Prototypical residual networks for anomaly detection and localization

Hui Zhang, Zhixiang Wu, Zheng Wang, Zhineng Chen, and Yu-Gang Jiang. Prototypical residual networks for anomaly detection and localization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16281–16291, 2023

2023
[70]

Deep learn- ing for remote sensing data: A technical tutorial on the state of the art.IEEE Geoscience and Remote Sensing Magazine, 4(2):22–40, 2016

Liangpei Zhang, Lefei Zhang, and Bo Du. Deep learn- ing for remote sensing data: A technical tutorial on the state of the art.IEEE Geoscience and Remote Sensing Magazine, 4(2):22–40, 2016

2016
[71]

DeSTSeg: Segmentation guided denoising student-teacher for anomaly detection

Xuan Zhang, Shiyu Li, Xi Li, Ping Huang, Jiulong Shan, and Ting Chen. DeSTSeg: Segmentation guided denoising student-teacher for anomaly detection. In Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 3914– 3923, 2023

2023
[72]

RealNet: A feature selection network with realistic synthetic anomaly for anomaly detection

Ximei Zhang, Min Xu, and Xiuzhuang Zhou. RealNet: A feature selection network with realistic synthetic anomaly for anomaly detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 16699–16708, 2024

2024
[73]

TabPedia: Towards comprehensive visual table understanding with con- cept synergy

Weichao Zhao, Hao Feng, Qi Liu, Jingqun Tang, Shuangping Wei, Binghong Wu, Lei Liao, Yongjie Ye, Hao Liu, Wengang Zhou, et al. TabPedia: Towards comprehensive visual table understanding with con- cept synergy. InAdvances in Neural Information Pro- cessing Systems, 2024

2024
[74]

Multi-modal in-context learning makes an ego-evolving scene text recognizer

Zhen Zhao, Jingqun Tang, Binghong Wu, Chunhui Lin, Hao Liu, Zeming Zhang, Xin Tan, Can Huang, and Yuan Xie. Multi-modal in-context learning makes an ego-evolving scene text recognizer. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2023
[75]

Harmonizing visual text comprehension and generation

Zhen Zhao, Jingqun Tang, Binghong Wu, Chunhui Lin, Shuangping Wei, Hao Liu, Xin Tan, Zhizhong Zhang, Can Huang, et al. Harmonizing visual text comprehension and generation. InAdvances in Neural Information Processing Systems, 2024

2024
[76]

Benchmark- ing and analyzing multi-class anomaly detection

Hao Zheng, Hanqiu Deng, and Xingyu Li. Benchmark- ing and analyzing multi-class anomaly detection. In Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, 2024

2024
[77]

Spot-the-difference self- supervised pre-training for anomaly detection and seg- mentation

Yang Zou, Jongheon Jeong, Latha Pemula, Dongqing Zhang, and Onkar Dabeer. Spot-the-difference self- supervised pre-training for anomaly detection and seg- mentation. InProceedings of the European Conference on Computer Vision, pages 392–408, 2022. 13

2022