Structure-Guided Mixed Masked Pretraining and Spatial Continuity Regularization for Printed Circuit Board Defect Detection

Chengjin Yu; Enxin Qin; Hanyu Xuan; Nuo Wang; Peitong Wang; Yuanting Yan

arxiv: 2606.03508 · v1 · pith:7GQTJ6FXnew · submitted 2026-06-02 · 💻 cs.CV

Structure-Guided Mixed Masked Pretraining and Spatial Continuity Regularization for Printed Circuit Board Defect Detection

Peitong Wang , Nuo Wang , Enxin Qin , Chengjin Yu , Hanyu Xuan , Yuanting Yan This is my paper

Pith reviewed 2026-06-28 10:17 UTC · model grok-4.3

classification 💻 cs.CV

keywords PCB defect detectionmasked pretrainingspatial continuity regularizationautomated optical inspectionsparse convolutionstructural priorsobject detectionDsPCBSD dataset

0 comments

The pith

A two-phase framework of structure-guided masked pretraining on unlabeled PCB images and spatial continuity regularization during fine-tuning improves detection of tiny defects.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a method for PCB defect detection that first uses unlabeled images in a pretraining phase where structure-guided mixed masking and sparse convolutions help the model learn structural priors of circuit boards. Then in fine-tuning, a regularization term is added to encourage compact predictions for defect instances. This combination aims to handle challenges like low-contrast and dense backgrounds better than standard detectors. A sympathetic reader would care because it potentially allows leveraging abundant unlabeled data to enhance performance in industrial inspection settings.

Core claim

The proposed framework combines structure-guided mixed masked pretraining with spatial continuity regularization. In pretraining, structure-guided mixed masking constructs informative masked inputs for sparse convolutional reconstruction that suppresses invalid responses from masked regions and enables inference of missing PCB structures from visible conductive patterns to learn structural priors. In the fine-tuning stage, the spatial continuity regularization term constrains dispersed positive predictions assigned to the same defect instance to promote compact localization on elongated defect regions. On the DsPCBSD+ dataset, it achieves 85.5% mAP0.5 and 52.3% mAP0.5:0.95, outperforming bas

What carries the argument

Structure-guided mixed masked pretraining scheme using sparse convolutional reconstruction to learn PCB structural priors from visible patterns.

Load-bearing premise

The structure-guided mixed masking constructs informative masked inputs such that the sparse convolutional reconstruction pipeline suppresses invalid responses from masked regions and enables the detector backbone to infer missing PCB structures from visible conductive patterns, thereby learning PCB structural priors.

What would settle it

An experiment showing that removing the structure-guided masking or the spatial continuity regularization does not decrease the mAP scores on DsPCBSD+ would falsify the central claim.

Figures

Figures reproduced from arXiv: 2606.03508 by Chengjin Yu, Enxin Qin, Hanyu Xuan, Nuo Wang, Peitong Wang, Yuanting Yan.

**Figure 2.** Figure 2: Overall framework and core designs of the proposed method. (a) Structure [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Overview of the pretraining stage with sparse-convolutional structure-guided [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Illustration of the sparse convolutional masked reconstruction process in the [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: Qualitative comparison between random masking and structure-guided masking [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗

**Figure 6.** Figure 6: Overview of the fine-tuning stage with spatial continuity regularization. (a) The [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗

**Figure 7.** Figure 7: Statistical analysis of the DsPCBSD+ dataset, indicating imbalanced defect [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗

**Figure 8.** Figure 8: Validation curves of the proposed method on the DsPCBSD+ dataset. [PITH_FULL_IMAGE:figures/full_fig_p020_8.png] view at source ↗

**Figure 9.** Figure 9: Qualitative comparison of different detectors on representative samples from the [PITH_FULL_IMAGE:figures/full_fig_p023_9.png] view at source ↗

**Figure 10.** Figure 10: Qualitative comparison on typical challenging samples before and after im [PITH_FULL_IMAGE:figures/full_fig_p024_10.png] view at source ↗

**Figure 11.** Figure 11: Qualitative visualization of output heat maps for different component settings [PITH_FULL_IMAGE:figures/full_fig_p026_11.png] view at source ↗

**Figure 12.** Figure 12: Visualization comparison of feature-response heatmaps produced by different [PITH_FULL_IMAGE:figures/full_fig_p028_12.png] view at source ↗

read the original abstract

Printed circuit board (PCB) defect detection is an essential part of automated optical inspection (AOI); yet it remains challenging in practice because many defects are tiny, low-contrast, and embedded in dense circuit backgrounds. To address these issues, this paper presents a two-phase PCB defect detection framework that combines structure-guided mixed masked pretraining with spatial continuity regularization. In the pretraining stage, we design a sparse convolutional masked pretraining scheme to exploit unlabeled PCB images, where structure-guided mixed masking is used to construct informative masked inputs. The sparse convolutional reconstruction pipeline suppresses invalid responses from masked regions and enables the detector backbone to infer missing PCB structures from visible conductive patterns, thereby learning PCB structural priors. In the fine-tuning stage, the pretrained backbone is transferred to the downstream defect detection task. For the task, a spatial continuity regularization term is introduced during fine-tuning. This term constrains dispersed positive predictions assigned to the same defect instance and promotes more compact localization on elongated defect regions. Experiments on the DsPCBSD+ dataset show that the proposed method achieves 85.5% mAP0.5 and 52.3% mAP0.5:0.95, outperforming several strong baseline detectors. Ablation studies and qualitative results further confirm the effectiveness of the proposed framework for robust PCB defect detection in industrial AOI scenarios.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a narrow but coherent engineering paper that adapts masked pretraining and adds a simple regularization term for PCB defect detection, with reported gains on one dataset that still need full experimental backing to evaluate.

read the letter

The paper's core move is to pretrain a detector backbone on unlabeled PCB images using structure-guided mixed masking inside a sparse convolutional reconstruction setup, then fine-tune with an added spatial continuity term that penalizes dispersed predictions on the same defect. That combination is the actual new piece; it is not a general method but a domain-specific tweak aimed at tiny, low-contrast defects in dense circuit backgrounds.

What works is the problem framing and the two-stage structure. Using unlabeled data for pretraining makes sense in manufacturing where labels are expensive, and the regularization idea directly targets the elongated shape of many PCB defects. The reported numbers (85.5% mAP@0.5, 52.3% mAP@0.5:0.95 on DsPCBSD+) are concrete, and the abstract mentions ablations and qualitative results, which is better than many applied papers.

The soft spots are in the evidence. The abstract gives performance claims but no baseline definitions, no table of ablations that isolate the masking versus the regularization, and no error analysis. Without those, it is hard to tell whether the lift comes from the proposed components or from tuning or dataset specifics. The description of how the structure-guided masking actually constructs inputs and how the sparse conv suppresses masked regions stays high-level, so the mechanistic claim is plausible but not yet verifiable from the text.

This is for readers who build AOI systems for electronics, not for people looking for new self-supervised principles. It is worth a serious referee because the approach is internally consistent, the application is real, and the empirical claim is falsifiable even if the gains turn out modest. I would send it to review rather than desk-reject, with the expectation that the authors supply the missing experimental details.

Referee Report

1 major / 0 minor

Summary. The manuscript proposes a two-phase framework for printed circuit board (PCB) defect detection. It combines structure-guided mixed masked pretraining using sparse convolutional reconstruction on unlabeled PCB images to learn structural priors, with a spatial continuity regularization term during fine-tuning to constrain dispersed predictions and improve localization of elongated defects. On the DsPCBSD+ dataset, it reports achieving 85.5% mAP0.5 and 52.3% mAP0.5:0.95, outperforming several strong baseline detectors, with ablation studies and qualitative results supporting the framework's effectiveness.

Significance. If the empirical results hold, this work could contribute to improving automated optical inspection (AOI) systems by better handling tiny, low-contrast defects in dense circuit backgrounds through self-supervised pretraining and task-specific regularization. The use of unlabeled data and the regularization for compact localization are potentially valuable for industrial applications.

major comments (1)

[Abstract] Abstract: The abstract states performance numbers and claims outperformance but supplies no experimental details, baseline definitions, ablation tables, or error analysis; without these it is impossible to verify whether the reported gains are attributable to the proposed components (structure-guided mixed masking and spatial continuity regularization).

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the single major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: The abstract states performance numbers and claims outperformance but supplies no experimental details, baseline definitions, ablation tables, or error analysis; without these it is impossible to verify whether the reported gains are attributable to the proposed components (structure-guided mixed masking and spatial continuity regularization).

Authors: We agree that the abstract would benefit from additional context to substantiate the reported gains. In the revised manuscript we will expand the abstract to briefly reference the DsPCBSD+ dataset, note that comparisons are performed against multiple strong detectors (including YOLO variants and two-stage detectors), and state that ablation studies isolate the contributions of structure-guided mixed masked pretraining and spatial continuity regularization. Full tables, baseline definitions, and error analysis will continue to appear in the main body due to length constraints, but the added abstract phrasing will allow readers to better attribute the improvements to the proposed components. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper describes an empirical two-stage method: structure-guided mixed masked pretraining on unlabeled PCB images to learn structural priors via sparse convolutional reconstruction, followed by fine-tuning on the defect detection task with an added spatial continuity regularization term. The reported results (85.5% mAP0.5 and 52.3% mAP0.5:0.95 on DsPCBSD+) are measured empirical outcomes on a held-out dataset, not quantities obtained by fitting parameters to a subset and then renaming the fit as a prediction, nor by self-definitional equations, nor by load-bearing self-citations that reduce the central claim to prior author work. No derivation chain, uniqueness theorem, or ansatz is presented that collapses to the inputs by construction. The framework is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review prevents extraction of concrete free parameters, axioms, or invented entities; the approach implicitly relies on standard transfer-learning assumptions common to masked pretraining methods.

axioms (1)

domain assumption Pretraining on unlabeled PCB images via structure-guided masking transfers useful structural priors to the downstream supervised defect detection task
The entire two-phase pipeline depends on successful transfer from the pretraining stage to fine-tuning.

pith-pipeline@v0.9.1-grok · 5788 in / 1214 out tokens · 36223 ms · 2026-06-28T10:17:32.672536+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

66 extracted references · 11 canonical work pages · 4 internal anchors

[1]

J. Tang, Z. Wang, H. Zhang, H. Li, P. Wu, N. Zeng, A lightweight sur- face defect detection framework combined with dual-domain attention mechanism, Expert Systems with Applications 238 (2024) 121726

2024
[2]

Angelopoulos, E

A. Angelopoulos, E. T. Michailidis, N. Nomikos, P. Trakadas, A. Hatziefremidis, S. Voliotis, T. Zahariadis, Tackling faults in the in- dustry 4.0 era—a survey of machine-learning solutions and key aspects, Sensors 20 (1) (2019) 109. 31

2019
[3]

Q. Tan, L. Liu, M. Yu, J. Li, An innovative method of recycling metals in printed circuit board (pcb) using solutions from pcb production, Journal of Hazardous Materials 390 (2020) 121892

2020
[4]

Moganti, F

M. Moganti, F. Ercal, C. H. Dagli, S. Tsunekawa, Automatic pcb inspec- tion algorithms: A survey, Computer Vision and Image Understanding 63 (2) (1996) 287–313

1996
[5]

Y. Zhou, M. Yuan, J. Zhang, G. Ding, S. Qin, Review of vision-based defect detection research and its perspectives for printed circuit board, Journal of Manufacturing Systems 70 (2023) 557–578

2023
[6]

Q. Ling, N. A. M. Isa, Printed circuit board defect detection methods based on image processing, machine learning and deep learning: A sur- vey, IEEE Access 11 (2023) 15921–15944

2023
[7]

D. Kang, J. Lai, Y. Han, Improving surface defect detection with context-guided asymmetric modulation networks and confidence- boosting loss, Expert Systems with Applications 225 (2023) 120121

2023
[8]

P. Sun, C. Hua, W. Ding, C. Hua, A real–time detection framework for surface defects in ceramic tableware based on deep learning, Expert Systems with Applications 286 (2025) 128101

2025
[9]

S. Meng, S. Zhang, X. Liang, J. Hu, Automatic extraction of scale infor- mation for interactive measurement of anything in microscopy images, Knowledge-Based Systems 324 (2025) 113578

2025
[10]

S. Ren, K. He, R. Girshick, J. Sun, Faster r-cnn: Towards real-time object detection with region proposal networks, IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (6) (2017) 1137–1149

2017
[11]

C. Song, J. Chen, Z. Lu, F. Li, Y. Liu, Steel surface defect detection via deformableconvolutionandbackgroundsuppression, IEEETransactions on Instrumentation and Measurement 72 (2023) 1–9

2023
[12]

W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, A. C. Berg, Ssd: Single shot multibox detector, in: European conference on computer vision, Springer, 2016, pp. 21–37. 32

2016
[13]

K. Duan, S. Bai, L. Xie, H. Qi, Q. Huang, Q. Tian, Centernet: Key- point triplets for object detection, in: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019

2019
[14]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., An image is worth 16x16 words: Transformers for image recognition at scale, arXiv preprint arXiv:2010.11929 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2010
[15]

Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo, Swin transformer: Hierarchical vision transformer using shifted windows, in: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 10012–10022

2021
[16]

N.Carion, F.Massa, G.Synnaeve, N.Usunier, A.Kirillov, S.Zagoruyko, End-to-end object detection with transformers, in: European conference on computer vision, Springer, 2020, pp. 213–229

2020
[17]

Z. Zong, G. Song, Y. Liu, Detrs with collaborative hybrid assignments training, in: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 19609–19619

2023
[18]

Y. Ma, J. Yin, F. Huang, Q. Li, Surface defect inspection of industrial products with object detection deep networks: A systematic review, Artificial Intelligence Review 57 (12) (2024) 333

2024
[19]

L. Zhu, R. Zhao, A novel pcb surface defect detection method based on separated global context attention to guide residual context aggregation, Scientific Reports 15 (1) (2025) 9620

2025
[20]

A. Khan, Z. Rauf, A. Sohail, A. R. Khan, H. Asif, A. Asif, U. Farooq, A survey of the vision transformers and their cnn-transformer based variants, Artificial Intelligence Review 56 (Suppl 3) (2023) 2917–2970

2023
[21]

Q. Yuan, Y. Shi, M. Li, A review of computer vision-based crack detec- tion methods in civil infrastructure: Progress and challenges, Remote Sensing 16 (16) (2024)

2024
[22]

Y. He, S. Li, X. Wen, J. Xu, A survey on surface defect inspection based on generative models in manufacturing, Applied Sciences 14 (15) (2024). 33

2024
[23]

Sohan, T

M. Sohan, T. Sai Ram, C. V. Rami Reddy, A review on yolov8 and its advancements, in: International conference on data intelligence and cognitive informatics, Springer, 2024, pp. 529–545

2024
[24]

Yaseen, What is yolov8: An in-depth exploration of the inter- nal features of the next-generation object detector, arXiv preprint arXiv:2408.15857 (2024)

M. Yaseen, What is yolov8: An in-depth exploration of the inter- nal features of the next-generation object detector, arXiv preprint arXiv:2408.15857 (2024)

work page arXiv 2024
[25]

Q.Zhao, T.Ji, S.Liang, W.Yu, Pcbsurfacedefectfastdetectionmethod based on attention and multi-source fusion, Multimedia Tools and Ap- plications 83 (2) (2024) 5451–5472

2024
[26]

G. Liu, H. Wen, Printed circuit board defect detection based on MobileNet-Yolo-Fast, Journal of Electronic Imaging 30 (4) (2021) 043004

2021
[27]

J. Tang, S. Liu, D. Zhao, L. Tang, W. Zou, B. Zheng, Pcb-yolo: An improved detection algorithm of pcb surface defects based on yolov5, Sustainability 15 (7) (2023) 5963

2023
[28]

W. Xuan, G. Jian-She, H. Bo-Jie, W. Zong-Shan, D. Hong-Wei, W. Jie, A lightweight modified yolox network using coordinate attention mech- anism for pcb surface defect detection, IEEE Sensors Journal 22 (21) (2022) 20910–20920

2022
[29]

X. Liu, J. Hu, H. Wang, Z. Zhang, X. Lu, C. Sheng, S. Song, J. Nie, Gaussian-iou loss: Better learning for bounding box regression on pcb component detection, Expert Systems with Applications 190 (2022) 116178

2022
[30]

M. Yuan, Y. Zhou, X. Ren, H. Zhi, J. Zhang, H. Chen, Yolo-hmc: An improved method for pcb surface defect detection, IEEE Transactions on Instrumentation and Measurement 73 (2024) 1–11

2024
[31]

Q. Ling, N. A. M. Isa, Printed circuit board defect detection methods based on image processing, machine learning and deep learning: A sur- vey, IEEE access 11 (2023) 15921–15944

2023
[32]

Y. Zhou, M. Yuan, J. Zhang, G. Ding, S. Qin, Review of vision-based defect detection research and its perspectives for printed circuit board, Journal of Manufacturing Systems 70 (2023) 557–578. 34

2023
[33]

X. Tao, X. Gong, X. Zhang, S. Yan, C. Adak, Deep learning for un- supervised anomaly localization in industrial images: A survey, IEEE Transactions on Instrumentation and Measurement 71 (2022) 1–21

2022
[34]

L. Jing, Y. Tian, Self-supervised visual feature learning with deep neural networks: A survey, IEEE transactions on pattern analysis and machine intelligence 43 (11) (2020) 4037–4058

2020
[35]

A. v. d. Oord, Y. Li, O. Vinyals, Representation learning with con- trastive predictive coding, arXiv preprint arXiv:1807.03748 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[36]

K. He, H. Fan, Y. Wu, S. Xie, R. Girshick, Momentum contrast for unsupervised visual representation learning, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 9729–9738

2020
[37]

T. Chen, S. Kornblith, M. Norouzi, G. Hinton, A simple framework for contrastive learning of visual representations, in: International confer- ence on machine learning, PmLR, 2020, pp. 1597–1607

2020
[38]

Grill, F

J.-B. Grill, F. Strub, F. Altché, C. Tallec, P. Richemond, E.Buchatskaya, C.Doersch, B.AvilaPires, Z.Guo, M.GheshlaghiAzar, et al., Bootstrap your own latent-a new approach to self-supervised learning, Advances in neural information processing systems 33 (2020) 21271–21284

2020
[39]

Caron, I

M. Caron, I. Misra, J. Mairal, P. Goyal, P. Bojanowski, A. Joulin, Unsu- pervised learning of visual features by contrasting cluster assignments, Advances in neural information processing systems 33 (2020) 9912–9924

2020
[40]

X. Chen, K. He, Exploring simple siamese representation learning, in: Proceedings of the IEEE/CVF conference on computer vision and pat- tern recognition, 2021, pp. 15750–15758

2021
[41]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin, Attention is all you need, Advances in neural information processing systems 30 (2017)

2017
[42]

Devlin, M.-W

J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, in: Pro- ceedings of the 2019 conference of the North American chapter of the 35 association for computational linguistics: human language technologies, volume 1 (long and short papers), 2019, pp. 4171–4186

2019
[43]

H. Bao, L. Dong, S. Piao, F. Wei, Beit: Bert pre-training of image transformers, arXiv preprint arXiv:2106.08254 (2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021
[44]

K. He, X. Chen, S. Xie, Y. Li, P. Dollár, R. Girshick, Masked autoen- coders are scalable vision learners, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 16000– 16009

2022
[45]

9653–9663

Z.Xie, Z.Zhang, Y.Cao, Y.Lin, J.Bao, Z.Yao, Q.Dai, H.Hu, Simmim: A simple framework for masked image modeling, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 9653–9663

2022
[46]

L. Zhou, H. Liu, J. Bae, J. He, D. Samaras, P. Prasanna, Self pre- training with masked autoencoders for medical image classification and segmentation, in: 2023 IEEE 20th international symposium on biomed- ical imaging (ISBI), IEEE, 2023, pp. 1–6

2023
[47]

Hondru, F

V. Hondru, F. A. Croitoru, S. Minaee, R. T. Ionescu, N. Sebe, Masked image modeling: A survey, International Journal of Computer Vision 133 (10) (2025) 7154–7200

2025
[48]

K. Tian, Y. Jiang, Q. Diao, C. Lin, L. Wang, Z. Yuan, Designing bert for convolutional networks: Sparse and hierarchical masked modeling, arXiv preprint arXiv:2301.03580 (2023)

work page arXiv 2023
[49]

Canny, A computational approach to edge detection, IEEE Transac- tions on Pattern Analysis and Machine Intelligence PAMI-8 (6) (1986) 679–698

J. Canny, A computational approach to edge detection, IEEE Transac- tions on Pattern Analysis and Machine Intelligence PAMI-8 (6) (1986) 679–698

1986
[50]

D. Marr, E. Hildreth, Theory of edge detection, Proceedings of the Royal Society of London. Series B. Biological Sciences 207 (1167) (1980) 187– 217

1980
[51]

R. M. Haralick, K. Shanmugam, I. Dinstein, Textural features for image classification, IEEE Transactions on Systems, Man, and Cybernetics SMC-3 (6) (1973) 610–621. 36

1973
[52]

W. T. Freeman, E. H. Adelson, et al., The design and use of steerable filters, IEEE Transactions on Pattern analysis and machine intelligence 13 (9) (1991) 891–906

1991
[53]

Bigun, G

J. Bigun, G. H. Granlund, J. Wiklund, Multidimensional orientation estimation with applications to texture analysis and optical flow, IEEE Transactions on pattern analysis and machine intelligence 13 (8) (2002) 775–790

2002
[54]

S. Lv, B. Ouyang, Z. Deng, T. Liang, S. Jiang, K. Zhang, J. Chen, Z. Li, A dataset for deep learning based detection of printed circuit board surface defect, Scientific Data 11 (1) (2024) 811

2024
[55]

Chen, M.-C

P.-Y. Chen, M.-C. Chang, J.-W. Hsieh, Y.-S. Chen, Parallel residual bi- fusion feature pyramid network for accurate single-shot object detection, IEEE transactions on Image Processing 30 (2021) 9099–9111

2021
[56]

S. Xu, X. Wang, W. Lv, Q. Chang, C. Cui, K. Deng, G. Wang, Q. Dang, S. Wei, Y. Du, et al., Pp-yoloe: An evolved version of yolo, arXiv preprint arXiv:2203.16250 (2022)

work page arXiv 2022
[57]

Jocher, Ultralytics yolov5, gitHub repository (2020)

G. Jocher, Ultralytics yolov5, gitHub repository (2020)

2020
[58]

X. Xu, Y. Jiang, W. Chen, Y. Huang, Y. Zhang, X. Sun, Damo- yolo: A report on real-time object detection design, arXiv preprint arXiv:2211.15444 (2022)

work page arXiv 2022
[59]

C. Lyu, W. Zhang, H. Huang, Y. Zhou, Y. Wang, Y. Liu, S. Zhang, K. Chen, Rtmdet: An empirical study of designing real-time object detectors, arXiv preprint arXiv:2212.07784 (2022)

work page arXiv 2022
[60]

Y. Zhao, W. Lv, S. Xu, J. Wei, G. Wang, Q. Dang, Y. Liu, J. Chen, Detrs beat yolos on real-time object detection, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 16965–16974

2024
[61]

S. Li, F. Kong, R. Wang, T. Luo, Z. Shi, Efd-yolov4: A steel surface de- fect detection network with encoder-decoder residual block and feature alignment module, Measurement 220 (2023) 113359. 37

2023
[62]

Ultralytics, Yolov8 documentation, Ultralytics official documentation (2023)

2023
[63]

Ultralytics, Ultralytics yolo, GitHub repository (2023)

2023
[64]

A. Wang, H. Chen, L. Liu, K. Chen, Z. Lin, J. Han, G. Ding, Yolov10: Real-time end-to-end object detection, arXiv preprint arXiv:2405.14458 (2024)

work page arXiv 2024
[65]

C. Li, L. Li, H. Jiang, K. Weng, Y. Geng, L. Li, Z. Ke, Q. Li, M. Cheng, W. Nie, Y. Li, B. Zhang, Y. Liang, L. Zhou, X. Xu, X. Chu, X. Wei, X. Wei, Yolov6: A single-stage object detection framework for industrial applications, arXiv preprint arXiv:2209.02976 (2022)

work page arXiv 2022
[66]

J. Zhou, C. Wei, H. Wang, W. Shen, C. Xie, A. Yuille, T. Kong, ibot: Image bert pre-training with online tokenizer, arXiv preprint arXiv:2111.07832 (2021). 38

work page internal anchor Pith review Pith/arXiv arXiv 2021

[1] [1]

J. Tang, Z. Wang, H. Zhang, H. Li, P. Wu, N. Zeng, A lightweight sur- face defect detection framework combined with dual-domain attention mechanism, Expert Systems with Applications 238 (2024) 121726

2024

[2] [2]

Angelopoulos, E

A. Angelopoulos, E. T. Michailidis, N. Nomikos, P. Trakadas, A. Hatziefremidis, S. Voliotis, T. Zahariadis, Tackling faults in the in- dustry 4.0 era—a survey of machine-learning solutions and key aspects, Sensors 20 (1) (2019) 109. 31

2019

[3] [3]

Q. Tan, L. Liu, M. Yu, J. Li, An innovative method of recycling metals in printed circuit board (pcb) using solutions from pcb production, Journal of Hazardous Materials 390 (2020) 121892

2020

[4] [4]

Moganti, F

M. Moganti, F. Ercal, C. H. Dagli, S. Tsunekawa, Automatic pcb inspec- tion algorithms: A survey, Computer Vision and Image Understanding 63 (2) (1996) 287–313

1996

[5] [5]

Y. Zhou, M. Yuan, J. Zhang, G. Ding, S. Qin, Review of vision-based defect detection research and its perspectives for printed circuit board, Journal of Manufacturing Systems 70 (2023) 557–578

2023

[6] [6]

Q. Ling, N. A. M. Isa, Printed circuit board defect detection methods based on image processing, machine learning and deep learning: A sur- vey, IEEE Access 11 (2023) 15921–15944

2023

[7] [7]

D. Kang, J. Lai, Y. Han, Improving surface defect detection with context-guided asymmetric modulation networks and confidence- boosting loss, Expert Systems with Applications 225 (2023) 120121

2023

[8] [8]

P. Sun, C. Hua, W. Ding, C. Hua, A real–time detection framework for surface defects in ceramic tableware based on deep learning, Expert Systems with Applications 286 (2025) 128101

2025

[9] [9]

S. Meng, S. Zhang, X. Liang, J. Hu, Automatic extraction of scale infor- mation for interactive measurement of anything in microscopy images, Knowledge-Based Systems 324 (2025) 113578

2025

[10] [10]

S. Ren, K. He, R. Girshick, J. Sun, Faster r-cnn: Towards real-time object detection with region proposal networks, IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (6) (2017) 1137–1149

2017

[11] [11]

C. Song, J. Chen, Z. Lu, F. Li, Y. Liu, Steel surface defect detection via deformableconvolutionandbackgroundsuppression, IEEETransactions on Instrumentation and Measurement 72 (2023) 1–9

2023

[12] [12]

W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, A. C. Berg, Ssd: Single shot multibox detector, in: European conference on computer vision, Springer, 2016, pp. 21–37. 32

2016

[13] [13]

K. Duan, S. Bai, L. Xie, H. Qi, Q. Huang, Q. Tian, Centernet: Key- point triplets for object detection, in: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019

2019

[14] [14]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., An image is worth 16x16 words: Transformers for image recognition at scale, arXiv preprint arXiv:2010.11929 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2010

[15] [15]

Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo, Swin transformer: Hierarchical vision transformer using shifted windows, in: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 10012–10022

2021

[16] [16]

N.Carion, F.Massa, G.Synnaeve, N.Usunier, A.Kirillov, S.Zagoruyko, End-to-end object detection with transformers, in: European conference on computer vision, Springer, 2020, pp. 213–229

2020

[17] [17]

Z. Zong, G. Song, Y. Liu, Detrs with collaborative hybrid assignments training, in: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 19609–19619

2023

[18] [18]

Y. Ma, J. Yin, F. Huang, Q. Li, Surface defect inspection of industrial products with object detection deep networks: A systematic review, Artificial Intelligence Review 57 (12) (2024) 333

2024

[19] [19]

L. Zhu, R. Zhao, A novel pcb surface defect detection method based on separated global context attention to guide residual context aggregation, Scientific Reports 15 (1) (2025) 9620

2025

[20] [20]

A. Khan, Z. Rauf, A. Sohail, A. R. Khan, H. Asif, A. Asif, U. Farooq, A survey of the vision transformers and their cnn-transformer based variants, Artificial Intelligence Review 56 (Suppl 3) (2023) 2917–2970

2023

[21] [21]

Q. Yuan, Y. Shi, M. Li, A review of computer vision-based crack detec- tion methods in civil infrastructure: Progress and challenges, Remote Sensing 16 (16) (2024)

2024

[22] [22]

Y. He, S. Li, X. Wen, J. Xu, A survey on surface defect inspection based on generative models in manufacturing, Applied Sciences 14 (15) (2024). 33

2024

[23] [23]

Sohan, T

M. Sohan, T. Sai Ram, C. V. Rami Reddy, A review on yolov8 and its advancements, in: International conference on data intelligence and cognitive informatics, Springer, 2024, pp. 529–545

2024

[24] [24]

Yaseen, What is yolov8: An in-depth exploration of the inter- nal features of the next-generation object detector, arXiv preprint arXiv:2408.15857 (2024)

M. Yaseen, What is yolov8: An in-depth exploration of the inter- nal features of the next-generation object detector, arXiv preprint arXiv:2408.15857 (2024)

work page arXiv 2024

[25] [25]

Q.Zhao, T.Ji, S.Liang, W.Yu, Pcbsurfacedefectfastdetectionmethod based on attention and multi-source fusion, Multimedia Tools and Ap- plications 83 (2) (2024) 5451–5472

2024

[26] [26]

G. Liu, H. Wen, Printed circuit board defect detection based on MobileNet-Yolo-Fast, Journal of Electronic Imaging 30 (4) (2021) 043004

2021

[27] [27]

J. Tang, S. Liu, D. Zhao, L. Tang, W. Zou, B. Zheng, Pcb-yolo: An improved detection algorithm of pcb surface defects based on yolov5, Sustainability 15 (7) (2023) 5963

2023

[28] [28]

W. Xuan, G. Jian-She, H. Bo-Jie, W. Zong-Shan, D. Hong-Wei, W. Jie, A lightweight modified yolox network using coordinate attention mech- anism for pcb surface defect detection, IEEE Sensors Journal 22 (21) (2022) 20910–20920

2022

[29] [29]

X. Liu, J. Hu, H. Wang, Z. Zhang, X. Lu, C. Sheng, S. Song, J. Nie, Gaussian-iou loss: Better learning for bounding box regression on pcb component detection, Expert Systems with Applications 190 (2022) 116178

2022

[30] [30]

M. Yuan, Y. Zhou, X. Ren, H. Zhi, J. Zhang, H. Chen, Yolo-hmc: An improved method for pcb surface defect detection, IEEE Transactions on Instrumentation and Measurement 73 (2024) 1–11

2024

[31] [31]

Q. Ling, N. A. M. Isa, Printed circuit board defect detection methods based on image processing, machine learning and deep learning: A sur- vey, IEEE access 11 (2023) 15921–15944

2023

[32] [32]

Y. Zhou, M. Yuan, J. Zhang, G. Ding, S. Qin, Review of vision-based defect detection research and its perspectives for printed circuit board, Journal of Manufacturing Systems 70 (2023) 557–578. 34

2023

[33] [33]

X. Tao, X. Gong, X. Zhang, S. Yan, C. Adak, Deep learning for un- supervised anomaly localization in industrial images: A survey, IEEE Transactions on Instrumentation and Measurement 71 (2022) 1–21

2022

[34] [34]

L. Jing, Y. Tian, Self-supervised visual feature learning with deep neural networks: A survey, IEEE transactions on pattern analysis and machine intelligence 43 (11) (2020) 4037–4058

2020

[35] [35]

A. v. d. Oord, Y. Li, O. Vinyals, Representation learning with con- trastive predictive coding, arXiv preprint arXiv:1807.03748 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[36] [36]

K. He, H. Fan, Y. Wu, S. Xie, R. Girshick, Momentum contrast for unsupervised visual representation learning, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 9729–9738

2020

[37] [37]

T. Chen, S. Kornblith, M. Norouzi, G. Hinton, A simple framework for contrastive learning of visual representations, in: International confer- ence on machine learning, PmLR, 2020, pp. 1597–1607

2020

[38] [38]

Grill, F

J.-B. Grill, F. Strub, F. Altché, C. Tallec, P. Richemond, E.Buchatskaya, C.Doersch, B.AvilaPires, Z.Guo, M.GheshlaghiAzar, et al., Bootstrap your own latent-a new approach to self-supervised learning, Advances in neural information processing systems 33 (2020) 21271–21284

2020

[39] [39]

Caron, I

M. Caron, I. Misra, J. Mairal, P. Goyal, P. Bojanowski, A. Joulin, Unsu- pervised learning of visual features by contrasting cluster assignments, Advances in neural information processing systems 33 (2020) 9912–9924

2020

[40] [40]

X. Chen, K. He, Exploring simple siamese representation learning, in: Proceedings of the IEEE/CVF conference on computer vision and pat- tern recognition, 2021, pp. 15750–15758

2021

[41] [41]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin, Attention is all you need, Advances in neural information processing systems 30 (2017)

2017

[42] [42]

Devlin, M.-W

J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, in: Pro- ceedings of the 2019 conference of the North American chapter of the 35 association for computational linguistics: human language technologies, volume 1 (long and short papers), 2019, pp. 4171–4186

2019

[43] [43]

H. Bao, L. Dong, S. Piao, F. Wei, Beit: Bert pre-training of image transformers, arXiv preprint arXiv:2106.08254 (2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021

[44] [44]

K. He, X. Chen, S. Xie, Y. Li, P. Dollár, R. Girshick, Masked autoen- coders are scalable vision learners, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 16000– 16009

2022

[45] [45]

9653–9663

Z.Xie, Z.Zhang, Y.Cao, Y.Lin, J.Bao, Z.Yao, Q.Dai, H.Hu, Simmim: A simple framework for masked image modeling, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 9653–9663

2022

[46] [46]

L. Zhou, H. Liu, J. Bae, J. He, D. Samaras, P. Prasanna, Self pre- training with masked autoencoders for medical image classification and segmentation, in: 2023 IEEE 20th international symposium on biomed- ical imaging (ISBI), IEEE, 2023, pp. 1–6

2023

[47] [47]

Hondru, F

V. Hondru, F. A. Croitoru, S. Minaee, R. T. Ionescu, N. Sebe, Masked image modeling: A survey, International Journal of Computer Vision 133 (10) (2025) 7154–7200

2025

[48] [48]

K. Tian, Y. Jiang, Q. Diao, C. Lin, L. Wang, Z. Yuan, Designing bert for convolutional networks: Sparse and hierarchical masked modeling, arXiv preprint arXiv:2301.03580 (2023)

work page arXiv 2023

[49] [49]

Canny, A computational approach to edge detection, IEEE Transac- tions on Pattern Analysis and Machine Intelligence PAMI-8 (6) (1986) 679–698

J. Canny, A computational approach to edge detection, IEEE Transac- tions on Pattern Analysis and Machine Intelligence PAMI-8 (6) (1986) 679–698

1986

[50] [50]

D. Marr, E. Hildreth, Theory of edge detection, Proceedings of the Royal Society of London. Series B. Biological Sciences 207 (1167) (1980) 187– 217

1980

[51] [51]

R. M. Haralick, K. Shanmugam, I. Dinstein, Textural features for image classification, IEEE Transactions on Systems, Man, and Cybernetics SMC-3 (6) (1973) 610–621. 36

1973

[52] [52]

W. T. Freeman, E. H. Adelson, et al., The design and use of steerable filters, IEEE Transactions on Pattern analysis and machine intelligence 13 (9) (1991) 891–906

1991

[53] [53]

Bigun, G

J. Bigun, G. H. Granlund, J. Wiklund, Multidimensional orientation estimation with applications to texture analysis and optical flow, IEEE Transactions on pattern analysis and machine intelligence 13 (8) (2002) 775–790

2002

[54] [54]

S. Lv, B. Ouyang, Z. Deng, T. Liang, S. Jiang, K. Zhang, J. Chen, Z. Li, A dataset for deep learning based detection of printed circuit board surface defect, Scientific Data 11 (1) (2024) 811

2024

[55] [55]

Chen, M.-C

P.-Y. Chen, M.-C. Chang, J.-W. Hsieh, Y.-S. Chen, Parallel residual bi- fusion feature pyramid network for accurate single-shot object detection, IEEE transactions on Image Processing 30 (2021) 9099–9111

2021

[56] [56]

S. Xu, X. Wang, W. Lv, Q. Chang, C. Cui, K. Deng, G. Wang, Q. Dang, S. Wei, Y. Du, et al., Pp-yoloe: An evolved version of yolo, arXiv preprint arXiv:2203.16250 (2022)

work page arXiv 2022

[57] [57]

Jocher, Ultralytics yolov5, gitHub repository (2020)

G. Jocher, Ultralytics yolov5, gitHub repository (2020)

2020

[58] [58]

X. Xu, Y. Jiang, W. Chen, Y. Huang, Y. Zhang, X. Sun, Damo- yolo: A report on real-time object detection design, arXiv preprint arXiv:2211.15444 (2022)

work page arXiv 2022

[59] [59]

C. Lyu, W. Zhang, H. Huang, Y. Zhou, Y. Wang, Y. Liu, S. Zhang, K. Chen, Rtmdet: An empirical study of designing real-time object detectors, arXiv preprint arXiv:2212.07784 (2022)

work page arXiv 2022

[60] [60]

Y. Zhao, W. Lv, S. Xu, J. Wei, G. Wang, Q. Dang, Y. Liu, J. Chen, Detrs beat yolos on real-time object detection, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 16965–16974

2024

[61] [61]

S. Li, F. Kong, R. Wang, T. Luo, Z. Shi, Efd-yolov4: A steel surface de- fect detection network with encoder-decoder residual block and feature alignment module, Measurement 220 (2023) 113359. 37

2023

[62] [62]

Ultralytics, Yolov8 documentation, Ultralytics official documentation (2023)

2023

[63] [63]

Ultralytics, Ultralytics yolo, GitHub repository (2023)

2023

[64] [64]

A. Wang, H. Chen, L. Liu, K. Chen, Z. Lin, J. Han, G. Ding, Yolov10: Real-time end-to-end object detection, arXiv preprint arXiv:2405.14458 (2024)

work page arXiv 2024

[65] [65]

C. Li, L. Li, H. Jiang, K. Weng, Y. Geng, L. Li, Z. Ke, Q. Li, M. Cheng, W. Nie, Y. Li, B. Zhang, Y. Liang, L. Zhou, X. Xu, X. Chu, X. Wei, X. Wei, Yolov6: A single-stage object detection framework for industrial applications, arXiv preprint arXiv:2209.02976 (2022)

work page arXiv 2022

[66] [66]

J. Zhou, C. Wei, H. Wang, W. Shen, C. Xie, A. Yuille, T. Kong, ibot: Image bert pre-training with online tokenizer, arXiv preprint arXiv:2111.07832 (2021). 38

work page internal anchor Pith review Pith/arXiv arXiv 2021