pith. sign in

arxiv: 2606.03508 · v1 · pith:7GQTJ6FXnew · submitted 2026-06-02 · 💻 cs.CV

Structure-Guided Mixed Masked Pretraining and Spatial Continuity Regularization for Printed Circuit Board Defect Detection

Pith reviewed 2026-06-28 10:17 UTC · model grok-4.3

classification 💻 cs.CV
keywords PCB defect detectionmasked pretrainingspatial continuity regularizationautomated optical inspectionsparse convolutionstructural priorsobject detectionDsPCBSD dataset
0
0 comments X

The pith

A two-phase framework of structure-guided masked pretraining on unlabeled PCB images and spatial continuity regularization during fine-tuning improves detection of tiny defects.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a method for PCB defect detection that first uses unlabeled images in a pretraining phase where structure-guided mixed masking and sparse convolutions help the model learn structural priors of circuit boards. Then in fine-tuning, a regularization term is added to encourage compact predictions for defect instances. This combination aims to handle challenges like low-contrast and dense backgrounds better than standard detectors. A sympathetic reader would care because it potentially allows leveraging abundant unlabeled data to enhance performance in industrial inspection settings.

Core claim

The proposed framework combines structure-guided mixed masked pretraining with spatial continuity regularization. In pretraining, structure-guided mixed masking constructs informative masked inputs for sparse convolutional reconstruction that suppresses invalid responses from masked regions and enables inference of missing PCB structures from visible conductive patterns to learn structural priors. In the fine-tuning stage, the spatial continuity regularization term constrains dispersed positive predictions assigned to the same defect instance to promote compact localization on elongated defect regions. On the DsPCBSD+ dataset, it achieves 85.5% mAP0.5 and 52.3% mAP0.5:0.95, outperforming bas

What carries the argument

Structure-guided mixed masked pretraining scheme using sparse convolutional reconstruction to learn PCB structural priors from visible patterns.

Load-bearing premise

The structure-guided mixed masking constructs informative masked inputs such that the sparse convolutional reconstruction pipeline suppresses invalid responses from masked regions and enables the detector backbone to infer missing PCB structures from visible conductive patterns, thereby learning PCB structural priors.

What would settle it

An experiment showing that removing the structure-guided masking or the spatial continuity regularization does not decrease the mAP scores on DsPCBSD+ would falsify the central claim.

Figures

Figures reproduced from arXiv: 2606.03508 by Chengjin Yu, Enxin Qin, Hanyu Xuan, Nuo Wang, Peitong Wang, Yuanting Yan.

Figure 1
Figure 1. Figure 1: Overview of the motivation and key observations for industrial AOI PCB defect [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overall framework and core designs of the proposed method. (a) Structure [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overview of the pretraining stage with sparse-convolutional structure-guided [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Illustration of the sparse convolutional masked reconstruction process in the [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative comparison between random masking and structure-guided masking [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Overview of the fine-tuning stage with spatial continuity regularization. (a) The [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Statistical analysis of the DsPCBSD+ dataset, indicating imbalanced defect [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Validation curves of the proposed method on the DsPCBSD+ dataset. [PITH_FULL_IMAGE:figures/full_fig_p020_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Qualitative comparison of different detectors on representative samples from the [PITH_FULL_IMAGE:figures/full_fig_p023_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Qualitative comparison on typical challenging samples before and after im [PITH_FULL_IMAGE:figures/full_fig_p024_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Qualitative visualization of output heat maps for different component settings [PITH_FULL_IMAGE:figures/full_fig_p026_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Visualization comparison of feature-response heatmaps produced by different [PITH_FULL_IMAGE:figures/full_fig_p028_12.png] view at source ↗
read the original abstract

Printed circuit board (PCB) defect detection is an essential part of automated optical inspection (AOI); yet it remains challenging in practice because many defects are tiny, low-contrast, and embedded in dense circuit backgrounds. To address these issues, this paper presents a two-phase PCB defect detection framework that combines structure-guided mixed masked pretraining with spatial continuity regularization. In the pretraining stage, we design a sparse convolutional masked pretraining scheme to exploit unlabeled PCB images, where structure-guided mixed masking is used to construct informative masked inputs. The sparse convolutional reconstruction pipeline suppresses invalid responses from masked regions and enables the detector backbone to infer missing PCB structures from visible conductive patterns, thereby learning PCB structural priors. In the fine-tuning stage, the pretrained backbone is transferred to the downstream defect detection task. For the task, a spatial continuity regularization term is introduced during fine-tuning. This term constrains dispersed positive predictions assigned to the same defect instance and promotes more compact localization on elongated defect regions. Experiments on the DsPCBSD+ dataset show that the proposed method achieves 85.5% mAP0.5 and 52.3% mAP0.5:0.95, outperforming several strong baseline detectors. Ablation studies and qualitative results further confirm the effectiveness of the proposed framework for robust PCB defect detection in industrial AOI scenarios.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript proposes a two-phase framework for printed circuit board (PCB) defect detection. It combines structure-guided mixed masked pretraining using sparse convolutional reconstruction on unlabeled PCB images to learn structural priors, with a spatial continuity regularization term during fine-tuning to constrain dispersed predictions and improve localization of elongated defects. On the DsPCBSD+ dataset, it reports achieving 85.5% mAP0.5 and 52.3% mAP0.5:0.95, outperforming several strong baseline detectors, with ablation studies and qualitative results supporting the framework's effectiveness.

Significance. If the empirical results hold, this work could contribute to improving automated optical inspection (AOI) systems by better handling tiny, low-contrast defects in dense circuit backgrounds through self-supervised pretraining and task-specific regularization. The use of unlabeled data and the regularization for compact localization are potentially valuable for industrial applications.

major comments (1)
  1. [Abstract] Abstract: The abstract states performance numbers and claims outperformance but supplies no experimental details, baseline definitions, ablation tables, or error analysis; without these it is impossible to verify whether the reported gains are attributable to the proposed components (structure-guided mixed masking and spatial continuity regularization).

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the single major comment below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The abstract states performance numbers and claims outperformance but supplies no experimental details, baseline definitions, ablation tables, or error analysis; without these it is impossible to verify whether the reported gains are attributable to the proposed components (structure-guided mixed masking and spatial continuity regularization).

    Authors: We agree that the abstract would benefit from additional context to substantiate the reported gains. In the revised manuscript we will expand the abstract to briefly reference the DsPCBSD+ dataset, note that comparisons are performed against multiple strong detectors (including YOLO variants and two-stage detectors), and state that ablation studies isolate the contributions of structure-guided mixed masked pretraining and spatial continuity regularization. Full tables, baseline definitions, and error analysis will continue to appear in the main body due to length constraints, but the added abstract phrasing will allow readers to better attribute the improvements to the proposed components. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper describes an empirical two-stage method: structure-guided mixed masked pretraining on unlabeled PCB images to learn structural priors via sparse convolutional reconstruction, followed by fine-tuning on the defect detection task with an added spatial continuity regularization term. The reported results (85.5% mAP0.5 and 52.3% mAP0.5:0.95 on DsPCBSD+) are measured empirical outcomes on a held-out dataset, not quantities obtained by fitting parameters to a subset and then renaming the fit as a prediction, nor by self-definitional equations, nor by load-bearing self-citations that reduce the central claim to prior author work. No derivation chain, uniqueness theorem, or ansatz is presented that collapses to the inputs by construction. The framework is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review prevents extraction of concrete free parameters, axioms, or invented entities; the approach implicitly relies on standard transfer-learning assumptions common to masked pretraining methods.

axioms (1)
  • domain assumption Pretraining on unlabeled PCB images via structure-guided masking transfers useful structural priors to the downstream supervised defect detection task
    The entire two-phase pipeline depends on successful transfer from the pretraining stage to fine-tuning.

pith-pipeline@v0.9.1-grok · 5788 in / 1214 out tokens · 36223 ms · 2026-06-28T10:17:32.672536+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

66 extracted references · 11 canonical work pages · 4 internal anchors

  1. [1]

    J. Tang, Z. Wang, H. Zhang, H. Li, P. Wu, N. Zeng, A lightweight sur- face defect detection framework combined with dual-domain attention mechanism, Expert Systems with Applications 238 (2024) 121726

  2. [2]

    Angelopoulos, E

    A. Angelopoulos, E. T. Michailidis, N. Nomikos, P. Trakadas, A. Hatziefremidis, S. Voliotis, T. Zahariadis, Tackling faults in the in- dustry 4.0 era—a survey of machine-learning solutions and key aspects, Sensors 20 (1) (2019) 109. 31

  3. [3]

    Q. Tan, L. Liu, M. Yu, J. Li, An innovative method of recycling metals in printed circuit board (pcb) using solutions from pcb production, Journal of Hazardous Materials 390 (2020) 121892

  4. [4]

    Moganti, F

    M. Moganti, F. Ercal, C. H. Dagli, S. Tsunekawa, Automatic pcb inspec- tion algorithms: A survey, Computer Vision and Image Understanding 63 (2) (1996) 287–313

  5. [5]

    Y. Zhou, M. Yuan, J. Zhang, G. Ding, S. Qin, Review of vision-based defect detection research and its perspectives for printed circuit board, Journal of Manufacturing Systems 70 (2023) 557–578

  6. [6]

    Q. Ling, N. A. M. Isa, Printed circuit board defect detection methods based on image processing, machine learning and deep learning: A sur- vey, IEEE Access 11 (2023) 15921–15944

  7. [7]

    D. Kang, J. Lai, Y. Han, Improving surface defect detection with context-guided asymmetric modulation networks and confidence- boosting loss, Expert Systems with Applications 225 (2023) 120121

  8. [8]

    P. Sun, C. Hua, W. Ding, C. Hua, A real–time detection framework for surface defects in ceramic tableware based on deep learning, Expert Systems with Applications 286 (2025) 128101

  9. [9]

    S. Meng, S. Zhang, X. Liang, J. Hu, Automatic extraction of scale infor- mation for interactive measurement of anything in microscopy images, Knowledge-Based Systems 324 (2025) 113578

  10. [10]

    S. Ren, K. He, R. Girshick, J. Sun, Faster r-cnn: Towards real-time object detection with region proposal networks, IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (6) (2017) 1137–1149

  11. [11]

    C. Song, J. Chen, Z. Lu, F. Li, Y. Liu, Steel surface defect detection via deformableconvolutionandbackgroundsuppression, IEEETransactions on Instrumentation and Measurement 72 (2023) 1–9

  12. [12]

    W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, A. C. Berg, Ssd: Single shot multibox detector, in: European conference on computer vision, Springer, 2016, pp. 21–37. 32

  13. [13]

    K. Duan, S. Bai, L. Xie, H. Qi, Q. Huang, Q. Tian, Centernet: Key- point triplets for object detection, in: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019

  14. [14]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., An image is worth 16x16 words: Transformers for image recognition at scale, arXiv preprint arXiv:2010.11929 (2020)

  15. [15]

    Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo, Swin transformer: Hierarchical vision transformer using shifted windows, in: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 10012–10022

  16. [16]

    N.Carion, F.Massa, G.Synnaeve, N.Usunier, A.Kirillov, S.Zagoruyko, End-to-end object detection with transformers, in: European conference on computer vision, Springer, 2020, pp. 213–229

  17. [17]

    Z. Zong, G. Song, Y. Liu, Detrs with collaborative hybrid assignments training, in: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 19609–19619

  18. [18]

    Y. Ma, J. Yin, F. Huang, Q. Li, Surface defect inspection of industrial products with object detection deep networks: A systematic review, Artificial Intelligence Review 57 (12) (2024) 333

  19. [19]

    L. Zhu, R. Zhao, A novel pcb surface defect detection method based on separated global context attention to guide residual context aggregation, Scientific Reports 15 (1) (2025) 9620

  20. [20]

    A. Khan, Z. Rauf, A. Sohail, A. R. Khan, H. Asif, A. Asif, U. Farooq, A survey of the vision transformers and their cnn-transformer based variants, Artificial Intelligence Review 56 (Suppl 3) (2023) 2917–2970

  21. [21]

    Q. Yuan, Y. Shi, M. Li, A review of computer vision-based crack detec- tion methods in civil infrastructure: Progress and challenges, Remote Sensing 16 (16) (2024)

  22. [22]

    Y. He, S. Li, X. Wen, J. Xu, A survey on surface defect inspection based on generative models in manufacturing, Applied Sciences 14 (15) (2024). 33

  23. [23]

    Sohan, T

    M. Sohan, T. Sai Ram, C. V. Rami Reddy, A review on yolov8 and its advancements, in: International conference on data intelligence and cognitive informatics, Springer, 2024, pp. 529–545

  24. [24]

    Yaseen, What is yolov8: An in-depth exploration of the inter- nal features of the next-generation object detector, arXiv preprint arXiv:2408.15857 (2024)

    M. Yaseen, What is yolov8: An in-depth exploration of the inter- nal features of the next-generation object detector, arXiv preprint arXiv:2408.15857 (2024)

  25. [25]

    Q.Zhao, T.Ji, S.Liang, W.Yu, Pcbsurfacedefectfastdetectionmethod based on attention and multi-source fusion, Multimedia Tools and Ap- plications 83 (2) (2024) 5451–5472

  26. [26]

    G. Liu, H. Wen, Printed circuit board defect detection based on MobileNet-Yolo-Fast, Journal of Electronic Imaging 30 (4) (2021) 043004

  27. [27]

    J. Tang, S. Liu, D. Zhao, L. Tang, W. Zou, B. Zheng, Pcb-yolo: An improved detection algorithm of pcb surface defects based on yolov5, Sustainability 15 (7) (2023) 5963

  28. [28]

    W. Xuan, G. Jian-She, H. Bo-Jie, W. Zong-Shan, D. Hong-Wei, W. Jie, A lightweight modified yolox network using coordinate attention mech- anism for pcb surface defect detection, IEEE Sensors Journal 22 (21) (2022) 20910–20920

  29. [29]

    X. Liu, J. Hu, H. Wang, Z. Zhang, X. Lu, C. Sheng, S. Song, J. Nie, Gaussian-iou loss: Better learning for bounding box regression on pcb component detection, Expert Systems with Applications 190 (2022) 116178

  30. [30]

    M. Yuan, Y. Zhou, X. Ren, H. Zhi, J. Zhang, H. Chen, Yolo-hmc: An improved method for pcb surface defect detection, IEEE Transactions on Instrumentation and Measurement 73 (2024) 1–11

  31. [31]

    Q. Ling, N. A. M. Isa, Printed circuit board defect detection methods based on image processing, machine learning and deep learning: A sur- vey, IEEE access 11 (2023) 15921–15944

  32. [32]

    Y. Zhou, M. Yuan, J. Zhang, G. Ding, S. Qin, Review of vision-based defect detection research and its perspectives for printed circuit board, Journal of Manufacturing Systems 70 (2023) 557–578. 34

  33. [33]

    X. Tao, X. Gong, X. Zhang, S. Yan, C. Adak, Deep learning for un- supervised anomaly localization in industrial images: A survey, IEEE Transactions on Instrumentation and Measurement 71 (2022) 1–21

  34. [34]

    L. Jing, Y. Tian, Self-supervised visual feature learning with deep neural networks: A survey, IEEE transactions on pattern analysis and machine intelligence 43 (11) (2020) 4037–4058

  35. [35]

    A. v. d. Oord, Y. Li, O. Vinyals, Representation learning with con- trastive predictive coding, arXiv preprint arXiv:1807.03748 (2018)

  36. [36]

    K. He, H. Fan, Y. Wu, S. Xie, R. Girshick, Momentum contrast for unsupervised visual representation learning, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 9729–9738

  37. [37]

    T. Chen, S. Kornblith, M. Norouzi, G. Hinton, A simple framework for contrastive learning of visual representations, in: International confer- ence on machine learning, PmLR, 2020, pp. 1597–1607

  38. [38]

    Grill, F

    J.-B. Grill, F. Strub, F. Altché, C. Tallec, P. Richemond, E.Buchatskaya, C.Doersch, B.AvilaPires, Z.Guo, M.GheshlaghiAzar, et al., Bootstrap your own latent-a new approach to self-supervised learning, Advances in neural information processing systems 33 (2020) 21271–21284

  39. [39]

    Caron, I

    M. Caron, I. Misra, J. Mairal, P. Goyal, P. Bojanowski, A. Joulin, Unsu- pervised learning of visual features by contrasting cluster assignments, Advances in neural information processing systems 33 (2020) 9912–9924

  40. [40]

    X. Chen, K. He, Exploring simple siamese representation learning, in: Proceedings of the IEEE/CVF conference on computer vision and pat- tern recognition, 2021, pp. 15750–15758

  41. [41]

    Vaswani, N

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin, Attention is all you need, Advances in neural information processing systems 30 (2017)

  42. [42]

    Devlin, M.-W

    J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, in: Pro- ceedings of the 2019 conference of the North American chapter of the 35 association for computational linguistics: human language technologies, volume 1 (long and short papers), 2019, pp. 4171–4186

  43. [43]

    H. Bao, L. Dong, S. Piao, F. Wei, Beit: Bert pre-training of image transformers, arXiv preprint arXiv:2106.08254 (2021)

  44. [44]

    K. He, X. Chen, S. Xie, Y. Li, P. Dollár, R. Girshick, Masked autoen- coders are scalable vision learners, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 16000– 16009

  45. [45]

    9653–9663

    Z.Xie, Z.Zhang, Y.Cao, Y.Lin, J.Bao, Z.Yao, Q.Dai, H.Hu, Simmim: A simple framework for masked image modeling, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 9653–9663

  46. [46]

    L. Zhou, H. Liu, J. Bae, J. He, D. Samaras, P. Prasanna, Self pre- training with masked autoencoders for medical image classification and segmentation, in: 2023 IEEE 20th international symposium on biomed- ical imaging (ISBI), IEEE, 2023, pp. 1–6

  47. [47]

    Hondru, F

    V. Hondru, F. A. Croitoru, S. Minaee, R. T. Ionescu, N. Sebe, Masked image modeling: A survey, International Journal of Computer Vision 133 (10) (2025) 7154–7200

  48. [48]

    K. Tian, Y. Jiang, Q. Diao, C. Lin, L. Wang, Z. Yuan, Designing bert for convolutional networks: Sparse and hierarchical masked modeling, arXiv preprint arXiv:2301.03580 (2023)

  49. [49]

    Canny, A computational approach to edge detection, IEEE Transac- tions on Pattern Analysis and Machine Intelligence PAMI-8 (6) (1986) 679–698

    J. Canny, A computational approach to edge detection, IEEE Transac- tions on Pattern Analysis and Machine Intelligence PAMI-8 (6) (1986) 679–698

  50. [50]

    D. Marr, E. Hildreth, Theory of edge detection, Proceedings of the Royal Society of London. Series B. Biological Sciences 207 (1167) (1980) 187– 217

  51. [51]

    R. M. Haralick, K. Shanmugam, I. Dinstein, Textural features for image classification, IEEE Transactions on Systems, Man, and Cybernetics SMC-3 (6) (1973) 610–621. 36

  52. [52]

    W. T. Freeman, E. H. Adelson, et al., The design and use of steerable filters, IEEE Transactions on Pattern analysis and machine intelligence 13 (9) (1991) 891–906

  53. [53]

    Bigun, G

    J. Bigun, G. H. Granlund, J. Wiklund, Multidimensional orientation estimation with applications to texture analysis and optical flow, IEEE Transactions on pattern analysis and machine intelligence 13 (8) (2002) 775–790

  54. [54]

    S. Lv, B. Ouyang, Z. Deng, T. Liang, S. Jiang, K. Zhang, J. Chen, Z. Li, A dataset for deep learning based detection of printed circuit board surface defect, Scientific Data 11 (1) (2024) 811

  55. [55]

    Chen, M.-C

    P.-Y. Chen, M.-C. Chang, J.-W. Hsieh, Y.-S. Chen, Parallel residual bi- fusion feature pyramid network for accurate single-shot object detection, IEEE transactions on Image Processing 30 (2021) 9099–9111

  56. [56]

    S. Xu, X. Wang, W. Lv, Q. Chang, C. Cui, K. Deng, G. Wang, Q. Dang, S. Wei, Y. Du, et al., Pp-yoloe: An evolved version of yolo, arXiv preprint arXiv:2203.16250 (2022)

  57. [57]

    Jocher, Ultralytics yolov5, gitHub repository (2020)

    G. Jocher, Ultralytics yolov5, gitHub repository (2020)

  58. [58]

    X. Xu, Y. Jiang, W. Chen, Y. Huang, Y. Zhang, X. Sun, Damo- yolo: A report on real-time object detection design, arXiv preprint arXiv:2211.15444 (2022)

  59. [59]

    C. Lyu, W. Zhang, H. Huang, Y. Zhou, Y. Wang, Y. Liu, S. Zhang, K. Chen, Rtmdet: An empirical study of designing real-time object detectors, arXiv preprint arXiv:2212.07784 (2022)

  60. [60]

    Y. Zhao, W. Lv, S. Xu, J. Wei, G. Wang, Q. Dang, Y. Liu, J. Chen, Detrs beat yolos on real-time object detection, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 16965–16974

  61. [61]

    S. Li, F. Kong, R. Wang, T. Luo, Z. Shi, Efd-yolov4: A steel surface de- fect detection network with encoder-decoder residual block and feature alignment module, Measurement 220 (2023) 113359. 37

  62. [62]

    Ultralytics, Yolov8 documentation, Ultralytics official documentation (2023)

  63. [63]

    Ultralytics, Ultralytics yolo, GitHub repository (2023)

  64. [64]

    A. Wang, H. Chen, L. Liu, K. Chen, Z. Lin, J. Han, G. Ding, Yolov10: Real-time end-to-end object detection, arXiv preprint arXiv:2405.14458 (2024)

  65. [65]

    C. Li, L. Li, H. Jiang, K. Weng, Y. Geng, L. Li, Z. Ke, Q. Li, M. Cheng, W. Nie, Y. Li, B. Zhang, Y. Liang, L. Zhou, X. Xu, X. Chu, X. Wei, X. Wei, Yolov6: A single-stage object detection framework for industrial applications, arXiv preprint arXiv:2209.02976 (2022)

  66. [66]

    J. Zhou, C. Wei, H. Wang, W. Shen, C. Xie, A. Yuille, T. Kong, ibot: Image bert pre-training with online tokenizer, arXiv preprint arXiv:2111.07832 (2021). 38