Recognition: unknown
Attention-Augmented YOLOv8 with Ghost Convolution for Real-Time Vehicle Detection in Intelligent Transportation Systems
Pith reviewed 2026-05-10 00:29 UTC · model grok-4.3
The pith
Adding Ghost Module, CBAM, and DCNv2 to YOLOv8n raises vehicle detection mAP to 95.4 percent on KITTI.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By integrating the Ghost Module for efficient feature generation, CBAM for channel and spatial attention, and DCNv2 for geometric adaptability into YOLOv8n, the resulting detector reaches 95.4% mAP@0.5 on the KITTI dataset, an 8.97% gain over the unmodified YOLOv8n baseline, together with 96.2% precision, 93.7% recall, and 94.93% F1-score. Comparative tests against seven other detectors and ablation studies confirm that the three modules together produce consistent improvements in feature handling for vehicle detection.
What carries the argument
The attention-augmented YOLOv8n backbone that combines Ghost Module, CBAM, and DCNv2 to reduce redundancy, refine features, and adapt to shape variations.
If this is right
- The model outperforms seven existing detectors across precision, recall, and mAP metrics on KITTI.
- Ablation experiments show each module contributes measurably when added individually or in combination.
- The architecture maintains computational efficiency suitable for real-time traffic monitoring.
- The same modules address feature redundancy, attention focus, and shape variation in complex scenes.
Where Pith is reading between the lines
- The same three-module pattern could be tested on other YOLO variants or on pedestrian and cyclist detection within the same dataset.
- If the efficiency gains hold on embedded hardware, the detector becomes a candidate for roadside cameras in live traffic systems.
- Extending the approach to multi-camera fusion or night-time infrared images would test whether the attention and deformable layers generalize to harder lighting conditions.
Load-bearing premise
The measured accuracy gains come chiefly from the three added modules and will hold for vehicle detection outside the KITTI dataset and under different training conditions.
What would settle it
Re-run the exact same training schedule and data augmentations on KITTI for both the baseline YOLOv8n and the proposed model; if the mAP gap shrinks below roughly 5 points, the claim that the modules are the main source of the 8.97% lift is weakened.
Figures
read the original abstract
Accurate vehicle detection is a critical component of autonomous driving, traffic surveillance, and intelligent transportation systems. This paper presents an enhanced YOLOv8n-based model that integrates the Ghost Module, Convolutional Block Attention Module (CBAM), and Deformable Convolutional Networks v2 (DCNv2) to improve detection performance. The Ghost Module reduces feature redundancy through efficient feature generation, CBAM refines feature representation via channel and spatial attention, and DCNv2 enhances adaptability to geometric variations in vehicle structures. Evaluated on the KITTI dataset, the proposed model achieves 95.4% mAP@0.5, representing an 8.97% improvement over the baseline YOLOv8n, along with 96.2% precision, 93.7% recall, and a 94.93% F1-score. Comparative analysis against seven state-of-the-art detectors demonstrates consistent superiority across key performance metrics, while ablation studies validate the individual and combined contributions of the integrated modules. By addressing feature redundancy, attention refinement, and spatial adaptability, the proposed approach offers a robust and computationally efficient solution for vehicle detection in diverse and complex traffic environments.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes an attention-augmented YOLOv8n variant that integrates the Ghost Module for efficient feature generation, CBAM for channel/spatial attention, and DCNv2 for handling geometric variations in vehicles. Evaluated on the KITTI dataset, the model reports 95.4% mAP@0.5 (8.97% above baseline YOLOv8n), 96.2% precision, 93.7% recall, and 94.93% F1-score, with comparative results against seven other detectors and ablation studies claimed to validate the modules' contributions for real-time vehicle detection in intelligent transportation systems.
Significance. If the reported gains are shown to arise specifically from the added modules under matched training conditions, the work would offer a practical, efficiency-aware improvement to YOLOv8 for vehicle detection tasks. The combination of Ghost convolution, attention, and deformable convolutions is a standard and plausible direction in the field; reproducible ablation results and consistent outperformance on a public benchmark would strengthen its utility for ITS applications.
major comments (1)
- [Abstract and §4 (Experiments)] Abstract and §4 (Experiments): The central claim attributes the 8.97% mAP@0.5 lift (95.4% vs. YOLOv8n baseline) primarily to Ghost Module + CBAM + DCNv2, supported by ablation studies. However, the manuscript does not explicitly state that the baseline YOLOv8n was retrained under identical conditions (optimizer, learning-rate schedule, number of epochs, data augmentations, and train/val splits). Without this, performance differences cannot be confidently ascribed to the architectural additions rather than training-protocol variations; this directly undermines the ablation-based validation of module contributions.
minor comments (2)
- [Abstract and Results] The abstract and results sections claim real-time suitability but report no FPS, inference latency, or FLOPs numbers for the proposed model versus baseline; adding these metrics (e.g., on the same hardware) would directly support the efficiency claims.
- [Tables 1-3] Table captions and axis labels in the comparative and ablation tables should explicitly note the evaluation protocol (e.g., mAP@0.5 on KITTI val split) to avoid ambiguity when readers compare against other published YOLOv8 variants.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback, which will help strengthen the clarity and rigor of our work. We address the major comment below.
read point-by-point responses
-
Referee: [Abstract and §4 (Experiments)] Abstract and §4 (Experiments): The central claim attributes the 8.97% mAP@0.5 lift (95.4% vs. YOLOv8n baseline) primarily to Ghost Module + CBAM + DCNv2, supported by ablation studies. However, the manuscript does not explicitly state that the baseline YOLOv8n was retrained under identical conditions (optimizer, learning-rate schedule, number of epochs, data augmentations, and train/val splits). Without this, performance differences cannot be confidently ascribed to the architectural additions rather than training-protocol variations; this directly undermines the ablation-based validation of module contributions.
Authors: We agree with the referee that explicit confirmation of identical training conditions is essential for attributing performance gains to the architectural changes and for supporting the ablation results. In our experiments, the YOLOv8n baseline was retrained from scratch under exactly the same conditions as the proposed model, using the Adam optimizer, the identical learning-rate schedule, 300 epochs, the same data augmentations, and the same train/validation splits on the KITTI dataset. This controlled setup ensures that the reported 8.97% mAP@0.5 improvement (and the ablation outcomes) can be confidently ascribed to the Ghost Module, CBAM, and DCNv2. We will revise Section 4 to include a clear statement of these matched training protocols and will add a brief reference in the abstract and ablation discussion to improve reproducibility and strengthen the validation of the module contributions. revision: yes
Circularity Check
No circularity; empirical results from benchmark evaluation
full rationale
The paper describes an architectural enhancement to YOLOv8n via Ghost Module, CBAM, and DCNv2, then reports measured performance (mAP@0.5, precision, recall, F1) after training and evaluation on the public KITTI dataset, plus ablations and comparisons to other detectors. No mathematical derivation chain, first-principles predictions, or fitted parameters are claimed; all headline numbers are direct empirical outputs. No self-citations, self-definitional equations, or renamings of known results appear in the abstract or described content. The central claims rest on experimental protocol rather than any reduction to inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The KITTI dataset distribution is representative of the target deployment environments for intelligent transportation systems.
Reference graph
Works this paper leans on
-
[1]
The application of virtual reality technology on intelligent traffic construction and decision support in smart cities,
G. Yan and Y . Chen, “The application of virtual reality technology on intelligent traffic construction and decision support in smart cities,”Wire- less Communications and Mobile Computing, vol. 2021, p. 3833562, 2021
2021
-
[2]
You only look once: Unified, real-time object detection,
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 779–788
2016
-
[3]
Yolo9000: Better, faster, stronger,
J. Redmon and A. Farhadi, “Yolo9000: Better, faster, stronger,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 6517–6525
2017
-
[4]
YOLOv3: An Incremental Improvement
——, “Yolov3: An incremental improvement,”arXiv preprint arXiv:1804.02767, 2018
work page internal anchor Pith review arXiv 2018
-
[5]
YOLOv4: Optimal Speed and Accuracy of Object Detection
A. Bochkovskiy, C.-Y . Wang, and H.-Y . M. Liao, “Yolov4: Op- timal speed and accuracy of object detection,”arXiv preprint arXiv:2004.10934, 2020
work page internal anchor Pith review arXiv 2004
-
[6]
Ultralytics yolov5,
G. Jocheret al., “Ultralytics yolov5,” https://github.com/ultralytics/ yolov5, 2020
2020
-
[7]
Under the hood: Yolov8 architecture explained,
Keylabs, “Under the hood: Yolov8 architecture explained,” https: //keylabs.ai/blog/under-the-hood-yolov8-architecture-explained/, 2023, accessed: 2024-12-09
2023
-
[8]
A bearing surface defect detection method based on multi- attention mechanism yolov8,
P. Ding, “A bearing surface defect detection method based on multi- attention mechanism yolov8,”Measurement Science and Technology, vol. 35, no. 8, p. 086003, 2024
2024
-
[9]
GhostNetV2: Enhance Cheap Operation with Long-Range Attention,
Y . Tang, K. Han, J. Guo, C. Xu, C. Xu, and Y . Wang, “GhostNetV2: Enhance Cheap Operation with Long-Range Attention,” inAdvances in Neural Information Processing Systems (NeurIPS), 2022. [Online]. Available: https://arxiv.org/abs/2211.12905
-
[10]
YOLOv8-AM: YOLOv8 Based on Effective Attention Mechanisms for Pediatric Wrist Fracture Detection,
C.-T. Chien, R.-Y . Ju, K.-Y . Chou, E. Xieerke, and J.-S. Chiang, “YOLOv8-AM: YOLOv8 Based on Effective Attention Mechanisms for Pediatric Wrist Fracture Detection,”IEEE Acces, 2025. [Online]. Available: https://ieeexplore.ieee.org/document/10918980
-
[11]
Object Detection Algorithm Based on Improved YOLOv8 for Drill Pipe on Coal Mines,
X. Li, M. Li, and M. Zhao, “Object Detection Algorithm Based on Improved YOLOv8 for Drill Pipe on Coal Mines,”Scientific Reports, vol. 15, no. 5942, 2025. [Online]. Available: https: //www.nature.com/articles/s41598-025-89019-8
2025
-
[12]
Fully deformable convolutional network for ship detection in remote sensing imagery,
H. Guo, H. Bai, Y . Yuan, and W. Qin, “Fully deformable convolutional network for ship detection in remote sensing imagery,”Remote Sensing, vol. 14, no. 8, p. 1850, 2022
2022
-
[13]
A comprehensive review of yolo architectures in computer vision: from yolov1 to yolov8 and yolo-nas,
J. Terven, “A comprehensive review of yolo architectures in computer vision: from yolov1 to yolov8 and yolo-nas,”Machine Learning and Knowledge Extraction, vol. 5, no. 4, pp. 1680–1716, 2023
2023
-
[14]
An improved yolov8 to detect moving objects,
M. Safaldin, “An improved yolov8 to detect moving objects,”IEEE Access, vol. 12, pp. 59 782–59 806, 2024
2024
-
[15]
Improved vehicle detection systems with double-layer lstm modules,
W. Yang, W. Liow, S. Chen, J. Yang, P. Chung, and S. Mao, “Improved vehicle detection systems with double-layer lstm modules,”Eurasip Journal on Advances in Signal Processing, vol. 2022, pp. 1–10, 2022
2022
-
[16]
Vehicle detection using deep learning technique in tunnel road environments,
J. Kim, “Vehicle detection using deep learning technique in tunnel road environments,”Symmetry, vol. 12, no. 12, p. 2012, 2020
2012
-
[17]
Vehicle detection using yolov5,
C. Chavan, “Vehicle detection using yolov5,”International Journal of Scientific Research in Engineering and Management, vol. 07, no. 05, 2023
2023
-
[18]
Deep learning based multi-target detection for roads,
J. Jiang, “Deep learning based multi-target detection for roads,”Applied and Computational Engineering, vol. 39, no. 1, pp. 38–43, 2024
2024
-
[19]
Lightweight yolov5 architecture for real-time vehicle detection in intelligent transportation systems,
L. Xu and B. Chen, “Lightweight yolov5 architecture for real-time vehicle detection in intelligent transportation systems,”IEEE Access, vol. 11, pp. 6783–6795, 2023
2023
-
[20]
Convolution neural network with selective multi-stage feature fusion: case study on vehicle rear detection,
W. Lee, D. Kim, T. Kang, and M. Lim, “Convolution neural network with selective multi-stage feature fusion: case study on vehicle rear detection,”Applied Sciences, vol. 8, no. 12, p. 2468, 2018
2018
-
[21]
YOLOv5-CBAM: A Small Object Detection Model Based on YOLOv5 and CBAM,
Q. Ma, “YOLOv5-CBAM: A Small Object Detection Model Based on YOLOv5 and CBAM,” in2024 6th International Conference on Robotics, Intelligent Control and Artificial Intelligence (RICAI), 2024, pp. 618–623. [Online]. Available: https://doi.org/10.1109/RICAI64321. 2024.10911839
-
[22]
Object detection in aerial images using cbam and fpn,
J. Li and Y . Wang, “Object detection in aerial images using cbam and fpn,”Sensors, vol. 20, no. 18, p. 5245, 2020
2020
-
[23]
Intelligent detection of hazardous goods vehicles and determination of risk grade based on deep learning,
Q. An, S. Wu, R. Shi, H. Wang, J. Yu, and Z. Li, “Intelligent detection of hazardous goods vehicles and determination of risk grade based on deep learning,”Sensors, vol. 22, no. 19, p. 7123, 2022
2022
-
[24]
Deformable convnets v2: More deformable, better results,
X. Zhu, H. Hu, S. Lin, and J. Dai, “Deformable convnets v2: More deformable, better results,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 9308– 9316
2019
-
[25]
Yolov8 based novel approach for object detection on lidar point cloud,
S. Behera, B. Anandet al., “Yolov8 based novel approach for object detection on lidar point cloud,” in2024 IEEE 99th Vehicular Technology Conference (VTC2024-Spring), 2024, pp. 1–5
2024
-
[26]
A visual de- tection algorithm for autonomous driving road environment perception,
P. Cong, H. Feng, S. Li, T. Li, Y . Xu, and X. Zhang, “A visual de- tection algorithm for autonomous driving road environment perception,” Engineering Applications of Artificial Intelligence, vol. 133, p. 108034, 2024
2024
-
[27]
Road object detection algorithm based on improved yolov8,
J. Peng, C. Li, A. Jiang, B. Mou, Y . Luo, and W. Chen, “Road object detection algorithm based on improved yolov8,” in2024 IEEE 19th Conference on Industrial Electronics and Applications (ICIEA), 2024, pp. 1–6
2024
-
[28]
An improved lightweight network for real-time detection of potential risks for autonomous vehicles,
X. Shen and V . V . Lukyanov, “An improved lightweight network for real-time detection of potential risks for autonomous vehicles,” in2024 International Russian Automation Conference (RusAutoCon). IEEE, 2024, pp. 583–588
2024
-
[29]
Gbforkdet: A lightweight object detector for forklift safety driving,
L. Ye and S. Chen, “Gbforkdet: A lightweight object detector for forklift safety driving,”IEEE Access, vol. 11, pp. 86 509–86 521, 2023
2023
-
[30]
Z- yolov8s-based approach for road object recognition in complex traffic scenarios,
R. Zhao, S. H. Tang, E. E. B. Supeni, S. A. Rahim, and L. Fan, “Z- yolov8s-based approach for road object recognition in complex traffic scenarios,”Alexandria Engineering Journal, vol. 106, pp. 298–311, 2024
2024
-
[31]
Real-time vehicle detection algorithm based on vision and lidar point cloud fusion,
H. Wang, X. Lou, Y . Cai, Y . Li, and L. Chen, “Real-time vehicle detection algorithm based on vision and lidar point cloud fusion,” Journal of Sensors, vol. 2019, pp. 1–9, 2019
2019
-
[32]
Ghostnet: More features from cheap operations,
K. Han, Y . Wang, Q. Tian, J. Guo, C. Xu, and C. Xu, “Ghostnet: More features from cheap operations,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 1580–1589
2020
-
[33]
Cbam: Convolutional block attention module,
S. Woo, J. Park, J.-Y . Lee, and I. S. Kweon, “Cbam: Convolutional block attention module,” inProceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 3–19
2018
-
[34]
A multi-objective dynamic detection model in autonomous driving based on an improved yolov8,
C. Li, Y . Zhu, and M. Zheng, “A multi-objective dynamic detection model in autonomous driving based on an improved yolov8,”Alexandria Engineering Journal, vol. 122, pp. 453–464, 2025
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.