Small Object Detection in Industrial Recycling: A New Dataset and YOLO Performance Evaluation
Pith reviewed 2026-06-29 18:05 UTC · model grok-4.3
The pith
A new dataset of over 10k images and 120k instances enables direct comparison of YOLO detectors for small dense objects in recycling.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes a benchmark dataset of more than 10,000 images and 120,000 instances and uses it to compare supervised deep learning detectors, particularly YOLO variants, on accuracy, computational efficiency, and robustness for small-object detection in recycling. It reports that data augmentation and synthetic images improve results, identifies the most reliable current systems, and demonstrates an anomaly detection approach that remains stable across resolution and zoom variations while extending the scope to length measurement.
What carries the argument
The new recycling-specific dataset paired with a side-by-side evaluation of YOLO and other deep learning detectors on accuracy, speed, and augmentation effects.
If this is right
- Data augmentation and synthetic images measurably raise detection accuracy on the provided dataset.
- Specific YOLO configurations deliver the best trade-off between precision and runtime for dense small-object scenes.
- The anomaly detection component continues to function reliably when image resolution or zoom changes.
- Object detection outputs can be combined with length measurement within the same recycling workflow.
Where Pith is reading between the lines
- The dataset and evaluation protocol could be replicated for other manufacturing inspection tasks that involve small dense items.
- Cross-plant testing would reveal whether the current top-ranked detectors generalize beyond the original data source.
- Integrating the detectors with on-line length measurement hardware could create an end-to-end quality-control pipeline.
Load-bearing premise
The collected images represent the full range of object sizes, densities, overlaps, and lighting conditions encountered in operating recycling facilities.
What would settle it
Run the same set of detectors on a fresh collection of images sampled from a different recycling plant and verify whether the accuracy and efficiency rankings stay the same.
Figures
read the original abstract
In this paper, we address the problem of detecting small, dense, and overlapping objects, a major challenge in computer vision. Our focus is on reviewing proposed methods based on deep learning supervised approaches. We provide a detailed comparison of these systems on a new dataset of more than 10k images and 120k instances, highlighting their performance, accuracy, and computational efficiency in the industrial recycling process use case. Through this comparative analysis, we identify the most reliable systems currently available and the specific challenges they are designed to tackle. Furthermore, we explore the benefits of data augmentation and synthetic images. Based on our analysis, we also propose potential future directions and innovative solutions that could enhance the effectiveness of small, dense and overlapped object detection systems. The scope of our investigations encompasses object detection, length measurement, and anomaly detection within the context of the recycling process. The anomaly detection strategy is robust against variations in image resolution and zoom levels, ensuring reliable performance in industrial applications. The repository of the proposed dataset, methods and evaluation codes can be found at: https://github.com/o-messai/SDOOD
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces a new dataset of more than 10k images and 120k instances focused on small, dense, and overlapping objects in industrial recycling. It performs a comparative evaluation of deep-learning detectors (with emphasis on YOLO variants) for performance, accuracy, and computational efficiency, investigates data augmentation and synthetic images, and addresses length measurement plus anomaly detection claimed to be robust against resolution and zoom variations. Future directions are proposed and code/dataset are released via GitHub.
Significance. If the dataset is shown to be representative of real recycling streams, the benchmark could usefully inform model selection for industrial small-object tasks where efficiency matters. Public release of data and code supports reproducibility. The anomaly-detection robustness claim, if backed by quantitative cross-condition tests, would add practical value.
major comments (2)
- [Abstract and Dataset section] Abstract and Dataset section: the central claim that the comparison reliably identifies the best systems for the industrial recycling use case requires the dataset to be representative of varied real streams (plants, lighting, densities, zoom). No sampling strategy, acquisition protocol, number of source facilities, camera/setup details, or cross-plant/domain-shift validation is described.
- [Abstract] Abstract: the text states that a detailed comparison of performance, accuracy, and efficiency was performed yet supplies no quantitative results, tables, error bars, or exclusion criteria, so the soundness of the ranking claims cannot be verified from the provided text.
minor comments (1)
- [Title and Abstract] The title specifies YOLO evaluation while the abstract refers to broader deep-learning methods; clarify the exact set of detectors compared.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback. We address each major comment below and commit to revisions that strengthen the clarity and verifiability of our claims without altering the core contributions.
read point-by-point responses
-
Referee: [Abstract and Dataset section] Abstract and Dataset section: the central claim that the comparison reliably identifies the best systems for the industrial recycling use case requires the dataset to be representative of varied real streams (plants, lighting, densities, zoom). No sampling strategy, acquisition protocol, number of source facilities, camera/setup details, or cross-plant/domain-shift validation is described.
Authors: We agree that explicit documentation of acquisition details is required to support claims of representativeness. In the revised manuscript we will add a dedicated 'Data Acquisition' subsection that specifies the sampling strategy, acquisition protocol, number of source facilities, camera models, lighting conditions, and zoom/density variations covered. We will also report any available cross-facility consistency checks; where full domain-shift experiments were not performed we will explicitly note this as a limitation and outline it as future work. revision: yes
-
Referee: [Abstract] Abstract: the text states that a detailed comparison of performance, accuracy, and efficiency was performed yet supplies no quantitative results, tables, error bars, or exclusion criteria, so the soundness of the ranking claims cannot be verified from the provided text.
Authors: We accept that the abstract must contain concise quantitative support. We will revise the abstract to include the principal numerical outcomes (e.g., mAP ranges and inference speeds for the leading YOLO variants, augmentation gains, and anomaly-detection metrics) together with a brief statement of the evaluation protocol and exclusion criteria used. revision: yes
Circularity Check
No circularity: purely empirical dataset and benchmark paper
full rationale
The paper introduces a new dataset (>10k images, 120k instances) and reports empirical performance numbers for existing YOLO-family detectors on held-out test splits. No derivations, first-principles results, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text. The central claim is a straightforward comparison of off-the-shelf models; the evaluation protocol does not reduce to its own inputs by construction. Dataset representativeness is an external validity question, not a circularity issue.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Object detection with deep learning: A review,
Z.-Q. Zhao, P. Zheng, S.-t. Xu,et al., “Object detection with deep learning: A review,”IEEE transactions on neural networks and learning systems30(11), 3212–3232 (2019)
2019
-
[2]
Small-object detection in remote sensing images with end-to-end edge-enhanced gan and object detector network,
J. Rabbi, N. Ray, M. Schubert,et al., “Small-object detection in remote sensing images with end-to-end edge-enhanced gan and object detector network,”Remote Sensing12(9), 1432 (2020)
2020
-
[3]
Enhancing image quality and anomaly detection for small and dense industrial objects in nuclear recycling,
O. Messai, A. Zein-Eddine, A. Bentamou,et al., “Enhancing image quality and anomaly detection for small and dense industrial objects in nuclear recycling,” inSeventeenth International Conference on Quality Control by Artificial Vision, T. Shimizu, T. Akashi, J. Sato,et al., Eds., 13737, 1373705, International Society for Optics and Photonics, SPIE (2025)
2025
-
[4]
Application of deep learning for object detection,
A. R. Pathak, M. Pandey, and S. Rautaray, “Application of deep learning for object detection,”Procedia computer science132, 1706– 1717 (2018)
2018
-
[5]
Object detection in 20 years: A survey,
Z. Zou, K. Chen, Z. Shi,et al., “Object detection in 20 years: A survey,” Proceedings of the IEEE111(3), 257–276 (2023)
2023
-
[6]
Google earth engine for geo-big data applications: A meta-analysis and systematic review,
H. Tamiminia, B. Salehi, M. Mahdianpari,et al., “Google earth engine for geo-big data applications: A meta-analysis and systematic review,” ISPRS journal of photogrammetry and remote sensing164, 152–170 (2020)
2020
-
[7]
Roboflow (version 1.0)[soft- ware],
B. Dwyer, J. Nelson, J. Solawetz,et al., “Roboflow (version 1.0)[soft- ware],”URL: https://roboflow. com. computer vision(2022)
2022
-
[8]
Generation of synthetic digital image corre- lation images using the open-source blender software,
D. Rohe and E. Jones, “Generation of synthetic digital image corre- lation images using the open-source blender software,”Experimental Techniques46(4), 615–631 (2022)
2022
-
[9]
Improved object recognition results using sift and orb feature detector,
S. Gupta, M. Kumar, and A. Garg, “Improved object recognition results using sift and orb feature detector,”Multimedia Tools and Applications 78(23), 34157–34171 (2019)
2019
-
[10]
A geometric-based method for recognizing overlapping polygonal-shaped and semi-transparent par- ticles in gray tone images,
O. S. Ahmad, J. Debayle, and J.-C. Pinoli, “A geometric-based method for recognizing overlapping polygonal-shaped and semi-transparent par- ticles in gray tone images,”Pattern Recognition Letters32(15), 2068– 2079 (2011)
2068
-
[11]
Fusing points and lines for high perfor- mance tracking,
E. Rosten and T. Drummond, “Fusing points and lines for high perfor- mance tracking,” inTenth IEEE International Conference on Computer Vision (ICCV’05) Volume 1,2, 1508–1515, Ieee (2005)
2005
-
[12]
A computational approach to edge detection,
J. Canny, “A computational approach to edge detection,”IEEE Transac- tions on pattern analysis and machine intelligence(6), 679–698 (1986)
1986
-
[13]
A multiscale method for shape recognition of overlapping elliptical particles,
M. De Langlard, H. Al Saddik, F. Lamadie,et al., “A multiscale method for shape recognition of overlapping elliptical particles,” in2016 23rd International Conference on Pattern Recognition (ICPR), 692–697, IEEE (2016)
2016
-
[14]
Recognition of highly overlapping ellipse-like bubble images,
M. Honkanen, P. Saarenrinne, T. Stoor,et al., “Recognition of highly overlapping ellipse-like bubble images,”Measurement Science and Tech- nology16(9), 1760 (2005)
2005
-
[15]
An efficiency improved recognition algorithm for highly overlapping ellipses: Appli- cation to dense bubbly flows,
M. De Langlard, H. Al-Saddik, S. Charton,et al., “An efficiency improved recognition algorithm for highly overlapping ellipses: Appli- cation to dense bubbly flows,”Pattern Recognition Letters101, 88–95 (2018)
2018
-
[16]
R. Girshick, “Fast r-cnn,”arXiv preprint arXiv:1504.08083(2015)
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[17]
Faster r-cnn: Towards real-time object detection with region proposal networks,
S. Ren, K. He, R. Girshick,et al., “Faster r-cnn: Towards real-time object detection with region proposal networks,”IEEE transactions on pattern analysis and machine intelligence39(6), 1137–1149 (2016)
2016
-
[18]
You only look once: Unified, real-time object detection,
J. Redmon, S. Divvala, R. Girshick,et al., “You only look once: Unified, real-time object detection,” inProceedings of the IEEE conference on computer vision and pattern recognition, 779–788 (2016). 12
2016
-
[19]
Yolo9000: better, faster, stronger,
J. Redmon and A. Farhadi, “Yolo9000: better, faster, stronger,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 7263–7271 (2017)
2017
-
[20]
YOLOv3: An Incremental Improvement
J. Redmon, “Yolov3: An incremental improvement,”arXiv preprint arXiv:1804.02767(2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[21]
YOLOv4: Optimal Speed and Accuracy of Object Detection
A. Bochkovskiy, C.-Y . Wang, and H.-Y . M. Liao, “Yolov4: Op- timal speed and accuracy of object detection,”arXiv preprint arXiv:2004.10934(2020)
work page internal anchor Pith review Pith/arXiv arXiv 2004
-
[22]
ultralytics/yolov5: v7. 0- yolov5 sota realtime instance segmentation,
G. Jocher, A. Chaurasia, A. Stoken,et al., “ultralytics/yolov5: v7. 0- yolov5 sota realtime instance segmentation,”Zenodo(2022)
2022
-
[23]
arXiv preprint arXiv:2209.02976 , year=
C. Li, L. Li, H. Jiang,et al., “Yolov6: A single-stage object detection framework for industrial applications,”arXiv preprint arXiv:2209.02976 (2022)
-
[24]
Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors,
C.-Y . Wang, A. Bochkovskiy, and H.-Y . M. Liao, “Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 7464–7475 (2023)
2023
-
[25]
A comprehensive review of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-nas,
J. Terven, D.-M. C ´ordova-Esparza, and J.-A. Romero-Gonz ´alez, “A comprehensive review of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-nas,”Machine Learning and Knowledge Extraction5(4), 1680–1716 (2023)
2023
-
[26]
arXiv preprint arXiv:2402.13616 , year=
C.-Y . Wang, I.-H. Yeh, and H.-Y . M. Liao, “Yolov9: Learning what you want to learn using programmable gradient information,”arXiv preprint arXiv:2402.13616(2024)
-
[27]
arXiv preprint arXiv:2405.14458 , year=
A. Wang, H. Chen, L. Liu,et al., “Yolov10: Real-time end-to-end object detection,”arXiv preprint arXiv:2405.14458(2024)
-
[28]
Ultralytics yolo11,
G. Jocher and J. Qiu, “Ultralytics yolo11,” https://github.com/ultralytics/ultralytics(2024)
2024
-
[29]
Dc-yolov8: small-size object detection algorithm based on camera sensor,
H. Lou, X. Duan, J. Guo,et al., “Dc-yolov8: small-size object detection algorithm based on camera sensor,”Electronics12(10), 2323 (2023)
2023
-
[30]
An efficient and accurate quality inspection model for steel scraps based on dense small-target detection,
P. Xiao, C. Wang, L. Zhu,et al., “An efficient and accurate quality inspection model for steel scraps based on dense small-target detection,” Processes12(8), 1700 (2024)
2024
-
[31]
Enhancing object detection in dense images: Adjustable non-maximum suppression for single-class detection,
K. Noh, S. K. Hong, S. Makonin,et al., “Enhancing object detection in dense images: Adjustable non-maximum suppression for single-class detection,”IEEE Access(2024)
2024
-
[32]
A global-local self-adaptive network for drone-view object detection,
S. Deng, S. Li, K. Xie,et al., “A global-local self-adaptive network for drone-view object detection,”IEEE Transactions on Image Processing 30, 1556–1569 (2020)
2020
-
[33]
Delving into cluttered prohibited item detection for security inspection system,
H. Wang, T. Jia, B. Ma,et al., “Delving into cluttered prohibited item detection for security inspection system,”IEEE Transactions on Industrial Informatics(2024)
2024
-
[34]
Fast and accurate convolution neural network for detecting manufacturing data,
Y . Djenouri, G. Srivastava, and J. C.-W. Lin, “Fast and accurate convolution neural network for detecting manufacturing data,”IEEE Transactions on Industrial Informatics17(4), 2947–2955 (2020)
2020
-
[35]
Deep learning optimization method for counting overlapping rice seeds,
J. Sun, Y . Zhang, X. Zhu,et al., “Deep learning optimization method for counting overlapping rice seeds,”Journal of Food Process Engineering 44(9), e13787 (2021)
2021
-
[36]
Small-object detection based on yolo and dense block via image super-resolution,
Z.-Z. Wang, K. Xie, X.-Y . Zhang,et al., “Small-object detection based on yolo and dense block via image super-resolution,”IEEE Access9, 56416–56429 (2021)
2021
-
[37]
Overlapped pedestrian detection based on yolov5 in crowded scenes,
W. Guo, N. Shen, and T. Zhang, “Overlapped pedestrian detection based on yolov5 in crowded scenes,” in2022 3rd International Conference on Computer Vision, Image and Deep Learning & International Conference on Computer Engineering and Applications (CVIDL & ICCEA), 412– 416, IEEE (2022)
2022
-
[38]
End-to-end people detection in crowded scenes,
R. Stewart, M. Andriluka, and A. Y . Ng, “End-to-end people detection in crowded scenes,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2325–2333 (2016)
2016
-
[39]
Long Short-Term Memory,
S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,”Neural Computation9, 1735–1780 (1997)
1997
-
[40]
People-tracking-by-detection and people-detection-by-tracking,
M. Andriluka, S. Roth, and B. Schiele, “People-tracking-by-detection and people-detection-by-tracking,” in2008 IEEE Conference on com- puter vision and pattern recognition, 1–8, IEEE (2008)
2008
-
[41]
Nms by representative region: Towards crowded pedestrian detection by proposal pairing,
X. Huang, Z. Ge, Z. Jie,et al., “Nms by representative region: Towards crowded pedestrian detection by proposal pairing,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10750–10759 (2020)
2020
-
[42]
Detrs beat yolos on real-time object detection,
Y . Zhao, W. Lv, S. Xu,et al., “Detrs beat yolos on real-time object detection,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 16965–16974 (2024)
2024
-
[43]
Adaptive histogram equalization and its variations,
S. M. Pizer, E. P. Amburn, J. D. Austin,et al., “Adaptive histogram equalization and its variations,”Computer vision, graphics, and image processing39(3), 355–368 (1987)
1987
-
[44]
Detection of outliers using interquartile range technique from intrusion dataset,
H. Vinutha, B. Poornima, and B. Sagar, “Detection of outliers using interquartile range technique from intrusion dataset,” inInformation and decision sciences: Proceedings of the 6th international conference on ficta, 511–518, Springer (2018)
2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.