Evaluation of Convolutional and Transformer-Based Detectors for Weed Detection in Tomato Plantations

Alcides Toledo Espinosa; \'Angel Eduardo Zamora-Su\'arez; Gerardo Antonio \'Alvarez Hern\'andez; Juan Irving V\'asquez; Miguel Bola\~nos

arxiv: 2605.00908 · v2 · pith:HCPJE4IAnew · submitted 2026-04-29 · 💻 cs.CV

Evaluation of Convolutional and Transformer-Based Detectors for Weed Detection in Tomato Plantations

Alcides Toledo Espinosa , Gerardo Antonio \'Alvarez Hern\'andez , \'Angel Eduardo Zamora-Su\'arez , Miguel Bola\~nos , Juan Irving V\'asquez This is my paper

Pith reviewed 2026-07-01 08:58 UTC · model grok-4.3

classification 💻 cs.CV

keywords weed detectiontomato plantationsobject detectionconvolutional neural networkstransformer modelsYOLOprecision agriculturecomputational efficiency

0 comments

The pith

CNN-based detectors achieve high weed detection performance at lower computational cost than transformer models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares convolutional and transformer-based object detectors for identifying weeds in tomato plantations. It evaluates YOLOv26-nano against RT-DETR Large and RF-DETR Medium on a dataset containing six weed classes and unidentified plants. Results indicate that CNN models deliver strong accuracy with faster inference, while transformers capture more context but use more resources. This matters for selecting models suitable for real-time applications in agriculture where computing power may be limited.

Core claim

On the GROUNDBASED_WEED dataset, CNN-based detectors like YOLOv26-nano achieve high performance in terms of precision, recall, and average precision at lower computational cost, whereas transformer-based approaches like RT-DETR Large and RF-DETR Medium offer better global context capture but demand higher resources, as confirmed by statistical tests.

What carries the argument

The trade-off between detection accuracy and computational efficiency when evaluating convolutional versus transformer-based object detection models on the GROUNDBASED_WEED dataset.

If this is right

CNN models are suitable for resource-constrained environments in precision agriculture.
Transformer models may be chosen when higher accuracy from global context is needed and resources allow.
Model selection should consider both accuracy metrics and inference speed for practical deployment.
Non-parametric statistical tests can validate observed performance differences between architectures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Deploying CNN models on edge devices or field robots could enable automated weeding with lower hardware costs.
The efficiency findings may apply to weed detection in other crops if similar image datasets are collected.
Hybrid models combining convolutional efficiency with transformer context could address the observed trade-off.

Load-bearing premise

The GROUNDBASED_WEED dataset with its six weed classes plus unidentified plants is representative enough of real tomato plantations to support general claims about detector trade-offs.

What would settle it

Performance results on a larger dataset with more diverse field conditions or additional weed species where transformer models achieve higher accuracy at comparable inference speeds.

Figures

Figures reproduced from arXiv: 2605.00908 by Alcides Toledo Espinosa, \'Angel Eduardo Zamora-Su\'arez, Gerardo Antonio \'Alvarez Hern\'andez, Juan Irving V\'asquez, Miguel Bola\~nos.

**Figure 1.** Figure 1: Distribution of annotated instances in the GROUNDBASED_WEED dataset, highlighting class imbalance and ambiguous samples. These characteristics make GROUNDBASED_WEED a realistic benchmark for evaluating model robustness and generalization, bridging the gap between controlled experiments and real-world deployment [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗

**Figure 1.** Figure 1: Examples of dataset classes: LYPES: Solanum [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Example of YOLOv26-nano inference trained on the GROUNDBASED_WEED dataset It is concluded that hierarchical CNN optimization provides a more robust approach for real-time precision agriculture applications, particularly for handling small, dispersed objects. Consequently, YOLOv26-nano emerges as the most viable architecture for integration into embedded robotic systems operating under real-world field c… view at source ↗

**Figure 2.** Figure 2: Distribution of annotated instances in the [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Representative inference examples generated by YOLOv26-nano, RF-DETR Medium, RT-DETR large on the [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

read the original abstract

This paper presents a comparative evaluation of convolutional and transformer-based object detection architectures for early weed detection in tomato plantations. Representative models from each paradigm are considered, including YOLOv26-nano, a recent variant of the YOLO family, and RT-DETR Large and RF-DETR Medium as transformer-based architectures. The evaluation was conducted on the GROUNDBASED_WEED dataset, considering six weed classes and an additional category corresponding to unidentified plants, which allowed for the assessment of performance in terms of detection accuracy and computational efficiency using metrics such as precision, recall, average precision, and inference speed, as well as non-parametric statistical tests. The results highlight a clear trade-off between efficiency and contextual modeling: CNN-based detectors achieve high performance at a lower computational cost, while transformer-based approaches offer better global context capture at the expense of higher resource demands. These results provide practical criteria for model selection in precision agriculture applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a basic side-by-side run of three existing detectors on one named agricultural dataset that adds no new methods or verifiable numbers.

read the letter

This paper runs YOLOv26-nano, RT-DETR Large, and RF-DETR Medium on the GROUNDBASED_WEED dataset for weed detection in tomato fields. It reports the usual metrics and concludes that CNNs deliver high performance at lower cost while transformers handle global context at higher resource cost.

Nothing here is new. The models are already published, the dataset is given, and no architecture, loss, or theoretical point is introduced. The work simply applies the three detectors to six weed classes plus an unidentified-plants category and measures precision, recall, average precision, inference speed, and non-parametric tests.

The evaluation follows standard practice and the inclusion of the unidentified category is a sensible practical choice. The citation pattern is ordinary for an applied comparison.

The soft spots are clear. The abstract contains no numerical results, no image counts, no train-test split details, and no code or data link, so the central trade-off claim cannot be checked. The dataset description is also thin on collection conditions, geography, or variation, which leaves the representativeness question open. If the images come from limited sites or seasons, the reported efficiency versus context difference may not generalize beyond this collection.

The paper is mainly for engineers who need a quick reference point when picking a detector for similar tomato-weed tasks. A reader seeking new techniques or broad claims will find little of use.

I would not bring it to a reading group and would not cite it. It does not look like it deserves peer review because the evidence supplied is too limited to support the application guidance offered.

Referee Report

2 major / 1 minor

Summary. This paper presents a comparative evaluation of CNN-based (YOLOv26-nano) and transformer-based (RT-DETR Large, RF-DETR Medium) object detectors for weed detection in tomato plantations. Using the GROUNDBASED_WEED dataset (six weed classes plus unidentified plants), it assesses detection accuracy and computational efficiency via precision, recall, average precision, inference speed, and non-parametric statistical tests. The central claim is a practical trade-off: CNN detectors deliver high performance at lower computational cost, while transformers capture global context better but incur higher resource demands, providing model-selection guidance for precision agriculture.

Significance. If the empirical trade-off is substantiated with verifiable numbers and the dataset proves representative, the work supplies actionable criteria for balancing efficiency and context modeling in agricultural computer vision. The explicit use of non-parametric tests for comparing paradigms is a methodological strength. No code, data, or reproducibility artifacts are referenced, limiting immediate utility.

major comments (2)

[Abstract] Abstract: despite stating that 'standard metrics and non-parametric tests were used,' the text supplies no numerical results for precision, recall, AP, inference speed, or test outcomes. This absence makes the reported CNN-transformer trade-off impossible to evaluate or reproduce.
[Dataset description] Dataset section (presumed §2–3): the GROUNDBASED_WEED dataset is described only by class count; no image totals, train-test split, geographic/seasonal coverage, camera parameters, or class-imbalance statistics are given. Without these, the general claim that observed efficiency-context differences constitute 'practical criteria for model selection' cannot be assessed for representativeness beyond this single collection.

minor comments (1)

[Introduction or Methods] Clarify the exact YOLO variant referenced as 'YOLOv26-nano' and provide citations for RT-DETR and RF-DETR implementations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We provide point-by-point responses below and have revised the manuscript to address the concerns raised.

read point-by-point responses

Referee: [Abstract] Abstract: despite stating that 'standard metrics and non-parametric tests were used,' the text supplies no numerical results for precision, recall, AP, inference speed, or test outcomes. This absence makes the reported CNN-transformer trade-off impossible to evaluate or reproduce.

Authors: We agree that the abstract would be strengthened by the inclusion of key numerical results to allow immediate evaluation of the trade-off. Although the detailed metrics and test outcomes appear in the experimental section, we have revised the abstract to report the primary precision, recall, AP, inference speed values, and statistical test results for the models compared. revision: yes
Referee: [Dataset description] Dataset section (presumed §2–3): the GROUNDBASED_WEED dataset is described only by class count; no image totals, train-test split, geographic/seasonal coverage, camera parameters, or class-imbalance statistics are given. Without these, the general claim that observed efficiency-context differences constitute 'practical criteria for model selection' cannot be assessed for representativeness beyond this single collection.

Authors: We acknowledge that additional dataset details are needed to support claims of practical applicability. We have expanded the dataset section to include the total number of images, train-test split, geographic and seasonal coverage, camera parameters, and class-imbalance statistics. revision: yes

Circularity Check

0 steps flagged

Empirical evaluation paper contains no derivation chain or circular steps

full rationale

The manuscript is a comparative benchmark of off-the-shelf detectors (YOLOv26-nano, RT-DETR, RF-DETR) on the GROUNDBASED_WEED dataset using standard metrics (precision, recall, AP, inference speed) and non-parametric tests. No equations, fitted parameters, predictions derived from inputs, or self-citation load-bearing arguments appear in the provided text. Claims about CNN vs. transformer trade-offs rest on direct measurements rather than any reduction to prior results by construction. Dataset representativeness is a validity concern, not a circularity issue.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Empirical benchmark study; contains no free parameters, mathematical axioms, or postulated entities beyond the standard assumption that the chosen dataset and metrics are appropriate for the task.

pith-pipeline@v0.9.1-grok · 5712 in / 1020 out tokens · 23340 ms · 2026-07-01T08:58:17.181227+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

28 extracted references · 4 canonical work pages · 2 internal anchors

[1]

Deep learning in agri- culture: A survey,

A. Kamilaris and F. X. Prenafeta-Boldú, “Deep learning in agri- culture: A survey,” Computers and electronics in agriculture, vol. 147, pp. 70–90, 2018

2018
[2]

Machine learning in agriculture: A review,

K. G. Liakos, P. Busato, D. Moshou, S. Pearson, and D. Bochtis, “Machine learning in agriculture: A review,” Sensors, vol. 18, no. 8, p. 2674, 2018

2018
[3]

Global perspective of herbicide-resistant weeds,

I. Heap, “Global perspective of herbicide-resistant weeds,” Pest management science, vol. 70, no. 9, pp. 1306–1315, 2014

2014
[4]

Weed management challenges in modern agriculture,

B. S. Chauhan et al., “Weed management challenges in modern agriculture,” Crop Protection, 2024

2024
[5]

Weeds as pathogen hosts and disease risk for crops in the wake of a reduced use of herbicides,

P. Dentika, H. Ozier-Lafontaine, and L. Penet, “Weeds as pathogen hosts and disease risk for crops in the wake of a reduced use of herbicides,” Journal of Fungi, vol. 7, no. 4, p. 283, 2021

2021
[6]

Herbicide resistance: Managing weeds in a chang- ing world,

G. Kazinczi, “Herbicide resistance: Managing weeds in a chang- ing world,” Agronomy, vol. 13, no. 6, p. 1595, 2023

2023
[7]

Autonomous robotic weed control systems: A review,

D. C. Slaughter, D. Giles, and D. Downey, “Autonomous robotic weed control systems: A review,” Computers and electronics in agriculture, vol. 61, no. 1, pp. 63–78, 2008

2008
[8]

Weed detection in soybean crops using convnets,

A. dos Santos Ferreira, D. M. Freitas, G. G. Da Silva, H. Pistori, and M. T. Folhes, “Weed detection in soybean crops using convnets,” Computers and Electronics in Agriculture, vol. 143, pp. 314–324, 2017

2017
[9]

Real-time semantic segmentation of crop and weed for precision agriculture robots leveraging background knowledge in cnns,

A. Milioto, P. Lottes, and C. Stachniss, “Real-time semantic segmentation of crop and weed for precision agriculture robots leveraging background knowledge in cnns,” in 2018 IEEE inter- national conference on robotics and automation (ICRA). IEEE, 2018, pp. 2229–2235

2018
[10]

Deep learning-based weed–crop recog- nition for smart agricultural equipment: A review,

H.-R. Qu and W.-H. Su, “Deep learning-based weed–crop recog- nition for smart agricultural equipment: A review,” Agronomy, vol. 14, no. 2, p. 363, 2024

2024
[11]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010
[12]

End-to-end object detection with transformers,

N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in European conference on computer vision. Springer, 2020, pp. 213–229

2020
[13]

DINOv2: Learning Robust Visual Features without Supervision

M. Oquab, T. Darcet, T. Moutakanni, H. Vo, M. Szafraniec, V. Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Nouby et al., “Dinov2: Learning robust visual features without super- vision,” arXiv preprint arXiv:2304.07193, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[14]

Spatio- temporal stability of intelligent modeling for weed detection in tomato fields,

A. Gómez, H. Moreno, C. Valero, and D. Andújar, “Spatio- temporal stability of intelligent modeling for weed detection in tomato fields,” Agricultural Systems, vol. 228, p. 104394, 2025

2025
[15]

Design and imple- mentation of computer vision based in-row weeding system,

X. Wu, S. Aravecchia, and C. Pradalier, “Design and imple- mentation of computer vision based in-row weeding system,” in 2019 International Conference on Robotics and Automation (ICRA). IEEE, 2019, pp. 4218–4224

2019
[16]

Weed detection using deep learning: A systematic literature review,

N. Y. Murad, T. Mahmood, A. R. M. Forkan, A. Morshed, P. P. Jayaraman, and M. S. Siddiqui, “Weed detection using deep learning: A systematic literature review,” Sensors, vol. 23, no. 7, p. 3670, 2023

2023
[17]

Learning from imbalanced data,

H. He and E. A. Garcia, “Learning from imbalanced data,” IEEE Transactions on knowledge and data engineering, vol. 21, no. 9, pp. 1263–1284, 2009

2009
[18]

Classification in the presence of label noise: a survey,

B. Frénay and M. Verleysen, “Classification in the presence of label noise: a survey,” IEEE transactions on neural networks and learning systems, vol. 25, no. 5, pp. 845–869, 2013

2013
[19]

Ground-based imagery dataset for early weed classification in tomato crops,

H. Moreno, G. Rivera, and D. Andújar, “Ground-based imagery dataset for early weed classification in tomato crops,” Data in Brief, vol. 63, p. 112249, 2025. [Online]. A vailable: https:// www.sciencedirect.com/science/article/pii/S2352340925009709

2025
[20]

Rich feature hierarchies for accurate object detection and semantic segmen- tation,

R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmen- tation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 580–587

2014
[21]

Faster r-cnn: Towards real-time object detection with region proposal networks,

S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” Ad- vances in neural information processing systems, vol. 28, 2015

2015
[22]

You only look once: Unified, real-time object detection,

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE conference on computer vision and pattern recog- nition, 2016, pp. 779–788

2016
[23]

Yolo26: key architectural enhancements and performance benchmarking for real-time object detection,

R. Sapkota, R. H. Cheppally, A. Sharda, and M. Karkee, “Yolo26: key architectural enhancements and performance benchmarking for real-time object detection,” arXiv preprint arXiv:2509.25164, 2025

work page arXiv 2025
[24]

Detrs beat yolos on real-time object detection,

Y. Zhao, W. Lv, S. Xu, J. Wei, G. Wang, Q. Dang, Y. Liu, and J. Chen, “Detrs beat yolos on real-time object detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 16 965–16 974

2024
[25]

arXiv preprint arXiv:2511.09554 (2025) 4, 6, 10

I. Robicheaux et al., “RF-DETR: Neural architecture search for real-time detection transformers,” in International Conference on Learning Representations (ICLR), 2026. [Online]. A vailable: https://arxiv.org/abs/2511.09554

work page arXiv 2026
[26]

Statistical comparisons of classifiers over multiple data sets,

J. Demšar, “Statistical comparisons of classifiers over multiple data sets,” Journal of Machine Learning Research, vol. 7, pp. 1–30, 2006

2006
[27]

A comparison of alternative tests of significance for the problem of m rankings,

M. Friedman, “A comparison of alternative tests of significance for the problem of m rankings,” The Annals of Mathematical Statistics, vol. 11, no. 1, pp. 86–92, 1940

1940
[28]

Individual comparisons by ranking methods,

F. Wilcoxon, “Individual comparisons by ranking methods,” Biometrics Bulletin, vol. 1, no. 6, pp. 80–83, 1945

1945

[1] [1]

Deep learning in agri- culture: A survey,

A. Kamilaris and F. X. Prenafeta-Boldú, “Deep learning in agri- culture: A survey,” Computers and electronics in agriculture, vol. 147, pp. 70–90, 2018

2018

[2] [2]

Machine learning in agriculture: A review,

K. G. Liakos, P. Busato, D. Moshou, S. Pearson, and D. Bochtis, “Machine learning in agriculture: A review,” Sensors, vol. 18, no. 8, p. 2674, 2018

2018

[3] [3]

Global perspective of herbicide-resistant weeds,

I. Heap, “Global perspective of herbicide-resistant weeds,” Pest management science, vol. 70, no. 9, pp. 1306–1315, 2014

2014

[4] [4]

Weed management challenges in modern agriculture,

B. S. Chauhan et al., “Weed management challenges in modern agriculture,” Crop Protection, 2024

2024

[5] [5]

Weeds as pathogen hosts and disease risk for crops in the wake of a reduced use of herbicides,

P. Dentika, H. Ozier-Lafontaine, and L. Penet, “Weeds as pathogen hosts and disease risk for crops in the wake of a reduced use of herbicides,” Journal of Fungi, vol. 7, no. 4, p. 283, 2021

2021

[6] [6]

Herbicide resistance: Managing weeds in a chang- ing world,

G. Kazinczi, “Herbicide resistance: Managing weeds in a chang- ing world,” Agronomy, vol. 13, no. 6, p. 1595, 2023

2023

[7] [7]

Autonomous robotic weed control systems: A review,

D. C. Slaughter, D. Giles, and D. Downey, “Autonomous robotic weed control systems: A review,” Computers and electronics in agriculture, vol. 61, no. 1, pp. 63–78, 2008

2008

[8] [8]

Weed detection in soybean crops using convnets,

A. dos Santos Ferreira, D. M. Freitas, G. G. Da Silva, H. Pistori, and M. T. Folhes, “Weed detection in soybean crops using convnets,” Computers and Electronics in Agriculture, vol. 143, pp. 314–324, 2017

2017

[9] [9]

Real-time semantic segmentation of crop and weed for precision agriculture robots leveraging background knowledge in cnns,

A. Milioto, P. Lottes, and C. Stachniss, “Real-time semantic segmentation of crop and weed for precision agriculture robots leveraging background knowledge in cnns,” in 2018 IEEE inter- national conference on robotics and automation (ICRA). IEEE, 2018, pp. 2229–2235

2018

[10] [10]

Deep learning-based weed–crop recog- nition for smart agricultural equipment: A review,

H.-R. Qu and W.-H. Su, “Deep learning-based weed–crop recog- nition for smart agricultural equipment: A review,” Agronomy, vol. 14, no. 2, p. 363, 2024

2024

[11] [11]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010

[12] [12]

End-to-end object detection with transformers,

N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in European conference on computer vision. Springer, 2020, pp. 213–229

2020

[13] [13]

DINOv2: Learning Robust Visual Features without Supervision

M. Oquab, T. Darcet, T. Moutakanni, H. Vo, M. Szafraniec, V. Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Nouby et al., “Dinov2: Learning robust visual features without super- vision,” arXiv preprint arXiv:2304.07193, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[14] [14]

Spatio- temporal stability of intelligent modeling for weed detection in tomato fields,

A. Gómez, H. Moreno, C. Valero, and D. Andújar, “Spatio- temporal stability of intelligent modeling for weed detection in tomato fields,” Agricultural Systems, vol. 228, p. 104394, 2025

2025

[15] [15]

Design and imple- mentation of computer vision based in-row weeding system,

X. Wu, S. Aravecchia, and C. Pradalier, “Design and imple- mentation of computer vision based in-row weeding system,” in 2019 International Conference on Robotics and Automation (ICRA). IEEE, 2019, pp. 4218–4224

2019

[16] [16]

Weed detection using deep learning: A systematic literature review,

N. Y. Murad, T. Mahmood, A. R. M. Forkan, A. Morshed, P. P. Jayaraman, and M. S. Siddiqui, “Weed detection using deep learning: A systematic literature review,” Sensors, vol. 23, no. 7, p. 3670, 2023

2023

[17] [17]

Learning from imbalanced data,

H. He and E. A. Garcia, “Learning from imbalanced data,” IEEE Transactions on knowledge and data engineering, vol. 21, no. 9, pp. 1263–1284, 2009

2009

[18] [18]

Classification in the presence of label noise: a survey,

B. Frénay and M. Verleysen, “Classification in the presence of label noise: a survey,” IEEE transactions on neural networks and learning systems, vol. 25, no. 5, pp. 845–869, 2013

2013

[19] [19]

Ground-based imagery dataset for early weed classification in tomato crops,

H. Moreno, G. Rivera, and D. Andújar, “Ground-based imagery dataset for early weed classification in tomato crops,” Data in Brief, vol. 63, p. 112249, 2025. [Online]. A vailable: https:// www.sciencedirect.com/science/article/pii/S2352340925009709

2025

[20] [20]

Rich feature hierarchies for accurate object detection and semantic segmen- tation,

R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmen- tation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 580–587

2014

[21] [21]

Faster r-cnn: Towards real-time object detection with region proposal networks,

S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” Ad- vances in neural information processing systems, vol. 28, 2015

2015

[22] [22]

You only look once: Unified, real-time object detection,

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE conference on computer vision and pattern recog- nition, 2016, pp. 779–788

2016

[23] [23]

Yolo26: key architectural enhancements and performance benchmarking for real-time object detection,

R. Sapkota, R. H. Cheppally, A. Sharda, and M. Karkee, “Yolo26: key architectural enhancements and performance benchmarking for real-time object detection,” arXiv preprint arXiv:2509.25164, 2025

work page arXiv 2025

[24] [24]

Detrs beat yolos on real-time object detection,

Y. Zhao, W. Lv, S. Xu, J. Wei, G. Wang, Q. Dang, Y. Liu, and J. Chen, “Detrs beat yolos on real-time object detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 16 965–16 974

2024

[25] [25]

arXiv preprint arXiv:2511.09554 (2025) 4, 6, 10

I. Robicheaux et al., “RF-DETR: Neural architecture search for real-time detection transformers,” in International Conference on Learning Representations (ICLR), 2026. [Online]. A vailable: https://arxiv.org/abs/2511.09554

work page arXiv 2026

[26] [26]

Statistical comparisons of classifiers over multiple data sets,

J. Demšar, “Statistical comparisons of classifiers over multiple data sets,” Journal of Machine Learning Research, vol. 7, pp. 1–30, 2006

2006

[27] [27]

A comparison of alternative tests of significance for the problem of m rankings,

M. Friedman, “A comparison of alternative tests of significance for the problem of m rankings,” The Annals of Mathematical Statistics, vol. 11, no. 1, pp. 86–92, 1940

1940

[28] [28]

Individual comparisons by ranking methods,

F. Wilcoxon, “Individual comparisons by ranking methods,” Biometrics Bulletin, vol. 1, no. 6, pp. 80–83, 1945

1945