Evaluation of Convolutional and Transformer-Based Detectors for Weed Detection in Tomato Plantations
Pith reviewed 2026-07-01 08:58 UTC · model grok-4.3
The pith
CNN-based detectors achieve high weed detection performance at lower computational cost than transformer models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
On the GROUNDBASED_WEED dataset, CNN-based detectors like YOLOv26-nano achieve high performance in terms of precision, recall, and average precision at lower computational cost, whereas transformer-based approaches like RT-DETR Large and RF-DETR Medium offer better global context capture but demand higher resources, as confirmed by statistical tests.
What carries the argument
The trade-off between detection accuracy and computational efficiency when evaluating convolutional versus transformer-based object detection models on the GROUNDBASED_WEED dataset.
If this is right
- CNN models are suitable for resource-constrained environments in precision agriculture.
- Transformer models may be chosen when higher accuracy from global context is needed and resources allow.
- Model selection should consider both accuracy metrics and inference speed for practical deployment.
- Non-parametric statistical tests can validate observed performance differences between architectures.
Where Pith is reading between the lines
- Deploying CNN models on edge devices or field robots could enable automated weeding with lower hardware costs.
- The efficiency findings may apply to weed detection in other crops if similar image datasets are collected.
- Hybrid models combining convolutional efficiency with transformer context could address the observed trade-off.
Load-bearing premise
The GROUNDBASED_WEED dataset with its six weed classes plus unidentified plants is representative enough of real tomato plantations to support general claims about detector trade-offs.
What would settle it
Performance results on a larger dataset with more diverse field conditions or additional weed species where transformer models achieve higher accuracy at comparable inference speeds.
Figures
read the original abstract
This paper presents a comparative evaluation of convolutional and transformer-based object detection architectures for early weed detection in tomato plantations. Representative models from each paradigm are considered, including YOLOv26-nano, a recent variant of the YOLO family, and RT-DETR Large and RF-DETR Medium as transformer-based architectures. The evaluation was conducted on the GROUNDBASED_WEED dataset, considering six weed classes and an additional category corresponding to unidentified plants, which allowed for the assessment of performance in terms of detection accuracy and computational efficiency using metrics such as precision, recall, average precision, and inference speed, as well as non-parametric statistical tests. The results highlight a clear trade-off between efficiency and contextual modeling: CNN-based detectors achieve high performance at a lower computational cost, while transformer-based approaches offer better global context capture at the expense of higher resource demands. These results provide practical criteria for model selection in precision agriculture applications.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. This paper presents a comparative evaluation of CNN-based (YOLOv26-nano) and transformer-based (RT-DETR Large, RF-DETR Medium) object detectors for weed detection in tomato plantations. Using the GROUNDBASED_WEED dataset (six weed classes plus unidentified plants), it assesses detection accuracy and computational efficiency via precision, recall, average precision, inference speed, and non-parametric statistical tests. The central claim is a practical trade-off: CNN detectors deliver high performance at lower computational cost, while transformers capture global context better but incur higher resource demands, providing model-selection guidance for precision agriculture.
Significance. If the empirical trade-off is substantiated with verifiable numbers and the dataset proves representative, the work supplies actionable criteria for balancing efficiency and context modeling in agricultural computer vision. The explicit use of non-parametric tests for comparing paradigms is a methodological strength. No code, data, or reproducibility artifacts are referenced, limiting immediate utility.
major comments (2)
- [Abstract] Abstract: despite stating that 'standard metrics and non-parametric tests were used,' the text supplies no numerical results for precision, recall, AP, inference speed, or test outcomes. This absence makes the reported CNN-transformer trade-off impossible to evaluate or reproduce.
- [Dataset description] Dataset section (presumed §2–3): the GROUNDBASED_WEED dataset is described only by class count; no image totals, train-test split, geographic/seasonal coverage, camera parameters, or class-imbalance statistics are given. Without these, the general claim that observed efficiency-context differences constitute 'practical criteria for model selection' cannot be assessed for representativeness beyond this single collection.
minor comments (1)
- [Introduction or Methods] Clarify the exact YOLO variant referenced as 'YOLOv26-nano' and provide citations for RT-DETR and RF-DETR implementations.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We provide point-by-point responses below and have revised the manuscript to address the concerns raised.
read point-by-point responses
-
Referee: [Abstract] Abstract: despite stating that 'standard metrics and non-parametric tests were used,' the text supplies no numerical results for precision, recall, AP, inference speed, or test outcomes. This absence makes the reported CNN-transformer trade-off impossible to evaluate or reproduce.
Authors: We agree that the abstract would be strengthened by the inclusion of key numerical results to allow immediate evaluation of the trade-off. Although the detailed metrics and test outcomes appear in the experimental section, we have revised the abstract to report the primary precision, recall, AP, inference speed values, and statistical test results for the models compared. revision: yes
-
Referee: [Dataset description] Dataset section (presumed §2–3): the GROUNDBASED_WEED dataset is described only by class count; no image totals, train-test split, geographic/seasonal coverage, camera parameters, or class-imbalance statistics are given. Without these, the general claim that observed efficiency-context differences constitute 'practical criteria for model selection' cannot be assessed for representativeness beyond this single collection.
Authors: We acknowledge that additional dataset details are needed to support claims of practical applicability. We have expanded the dataset section to include the total number of images, train-test split, geographic and seasonal coverage, camera parameters, and class-imbalance statistics. revision: yes
Circularity Check
Empirical evaluation paper contains no derivation chain or circular steps
full rationale
The manuscript is a comparative benchmark of off-the-shelf detectors (YOLOv26-nano, RT-DETR, RF-DETR) on the GROUNDBASED_WEED dataset using standard metrics (precision, recall, AP, inference speed) and non-parametric tests. No equations, fitted parameters, predictions derived from inputs, or self-citation load-bearing arguments appear in the provided text. Claims about CNN vs. transformer trade-offs rest on direct measurements rather than any reduction to prior results by construction. Dataset representativeness is a validity concern, not a circularity issue.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Deep learning in agri- culture: A survey,
A. Kamilaris and F. X. Prenafeta-Boldú, “Deep learning in agri- culture: A survey,” Computers and electronics in agriculture, vol. 147, pp. 70–90, 2018
2018
-
[2]
Machine learning in agriculture: A review,
K. G. Liakos, P. Busato, D. Moshou, S. Pearson, and D. Bochtis, “Machine learning in agriculture: A review,” Sensors, vol. 18, no. 8, p. 2674, 2018
2018
-
[3]
Global perspective of herbicide-resistant weeds,
I. Heap, “Global perspective of herbicide-resistant weeds,” Pest management science, vol. 70, no. 9, pp. 1306–1315, 2014
2014
-
[4]
Weed management challenges in modern agriculture,
B. S. Chauhan et al., “Weed management challenges in modern agriculture,” Crop Protection, 2024
2024
-
[5]
Weeds as pathogen hosts and disease risk for crops in the wake of a reduced use of herbicides,
P. Dentika, H. Ozier-Lafontaine, and L. Penet, “Weeds as pathogen hosts and disease risk for crops in the wake of a reduced use of herbicides,” Journal of Fungi, vol. 7, no. 4, p. 283, 2021
2021
-
[6]
Herbicide resistance: Managing weeds in a chang- ing world,
G. Kazinczi, “Herbicide resistance: Managing weeds in a chang- ing world,” Agronomy, vol. 13, no. 6, p. 1595, 2023
2023
-
[7]
Autonomous robotic weed control systems: A review,
D. C. Slaughter, D. Giles, and D. Downey, “Autonomous robotic weed control systems: A review,” Computers and electronics in agriculture, vol. 61, no. 1, pp. 63–78, 2008
2008
-
[8]
Weed detection in soybean crops using convnets,
A. dos Santos Ferreira, D. M. Freitas, G. G. Da Silva, H. Pistori, and M. T. Folhes, “Weed detection in soybean crops using convnets,” Computers and Electronics in Agriculture, vol. 143, pp. 314–324, 2017
2017
-
[9]
Real-time semantic segmentation of crop and weed for precision agriculture robots leveraging background knowledge in cnns,
A. Milioto, P. Lottes, and C. Stachniss, “Real-time semantic segmentation of crop and weed for precision agriculture robots leveraging background knowledge in cnns,” in 2018 IEEE inter- national conference on robotics and automation (ICRA). IEEE, 2018, pp. 2229–2235
2018
-
[10]
Deep learning-based weed–crop recog- nition for smart agricultural equipment: A review,
H.-R. Qu and W.-H. Su, “Deep learning-based weed–crop recog- nition for smart agricultural equipment: A review,” Agronomy, vol. 14, no. 2, p. 363, 2024
2024
-
[11]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[12]
End-to-end object detection with transformers,
N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in European conference on computer vision. Springer, 2020, pp. 213–229
2020
-
[13]
DINOv2: Learning Robust Visual Features without Supervision
M. Oquab, T. Darcet, T. Moutakanni, H. Vo, M. Szafraniec, V. Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Nouby et al., “Dinov2: Learning robust visual features without super- vision,” arXiv preprint arXiv:2304.07193, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[14]
Spatio- temporal stability of intelligent modeling for weed detection in tomato fields,
A. Gómez, H. Moreno, C. Valero, and D. Andújar, “Spatio- temporal stability of intelligent modeling for weed detection in tomato fields,” Agricultural Systems, vol. 228, p. 104394, 2025
2025
-
[15]
Design and imple- mentation of computer vision based in-row weeding system,
X. Wu, S. Aravecchia, and C. Pradalier, “Design and imple- mentation of computer vision based in-row weeding system,” in 2019 International Conference on Robotics and Automation (ICRA). IEEE, 2019, pp. 4218–4224
2019
-
[16]
Weed detection using deep learning: A systematic literature review,
N. Y. Murad, T. Mahmood, A. R. M. Forkan, A. Morshed, P. P. Jayaraman, and M. S. Siddiqui, “Weed detection using deep learning: A systematic literature review,” Sensors, vol. 23, no. 7, p. 3670, 2023
2023
-
[17]
Learning from imbalanced data,
H. He and E. A. Garcia, “Learning from imbalanced data,” IEEE Transactions on knowledge and data engineering, vol. 21, no. 9, pp. 1263–1284, 2009
2009
-
[18]
Classification in the presence of label noise: a survey,
B. Frénay and M. Verleysen, “Classification in the presence of label noise: a survey,” IEEE transactions on neural networks and learning systems, vol. 25, no. 5, pp. 845–869, 2013
2013
-
[19]
Ground-based imagery dataset for early weed classification in tomato crops,
H. Moreno, G. Rivera, and D. Andújar, “Ground-based imagery dataset for early weed classification in tomato crops,” Data in Brief, vol. 63, p. 112249, 2025. [Online]. A vailable: https:// www.sciencedirect.com/science/article/pii/S2352340925009709
2025
-
[20]
Rich feature hierarchies for accurate object detection and semantic segmen- tation,
R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmen- tation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 580–587
2014
-
[21]
Faster r-cnn: Towards real-time object detection with region proposal networks,
S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” Ad- vances in neural information processing systems, vol. 28, 2015
2015
-
[22]
You only look once: Unified, real-time object detection,
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE conference on computer vision and pattern recog- nition, 2016, pp. 779–788
2016
-
[23]
Yolo26: key architectural enhancements and performance benchmarking for real-time object detection,
R. Sapkota, R. H. Cheppally, A. Sharda, and M. Karkee, “Yolo26: key architectural enhancements and performance benchmarking for real-time object detection,” arXiv preprint arXiv:2509.25164, 2025
-
[24]
Detrs beat yolos on real-time object detection,
Y. Zhao, W. Lv, S. Xu, J. Wei, G. Wang, Q. Dang, Y. Liu, and J. Chen, “Detrs beat yolos on real-time object detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 16 965–16 974
2024
-
[25]
arXiv preprint arXiv:2511.09554 (2025) 4, 6, 10
I. Robicheaux et al., “RF-DETR: Neural architecture search for real-time detection transformers,” in International Conference on Learning Representations (ICLR), 2026. [Online]. A vailable: https://arxiv.org/abs/2511.09554
-
[26]
Statistical comparisons of classifiers over multiple data sets,
J. Demšar, “Statistical comparisons of classifiers over multiple data sets,” Journal of Machine Learning Research, vol. 7, pp. 1–30, 2006
2006
-
[27]
A comparison of alternative tests of significance for the problem of m rankings,
M. Friedman, “A comparison of alternative tests of significance for the problem of m rankings,” The Annals of Mathematical Statistics, vol. 11, no. 1, pp. 86–92, 1940
1940
-
[28]
Individual comparisons by ranking methods,
F. Wilcoxon, “Individual comparisons by ranking methods,” Biometrics Bulletin, vol. 1, no. 6, pp. 80–83, 1945
1945
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.