pith. sign in

arxiv: 2605.00908 · v2 · pith:HCPJE4IAnew · submitted 2026-04-29 · 💻 cs.CV

Evaluation of Convolutional and Transformer-Based Detectors for Weed Detection in Tomato Plantations

Pith reviewed 2026-07-01 08:58 UTC · model grok-4.3

classification 💻 cs.CV
keywords weed detectiontomato plantationsobject detectionconvolutional neural networkstransformer modelsYOLOprecision agriculturecomputational efficiency
0
0 comments X

The pith

CNN-based detectors achieve high weed detection performance at lower computational cost than transformer models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares convolutional and transformer-based object detectors for identifying weeds in tomato plantations. It evaluates YOLOv26-nano against RT-DETR Large and RF-DETR Medium on a dataset containing six weed classes and unidentified plants. Results indicate that CNN models deliver strong accuracy with faster inference, while transformers capture more context but use more resources. This matters for selecting models suitable for real-time applications in agriculture where computing power may be limited.

Core claim

On the GROUNDBASED_WEED dataset, CNN-based detectors like YOLOv26-nano achieve high performance in terms of precision, recall, and average precision at lower computational cost, whereas transformer-based approaches like RT-DETR Large and RF-DETR Medium offer better global context capture but demand higher resources, as confirmed by statistical tests.

What carries the argument

The trade-off between detection accuracy and computational efficiency when evaluating convolutional versus transformer-based object detection models on the GROUNDBASED_WEED dataset.

If this is right

  • CNN models are suitable for resource-constrained environments in precision agriculture.
  • Transformer models may be chosen when higher accuracy from global context is needed and resources allow.
  • Model selection should consider both accuracy metrics and inference speed for practical deployment.
  • Non-parametric statistical tests can validate observed performance differences between architectures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Deploying CNN models on edge devices or field robots could enable automated weeding with lower hardware costs.
  • The efficiency findings may apply to weed detection in other crops if similar image datasets are collected.
  • Hybrid models combining convolutional efficiency with transformer context could address the observed trade-off.

Load-bearing premise

The GROUNDBASED_WEED dataset with its six weed classes plus unidentified plants is representative enough of real tomato plantations to support general claims about detector trade-offs.

What would settle it

Performance results on a larger dataset with more diverse field conditions or additional weed species where transformer models achieve higher accuracy at comparable inference speeds.

Figures

Figures reproduced from arXiv: 2605.00908 by Alcides Toledo Espinosa, \'Angel Eduardo Zamora-Su\'arez, Gerardo Antonio \'Alvarez Hern\'andez, Juan Irving V\'asquez, Miguel Bola\~nos.

Figure 1
Figure 1. Figure 1: Distribution of annotated instances in the GROUNDBASED_WEED dataset, highlighting class imbalance and ambiguous samples. These characteristics make GROUNDBASED_WEED a realistic benchmark for evaluating model robustness and generalization, bridging the gap between controlled experiments and real-world deployment [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 1
Figure 1. Figure 1: Examples of dataset classes: LYPES: Solanum [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Example of YOLOv26-nano inference trained on the GROUND￾BASED_WEED dataset It is concluded that hierarchical CNN optimization provides a more robust ap￾proach for real-time precision agriculture applications, particularly for handling small, dispersed objects. Consequently, YOLOv26-nano emerges as the most vi￾able architecture for integration into embedded robotic systems operating under real-world field c… view at source ↗
Figure 2
Figure 2. Figure 2: Distribution of annotated instances in the [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Representative inference examples generated by YOLOv26-nano, RF-DETR Medium, RT-DETR large on the [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
read the original abstract

This paper presents a comparative evaluation of convolutional and transformer-based object detection architectures for early weed detection in tomato plantations. Representative models from each paradigm are considered, including YOLOv26-nano, a recent variant of the YOLO family, and RT-DETR Large and RF-DETR Medium as transformer-based architectures. The evaluation was conducted on the GROUNDBASED_WEED dataset, considering six weed classes and an additional category corresponding to unidentified plants, which allowed for the assessment of performance in terms of detection accuracy and computational efficiency using metrics such as precision, recall, average precision, and inference speed, as well as non-parametric statistical tests. The results highlight a clear trade-off between efficiency and contextual modeling: CNN-based detectors achieve high performance at a lower computational cost, while transformer-based approaches offer better global context capture at the expense of higher resource demands. These results provide practical criteria for model selection in precision agriculture applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. This paper presents a comparative evaluation of CNN-based (YOLOv26-nano) and transformer-based (RT-DETR Large, RF-DETR Medium) object detectors for weed detection in tomato plantations. Using the GROUNDBASED_WEED dataset (six weed classes plus unidentified plants), it assesses detection accuracy and computational efficiency via precision, recall, average precision, inference speed, and non-parametric statistical tests. The central claim is a practical trade-off: CNN detectors deliver high performance at lower computational cost, while transformers capture global context better but incur higher resource demands, providing model-selection guidance for precision agriculture.

Significance. If the empirical trade-off is substantiated with verifiable numbers and the dataset proves representative, the work supplies actionable criteria for balancing efficiency and context modeling in agricultural computer vision. The explicit use of non-parametric tests for comparing paradigms is a methodological strength. No code, data, or reproducibility artifacts are referenced, limiting immediate utility.

major comments (2)
  1. [Abstract] Abstract: despite stating that 'standard metrics and non-parametric tests were used,' the text supplies no numerical results for precision, recall, AP, inference speed, or test outcomes. This absence makes the reported CNN-transformer trade-off impossible to evaluate or reproduce.
  2. [Dataset description] Dataset section (presumed §2–3): the GROUNDBASED_WEED dataset is described only by class count; no image totals, train-test split, geographic/seasonal coverage, camera parameters, or class-imbalance statistics are given. Without these, the general claim that observed efficiency-context differences constitute 'practical criteria for model selection' cannot be assessed for representativeness beyond this single collection.
minor comments (1)
  1. [Introduction or Methods] Clarify the exact YOLO variant referenced as 'YOLOv26-nano' and provide citations for RT-DETR and RF-DETR implementations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We provide point-by-point responses below and have revised the manuscript to address the concerns raised.

read point-by-point responses
  1. Referee: [Abstract] Abstract: despite stating that 'standard metrics and non-parametric tests were used,' the text supplies no numerical results for precision, recall, AP, inference speed, or test outcomes. This absence makes the reported CNN-transformer trade-off impossible to evaluate or reproduce.

    Authors: We agree that the abstract would be strengthened by the inclusion of key numerical results to allow immediate evaluation of the trade-off. Although the detailed metrics and test outcomes appear in the experimental section, we have revised the abstract to report the primary precision, recall, AP, inference speed values, and statistical test results for the models compared. revision: yes

  2. Referee: [Dataset description] Dataset section (presumed §2–3): the GROUNDBASED_WEED dataset is described only by class count; no image totals, train-test split, geographic/seasonal coverage, camera parameters, or class-imbalance statistics are given. Without these, the general claim that observed efficiency-context differences constitute 'practical criteria for model selection' cannot be assessed for representativeness beyond this single collection.

    Authors: We acknowledge that additional dataset details are needed to support claims of practical applicability. We have expanded the dataset section to include the total number of images, train-test split, geographic and seasonal coverage, camera parameters, and class-imbalance statistics. revision: yes

Circularity Check

0 steps flagged

Empirical evaluation paper contains no derivation chain or circular steps

full rationale

The manuscript is a comparative benchmark of off-the-shelf detectors (YOLOv26-nano, RT-DETR, RF-DETR) on the GROUNDBASED_WEED dataset using standard metrics (precision, recall, AP, inference speed) and non-parametric tests. No equations, fitted parameters, predictions derived from inputs, or self-citation load-bearing arguments appear in the provided text. Claims about CNN vs. transformer trade-offs rest on direct measurements rather than any reduction to prior results by construction. Dataset representativeness is a validity concern, not a circularity issue.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Empirical benchmark study; contains no free parameters, mathematical axioms, or postulated entities beyond the standard assumption that the chosen dataset and metrics are appropriate for the task.

pith-pipeline@v0.9.1-grok · 5712 in / 1020 out tokens · 23340 ms · 2026-07-01T08:58:17.181227+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

28 extracted references · 4 canonical work pages · 2 internal anchors

  1. [1]

    Deep learning in agri- culture: A survey,

    A. Kamilaris and F. X. Prenafeta-Boldú, “Deep learning in agri- culture: A survey,” Computers and electronics in agriculture, vol. 147, pp. 70–90, 2018

  2. [2]

    Machine learning in agriculture: A review,

    K. G. Liakos, P. Busato, D. Moshou, S. Pearson, and D. Bochtis, “Machine learning in agriculture: A review,” Sensors, vol. 18, no. 8, p. 2674, 2018

  3. [3]

    Global perspective of herbicide-resistant weeds,

    I. Heap, “Global perspective of herbicide-resistant weeds,” Pest management science, vol. 70, no. 9, pp. 1306–1315, 2014

  4. [4]

    Weed management challenges in modern agriculture,

    B. S. Chauhan et al., “Weed management challenges in modern agriculture,” Crop Protection, 2024

  5. [5]

    Weeds as pathogen hosts and disease risk for crops in the wake of a reduced use of herbicides,

    P. Dentika, H. Ozier-Lafontaine, and L. Penet, “Weeds as pathogen hosts and disease risk for crops in the wake of a reduced use of herbicides,” Journal of Fungi, vol. 7, no. 4, p. 283, 2021

  6. [6]

    Herbicide resistance: Managing weeds in a chang- ing world,

    G. Kazinczi, “Herbicide resistance: Managing weeds in a chang- ing world,” Agronomy, vol. 13, no. 6, p. 1595, 2023

  7. [7]

    Autonomous robotic weed control systems: A review,

    D. C. Slaughter, D. Giles, and D. Downey, “Autonomous robotic weed control systems: A review,” Computers and electronics in agriculture, vol. 61, no. 1, pp. 63–78, 2008

  8. [8]

    Weed detection in soybean crops using convnets,

    A. dos Santos Ferreira, D. M. Freitas, G. G. Da Silva, H. Pistori, and M. T. Folhes, “Weed detection in soybean crops using convnets,” Computers and Electronics in Agriculture, vol. 143, pp. 314–324, 2017

  9. [9]

    Real-time semantic segmentation of crop and weed for precision agriculture robots leveraging background knowledge in cnns,

    A. Milioto, P. Lottes, and C. Stachniss, “Real-time semantic segmentation of crop and weed for precision agriculture robots leveraging background knowledge in cnns,” in 2018 IEEE inter- national conference on robotics and automation (ICRA). IEEE, 2018, pp. 2229–2235

  10. [10]

    Deep learning-based weed–crop recog- nition for smart agricultural equipment: A review,

    H.-R. Qu and W.-H. Su, “Deep learning-based weed–crop recog- nition for smart agricultural equipment: A review,” Agronomy, vol. 14, no. 2, p. 363, 2024

  11. [11]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020

  12. [12]

    End-to-end object detection with transformers,

    N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in European conference on computer vision. Springer, 2020, pp. 213–229

  13. [13]

    DINOv2: Learning Robust Visual Features without Supervision

    M. Oquab, T. Darcet, T. Moutakanni, H. Vo, M. Szafraniec, V. Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Nouby et al., “Dinov2: Learning robust visual features without super- vision,” arXiv preprint arXiv:2304.07193, 2023

  14. [14]

    Spatio- temporal stability of intelligent modeling for weed detection in tomato fields,

    A. Gómez, H. Moreno, C. Valero, and D. Andújar, “Spatio- temporal stability of intelligent modeling for weed detection in tomato fields,” Agricultural Systems, vol. 228, p. 104394, 2025

  15. [15]

    Design and imple- mentation of computer vision based in-row weeding system,

    X. Wu, S. Aravecchia, and C. Pradalier, “Design and imple- mentation of computer vision based in-row weeding system,” in 2019 International Conference on Robotics and Automation (ICRA). IEEE, 2019, pp. 4218–4224

  16. [16]

    Weed detection using deep learning: A systematic literature review,

    N. Y. Murad, T. Mahmood, A. R. M. Forkan, A. Morshed, P. P. Jayaraman, and M. S. Siddiqui, “Weed detection using deep learning: A systematic literature review,” Sensors, vol. 23, no. 7, p. 3670, 2023

  17. [17]

    Learning from imbalanced data,

    H. He and E. A. Garcia, “Learning from imbalanced data,” IEEE Transactions on knowledge and data engineering, vol. 21, no. 9, pp. 1263–1284, 2009

  18. [18]

    Classification in the presence of label noise: a survey,

    B. Frénay and M. Verleysen, “Classification in the presence of label noise: a survey,” IEEE transactions on neural networks and learning systems, vol. 25, no. 5, pp. 845–869, 2013

  19. [19]

    Ground-based imagery dataset for early weed classification in tomato crops,

    H. Moreno, G. Rivera, and D. Andújar, “Ground-based imagery dataset for early weed classification in tomato crops,” Data in Brief, vol. 63, p. 112249, 2025. [Online]. A vailable: https:// www.sciencedirect.com/science/article/pii/S2352340925009709

  20. [20]

    Rich feature hierarchies for accurate object detection and semantic segmen- tation,

    R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmen- tation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 580–587

  21. [21]

    Faster r-cnn: Towards real-time object detection with region proposal networks,

    S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” Ad- vances in neural information processing systems, vol. 28, 2015

  22. [22]

    You only look once: Unified, real-time object detection,

    J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE conference on computer vision and pattern recog- nition, 2016, pp. 779–788

  23. [23]

    Yolo26: key architectural enhancements and performance benchmarking for real-time object detection,

    R. Sapkota, R. H. Cheppally, A. Sharda, and M. Karkee, “Yolo26: key architectural enhancements and performance benchmarking for real-time object detection,” arXiv preprint arXiv:2509.25164, 2025

  24. [24]

    Detrs beat yolos on real-time object detection,

    Y. Zhao, W. Lv, S. Xu, J. Wei, G. Wang, Q. Dang, Y. Liu, and J. Chen, “Detrs beat yolos on real-time object detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 16 965–16 974

  25. [25]

    arXiv preprint arXiv:2511.09554 (2025) 4, 6, 10

    I. Robicheaux et al., “RF-DETR: Neural architecture search for real-time detection transformers,” in International Conference on Learning Representations (ICLR), 2026. [Online]. A vailable: https://arxiv.org/abs/2511.09554

  26. [26]

    Statistical comparisons of classifiers over multiple data sets,

    J. Demšar, “Statistical comparisons of classifiers over multiple data sets,” Journal of Machine Learning Research, vol. 7, pp. 1–30, 2006

  27. [27]

    A comparison of alternative tests of significance for the problem of m rankings,

    M. Friedman, “A comparison of alternative tests of significance for the problem of m rankings,” The Annals of Mathematical Statistics, vol. 11, no. 1, pp. 86–92, 1940

  28. [28]

    Individual comparisons by ranking methods,

    F. Wilcoxon, “Individual comparisons by ranking methods,” Biometrics Bulletin, vol. 1, no. 6, pp. 80–83, 1945