pith. machine review for the scientific record. sign in

arxiv: 2605.09768 · v1 · submitted 2026-05-10 · 💻 cs.MA

Recognition: 2 theorem links

· Lean Theorem

SAGE: Scalable Agentic Grounded Evaluation for Crop Disease Diagnosis

Authors on Pith no claims yet

Pith reviewed 2026-05-12 03:17 UTC · model grok-4.3

classification 💻 cs.MA
keywords crop disease diagnosisvision-language agentsymptom knowledgetraining-free evaluationreference image comparisonexplainable diagnosisplant pathology dataset
0
0 comments X

The pith

An agent using symptom knowledge and reference images raises crop disease diagnosis accuracy by 16.2 percentage points without any retraining.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to demonstrate that a vision-language agent can diagnose plant diseases more accurately when it is given structured, source-grounded symptom descriptions alongside reference images. This would matter for food security because labeled disease images are scarce, while symptom knowledge can be collected from existing web sources and extended to new crops. The authors build a large dataset of images paired with validated symptom claims and then test an autonomous agent that first identifies plant anatomy, narrows candidates with symptoms, compares references sequentially, and writes an explainable trace. The measured gain holds across four crops at full reference budget and requires no model training.

Core claim

The paper introduces a training-free agentic framework in which an autonomous visual reasoning agent identifies anatomical context, narrows candidate diseases using symptom knowledge, sequentially compares reference images, and produces a fully explainable reasoning trace. Adding the symptom knowledge component raises average accuracy by 16.2 percentage points across the four evaluation crops while the entire pipeline remains extendable to new crops by supplying only crop-specific reference images and symptom descriptions.

What carries the argument

The autonomous visual reasoning agent that sequentially applies symptom knowledge to narrow candidates before comparing reference images.

If this is right

  • Accuracy gains remain consistent across different crops when symptom knowledge is supplied.
  • New crops can be added by collecting only reference images and symptom descriptions with no retraining required.
  • The agent produces a complete, step-by-step reasoning trace that links each conclusion to specific symptoms and reference images.
  • Future improvements in the underlying vision-language model can be plugged in directly to raise baseline performance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same reference-plus-symptom structure could support agentic diagnosis in other image domains where expert knowledge is textual but images are limited.
  • Grounding every symptom claim to a verbatim web quote offers a built-in audit trail that could reduce hallucinated explanations.
  • The dataset size and coverage make it possible to test whether agentic reasoning scales better than end-to-end classification when the number of classes exceeds one thousand.

Load-bearing premise

The curated symptom descriptions are accurate, complete, and free of conflicting information so the agent's sequential reasoning actually improves diagnosis.

What would settle it

Run the identical agent on the same test images once with symptom knowledge enabled and once disabled, then measure whether the accuracy difference disappears or reverses on a held-out crop.

Figures

Figures reproduced from arXiv: 2605.09768 by Arti Singh, Asheesh K. Singh, Baskar Ganapathysubramanian, Chinmay Hegde, Dinakaran Elango, Muhammad Arbab Arshad, Shivani Chiranjeevi, Soumik Sarkar, Tirtho Roy, Yanben Shen.

Figure 1
Figure 1. Figure 1: System overview. Curation (top): web pages become a source-cited KB (335 crops, 1,251 diseases) with an expert audit; raw images are deduped, filtered against the KB, and split into reference/test sets with an anatomical index. Demonstrated agentic evaluation (bottom): the agent observes the organ, narrows candidates, consults KB symptoms, and sequentially compares references, producing a prediction with a… view at source ↗
Figure 2
Figure 2. Figure 2: (Left) Image distribution across 335 crops and 1,251 disease classes ( [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Sources backing the disease registry across all 10 crops with KBs released to date. Left: [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Per-crop accuracy across reference bud￾gets. Right semicircle = with internet KB; left = no KB. Concentric polygons (light to dark) corre￾spond to k = 0, 1, 4, 8 [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Cost-accuracy tradeoff (mean accuracy across all four crops, internet KB). Small dots show [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Confusion matrices for Soybean (Sonnet, 25 classes). Left: baseline, [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Expert agronomist verdicts on KB-sourced claims across five crops (soybean, mango, [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Diagnostic accuracy vs. reference budget [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗
read the original abstract

Plant disease diagnosis is critical for food security, yet training disease-recognition models that generalize across crops, pathogens, and field conditions remains challenging because labeled disease images are far less abundant and standardized than data for other biotic stresses such as insects or weeds. Frontier vision-language models offer new opportunities through improved visual reasoning, but they still struggle with fine-grained disease identification due to the lack of structured, crop-specific symptom knowledge. To address this gap, we curate the largest plant disease image--symptom dataset to date, covering 335 crops, 1{,}251 disease classes, and approximately 839K images, designed to support training-free, agentic disease prediction. A scalable automated pipeline generates source-grounded symptom descriptions in which each claim is linked to a verbatim web quote; domain experts validate sampled crops and reconcile disease-name variants across sources. As a baseline, we introduce an autonomous visual reasoning agent that identifies anatomical context, narrows candidate diseases using symptom knowledge, sequentially compares reference images, and produces a fully explainable reasoning trace. Incorporating symptom knowledge improves accuracy by 16.2 percentage points on average at the full reference budget, with consistent gains across all four evaluation crops. Because the framework only requires crop-specific reference images and symptom knowledge, it can be extended to new crops without retraining, while the agentic baseline can directly benefit from future improvements in foundation model capabilities. Dataset and code are available at:https://sage-dataset.github.io/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces SAGE, a training-free agentic framework for crop disease diagnosis. It curates the largest plant disease image-symptom dataset to date (335 crops, 1,251 classes, ~839K images) with source-grounded symptom descriptions generated via an automated pipeline and sampled expert validation. An autonomous visual reasoning agent identifies anatomical context, narrows candidates using symptom knowledge, performs sequential reference-image comparison, and outputs an explainable trace. The central empirical result is a 16.2 percentage point average accuracy improvement when symptom knowledge is incorporated, with consistent gains across four evaluation crops at full reference budget; the framework is claimed to extend to new crops without retraining.

Significance. If the accuracy gains are robustly attributable to symptom grounding rather than curation artifacts, the work offers a scalable, extensible approach to fine-grained plant disease diagnosis that leverages existing reference images and foundation-model improvements without retraining. The public release of the dataset and code is a clear strength that could support follow-on research in agricultural AI.

major comments (2)
  1. [Abstract / curation pipeline] Abstract and curation pipeline description: the 16.2 pp gain and the no-retraining scalability claim rest on the symptom descriptions being accurate, complete, and non-conflicting for the four evaluation crops. The manuscript states that domain experts validate only sampled crops after automated web-source grounding and disease-name reconciliation. It is unclear what fraction of crops (or of the four evaluation crops) received full validation, what quantitative quality metrics were used, or how residual errors were ruled out as an alternative explanation for the observed lift.
  2. [Abstract / experimental results] Evaluation protocol: the headline numerical claim lacks reported details on the four crops chosen, the baseline models or agents compared, the exact reference budget, statistical significance testing, or cross-validation procedure. Without these, it is impossible to assess whether the consistent gains across crops are robust or sensitive to crop selection and evaluation design.
minor comments (2)
  1. [Abstract] The abstract mentions 'approximately 839K images' but does not state the exact count or breakdown by crop/disease; a precise table or figure would improve reproducibility.
  2. [Abstract] The link to the dataset (https://sage-dataset.github.io/) is provided, but the manuscript should include a brief summary of dataset statistics (e.g., images per disease class) to allow readers to gauge coverage without visiting the site.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and will revise the paper to provide the requested clarifications on validation and evaluation details.

read point-by-point responses
  1. Referee: [Abstract / curation pipeline] Abstract and curation pipeline description: the 16.2 pp gain and the no-retraining scalability claim rest on the symptom descriptions being accurate, complete, and non-conflicting for the four evaluation crops. The manuscript states that domain experts validate only sampled crops after automated web-source grounding and disease-name reconciliation. It is unclear what fraction of crops (or of the four evaluation crops) received full validation, what quantitative quality metrics were used, or how residual errors were ruled out as an alternative explanation for the observed lift.

    Authors: We agree that additional detail on the expert validation process is required to substantiate the claims. The manuscript currently states only that domain experts validate sampled crops; we will revise the methods section to report the exact sampling fraction, confirm that the four evaluation crops received complete validation, describe the quantitative quality metrics (e.g., agreement with source quotes), and include a sensitivity analysis showing that accuracy gains persist under controlled perturbations of the symptom descriptions. This addresses the possibility of residual errors as an alternative explanation. revision: yes

  2. Referee: [Abstract / experimental results] Evaluation protocol: the headline numerical claim lacks reported details on the four crops chosen, the baseline models or agents compared, the exact reference budget, statistical significance testing, or cross-validation procedure. Without these, it is impossible to assess whether the consistent gains across crops are robust or sensitive to crop selection and evaluation design.

    Authors: We acknowledge that the evaluation protocol description is insufficiently detailed in the current version. We will expand the experimental results section to explicitly name the four evaluation crops, list the baseline models and agents, state the reference budget used, report statistical significance tests, and describe the cross-validation or repeated-trial procedure. These additions will allow readers to evaluate the robustness of the reported 16.2 pp average improvement. revision: yes

Circularity Check

0 steps flagged

No circularity: central claim is empirical accuracy delta on held-out images

full rationale

The paper's load-bearing result is the measured 16.2 pp accuracy lift obtained by comparing the agent with versus without symptom knowledge on held-out reference images across four crops. This is a direct experimental outcome, not a quantity derived by definition, by fitting a parameter to the target metric, or by a self-citation chain. The curation pipeline and agent architecture are described as engineering choices whose performance is then evaluated externally; no equation or step reduces the reported improvement to its own inputs by construction. The framework's extensibility claim follows from the same empirical setup rather than from any self-referential uniqueness theorem.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The work rests on standard assumptions about vision-language model capabilities and the reliability of web-sourced symptom text; no free parameters are fitted inside the central claim, and no new physical or mathematical entities are postulated.

axioms (2)
  • domain assumption Vision-language models can follow multi-step instructions that include visual comparison and symptom-based narrowing.
    Invoked when describing the autonomous visual reasoning agent.
  • domain assumption Expert validation of sampled crops is sufficient to ensure overall dataset quality.
    Stated in the description of the curation pipeline.

pith-pipeline@v0.9.0 · 5601 in / 1488 out tokens · 35904 ms · 2026-05-12T03:17:52.903828+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages · 1 internal anchor

  1. [1]

    Hughes and Marcel Salathé

    David P. Hughes and Marcel Salathé. An open access repository of images on plant health to enable the development of mobile disease diagnostics.arXiv preprint arXiv:1511.08060, 2015. URLhttps://arxiv.org/abs/1511.08060

  2. [2]

    Deng, Jayanth Koushik, Talukder Z

    Shivani Chiranjeevi, Mojdeh Saadati, Zi K. Deng, Jayanth Koushik, Talukder Z. Jubery, Daren S. Mueller, Matthew O’Neal, Nirav Merchant, Aarti Singh, Asheesh K. Singh, Soumik Sarkar, Arti Singh, and Baskar Ganapathysubramanian. InsectNet: Real-time identification of insects using an end-to-end machine learning pipeline.PNAS Nexus, 4(1):pgae575, January 202...

  3. [3]

    Ayanlade, Venkata Naresh Boddepalli, Mojdeh Saadati, Ashlyn Rairdin, Zi K

    Yanben Shen, Timilehin T. Ayanlade, Venkata Naresh Boddepalli, Mojdeh Saadati, Ashlyn Rairdin, Zi K. Deng, Muhammad Arbab Arshad, Aditya Balu, Daren Mueller, Asheesh K. Singh, Wesley Everman, Nirav Merchant, Baskar Ganapathysubramanian, Meaghan Anderson, Soumik Sarkar, and Arti Singh. WeedNet: A foundation model-based global-to-local AI approach for real-...

  4. [5]

    Leafnet: A large-scale dataset and comprehensive benchmark for foundational vision-language understanding of plant diseases

    Khang Nguyen Quoc, Phuong D Dao, and Luyl-Da Quach. Leafnet: A large-scale dataset and comprehensive benchmark for foundational vision-language understanding of plant diseases. arXiv preprint arXiv:2602.13662, 2026. doi: 10.48550/arXiv.2602.13662. URL https: //arxiv.org/abs/2602.13662

  5. [6]

    A multimodal benchmark dataset and model for crop disease diagnosis

    Xiang Liu, Zhaoxiang Liu, Huan Hu, Zezhou Chen, Kohou Wang, Kai Wang, and Shiguo Lian. A multimodal benchmark dataset and model for crop disease diagnosis. InEuropean Conference on Computer Vision, pages 157–170. Springer, 2024. doi: 10.1007/978-3-031-73016-0_10. URLhttps://doi.org/10.1007/978-3-031-73016-0_10

  6. [7]

    An explainable deep machine vision framework for plant stress phenotyping.Proceedings of the National Academy of Sciences of the United States of America, 115(18):4613–4618, 2018

    Sambuddha Ghosal, David Blystone, Asheesh K Singh, Baskar Ganapathysubramanian, Arti Singh, and Soumik Sarkar. An explainable deep machine vision framework for plant stress phenotyping.Proceedings of the National Academy of Sciences of the United States of America, 115(18):4613–4618, 2018. doi: 10.1073/pnas.1716999115. URL https://doi.org/10. 1073/pnas.1716999115

  7. [8]

    Leveraging vision language models for specialized agricultural tasks

    Muhammad Arbab Arshad, Talukder Zaki Jubery, Tirtho Roy, Rim Nassiri, Asheesh K Singh, Arti Singh, Chinmay Hegde, Baskar Ganapathysubramanian, Aditya Balu, Adarsh Krishna- murthy, et al. Leveraging vision language models for specialized agricultural tasks. In2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 6320–6329. IEEE, ...

  8. [10]

    Chatleafdisease: a chain-of-thought prompting approach for crop disease classification using large language models.Plant Phenomics, page 100094, 2025

    Jiandong Pan, Renhai Zhong, Fulin Xia, Jingfeng Huang, Linchao Zhu, Yi Yang, and Tao Lin. Chatleafdisease: a chain-of-thought prompting approach for crop disease classification using large language models.Plant Phenomics, page 100094, 2025. doi: 10.1016/j.plaphe.2025. 100094. URLhttps://doi.org/10.1016/j.plaphe.2025.100094

  9. [11]

    Visual large language model for wheat disease diagnosis in the wild.Computers and Electronics in Agriculture, 227: 109587, 2024

    Kunpeng Zhang, Li Ma, Beibei Cui, Xin Li, Boqiang Zhang, and Na Xie. Visual large language model for wheat disease diagnosis in the wild.Computers and Electronics in Agriculture, 227: 109587, 2024. doi: 10.1016/j.compag.2024.109587. URL https://doi.org/10.1016/j. compag.2024.109587. 10

  10. [12]

    Pdd-agent: Multimodal large language model-driven ai agent for enhanced plant disease diagnosis

    Lufu Qin, Xingcai Wu, Xinyu Dong, Huan Wang, Tingwei Yang, and Qi Wang. Pdd-agent: Multimodal large language model-driven ai agent for enhanced plant disease diagnosis. In2025 IEEE International Conference on Image Processing (ICIP), pages 1271–1276. IEEE, 2025. doi: 10.1109/ICIP55913.2025.11084359. URL https://doi.org/10.1109/ICIP55913.2025. 11084359

  11. [13]

    Agmmu: A comprehensive agricultural multimodal understanding benchmark

    Aruna Gauba, Irene Pi, Yunze Man, Ziqi Pang, Vikram S Adve, and Yu-Xiong Wang. Agmmu: A comprehensive agricultural multimodal understanding benchmark. InAdvances in Neural Information Processing Systems (NeurIPS) Datasets and Benchmarks Track, 2025. URL https://openreview.net/forum?id=MQPZPtv8GG

  12. [14]

    Dn-splatter: Depth and normal priors for gaussian splatting and meshing

    Muhammad Awais, Ali Husain Salem Abdulla Alharthi, Amandeep Kumar, Hisham Cholakkal, and Rao Muhammad Anwer. Agrogpt: Efficient agricultural vision-language model with expert tuning. In2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 5687–5696. IEEE, 2025. doi: 10.1109/W ACV61041.2025.00555. URLhttps: //doi.org/10.1109/WACV...

  13. [15]

    Agri-llava: Knowledge-infused large multimodal assistant on agricultural pests and diseases.arXiv preprint arXiv:2412.02158, 2024

    Liqiong Wang, Teng Jin, Jinyu Yang, Ales Leonardis, Fangyi Wang, and Feng Zheng. Agri-llava: Knowledge-infused large multimodal assistant on agricultural pests and diseases.arXiv preprint arXiv:2412.02158, 2024. doi: 10.48550/arXiv.2412.02158. URL https://arxiv.org/abs/ 2412.02158

  14. [16]

    Towards large reasoning models for agriculture.arXiv preprint arXiv:2505.19259, 2025

    Hossein Zaremehrjerdi, Shreyan Ganguly, Ashlyn Rairdin, Elizabeth Tranel, Benjamin Feuer, Juan Ignacio Di Salvo, Srikanth Panthulugiri, Hernan Torres Pacin, Victoria Moser, Sarah Jones, et al. Towards large reasoning models for agriculture.arXiv preprint arXiv:2505.19259, 2025. doi: 10.48550/arXiv.2505.19259. URLhttps://arxiv.org/abs/2505.19259

  15. [18]

    URLhttps://arxiv.org/abs/2604.23701

  16. [20]

    Severity-based rice disease classification

    Isaac Ritharson. Severity-based rice disease classification. https://www.kaggle.com/ datasets/isaacritharson/severity-based-rice-leaf-diseases-dataset , 2021. Kaggle

  17. [21]

    Yellowrust19: Yellow rust disease in wheat

    Tolga Hayit. Yellowrust19: Yellow rust disease in wheat. https://www.kaggle.com/ datasets/tolgahayit/yellowrust19-yellow-rust-disease-in-wheat , 2020. Kag- gle

  18. [22]

    Banana leaf disease dataset v1.1

    Gimril Lozarita. Banana leaf disease dataset v1.1. https://www.kaggle.com/datasets/ gimrillozarita/banana-leaf-disease-dataset-v1-1, 2022. Kaggle

  19. [23]

    Bean leaf lesions classification dataset.https://www.kaggle.com/datasets/ marquis03/bean-leaf-lesions-classification, 2023

    Marquis03. Bean leaf lesions classification dataset.https://www.kaggle.com/datasets/ marquis03/bean-leaf-lesions-classification, 2023. Kaggle

  20. [24]

    Lettuce diseases dataset

    Ashish Jena. Lettuce diseases dataset. https://www.kaggle.com/datasets/ ashishjstar/lettuce-diseases, 2024. Kaggle

  21. [25]

    Cucumber plant diseases dataset

    Karim Negm. Cucumber plant diseases dataset. https://www.kaggle.com/datasets/ kareem3egm/cucumber-plant-diseases-dataset, 2020. Kaggle

  22. [26]

    Durian leaf disease dataset

    Cthng123. Durian leaf disease dataset. https://www.kaggle.com/datasets/cthng123/ durian-leaf-disease-dataset, 2025. Kaggle

  23. [27]

    Eggplant disease recognition dataset

    Kamalmoha. Eggplant disease recognition dataset. https://www.kaggle.com/datasets/ kamalmoha/eggplant-disease-recognition-dataset, 2023. Kaggle. 11

  24. [28]

    Cotton disease multi transformation dataset

    Shuvo Kumar Basak. Cotton disease multi transformation dataset. https://www.kaggle.com/datasets/shuvokumarbasak2030/ cotton-disease-multi-transformation-dataset, 2026. Kaggle

  25. [29]

    Pumpkin leaf disease multi transformation dataset

    Shuvo Kumar Basak. Pumpkin leaf disease multi transformation dataset. https://www.kaggle.com/datasets/shuvokumarbasak2030/ pumpkin-leaf-disease-multi-transformation-dataset, 2024. Kaggle

  26. [30]

    Rose leaf disease multi transformation dataset

    Shuvo Kumar Basak. Rose leaf disease multi transformation dataset. https://www.kaggle.com/datasets/shuvokumarbasak2030/ rose-leaf-disease-multi-transformation-dataset, 2026. Kaggle

  27. [31]

    Strawberry disease detection dataset

    Usman Afzaal. Strawberry disease detection dataset. https://www.kaggle.com/datasets/ usmanafzaal/strawberry-disease-detection-dataset, 2021. Kaggle

  28. [32]

    Puspasari

    Betty D. Puspasari. Sugarleaf-idn: Sugarcane leaf diseases dataset. https://www.kaggle. com/datasets/bettydpuspasari/sugarleafidn, 2026. Kaggle

  29. [33]

    Fusarium wilt disease in chickpea dataset

    Tolga Hayit. Fusarium wilt disease in chickpea dataset. https://www.kaggle.com/ datasets/tolgahayit/fusarium-wilt-disease-in-chickpea-dataset , 2022. Kag- gle

  30. [34]

    Cauliflower disease multi transformation dataset

    Shuvo Kumar Basak. Cauliflower disease multi transformation dataset. https://www.kaggle.com/datasets/shuvokumarbasak2030/ cauliflower-disease-multi-transformation-dataset, 2024. Kaggle

  31. [35]

    Coconut disease multi transformation sttv dataset

    Shuvo Kumar Basak. Coconut disease multi transformation sttv dataset. https://www.kaggle.com/datasets/shuvokumarbasak2030/ coconut-disease-multi-transformation-sttv-dataset, 2024. Kaggle

  32. [36]

    Vanilla plant disease image dataset

    Muhammad Ihsan Permana. Vanilla plant disease image dataset. https://www.kaggle.com/ datasets/mihsanpermana/vanilla-plant-disease-image-dataset, 2024. Kaggle

  33. [37]

    https:// zenodo.org/records/16816441, 2025

    Cucumber disease and freshness classification dataset — curated annotations. https:// zenodo.org/records/16816441, 2025. Zenodo, DOI: 10.5281/zenodo.16816441

  34. [38]

    New plant diseases dataset (augmented)

    Vipoooool. New plant diseases dataset (augmented). https://www.kaggle.com/datasets/ vipoooool/new-plant-diseases-dataset , 2018. Kaggle, augmented version of PlantVil- lage

  35. [39]

    Plant diseases image-text pairs

    Rady10. Plant diseases image-text pairs. https://huggingface.co/datasets/Rady10/ Plant-Diseases-Image-Text-Pairs, 2024. HuggingFace Datasets

  36. [40]

    Plant disease (new) dataset

    A2H0H0R1. Plant disease (new) dataset. https://huggingface.co/datasets/A2H0H0R1/ plant-disease-new, 2024. HuggingFace Datasets

  37. [41]

    Plant disease classification complete

    Avinashhm. Plant disease classification complete. https://huggingface.co/datasets/ avinashhm/plant-disease-classification-complete, 2024. HuggingFace Datasets

  38. [42]

    Plant disease dataset

    Sakethdevx. Plant disease dataset. https://huggingface.co/datasets/sakethdevx/ plant-disease-dataset, 2024. HuggingFace Datasets

  39. [43]

    Vqa plant-disease classification (merged) dataset

    Raghavendrad60. Vqa plant-disease classification (merged) dataset. https://huggingface.co/datasets/raghavendrad60/vqa_ plant-disease-classification-merged-dataset, 2024. HuggingFace Datasets

  40. [44]

    Bangladesh crop & vegetable plant disease dataset

    Saon110. Bangladesh crop & vegetable plant disease dataset. https://huggingface.co/ datasets/Saon110/bd-crop-vegetable-plant-disease-dataset , 2024. Hugging- Face Datasets

  41. [45]

    Bugwood image database system

    The Bugwood Network and Center for Invasive Species and Ecosystem Health. Bugwood image database system. https://www.bugwood.org/, 2024. University of Georgia. Per-image attribution required

  42. [46]

    AgroBench: Vision-language model benchmark in agriculture

    Risa Shinoda, Nakamasa Inoue, Hirokatsu Kataoka, Masaki Onishi, and Yoshitaka Ushiku. AgroBench: Vision-language model benchmark in agriculture. InarXiv preprint arXiv:2507.20519, 2025. URLhttps://arxiv.org/abs/2507.20519. 12 A Evaluation Test-Set Sizes Section 6 states the target per-class counts; the table below adds the realized post-filter range. Some...

  43. [47]

    Rady Plant Diseases Image- Text Pairs Default license (Hug- gingFace) No explicit license declared on the HuggingFace dataset card; contact author before redistribution

    DOI: 10.1145/3371158.3371196. Rady Plant Diseases Image- Text Pairs Default license (Hug- gingFace) No explicit license declared on the HuggingFace dataset card; contact author before redistribution. Cite: Rady10,Plant-Diseases-Image- Text-Pairs, HuggingFace Datasets. A2H0H0R1 Plant Disease (New) Default license (Hug- gingFace) No explicit license declare...

  44. [48]

    CDDM — Crop Disease Domain Multimodal CC BY-NC-ND 4.0 No commercial use; no derivatives permitted

    DOI: 10.1145/3664647.3680599. CDDM — Crop Disease Domain Multimodal CC BY-NC-ND 4.0 No commercial use; no derivatives permitted. Liu et al., arXiv:2503.06973, 2025. Bugwood Image Database Per-image attribution (Bugwood ToU) Image rights remain with the individual photographers/contributors; attribution and Bugwood acknowledgement required for each image u...

  45. [49]

    Note the affected plant part (leaf, stem, pod, root, whole plant) and key visual features (color, shape, pattern, texture)

    Read the test image first. Note the affected plant part (leaf, stem, pod, root, whole plant) and key visual features (color, shape, pattern, texture)

  46. [50]

    This narrows the candidate classes to only those that affect that part

    Read the part index file ‘<PART_INDEX_PATH>‘ and find the plant part you identified. This narrows the candidate classes to only those that affect that part. Focus on these candidates. Stay within the part-narrowed set. Only view classes outside it if you have exhausted all candidates within it and still have budget

  47. [51]

    Review the symptom descriptions below to narrow further

  48. [52]

    Read ONE image, analyze how it compares to the test image, then decide which class to check next

    View reference images one at a time. Read ONE image, analyze how it compares to the test image, then decide which class to check next. Do NOT read multiple images in parallel. Explore before confirming: view one reference from EACH of your top candidates before viewing a second from any class

  49. [53]

    prediction

    IMPORTANT: Make your final prediction based on VISUAL SIMILARITY to reference images, not KB descriptions. The symptom descriptions help you understand what to look for, but when deciding between candidates, the reference image that most closely matches the test image wins. Do NOT let a text description override what you see in the images. - Submit your p...

  50. [54]

    Select candidate diseased i =NextCandidate(D rank)

  51. [55]

    Fetch reference imageI ref =FetchReferenceImage(R, d i, otest)

  52. [56]

    Compare and reason: ri =CompareAndReason(I test, Iref,S di ) Update reasoning traceτ←τ∪ {r i} Remove rejected candidates until confident. Prediction: d∗ = arg max d∈Drank Support(d, τ), c=Confidence(d ∗, τ) Algorithm 1SAGE: Agentic Inference Require:Test imageI test, KB, Reference setR, Anatomical Index, Reference budgetk Ensure:Predicted diseased ∗, conf...