arxiv: 2605.09768 · v1 · submitted 2026-05-10 · 💻 cs.MA

Recognition: 2 theorem links

· Lean Theorem

SAGE: Scalable Agentic Grounded Evaluation for Crop Disease Diagnosis

Muhammad Arbab Arshad , Tirtho Roy , Yanben Shen , Dinakaran Elango , Shivani Chiranjeevi , Asheesh K. Singh , Baskar Ganapathysubramanian , Chinmay Hegde

show 2 more authors

Arti Singh Soumik Sarkar

Authors on Pith no claims yet

Pith reviewed 2026-05-12 03:17 UTC · model grok-4.3

classification 💻 cs.MA

keywords crop disease diagnosisvision-language agentsymptom knowledgetraining-free evaluationreference image comparisonexplainable diagnosisplant pathology dataset

0 comments

The pith

An agent using symptom knowledge and reference images raises crop disease diagnosis accuracy by 16.2 percentage points without any retraining.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to demonstrate that a vision-language agent can diagnose plant diseases more accurately when it is given structured, source-grounded symptom descriptions alongside reference images. This would matter for food security because labeled disease images are scarce, while symptom knowledge can be collected from existing web sources and extended to new crops. The authors build a large dataset of images paired with validated symptom claims and then test an autonomous agent that first identifies plant anatomy, narrows candidates with symptoms, compares references sequentially, and writes an explainable trace. The measured gain holds across four crops at full reference budget and requires no model training.

Core claim

The paper introduces a training-free agentic framework in which an autonomous visual reasoning agent identifies anatomical context, narrows candidate diseases using symptom knowledge, sequentially compares reference images, and produces a fully explainable reasoning trace. Adding the symptom knowledge component raises average accuracy by 16.2 percentage points across the four evaluation crops while the entire pipeline remains extendable to new crops by supplying only crop-specific reference images and symptom descriptions.

What carries the argument

The autonomous visual reasoning agent that sequentially applies symptom knowledge to narrow candidates before comparing reference images.

If this is right

Accuracy gains remain consistent across different crops when symptom knowledge is supplied.
New crops can be added by collecting only reference images and symptom descriptions with no retraining required.
The agent produces a complete, step-by-step reasoning trace that links each conclusion to specific symptoms and reference images.
Future improvements in the underlying vision-language model can be plugged in directly to raise baseline performance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same reference-plus-symptom structure could support agentic diagnosis in other image domains where expert knowledge is textual but images are limited.
Grounding every symptom claim to a verbatim web quote offers a built-in audit trail that could reduce hallucinated explanations.
The dataset size and coverage make it possible to test whether agentic reasoning scales better than end-to-end classification when the number of classes exceeds one thousand.

Load-bearing premise

The curated symptom descriptions are accurate, complete, and free of conflicting information so the agent's sequential reasoning actually improves diagnosis.

What would settle it

Run the identical agent on the same test images once with symptom knowledge enabled and once disabled, then measure whether the accuracy difference disappears or reverses on a held-out crop.

Figures

Figures reproduced from arXiv: 2605.09768 by Arti Singh, Asheesh K. Singh, Baskar Ganapathysubramanian, Chinmay Hegde, Dinakaran Elango, Muhammad Arbab Arshad, Shivani Chiranjeevi, Soumik Sarkar, Tirtho Roy, Yanben Shen.

**Figure 1.** Figure 1: System overview. Curation (top): web pages become a source-cited KB (335 crops, 1,251 diseases) with an expert audit; raw images are deduped, filtered against the KB, and split into reference/test sets with an anatomical index. Demonstrated agentic evaluation (bottom): the agent observes the organ, narrows candidates, consults KB symptoms, and sequentially compares references, producing a prediction with a… view at source ↗

**Figure 2.** Figure 2: (Left) Image distribution across 335 crops and 1,251 disease classes ( [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Sources backing the disease registry across all 10 crops with KBs released to date. Left: [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Per-crop accuracy across reference budgets. Right semicircle = with internet KB; left = no KB. Concentric polygons (light to dark) correspond to k = 0, 1, 4, 8 [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Cost-accuracy tradeoff (mean accuracy across all four crops, internet KB). Small dots show [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Confusion matrices for Soybean (Sonnet, 25 classes). Left: baseline, [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Expert agronomist verdicts on KB-sourced claims across five crops (soybean, mango, [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

**Figure 8.** Figure 8: Diagnostic accuracy vs. reference budget [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗

read the original abstract

Plant disease diagnosis is critical for food security, yet training disease-recognition models that generalize across crops, pathogens, and field conditions remains challenging because labeled disease images are far less abundant and standardized than data for other biotic stresses such as insects or weeds. Frontier vision-language models offer new opportunities through improved visual reasoning, but they still struggle with fine-grained disease identification due to the lack of structured, crop-specific symptom knowledge. To address this gap, we curate the largest plant disease image--symptom dataset to date, covering 335 crops, 1{,}251 disease classes, and approximately 839K images, designed to support training-free, agentic disease prediction. A scalable automated pipeline generates source-grounded symptom descriptions in which each claim is linked to a verbatim web quote; domain experts validate sampled crops and reconcile disease-name variants across sources. As a baseline, we introduce an autonomous visual reasoning agent that identifies anatomical context, narrows candidate diseases using symptom knowledge, sequentially compares reference images, and produces a fully explainable reasoning trace. Incorporating symptom knowledge improves accuracy by 16.2 percentage points on average at the full reference budget, with consistent gains across all four evaluation crops. Because the framework only requires crop-specific reference images and symptom knowledge, it can be extended to new crops without retraining, while the agentic baseline can directly benefit from future improvements in foundation model capabilities. Dataset and code are available at:https://sage-dataset.github.io/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper delivers a large grounded crop-disease dataset and a training-free agent that uses symptom knowledge for diagnosis, but the 16-point accuracy claim rests on evaluation details that are still missing.

read the letter

This paper's core contribution is a massive new dataset of plant disease images tied to symptom descriptions extracted from the web with direct quotes, plus an agent that uses those descriptions to guide diagnosis without any model training. They assembled 839k images across 335 crops and 1251 classes. The pipeline pulls symptoms automatically and links every claim back to a source quote. Experts reviewed samples and fixed name inconsistencies. The agent works by first identifying the relevant plant anatomy, then filtering diseases based on the symptoms, and finally comparing the query image to reference images in sequence. On four crops it reports a 16.2 point average accuracy improvement when given the full reference budget, and the gains hold across those crops. The public release of the dataset and code is a clear plus. The approach fits the practical need in agriculture where collecting labeled data for every new crop is expensive. The main soft spot is the evaluation. The abstract states the accuracy lift but gives no information on the test protocol, the choice of the four crops, the baseline models, or statistical tests. On top of that, symptom validation happened only on sampled crops, so undetected errors in the descriptions could be inflating the results instead of the agent's reasoning. That makes the no-retraining scalability claim harder to trust until more checks are done. This work is aimed at researchers building domain-specific vision systems or anyone needing plant disease data. Readers who care about scalable, explainable AI for real-world problems will find it relevant. It deserves peer review because the dataset has standalone value and the agent idea is straightforward to test further. I would recommend sending it out for review, with specific requests for expanded evaluation details and more thorough validation of the symptom texts.

Referee Report

2 major / 2 minor

Summary. The paper introduces SAGE, a training-free agentic framework for crop disease diagnosis. It curates the largest plant disease image-symptom dataset to date (335 crops, 1,251 classes, ~839K images) with source-grounded symptom descriptions generated via an automated pipeline and sampled expert validation. An autonomous visual reasoning agent identifies anatomical context, narrows candidates using symptom knowledge, performs sequential reference-image comparison, and outputs an explainable trace. The central empirical result is a 16.2 percentage point average accuracy improvement when symptom knowledge is incorporated, with consistent gains across four evaluation crops at full reference budget; the framework is claimed to extend to new crops without retraining.

Significance. If the accuracy gains are robustly attributable to symptom grounding rather than curation artifacts, the work offers a scalable, extensible approach to fine-grained plant disease diagnosis that leverages existing reference images and foundation-model improvements without retraining. The public release of the dataset and code is a clear strength that could support follow-on research in agricultural AI.

major comments (2)

[Abstract / curation pipeline] Abstract and curation pipeline description: the 16.2 pp gain and the no-retraining scalability claim rest on the symptom descriptions being accurate, complete, and non-conflicting for the four evaluation crops. The manuscript states that domain experts validate only sampled crops after automated web-source grounding and disease-name reconciliation. It is unclear what fraction of crops (or of the four evaluation crops) received full validation, what quantitative quality metrics were used, or how residual errors were ruled out as an alternative explanation for the observed lift.
[Abstract / experimental results] Evaluation protocol: the headline numerical claim lacks reported details on the four crops chosen, the baseline models or agents compared, the exact reference budget, statistical significance testing, or cross-validation procedure. Without these, it is impossible to assess whether the consistent gains across crops are robust or sensitive to crop selection and evaluation design.

minor comments (2)

[Abstract] The abstract mentions 'approximately 839K images' but does not state the exact count or breakdown by crop/disease; a precise table or figure would improve reproducibility.
[Abstract] The link to the dataset (https://sage-dataset.github.io/) is provided, but the manuscript should include a brief summary of dataset statistics (e.g., images per disease class) to allow readers to gauge coverage without visiting the site.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and will revise the paper to provide the requested clarifications on validation and evaluation details.

read point-by-point responses

Referee: [Abstract / curation pipeline] Abstract and curation pipeline description: the 16.2 pp gain and the no-retraining scalability claim rest on the symptom descriptions being accurate, complete, and non-conflicting for the four evaluation crops. The manuscript states that domain experts validate only sampled crops after automated web-source grounding and disease-name reconciliation. It is unclear what fraction of crops (or of the four evaluation crops) received full validation, what quantitative quality metrics were used, or how residual errors were ruled out as an alternative explanation for the observed lift.

Authors: We agree that additional detail on the expert validation process is required to substantiate the claims. The manuscript currently states only that domain experts validate sampled crops; we will revise the methods section to report the exact sampling fraction, confirm that the four evaluation crops received complete validation, describe the quantitative quality metrics (e.g., agreement with source quotes), and include a sensitivity analysis showing that accuracy gains persist under controlled perturbations of the symptom descriptions. This addresses the possibility of residual errors as an alternative explanation. revision: yes
Referee: [Abstract / experimental results] Evaluation protocol: the headline numerical claim lacks reported details on the four crops chosen, the baseline models or agents compared, the exact reference budget, statistical significance testing, or cross-validation procedure. Without these, it is impossible to assess whether the consistent gains across crops are robust or sensitive to crop selection and evaluation design.

Authors: We acknowledge that the evaluation protocol description is insufficiently detailed in the current version. We will expand the experimental results section to explicitly name the four evaluation crops, list the baseline models and agents, state the reference budget used, report statistical significance tests, and describe the cross-validation or repeated-trial procedure. These additions will allow readers to evaluate the robustness of the reported 16.2 pp average improvement. revision: yes

Circularity Check

0 steps flagged

No circularity: central claim is empirical accuracy delta on held-out images

full rationale

The paper's load-bearing result is the measured 16.2 pp accuracy lift obtained by comparing the agent with versus without symptom knowledge on held-out reference images across four crops. This is a direct experimental outcome, not a quantity derived by definition, by fitting a parameter to the target metric, or by a self-citation chain. The curation pipeline and agent architecture are described as engineering choices whose performance is then evaluated externally; no equation or step reduces the reported improvement to its own inputs by construction. The framework's extensibility claim follows from the same empirical setup rather than from any self-referential uniqueness theorem.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The work rests on standard assumptions about vision-language model capabilities and the reliability of web-sourced symptom text; no free parameters are fitted inside the central claim, and no new physical or mathematical entities are postulated.

axioms (2)

domain assumption Vision-language models can follow multi-step instructions that include visual comparison and symptom-based narrowing.
Invoked when describing the autonomous visual reasoning agent.
domain assumption Expert validation of sampled crops is sufficient to ensure overall dataset quality.
Stated in the description of the curation pipeline.

pith-pipeline@v0.9.0 · 5601 in / 1488 out tokens · 35904 ms · 2026-05-12T03:17:52.903828+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages · 1 internal anchor

[1]

Hughes and Marcel Salathé

David P. Hughes and Marcel Salathé. An open access repository of images on plant health to enable the development of mobile disease diagnostics.arXiv preprint arXiv:1511.08060, 2015. URLhttps://arxiv.org/abs/1511.08060

work page arXiv 2015
[2]

Deng, Jayanth Koushik, Talukder Z

Shivani Chiranjeevi, Mojdeh Saadati, Zi K. Deng, Jayanth Koushik, Talukder Z. Jubery, Daren S. Mueller, Matthew O’Neal, Nirav Merchant, Aarti Singh, Asheesh K. Singh, Soumik Sarkar, Arti Singh, and Baskar Ganapathysubramanian. InsectNet: Real-time identification of insects using an end-to-end machine learning pipeline.PNAS Nexus, 4(1):pgae575, January 202...

work page doi:10.1093/pnasnexus/pgae575 2025
[3]

Ayanlade, Venkata Naresh Boddepalli, Mojdeh Saadati, Ashlyn Rairdin, Zi K

Yanben Shen, Timilehin T. Ayanlade, Venkata Naresh Boddepalli, Mojdeh Saadati, Ashlyn Rairdin, Zi K. Deng, Muhammad Arbab Arshad, Aditya Balu, Daren Mueller, Asheesh K. Singh, Wesley Everman, Nirav Merchant, Baskar Ganapathysubramanian, Meaghan Anderson, Soumik Sarkar, and Arti Singh. WeedNet: A foundation model-based global-to-local AI approach for real-...

work page doi:10.48550/arxiv.2505.18930 2025
[5]

Leafnet: A large-scale dataset and comprehensive benchmark for foundational vision-language understanding of plant diseases

Khang Nguyen Quoc, Phuong D Dao, and Luyl-Da Quach. Leafnet: A large-scale dataset and comprehensive benchmark for foundational vision-language understanding of plant diseases. arXiv preprint arXiv:2602.13662, 2026. doi: 10.48550/arXiv.2602.13662. URL https: //arxiv.org/abs/2602.13662

work page doi:10.48550/arxiv.2602.13662 2026
[6]

A multimodal benchmark dataset and model for crop disease diagnosis

Xiang Liu, Zhaoxiang Liu, Huan Hu, Zezhou Chen, Kohou Wang, Kai Wang, and Shiguo Lian. A multimodal benchmark dataset and model for crop disease diagnosis. InEuropean Conference on Computer Vision, pages 157–170. Springer, 2024. doi: 10.1007/978-3-031-73016-0_10. URLhttps://doi.org/10.1007/978-3-031-73016-0_10

work page doi:10.1007/978-3-031-73016-0_10 2024
[7]

An explainable deep machine vision framework for plant stress phenotyping.Proceedings of the National Academy of Sciences of the United States of America, 115(18):4613–4618, 2018

Sambuddha Ghosal, David Blystone, Asheesh K Singh, Baskar Ganapathysubramanian, Arti Singh, and Soumik Sarkar. An explainable deep machine vision framework for plant stress phenotyping.Proceedings of the National Academy of Sciences of the United States of America, 115(18):4613–4618, 2018. doi: 10.1073/pnas.1716999115. URL https://doi.org/10. 1073/pnas.1716999115

work page doi:10.1073/pnas.1716999115 2018
[8]

Leveraging vision language models for specialized agricultural tasks

Muhammad Arbab Arshad, Talukder Zaki Jubery, Tirtho Roy, Rim Nassiri, Asheesh K Singh, Arti Singh, Chinmay Hegde, Baskar Ganapathysubramanian, Aditya Balu, Adarsh Krishna- murthy, et al. Leveraging vision language models for specialized agricultural tasks. In2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 6320–6329. IEEE, ...

work page doi:10.1109/wacv61041.2025.00616 2025
[10]

Chatleafdisease: a chain-of-thought prompting approach for crop disease classification using large language models.Plant Phenomics, page 100094, 2025

Jiandong Pan, Renhai Zhong, Fulin Xia, Jingfeng Huang, Linchao Zhu, Yi Yang, and Tao Lin. Chatleafdisease: a chain-of-thought prompting approach for crop disease classification using large language models.Plant Phenomics, page 100094, 2025. doi: 10.1016/j.plaphe.2025. 100094. URLhttps://doi.org/10.1016/j.plaphe.2025.100094

work page doi:10.1016/j.plaphe.2025 2025
[11]

Visual large language model for wheat disease diagnosis in the wild.Computers and Electronics in Agriculture, 227: 109587, 2024

Kunpeng Zhang, Li Ma, Beibei Cui, Xin Li, Boqiang Zhang, and Na Xie. Visual large language model for wheat disease diagnosis in the wild.Computers and Electronics in Agriculture, 227: 109587, 2024. doi: 10.1016/j.compag.2024.109587. URL https://doi.org/10.1016/j. compag.2024.109587. 10

work page doi:10.1016/j.compag.2024.109587 2024
[12]

Pdd-agent: Multimodal large language model-driven ai agent for enhanced plant disease diagnosis

Lufu Qin, Xingcai Wu, Xinyu Dong, Huan Wang, Tingwei Yang, and Qi Wang. Pdd-agent: Multimodal large language model-driven ai agent for enhanced plant disease diagnosis. In2025 IEEE International Conference on Image Processing (ICIP), pages 1271–1276. IEEE, 2025. doi: 10.1109/ICIP55913.2025.11084359. URL https://doi.org/10.1109/ICIP55913.2025. 11084359

work page doi:10.1109/icip55913.2025.11084359 2025
[13]

Agmmu: A comprehensive agricultural multimodal understanding benchmark

Aruna Gauba, Irene Pi, Yunze Man, Ziqi Pang, Vikram S Adve, and Yu-Xiong Wang. Agmmu: A comprehensive agricultural multimodal understanding benchmark. InAdvances in Neural Information Processing Systems (NeurIPS) Datasets and Benchmarks Track, 2025. URL https://openreview.net/forum?id=MQPZPtv8GG

work page 2025
[14]

Dn-splatter: Depth and normal priors for gaussian splatting and meshing

Muhammad Awais, Ali Husain Salem Abdulla Alharthi, Amandeep Kumar, Hisham Cholakkal, and Rao Muhammad Anwer. Agrogpt: Efficient agricultural vision-language model with expert tuning. In2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 5687–5696. IEEE, 2025. doi: 10.1109/W ACV61041.2025.00555. URLhttps: //doi.org/10.1109/WACV...

work page doi:10.1109/w 2025
[15]

Agri-llava: Knowledge-infused large multimodal assistant on agricultural pests and diseases.arXiv preprint arXiv:2412.02158, 2024

Liqiong Wang, Teng Jin, Jinyu Yang, Ales Leonardis, Fangyi Wang, and Feng Zheng. Agri-llava: Knowledge-infused large multimodal assistant on agricultural pests and diseases.arXiv preprint arXiv:2412.02158, 2024. doi: 10.48550/arXiv.2412.02158. URL https://arxiv.org/abs/ 2412.02158

work page doi:10.48550/arxiv.2412.02158 2024
[16]

Towards large reasoning models for agriculture.arXiv preprint arXiv:2505.19259, 2025

Hossein Zaremehrjerdi, Shreyan Ganguly, Ashlyn Rairdin, Elizabeth Tranel, Benjamin Feuer, Juan Ignacio Di Salvo, Srikanth Panthulugiri, Hernan Torres Pacin, Victoria Moser, Sarah Jones, et al. Towards large reasoning models for agriculture.arXiv preprint arXiv:2505.19259, 2025. doi: 10.48550/arXiv.2505.19259. URLhttps://arxiv.org/abs/2505.19259

work page doi:10.48550/arxiv.2505.19259 2025
[18]

URLhttps://arxiv.org/abs/2604.23701

work page internal anchor Pith review Pith/arXiv arXiv
[20]

Severity-based rice disease classification

Isaac Ritharson. Severity-based rice disease classification. https://www.kaggle.com/ datasets/isaacritharson/severity-based-rice-leaf-diseases-dataset , 2021. Kaggle

work page 2021
[21]

Yellowrust19: Yellow rust disease in wheat

Tolga Hayit. Yellowrust19: Yellow rust disease in wheat. https://www.kaggle.com/ datasets/tolgahayit/yellowrust19-yellow-rust-disease-in-wheat , 2020. Kag- gle

work page 2020
[22]

Banana leaf disease dataset v1.1

Gimril Lozarita. Banana leaf disease dataset v1.1. https://www.kaggle.com/datasets/ gimrillozarita/banana-leaf-disease-dataset-v1-1, 2022. Kaggle

work page 2022
[23]

Bean leaf lesions classification dataset.https://www.kaggle.com/datasets/ marquis03/bean-leaf-lesions-classification, 2023

Marquis03. Bean leaf lesions classification dataset.https://www.kaggle.com/datasets/ marquis03/bean-leaf-lesions-classification, 2023. Kaggle

work page 2023
[24]

Lettuce diseases dataset

Ashish Jena. Lettuce diseases dataset. https://www.kaggle.com/datasets/ ashishjstar/lettuce-diseases, 2024. Kaggle

work page 2024
[25]

Cucumber plant diseases dataset

Karim Negm. Cucumber plant diseases dataset. https://www.kaggle.com/datasets/ kareem3egm/cucumber-plant-diseases-dataset, 2020. Kaggle

work page 2020
[26]

Durian leaf disease dataset

Cthng123. Durian leaf disease dataset. https://www.kaggle.com/datasets/cthng123/ durian-leaf-disease-dataset, 2025. Kaggle

work page 2025
[27]

Eggplant disease recognition dataset

Kamalmoha. Eggplant disease recognition dataset. https://www.kaggle.com/datasets/ kamalmoha/eggplant-disease-recognition-dataset, 2023. Kaggle. 11

work page 2023
[28]

Cotton disease multi transformation dataset

Shuvo Kumar Basak. Cotton disease multi transformation dataset. https://www.kaggle.com/datasets/shuvokumarbasak2030/ cotton-disease-multi-transformation-dataset, 2026. Kaggle

work page 2026
[29]

Pumpkin leaf disease multi transformation dataset

Shuvo Kumar Basak. Pumpkin leaf disease multi transformation dataset. https://www.kaggle.com/datasets/shuvokumarbasak2030/ pumpkin-leaf-disease-multi-transformation-dataset, 2024. Kaggle

work page 2024
[30]

Rose leaf disease multi transformation dataset

Shuvo Kumar Basak. Rose leaf disease multi transformation dataset. https://www.kaggle.com/datasets/shuvokumarbasak2030/ rose-leaf-disease-multi-transformation-dataset, 2026. Kaggle

work page 2026
[31]

Strawberry disease detection dataset

Usman Afzaal. Strawberry disease detection dataset. https://www.kaggle.com/datasets/ usmanafzaal/strawberry-disease-detection-dataset, 2021. Kaggle

work page 2021
[32]

Puspasari

Betty D. Puspasari. Sugarleaf-idn: Sugarcane leaf diseases dataset. https://www.kaggle. com/datasets/bettydpuspasari/sugarleafidn, 2026. Kaggle

work page 2026
[33]

Fusarium wilt disease in chickpea dataset

Tolga Hayit. Fusarium wilt disease in chickpea dataset. https://www.kaggle.com/ datasets/tolgahayit/fusarium-wilt-disease-in-chickpea-dataset , 2022. Kag- gle

work page 2022
[34]

Cauliflower disease multi transformation dataset

Shuvo Kumar Basak. Cauliflower disease multi transformation dataset. https://www.kaggle.com/datasets/shuvokumarbasak2030/ cauliflower-disease-multi-transformation-dataset, 2024. Kaggle

work page 2024
[35]

Coconut disease multi transformation sttv dataset

Shuvo Kumar Basak. Coconut disease multi transformation sttv dataset. https://www.kaggle.com/datasets/shuvokumarbasak2030/ coconut-disease-multi-transformation-sttv-dataset, 2024. Kaggle

work page 2024
[36]

Vanilla plant disease image dataset

Muhammad Ihsan Permana. Vanilla plant disease image dataset. https://www.kaggle.com/ datasets/mihsanpermana/vanilla-plant-disease-image-dataset, 2024. Kaggle

work page 2024
[37]

https:// zenodo.org/records/16816441, 2025

Cucumber disease and freshness classification dataset — curated annotations. https:// zenodo.org/records/16816441, 2025. Zenodo, DOI: 10.5281/zenodo.16816441

work page doi:10.5281/zenodo.16816441 2025
[38]

New plant diseases dataset (augmented)

Vipoooool. New plant diseases dataset (augmented). https://www.kaggle.com/datasets/ vipoooool/new-plant-diseases-dataset , 2018. Kaggle, augmented version of PlantVil- lage

work page 2018
[39]

Plant diseases image-text pairs

Rady10. Plant diseases image-text pairs. https://huggingface.co/datasets/Rady10/ Plant-Diseases-Image-Text-Pairs, 2024. HuggingFace Datasets

work page 2024
[40]

Plant disease (new) dataset

A2H0H0R1. Plant disease (new) dataset. https://huggingface.co/datasets/A2H0H0R1/ plant-disease-new, 2024. HuggingFace Datasets

work page 2024
[41]

Plant disease classification complete

Avinashhm. Plant disease classification complete. https://huggingface.co/datasets/ avinashhm/plant-disease-classification-complete, 2024. HuggingFace Datasets

work page 2024
[42]

Plant disease dataset

Sakethdevx. Plant disease dataset. https://huggingface.co/datasets/sakethdevx/ plant-disease-dataset, 2024. HuggingFace Datasets

work page 2024
[43]

Vqa plant-disease classification (merged) dataset

Raghavendrad60. Vqa plant-disease classification (merged) dataset. https://huggingface.co/datasets/raghavendrad60/vqa_ plant-disease-classification-merged-dataset, 2024. HuggingFace Datasets

work page 2024
[44]

Bangladesh crop & vegetable plant disease dataset

Saon110. Bangladesh crop & vegetable plant disease dataset. https://huggingface.co/ datasets/Saon110/bd-crop-vegetable-plant-disease-dataset , 2024. Hugging- Face Datasets

work page 2024
[45]

Bugwood image database system

The Bugwood Network and Center for Invasive Species and Ecosystem Health. Bugwood image database system. https://www.bugwood.org/, 2024. University of Georgia. Per-image attribution required

work page 2024
[46]

AgroBench: Vision-language model benchmark in agriculture

Risa Shinoda, Nakamasa Inoue, Hirokatsu Kataoka, Masaki Onishi, and Yoshitaka Ushiku. AgroBench: Vision-language model benchmark in agriculture. InarXiv preprint arXiv:2507.20519, 2025. URLhttps://arxiv.org/abs/2507.20519. 12 A Evaluation Test-Set Sizes Section 6 states the target per-class counts; the table below adds the realized post-filter range. Some...

work page doi:10.1016/j.dib.2023.108941 2025
[47]

Rady Plant Diseases Image- Text Pairs Default license (Hug- gingFace) No explicit license declared on the HuggingFace dataset card; contact author before redistribution

DOI: 10.1145/3371158.3371196. Rady Plant Diseases Image- Text Pairs Default license (Hug- gingFace) No explicit license declared on the HuggingFace dataset card; contact author before redistribution. Cite: Rady10,Plant-Diseases-Image- Text-Pairs, HuggingFace Datasets. A2H0H0R1 Plant Disease (New) Default license (Hug- gingFace) No explicit license declare...

work page doi:10.1145/3371158.3371196 2015
[48]

CDDM — Crop Disease Domain Multimodal CC BY-NC-ND 4.0 No commercial use; no derivatives permitted

DOI: 10.1145/3664647.3680599. CDDM — Crop Disease Domain Multimodal CC BY-NC-ND 4.0 No commercial use; no derivatives permitted. Liu et al., arXiv:2503.06973, 2025. Bugwood Image Database Per-image attribution (Bugwood ToU) Image rights remain with the individual photographers/contributors; attribution and Bugwood acknowledgement required for each image u...

work page doi:10.1145/3664647.3680599 2025
[49]

Note the affected plant part (leaf, stem, pod, root, whole plant) and key visual features (color, shape, pattern, texture)

Read the test image first. Note the affected plant part (leaf, stem, pod, root, whole plant) and key visual features (color, shape, pattern, texture)

work page
[50]

This narrows the candidate classes to only those that affect that part

Read the part index file ‘<PART_INDEX_PATH>‘ and find the plant part you identified. This narrows the candidate classes to only those that affect that part. Focus on these candidates. Stay within the part-narrowed set. Only view classes outside it if you have exhausted all candidates within it and still have budget

work page
[51]

Review the symptom descriptions below to narrow further

work page
[52]

Read ONE image, analyze how it compares to the test image, then decide which class to check next

View reference images one at a time. Read ONE image, analyze how it compares to the test image, then decide which class to check next. Do NOT read multiple images in parallel. Explore before confirming: view one reference from EACH of your top candidates before viewing a second from any class

work page
[53]

prediction

IMPORTANT: Make your final prediction based on VISUAL SIMILARITY to reference images, not KB descriptions. The symptom descriptions help you understand what to look for, but when deciding between candidates, the reference image that most closely matches the test image wins. Do NOT let a text description override what you see in the images. - Submit your p...

work page
[54]

Select candidate diseased i =NextCandidate(D rank)

work page
[55]

Fetch reference imageI ref =FetchReferenceImage(R, d i, otest)

work page
[56]

Compare and reason: ri =CompareAndReason(I test, Iref,S di ) Update reasoning traceτ←τ∪ {r i} Remove rejected candidates until confident. Prediction: d∗ = arg max d∈Drank Support(d, τ), c=Confidence(d ∗, τ) Algorithm 1SAGE: Agentic Inference Require:Test imageI test, KB, Reference setR, Anatomical Index, Reference budgetk Ensure:Predicted diseased ∗, conf...

work page