Sci-Rho is a dynamic multilingual visually-grounded symbolic benchmark for STEM problems that reveals robustness gaps in current VLMs between average and worst-case performance.
AgroBench: Vision-Language Model Benchmark in Agriculture
4 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 4representative citing papers
AgroVG is a new multi-source benchmark for agricultural visual grounding formulated as generalized set prediction, with protocols for box and mask grounding across single-target, multi-target, and target-absent queries from six object families.
CropVLM is a domain-adapted vision-language model that achieves 72.51% zero-shot crop classification accuracy and superior open-set detection performance on novel species without retraining.
Zero-shot VLMs reach at most 62% accuracy on agricultural classification tasks while supervised models like YOLO11 perform markedly higher, indicating they are not ready to replace task-specific systems.
citing papers explorer
-
Sci-Rho: A Multilingual Visually-Grounded Symbolic Benchmark for STEM Problems
Sci-Rho is a dynamic multilingual visually-grounded symbolic benchmark for STEM problems that reveals robustness gaps in current VLMs between average and worst-case performance.
-
CropVLM: A Domain-Adapted Vision-Language Model for Open-Set Crop Analysis
CropVLM is a domain-adapted vision-language model that achieves 72.51% zero-shot crop classification accuracy and superior open-set detection performance on novel species without retraining.
-
Are vision-language models ready to zero-shot replace supervised classification models in agriculture?
Zero-shot VLMs reach at most 62% accuracy on agricultural classification tasks while supervised models like YOLO11 perform markedly higher, indicating they are not ready to replace task-specific systems.