pith. sign in

arxiv: 2602.22959 · v2 · pith:XHUMT35Wnew · submitted 2026-02-26 · 💻 cs.CV

Can Agents Distinguish Visually Hard-to-Separate Diseases in a Zero-Shot Setting? A Pilot Study

classification 💻 cs.CV
keywords clinicalsettingperformancevisuallyzero-shotagentsconfoundeddiagnostic
0
0 comments X
read the original abstract

The rapid progress of multimodal large language models (MLLMs) has led to increasing interest in agent-based systems. While most prior work in medical imaging concentrates on automating routine clinical workflows, we study an underexplored yet clinically significant setting: distinguishing visually hard-to-separate diseases in a zero-shot setting. We benchmark representative agents on two imaging-only proxy diagnostic tasks, (1) melanoma vs. atypical nevus and (2) pulmonary edema vs. pneumonia, where visual features are highly confounded despite substantial differences in clinical management. We introduce a multi-agent framework based on contrastive adjudication. Experimental results show improved diagnostic performance (an 11-percentage-point gain in accuracy on dermoscopy data) and reduced unsupported claims on qualitative samples, although overall performance remains insufficient for clinical deployment. We acknowledge the inherent uncertainty in human annotations and the absence of clinical context, which further limit the translation to real-world settings. Within this controlled setting, this pilot study provides preliminary insights into zero-shot agent performance in visually confounded scenarios.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. From Clinical Intent to Clinical Model: Autonomous Coding-Agents for Clinician-driven AI Development

    cs.CV 2026-04 unverdicted novelty 6.0

    An autonomous coding-agent framework allows clinicians to independently develop clinical AI models via natural language, achieving promising results on lesion classification, fracture detection, and debiased pneumotho...