pith. sign in

arxiv: 2606.03416 · v1 · pith:AAWP3CGMnew · submitted 2026-06-02 · 💻 cs.MA

MeDxAgent: Multi-Agent Consultation for Interactive Medical Diagnosis

classification 💻 cs.MA
keywords diagnosisinteractivemedxagentaccuracychoicesclinicalconsultationdesign
0
0 comments X
read the original abstract

Large language models (LLMs) are increasingly used for health-related decision support. Yet most evaluations treat diagnosis as a single-shot task with complete information provided upfront, often as a multiple-choice selection. This diverges from clinical practice, where diagnosis is interactive and open-ended, involving sequential hypothesis refinement through targeted questioning. We address this gap. We build MeDxBench, a large-scale benchmark of 4,421 clinical cases across 20 specialties. We further propose MeDxAgent, a multi-agent consultation system for interactive diagnosis, and systematically study its prompt-, flow- and agent-level design choices. MeDxAgent achieves a 10.3% accuracy gain over the baseline on MeDxBench, closing 52.3% of the gap to a full-information oracle. We find that specific design choices: collecting demographics first, passing summarized dialogue for diagnosis, and feeding candidate diagnoses for targeted questioning, improve accuracy, mirroring how physicians reason, though their effect emerges fully only in combination. Code and dataset will be released upon publication.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. MedBench v5: A Dynamic, Process-Oriented, and Hallucination-Aware Benchmark for Clinical Multimodal Models

    cs.CL 2026-06 unverdicted novelty 5.0

    MedBench v5 is a new dynamic benchmark framework for clinical multimodal models that adds process auditing, factorized stressors, and hallucination propagation tracking across 63 tasks.