Few shot chain-of-thought driven reasoning to prompt LLMs for open ended medical question answering

Dinesh Raghu; Ganesh Ramakrishnan; Kshitij Sharad Jadhav; Ojas Gramopadhye; Prateek Chanda; Sachindra Joshi; Saeel Sandeep Nachane; Yatin Nandwani

arxiv: 2403.04890 · v4 · pith:G5QDK62Snew · submitted 2024-03-07 · 💻 cs.CL

Few shot chain-of-thought driven reasoning to prompt LLMs for open ended medical question answering

Saeel Sandeep Nachane , Ojas Gramopadhye , Prateek Chanda , Ganesh Ramakrishnan , Kshitij Sharad Jadhav , Yatin Nandwani , Dinesh Raghu , Sachindra Joshi This is my paper

classification 💻 cs.CL

keywords medicalpromptreasoningclinicalclinicrdrivenmcq-eliminativeprocess

0 comments

read the original abstract

In this paper, we propose a modified version of the MedQA-USMLE dataset, named MEDQA-OPEN, which contains open-ended medical questions without options to mimic clinical scenarios, along with clinician-approved reasoned answers. Additionally, we implement a prompt driven by Chain of Thought (CoT) reasoning, CLINICR, to mirror the prospective process of incremental reasoning, reaching a correct response to medical questions. We empirically demonstrate how CLINICR outperforms the state-of-the-art 5-shot CoT-based prompt (Li\'evin et al., 2022). We also present an approach that mirrors real-life clinical practice by first exploring multiple differential diagnoses through MCQ-CLINICR and subsequently narrowing down to a final diagnosis using MCQ-ELIMINATIVE. Finally, emphasizing the importance of response verification in medical settings, we utilize a reward model mechanism, replacing the elimination process performed by MCQ-ELIMINATIVE.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

MediQAl: A French Medical Question Answering Dataset for Knowledge and Reasoning Evaluation
cs.CL 2025-07 accept novelty 8.0

MediQAl is a new French medical QA benchmark with 32k exam-sourced questions in three formats and cognitive labels, evaluated on 14 LLMs to reveal gaps between factual recall and reasoning performance.
AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments
cs.HC 2024-05 conditional novelty 8.0

AgentClinic is a multimodal agent benchmark demonstrating that LLM diagnostic accuracy on MedQA drops to below one-tenth in sequential clinical simulations, with Claude-3.5 leading and large tool-use differences acros...