pith. sign in

arxiv: 2403.04890 · v4 · pith:G5QDK62Snew · submitted 2024-03-07 · 💻 cs.CL

Few shot chain-of-thought driven reasoning to prompt LLMs for open ended medical question answering

classification 💻 cs.CL
keywords medicalpromptreasoningclinicalclinicrdrivenmcq-eliminativeprocess
0
0 comments X
read the original abstract

In this paper, we propose a modified version of the MedQA-USMLE dataset, named MEDQA-OPEN, which contains open-ended medical questions without options to mimic clinical scenarios, along with clinician-approved reasoned answers. Additionally, we implement a prompt driven by Chain of Thought (CoT) reasoning, CLINICR, to mirror the prospective process of incremental reasoning, reaching a correct response to medical questions. We empirically demonstrate how CLINICR outperforms the state-of-the-art 5-shot CoT-based prompt (Li\'evin et al., 2022). We also present an approach that mirrors real-life clinical practice by first exploring multiple differential diagnoses through MCQ-CLINICR and subsequently narrowing down to a final diagnosis using MCQ-ELIMINATIVE. Finally, emphasizing the importance of response verification in medical settings, we utilize a reward model mechanism, replacing the elimination process performed by MCQ-ELIMINATIVE.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. MediQAl: A French Medical Question Answering Dataset for Knowledge and Reasoning Evaluation

    cs.CL 2025-07 accept novelty 8.0

    MediQAl is a new French medical QA benchmark with 32k exam-sourced questions in three formats and cognitive labels, evaluated on 14 LLMs to reveal gaps between factual recall and reasoning performance.

  2. AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments

    cs.HC 2024-05 conditional novelty 8.0

    AgentClinic is a multimodal agent benchmark demonstrating that LLM diagnostic accuracy on MedQA drops to below one-tenth in sequential clinical simulations, with Claude-3.5 leading and large tool-use differences acros...