M ed R ed QA for Medical Consumer Question Answering: Dataset, Tasks, and Neural Baselines

Nguyen, Vincent, Karimi, Sarvnaz, Rybinski, Maciej, Xing, Zhenchang · 2023 · DOI 10.18653/v1/2023.ijcnlp-main.42

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open at publisher browse 2 citing papers

representative citing papers

When Retrieval Doesn't Help: A Large-Scale Study of Biomedical RAG

cs.CL · 2026-06-02 · accept · novelty 6.0

Large-scale evaluation shows retrieval-augmented generation yields only marginal and inconsistent gains (1-2 points) over no-retrieval baselines in biomedical QA, with model choice dominating retriever or corpus effects.

The Validity Gap in Health AI Evaluation: A Cross-Sectional Analysis of Benchmark Composition

cs.AI · 2026-03-18 · unverdicted · novelty 6.0

Health AI benchmarks exhibit a validity gap, with only 42% referencing objective data (mostly wellness wearables), rare complex inputs like labs or imaging, and minimal coverage of vulnerable groups or chronic care.

citing papers explorer

Showing 1 of 1 citing paper after filters.

When Retrieval Doesn't Help: A Large-Scale Study of Biomedical RAG cs.CL · 2026-06-02 · accept · none · ref 3
Large-scale evaluation shows retrieval-augmented generation yields only marginal and inconsistent gains (1-2 points) over no-retrieval baselines in biomedical QA, with model choice dominating retriever or corpus effects.

M ed R ed QA for Medical Consumer Question Answering: Dataset, Tasks, and Neural Baselines

fields

years

verdicts

representative citing papers

citing papers explorer