Large-scale evaluation shows retrieval-augmented generation yields only marginal and inconsistent gains (1-2 points) over no-retrieval baselines in biomedical QA, with model choice dominating retriever or corpus effects.
M ed R ed QA for Medical Consumer Question Answering: Dataset, Tasks, and Neural Baselines
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2representative citing papers
Health AI benchmarks exhibit a validity gap, with only 42% referencing objective data (mostly wellness wearables), rare complex inputs like labs or imaging, and minimal coverage of vulnerable groups or chronic care.
citing papers explorer
No citing papers match the current filters.