Neuro-symbolic pipeline using multi-agent translation and SAT solving detects conflicts in multimorbidity guidelines with 0.861 F1, finding 90.6% are local conflicts on 12 SGLT2 guidelines.
Contradictions in Context: Challenges for Retrieval-Augmented Generation in Healthcare
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
In high-stakes information domains such as healthcare, where large language models (LLMs) can produce hallucinations or misinformation, retrieval-augmented generation (RAG) has been proposed as a mitigation strategy, grounding model outputs in external, domain-specific documents. Yet, this approach can introduce errors when source documents contain outdated or contradictory information. This work investigates the performance of five LLMs in generating RAG-based responses to medicine-related queries. Our contributions are three-fold: i) the creation of a benchmark dataset using consumer medicine information documents from the Australian Therapeutic Goods Administration (TGA), where headings are repurposed as natural language questions, ii) the retrieval of PubMed abstracts using TGA headings, stratified across multiple publication years, to enable controlled temporal evaluation of outdated evidence, and iii) a comparative analysis of the frequency and impact of outdated or contradictory content on model-generated responses, assessing how LLMs integrate and reconcile temporally inconsistent information. Our findings show that contradictions between highly similar abstracts do, in fact, degrade performance, leading to inconsistencies and reduced factual accuracy in model answers. These results highlight that retrieval similarity alone is insufficient for reliable medical RAG and underscore the need for contradiction-aware filtering strategies to ensure trustworthy responses in high-stakes domains.
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Neuro-Symbolic Resolution of Recommendation Conflicts in Multimorbidity Clinical Guidelines
Neuro-symbolic pipeline using multi-agent translation and SAT solving detects conflicts in multimorbidity guidelines with 0.861 F1, finding 90.6% are local conflicts on 12 SGLT2 guidelines.