Introduces a benchmark with 34,560 instances for selective QA over conflicting multi-source personal memory and compares fusion methods against LLMs.
Consensus or conflict? Fine-grained evaluation of conflicting answers in question-answering
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2representative citing papers
Disentangling input ambiguity from uncertainty quantification improves error prediction for LLMs on QA tasks, yielding over 10 PRR point gains across models and datasets.
citing papers explorer
-
Selective QA over Conflicting Multi-Source Personal Memory: A Diagnostic Testbed and Method Comparison
Introduces a benchmark with 34,560 instances for selective QA over conflicting multi-source personal memory and compares fusion methods against LLMs.