PAVE evaluates LLM verifiers across four pre-evidence epistemic states in RAG fact-checking, revealing model-dependent unreliable arbitration and proposing a JSD-based test-time method to improve reliability.
Yuxi Sun, Aoqi Zuo, Wei Gao, and Jing Ma
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
SABER combines self-prior with multi-trace PK and CK reasoning representations to estimate reliability beliefs and drive trust-or-abstain decisions in knowledge-conflict RAG, improving accuracy over baselines.
Knowledge conflicts in hypernetwork LLM adaptation stem from constant adapter margins losing to frequency-dependent pretrained margins; selective layer boosting and conflict-aware triggering raise deep-conflict accuracy to 71-72.5% on Gemma-2B and Mistral-7B.
Introduces a mitigation technique that drops LLM accuracy on popular fiction character tasks from 96% to 72% by limiting verbatim memorization while retaining gist cues.
citing papers explorer
-
Diagnosing LLM Arbitration Behavior over Pre-evidence Epistemic States in RAG-based Fact-Checking
PAVE evaluates LLM verifiers across four pre-evidence epistemic states in RAG fact-checking, revealing model-dependent unreliable arbitration and proposing a JSD-based test-time method to improve reliability.
-
Trust or Abstain? A Self-Aware RAG Approach
SABER combines self-prior with multi-trace PK and CK reasoning representations to estimate reliability beliefs and drive trust-or-abstain decisions in knowledge-conflict RAG, improving accuracy over baselines.
-
The Override Gap: A Magnitude Account of Knowledge Conflict Failure in Hypernetwork-Based Instant LLM Adaptation
Knowledge conflicts in hypernetwork LLM adaptation stem from constant adapter margins losing to frequency-dependent pretrained margins; selective layer boosting and conflict-aware triggering raise deep-conflict accuracy to 71-72.5% on Gemma-2B and Mistral-7B.
-
Beyond Math: Stories as a Testbed for Memorization-Constrained Reasoning in LLMs
Introduces a mitigation technique that drops LLM accuracy on popular fiction character tasks from 96% to 72% by limiting verbatim memorization while retaining gist cues.