Uncertainty trace profiles from LM reasoning traces predict correct final answers with AUROC up to 0.807 and enable early error detection using only initial tokens.
Verifyingchain-of-thought reasoning via its computational graph.CoRR, abs/2510.09312
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 5representative citing papers
Attribution graphs reveal that RAG failures arise from shallow fragmented evidence flow in LLMs, enabling topology-based detection and targeted interventions that reinforce question-guided routing.
Counterfactual prompting effects on LLMs are often indistinguishable from those caused by meaning-preserving paraphrases, causing most previously reported demographic sensitivities to disappear under proper statistical comparison.
Token-level contrastive attribution yields informative signals for some LLM benchmark failures but is not universally applicable across datasets and models.
MEDS improves LLM RL performance by up to 4.13 pass@1 and 4.37 pass@128 points by dynamically penalizing rollouts matching prevalent historical error clusters identified via memory-stored representations and density clustering.
citing papers explorer
-
Compared to What? Baselines and Metrics for Counterfactual Prompting
Counterfactual prompting effects on LLMs are often indistinguishable from those caused by meaning-preserving paraphrases, causing most previously reported demographic sensitivities to disappear under proper statistical comparison.
-
Contrastive Attribution in the Wild: An Interpretability Analysis of LLM Failures on Realistic Benchmarks
Token-level contrastive attribution yields informative signals for some LLM benchmark failures but is not universally applicable across datasets and models.