Introduces BonaFide benchmark of 3,066 ground-truth labeled CoTs showing most faithfulness metrics perform near chance with biases and poor scaling to longer chains.
The probabilities also matter: A more faithful metric for faithfulness of free-text explanations in large language models
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
LLMs prioritize task-appropriate reasoning over conflicting instructions, but reasoning types are linearly encoded in middle-to-late layers, allowing activation steering to raise instruction compliance by up to 29%.
Reasoning in language models should be measured by the faithfulness and validity of their multi-step search processes and intermediate traces, not final-answer accuracy.
citing papers explorer
-
Faithfulness Metrics Don't Measure Faithfulness: A Meta-Evaluation with Ground Truth
Introduces BonaFide benchmark of 3,066 ground-truth labeled CoTs showing most faithfulness metrics perform near chance with biases and poor scaling to longer chains.
-
Compliance versus Sensibility: On the Reasoning Controllability in Large Language Models
LLMs prioritize task-appropriate reasoning over conflicting instructions, but reasoning types are linearly encoded in middle-to-late layers, allowing activation steering to raise instruction compliance by up to 29%.
-
Measuring AI Reasoning: A Guide for Researchers
Reasoning in language models should be measured by the faithfulness and validity of their multi-step search processes and intermediate traces, not final-answer accuracy.