LLMs display inconsistent factual recall across different surface forms of the same entity, with greater robustness to minor spelling changes than to aliases or abbreviations.
Transactions of the Association for Computational Linguistics , volume =
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Reasoning in language models should be measured by the faithfulness and validity of their multi-step search processes and intermediate traces, not final-answer accuracy.
citing papers explorer
-
Revisiting Non-Verbatim Memorization in Large Language Models: The Role of Entity Surface Forms
LLMs display inconsistent factual recall across different surface forms of the same entity, with greater robustness to minor spelling changes than to aliases or abbreviations.
-
Measuring AI Reasoning: A Guide for Researchers
Reasoning in language models should be measured by the faithfulness and validity of their multi-step search processes and intermediate traces, not final-answer accuracy.