AI deployment in high-stakes areas requires domain-scoped calibrated verification with monitoring and revocation, using a proposed six-component Verification Coverage standard instead of mechanistic interpretability.
Why do large language models (LLMs) struggle to count letters?
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
LLMs exhibit the Position Curse, with backward position retrieval in lists lagging far behind forward retrieval, showing only partial gains from PosBench fine-tuning.
citing papers explorer
-
The Open-Box Fallacy: Why AI Deployment Needs a Calibrated Verification Regime
AI deployment in high-stakes areas requires domain-scoped calibrated verification with monitoring and revocation, using a proposed six-component Verification Coverage standard instead of mechanistic interpretability.
-
The Position Curse: LLMs Struggle to Locate the Last Few Items in a List
LLMs exhibit the Position Curse, with backward position retrieval in lists lagging far behind forward retrieval, showing only partial gains from PosBench fine-tuning.