Systematic testing of prompt engineering for LLM equational reasoning finds a performance ceiling of 60-79% accuracy that extensive engineering cannot exceed, driven by undecidability and model capacity limits.
Mathematics distillation challenge – equa- tional theories.https://terrytao.wordpress.com/2026/03/13/ mathematics-distillation-challenge-equational-theories/
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Less Is More: Cognitive Load and the Single-Prompt Ceiling in LLM Mathematical Reasoning
Systematic testing of prompt engineering for LLM equational reasoning finds a performance ceiling of 60-79% accuracy that extensive engineering cannot exceed, driven by undecidability and model capacity limits.