ProofRank benchmark shows substantial differences in LLM proof quality not captured by correctness, with trade-offs between quality metrics and accuracy.
Mathematical methods and human thought in the age of AI.arXiv preprint arXiv:2603.26524, March
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 5verdicts
UNVERDICTED 5roles
background 2polarities
background 2representative citing papers
Frontier LLMs achieve 95-100% accuracy on AMC/AIME problems but recover far fewer distinct valid strategies than human references, while collectively generating 50 novel strategies.
NOVA models the generate-verify-accumulate-retrain loop and proves cumulative discovery cost scales as Theta(c_gen D^alpha) under Zipf tail equivalence with alpha greater than 1.
Case study applies verifier-guided LLM evolutionary agents to contraction-order optimization in tensor networks and concludes that human validation remains essential.
Extends Nonaka's tacit-explicit knowledge spiral to include AI-generated tacit machine knowledge, asserting that the company's role in fostering shared context for innovation remains unchanged.
citing papers explorer
-
Not All Proofs Are Equal: Evaluating LLM Proof Quality Beyond Correctness
ProofRank benchmark shows substantial differences in LLM proof quality not captured by correctness, with trade-offs between quality metrics and accuracy.
-
Beyond Accuracy: Evaluating Strategy Diversity in LLM Mathematical Reasoning
Frontier LLMs achieve 95-100% accuracy on AMC/AIME problems but recover far fewer distinct valid strategies than human references, while collectively generating 50 novel strategies.
-
NOVA: Fundamental Limits of Knowledge Discovery Through AI
NOVA models the generate-verify-accumulate-retrain loop and proves cumulative discovery cost scales as Theta(c_gen D^alpha) under Zipf tail equivalence with alpha greater than 1.
-
Algorithmic algorithm development with LLMs: A Case Study on LLM-Usage for Contraction Order Optimization in Tensor Networks
Case study applies verifier-guided LLM evolutionary agents to contraction-order optimization in tensor networks and concludes that human validation remains essential.
-
The Human-Machine Knowledge Spiral
Extends Nonaka's tacit-explicit knowledge spiral to include AI-generated tacit machine knowledge, asserting that the company's role in fostering shared context for innovation remains unchanged.