TheoremBench is a Lean4 benchmark of classical theorems in main and premised forms that evaluates LLM provers on partial progress, coverage, and token efficiency rather than binary success on competition problems.
Rabe, Talia Ringer, and Yuriy Brun
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
A Lean library called Palamedes uses synthesis rules from generator semantics and catamorphism-anamorphism rewriting to automatically produce correct constrained random generators.
Continued pretraining of Code Llama on Proof-Pile-2 yields Llemma, an open math-specialized LLM that beats known open base models on MATH and supports tool use plus formal proving out of the box.
citing papers explorer
-
TheoremBench: Evaluating LLMs on Theorem Proving in Formal Mathematics
TheoremBench is a Lean4 benchmark of classical theorems in main and premised forms that evaluates LLM provers on partial progress, coverage, and token efficiency rather than binary success on competition problems.
-
The Search for Constrained Random Generators
A Lean library called Palamedes uses synthesis rules from generator semantics and catamorphism-anamorphism rewriting to automatically produce correct constrained random generators.
-
Llemma: An Open Language Model For Mathematics
Continued pretraining of Code Llama on Proof-Pile-2 yields Llemma, an open math-specialized LLM that beats known open base models on MATH and supports tool use plus formal proving out of the box.