Title resolution pending

Wu, Y · 2022 · arXiv 2205.12615

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

dataset 1

citation-polarity summary

background 1

representative citing papers

A Machine-Verified Proof of a Quantum-Optimization Conjecture

quant-ph · 2026-06-29 · accept · novelty 8.0

A Lean 4 machine-verified proof establishes that depth-p QAOA on the ring of disagrees attains approximation ratio (2p+1)/(2p+2) exactly.

TheoremBench: Evaluating LLMs on Theorem Proving in Formal Mathematics

cs.AI · 2026-06-08 · unverdicted · novelty 8.0

TheoremBench is a Lean4 benchmark of classical theorems in main and premised forms that evaluates LLM provers on partial progress, coverage, and token efficiency rather than binary success on competition problems.

MathAtlas: A Benchmark for Autoformalization in the Wild

cs.AI · 2026-05-13 · accept · novelty 8.0

MathAtlas is the first large-scale benchmark for autoformalizing graduate mathematics, where even strong models reach only 9.8% correctness on theorem statements and drop to 2.6% on the hardest dependency-deep subset.

PAL: Program-aided Language Models

cs.CL · 2022-11-18 · conditional · novelty 8.0

PAL improves few-shot reasoning accuracy by having LLMs generate executable programs rather than text-based chains of thought, outperforming much larger models on math and logic benchmarks.

LeanMarathon: Toward Reliable AI Co-Mathematicians through Long-Horizon Lean Autoformalization

cs.AI · 2026-06-03 · unverdicted · novelty 7.0

LeanMarathon uses four contract-scoped agents on an evolving blueprint coordinated by a two-stage orchestrator to formalize seven theorems from Erdős problems in Lean, proving 258 lemmas with no sorry across three runs.

A Learning Method for Symbolic Systems Using Large Language Models

cs.SE · 2026-05-09 · unverdicted · novelty 7.0

LLM2Ltac mines symbolic tactics from 11,725 Coq theorems using LLMs and integrates them into CoqHammer, improving proof rates by 23.87% on 6,199 theorems from four large verification projects.

Evaluating the Formal Reasoning Capabilities of Large Language Models through Chomsky Hierarchy

cs.CL · 2026-04-03 · unverdicted · novelty 7.0

LLMs display clear performance stratification on formal language tasks aligned with Chomsky hierarchy complexity levels, limited by severe efficiency barriers rather than absolute capability.

Lean Atlas: An Integrated Proof Environment for Scalable Human-AI Collaborative Formalization

cs.HC · 2026-03-16 · conditional · novelty 7.0

Lean Atlas visualizes Lean 4 dependency graphs and applies Lean Compass to reduce the nodes needing human semantic review by 27-99% across six evaluated projects.

Automating Formal Verification with Reinforcement Learning and Recursive Inference

cs.LG · 2026-05-29 · unverdicted · novelty 5.0

RLVR training raises verified Dafny pass rates from 9.7% to 31.1% on a filtered benchmark while a Lean proof scaffold lifts success from 46.2% to 69.2% on a pilot set and solves 7 of 42 prior unsolved tasks.

citing papers explorer

Showing 3 of 3 citing papers after filters.

TheoremBench: Evaluating LLMs on Theorem Proving in Formal Mathematics cs.AI · 2026-06-08 · unverdicted · none · ref 34
TheoremBench is a Lean4 benchmark of classical theorems in main and premised forms that evaluates LLM provers on partial progress, coverage, and token efficiency rather than binary success on competition problems.
MathAtlas: A Benchmark for Autoformalization in the Wild cs.AI · 2026-05-13 · accept · none · ref 37
MathAtlas is the first large-scale benchmark for autoformalizing graduate mathematics, where even strong models reach only 9.8% correctness on theorem statements and drop to 2.6% on the hardest dependency-deep subset.
LeanMarathon: Toward Reliable AI Co-Mathematicians through Long-Horizon Lean Autoformalization cs.AI · 2026-06-03 · unverdicted · none · ref 31
LeanMarathon uses four contract-scoped agents on an evolving blueprint coordinated by a two-stage orchestrator to formalize seven theorems from Erdős problems in Lean, proving 258 lemmas with no sorry across three runs.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer