DAP achieves SOTA on Hard Mode ATP by having LLMs discover answers then prove them formally, solving 10 CombiBench and 36 PutnamBench problems while exposing that LLMs exceed 80% answer accuracy where formal provers stay under 10%.
Prover agent: An agent-based framework for formal mathematical proofs
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
Lean Atlas visualizes Lean 4 dependency graphs and applies Lean Compass to reduce the nodes needing human semantic review by 27-99% across six evaluated projects.
OProver-32B achieves top Pass@32 scores on MiniF2F, ProverBench, and PutnamBench by combining continued pretraining with iterative agentic proving, retrieval, SFT on repairs, and RL on unresolved cases using a 6.86M-proof dataset.
A minimal agentic system achieves competitive performance in automated theorem proving with a simpler design and lower cost than state-of-the-art methods.
MARL-Rad trains region-specific and global agents with reinforcement learning on clinical rewards to produce more accurate radiology reports than prior methods on MIMIC-CXR and IU X-ray datasets.
Ax-Prover is a tool-using multi-agent LLM system that matches state-of-the-art provers on public math benchmarks and outperforms them on new abstract-algebra and quantum-theory benchmarks while also assisting an expert with a cryptography proof.
GenoMAS deploys six specialized LLM agents with guided planning to preprocess transcriptomic data and identify genes, reaching 89.13% composite similarity and 60.48% F1 on the GenoTEX benchmark while outperforming prior methods.
citing papers explorer
-
Discover and Prove: An Open-source Agentic Framework for Hard Mode Automated Theorem Proving in Lean 4
DAP achieves SOTA on Hard Mode ATP by having LLMs discover answers then prove them formally, solving 10 CombiBench and 36 PutnamBench problems while exposing that LLMs exceed 80% answer accuracy where formal provers stay under 10%.
-
Lean Atlas: An Integrated Proof Environment for Scalable Human-AI Collaborative Formalization
Lean Atlas visualizes Lean 4 dependency graphs and applies Lean Compass to reduce the nodes needing human semantic review by 27-99% across six evaluated projects.
-
OProver: A Unified Framework for Agentic Formal Theorem Proving
OProver-32B achieves top Pass@32 scores on MiniF2F, ProverBench, and PutnamBench by combining continued pretraining with iterative agentic proving, retrieval, SFT on repairs, and RL on unresolved cases using a 6.86M-proof dataset.
-
A Minimal Agent for Automated Theorem Proving
A minimal agentic system achieves competitive performance in automated theorem proving with a simpler design and lower cost than state-of-the-art methods.
-
Multi-Modal Multi-Agent Reinforcement Learning for Radiology Report Generation
MARL-Rad trains region-specific and global agents with reinforcement learning on clinical rewards to produce more accurate radiology reports than prior methods on MIMIC-CXR and IU X-ray datasets.
-
Ax-Prover: A Deep Reasoning Agentic Framework for Theorem Proving in Mathematics and Quantum Physics
Ax-Prover is a tool-using multi-agent LLM system that matches state-of-the-art provers on public math benchmarks and outperforms them on new abstract-algebra and quantum-theory benchmarks while also assisting an expert with a cryptography proof.
-
GenoMAS: A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis
GenoMAS deploys six specialized LLM agents with guided planning to preprocess transcriptomic data and identify genes, reaching 89.13% composite similarity and 60.48% F1 on the GenoTEX benchmark while outperforming prior methods.