Aria: An Agent For Retrieval and Iterative Auto-Formalization via Dependency Graph

Bin Dong; Guoxiong Gao; Hanyu Wang; Ruohan Xie; Xintao Yu; Yutong Wang

arxiv: 2510.04520 · v2 · pith:CJBKL6XInew · submitted 2025-10-06 · 💻 cs.AI

Aria: An Agent For Retrieval and Iterative Auto-Formalization via Dependency Graph

Hanyu Wang , Ruohan Xie , Yutong Wang , Guoxiong Gao , Xintao Yu , Bin Dong This is my paper

classification 💻 cs.AI

keywords ariaaccuracyfinalagentauto-formalizationdefinitionsdependencygraph

0 comments

read the original abstract

Accurate auto-formalization of theorem statements is essential for advancing automated discovery and verification of research-level mathematics, yet remains a major bottleneck for LLMs due to hallucinations, semantic mismatches, and their inability to synthesize new definitions. To tackle these issues, we present Aria (Agent for Retrieval and Iterative Autoformalization), a system for conjecture-level formalization in Lean that emulates human expert reasoning via a two-phase Graph-of-Thought process: recursively decomposing statements into a dependency graph and then constructing formalizations from grounded concepts. To ensure semantic correctness, we introduce AriaScorer, a checker that retrieves definitions from Mathlib for term-level grounding, enabling rigorous and reliable verification. We evaluate Aria on diverse benchmarks. On ProofNet, it achieves 91.6% compilation success rate and 68.5% final accuracy, surpassing previous methods. On FATE-X, a suite of challenging algebra problems from research literature, it outperforms the best baseline with 44.0% vs. 24.0% final accuracy. On a dataset of homological conjectures, Aria reaches 42.9% final accuracy while all other models score 0%.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

MathAtlas: A Benchmark for Autoformalization in the Wild
cs.AI 2026-05 accept novelty 8.0

MathAtlas is the first large-scale benchmark for autoformalizing graduate mathematics, where even strong models reach only 9.8% correctness on theorem statements and drop to 2.6% on the hardest dependency-deep subset.
Automated Conjecture Resolution with Formal Verification
cs.LG 2026-04 unverdicted novelty 6.0

An AI framework combining informal reasoning and formal verification resolves an open commutative algebra problem and produces a Lean 4-checked proof with minimal human input.
AI for Mathematics: Progress, Challenges, and Prospects
math.HO 2026-01 unverdicted novelty 4.0

AI for math combines task-specific architectures and general foundation models to support research and advance AI reasoning capabilities.