Mixed citations

First proof

Mohammed Abouzaid, Andrew J · 2026 · arXiv 2602.05192

Mixed citation behavior. Most common role is background (60%).

10 Pith papers citing it

Background 60% of classified citations

read on arXiv browse 10 citing papers

citation-role summary

background 3 dataset 2

citation-polarity summary

background 3 use dataset 2

representative citing papers

Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs

cs.CL · 2026-05-09 · unverdicted · novelty 8.0 · 2 refs

Soohak is a 439-problem mathematician-curated benchmark where frontier LLMs reach at most 30.4% on research math challenges and no model exceeds 50% on refusal for ill-posed problems.

Formalizing Mathematics at Scale

cs.AI · 2026-05-28 · accept · novelty 7.0

A multi-agent framework called AutoformBot autoformalized 26 textbooks spanning analysis, algebra, topology, combinatorics and probability into a verified Lean 4 library of 45k declarations, demonstrating scalable formalization of graduate math.

AI co-mathematician: Accelerating mathematicians with agentic AI

cs.AI · 2026-05-07 · unverdicted · novelty 7.0 · 2 refs

An interactive AI workbench for mathematicians achieves 48% on FrontierMath Tier 4 and helped solve open problems in early tests.

$k$-server-bench: Automating Potential Discovery for the $k$-Server Conjecture

cs.MS · 2026-04-08 · accept · novelty 7.0

k-server-bench formulates potential-function discovery for the k-server conjecture as a code-based inequality-satisfaction task; current agents fully solve the resolved k=3 case and reduce violations on the open k=4 case.

RMA: an Agentic System for Research-Level Mathematical Problems

cs.AI · 2026-05-20 · unverdicted · novelty 6.0

RMA, a multi-agent system with structured memory and iterative feedback loops, solves 8 out of 10 research-level math problems on the new First Proof benchmark and outperforms GPT-5.2R and Aletheia according to expert evaluation.

Spectral Structure in Finite Free Information Inequalities and $p$-Stam Phase Transitions

math.PR · 2026-04-13 · unverdicted · novelty 6.0 · 2 refs

Computational discovery via FlowBoost supports conjectures on the singular values of the coupling matrix E_n being 2^{-k/2} independent of n, a sharp p=2 critical exponent for p-Stam inequalities, and bifurcation of extremals for p<2.

pAI/MSc: ML Theory Research with Humans on the Loop

cs.AI · 2026-04-22 · unverdicted · novelty 5.0

pAI/MSc is a customizable multi-agent system that reduces human steering by orders of magnitude when turning a hypothesis into a literature-grounded, mathematically established, experimentally supported manuscript draft in ML theory.

Forage V2: Knowledge Evolution and Transfer in Autonomous Agent Organizations

cs.AI · 2026-04-21 · unverdicted · novelty 5.0

Forage V2 enables agent organizations to grow knowledge from 0 to 54 entries over runs and transfer it so weaker models nearly match stronger ones in coverage, cost, and speed on open-world tasks.

Artificial Intelligence and the Structure of Mathematics

cs.AI · 2026-04-07 · unverdicted · novelty 4.0

AI agents exploring Platonic mathematical structures via proof hypergraphs may reveal the overall architecture of formal mathematics and what makes parts of it human-accessible.

Automated Conjecture Resolution with Formal Verification

cs.LG · 2026-04-04

citing papers explorer

Showing 3 of 3 citing papers after filters.

Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs cs.CL · 2026-05-09 · unverdicted · none · ref 1 · 2 links
Soohak is a 439-problem mathematician-curated benchmark where frontier LLMs reach at most 30.4% on research math challenges and no model exceeds 50% on refusal for ill-posed problems.
$k$-server-bench: Automating Potential Discovery for the $k$-Server Conjecture cs.MS · 2026-04-08 · accept · none · ref 1
k-server-bench formulates potential-function discovery for the k-server conjecture as a code-based inequality-satisfaction task; current agents fully solve the resolved k=3 case and reduce violations on the open k=4 case.
pAI/MSc: ML Theory Research with Humans on the Loop cs.AI · 2026-04-22 · unverdicted · none · ref 9
pAI/MSc is a customizable multi-agent system that reduces human steering by orders of magnitude when turning a hypothesis into a literature-grounded, mathematically established, experimentally supported manuscript draft in ML theory.

First proof

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer