hub Mixed citations

First proof

· 2026 · arXiv 2602.05192

Mixed citation behavior. Most common role is background (60%).

15 Pith papers citing it

Background 60% of classified citations

read on arXiv browse 15 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3 dataset 2

citation-polarity summary

background 3 use dataset 2

representative citing papers

Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs

cs.CL · 2026-05-09 · unverdicted · novelty 8.0 · 2 refs

Soohak is a 439-problem mathematician-curated benchmark where frontier LLMs reach at most 30.4% on research math challenges and no model exceeds 50% on refusal for ill-posed problems.

Failure Modes of Large Language Models on Research-Level Mathematics: A Taxonomy and an Empirical Characterisation

cs.DL · 2026-06-12 · conditional · novelty 7.0

This paper introduces a taxonomy of four LLM failure modes on research math proofs and empirically shows premise smuggling in all eight audited Gemini outputs, with a new audit instrument achieving 100% precision.

Formalizing Mathematics at Scale

cs.AI · 2026-05-28 · accept · novelty 7.0

A multi-agent framework called AutoformBot autoformalized 26 textbooks spanning analysis, algebra, topology, combinatorics and probability into a verified Lean 4 library of 45k declarations, demonstrating scalable formalization of graduate math.

AI co-mathematician: Accelerating mathematicians with agentic AI

cs.AI · 2026-05-07 · unverdicted · novelty 7.0 · 2 refs

An interactive AI workbench for mathematicians achieves 48% on FrontierMath Tier 4 and helped solve open problems in early tests.

$k$-server-bench: Automating Potential Discovery for the $k$-Server Conjecture

cs.MS · 2026-04-08 · accept · novelty 7.0

k-server-bench formulates potential-function discovery for the k-server conjecture as a code-based inequality-satisfaction task; current agents fully solve the resolved k=3 case and reduce violations on the open k=4 case.

An Enigma of Artificial Reason: Investigating the Production-Evaluation Gap in Large Reasoning Models

cs.AI · 2026-05-31 · conditional · novelty 6.0

LRMs show a large production-evaluation gap on the VAIR dataset with valid answers but invalid reasoning, driven by answer confirmation bias as evidenced by CoT analysis, linear probes, and causal patching.

RMA: an Agentic System for Research-Level Mathematical Problems

cs.AI · 2026-05-20 · unverdicted · novelty 6.0

RMA, a multi-agent system with structured memory and iterative feedback loops, solves 8 out of 10 research-level math problems on the new First Proof benchmark and outperforms GPT-5.2R and Aletheia according to expert evaluation.

Spectral Structure in Finite Free Information Inequalities and $p$-Stam Phase Transitions

math.PR · 2026-04-13 · unverdicted · novelty 6.0 · 2 refs

Computational discovery via FlowBoost supports conjectures on the singular values of the coupling matrix E_n being 2^{-k/2} independent of n, a sharp p=2 critical exponent for p-Stam inequalities, and bifurcation of extremals for p<2.

First Proof Second Batch

cs.AI · 2026-06-16 · unverdicted · novelty 5.0

Reports methodology and results from evaluating multiple AI systems on ten new research-level mathematics problems from active researchers.

Iteris: Agentic Research Loops for Computational Mathematics

cs.AI · 2026-06-01 · unverdicted · novelty 5.0

Iteris, an agentic research system, produced evidence and drafts for two open computational math problems that were verified after human correction.

pAI/MSc: ML Theory Research with Humans on the Loop

cs.AI · 2026-04-22 · unverdicted · novelty 5.0

pAI/MSc is a customizable multi-agent system that reduces human steering by orders of magnitude when turning a hypothesis into a literature-grounded, mathematically established, experimentally supported manuscript draft in ML theory.

Forage V2: Knowledge Evolution and Transfer in Autonomous Agent Organizations

cs.AI · 2026-04-21 · unverdicted · novelty 5.0

Forage V2 enables agent organizations to grow knowledge from 0 to 54 entries over runs and transfer it so weaker models nearly match stronger ones in coverage, cost, and speed on open-world tasks.

Artificial Intelligence and the Structure of Mathematics

cs.AI · 2026-04-07 · unverdicted · novelty 4.0

AI agents exploring Platonic mathematical structures via proof hypergraphs may reveal the overall architecture of formal mathematics and what makes parts of it human-accessible.

How AI settled the complexity of the oldest SGD algorithm

cs.LG · 2026-06-28 · unverdicted · novelty 3.0

AI models discovered the worst-case complexity of the Kaczmarz algorithm for solving linear systems.

Automated Conjecture Resolution with Formal Verification

cs.LG · 2026-04-04

citing papers explorer

Showing 15 of 15 citing papers.

Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs cs.CL · 2026-05-09 · unverdicted · none · ref 1 · 2 links
Soohak is a 439-problem mathematician-curated benchmark where frontier LLMs reach at most 30.4% on research math challenges and no model exceeds 50% on refusal for ill-posed problems.
Failure Modes of Large Language Models on Research-Level Mathematics: A Taxonomy and an Empirical Characterisation cs.DL · 2026-06-12 · conditional · none · ref 1
This paper introduces a taxonomy of four LLM failure modes on research math proofs and empirically shows premise smuggling in all eight audited Gemini outputs, with a new audit instrument achieving 100% precision.
Formalizing Mathematics at Scale cs.AI · 2026-05-28 · accept · none · ref 2
A multi-agent framework called AutoformBot autoformalized 26 textbooks spanning analysis, algebra, topology, combinatorics and probability into a verified Lean 4 library of 45k declarations, demonstrating scalable formalization of graduate math.
AI co-mathematician: Accelerating mathematicians with agentic AI cs.AI · 2026-05-07 · unverdicted · none · ref 52 · 2 links
An interactive AI workbench for mathematicians achieves 48% on FrontierMath Tier 4 and helped solve open problems in early tests.
$k$-server-bench: Automating Potential Discovery for the $k$-Server Conjecture cs.MS · 2026-04-08 · accept · none · ref 1
k-server-bench formulates potential-function discovery for the k-server conjecture as a code-based inequality-satisfaction task; current agents fully solve the resolved k=3 case and reduce violations on the open k=4 case.
An Enigma of Artificial Reason: Investigating the Production-Evaluation Gap in Large Reasoning Models cs.AI · 2026-05-31 · conditional · none · ref 1
LRMs show a large production-evaluation gap on the VAIR dataset with valid answers but invalid reasoning, driven by answer confirmation bias as evidenced by CoT analysis, linear probes, and causal patching.
RMA: an Agentic System for Research-Level Mathematical Problems cs.AI · 2026-05-20 · unverdicted · none · ref 16
RMA, a multi-agent system with structured memory and iterative feedback loops, solves 8 out of 10 research-level math problems on the new First Proof benchmark and outperforms GPT-5.2R and Aletheia according to expert evaluation.
Spectral Structure in Finite Free Information Inequalities and $p$-Stam Phase Transitions math.PR · 2026-04-13 · unverdicted · none · ref 2 · 2 links
Computational discovery via FlowBoost supports conjectures on the singular values of the coupling matrix E_n being 2^{-k/2} independent of n, a sharp p=2 critical exponent for p-Stam inequalities, and bifurcation of extremals for p<2.
First Proof Second Batch cs.AI · 2026-06-16 · unverdicted · none · ref 3
Reports methodology and results from evaluating multiple AI systems on ten new research-level mathematics problems from active researchers.
Iteris: Agentic Research Loops for Computational Mathematics cs.AI · 2026-06-01 · unverdicted · none · ref 1
Iteris, an agentic research system, produced evidence and drafts for two open computational math problems that were verified after human correction.
pAI/MSc: ML Theory Research with Humans on the Loop cs.AI · 2026-04-22 · unverdicted · none · ref 9
pAI/MSc is a customizable multi-agent system that reduces human steering by orders of magnitude when turning a hypothesis into a literature-grounded, mathematically established, experimentally supported manuscript draft in ML theory.
Forage V2: Knowledge Evolution and Transfer in Autonomous Agent Organizations cs.AI · 2026-04-21 · unverdicted · none · ref 32
Forage V2 enables agent organizations to grow knowledge from 0 to 54 entries over runs and transfer it so weaker models nearly match stronger ones in coverage, cost, and speed on open-world tasks.
Artificial Intelligence and the Structure of Mathematics cs.AI · 2026-04-07 · unverdicted · none · ref 2
AI agents exploring Platonic mathematical structures via proof hypergraphs may reveal the overall architecture of formal mathematics and what makes parts of it human-accessible.
How AI settled the complexity of the oldest SGD algorithm cs.LG · 2026-06-28 · unverdicted · none · ref 1
AI models discovered the worst-case complexity of the Kaczmarz algorithm for solving linear systems.
Automated Conjecture Resolution with Formal Verification cs.LG · 2026-04-04 · unreviewed · ref 1

First proof

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer