Soohak is a 439-problem mathematician-curated benchmark where frontier LLMs reach at most 30.4% on research math challenges and no model exceeds 50% on refusal for ill-posed problems.
Canonical reference
Semi-autonomous mathematics discovery with gemini: A case study on the erd\h{o}s problems
Canonical reference. 100% of citing Pith papers cite this work as background.
citation-role summary
citation-polarity summary
years
2026 8roles
background 6polarities
background 6representative citing papers
A SAT-plus-LLM method discovers infinite families of doubly saturated Ramsey-good graphs, answering Grinstead and Roberts' 1982 question.
k-server-bench formulates potential-function discovery for the k-server conjecture as a code-based inequality-satisfaction task; current agents fully solve the resolved k=3 case and reduce violations on the open k=4 case.
GRAFT-ATHENA projects combinatorial method choices into factored trees that embed as fingerprints in a metric space, enabling an agentic system to accumulate experience across domains and autonomously discover new numerical techniques for physics-informed problems.
SCALAR generates conjectures linking optimal QAOA parameters to graph invariants, recovers known periodicity and parameter-transfer properties, and identifies correlations with optimization landscapes across thousands of graphs up to 77 qubits.
Five improved inequalities were found with AI help: better Gaussian perimeter bounds for convex sets, sharper L2-L1 moments on the Hamming cube, a strengthened autoconvolution inequality, improved g-Sidon set bounds, and an optimal balanced Szarek inequality.
citing papers explorer
-
$k$-server-bench: Automating Potential Discovery for the $k$-Server Conjecture
k-server-bench formulates potential-function discovery for the k-server conjecture as a code-based inequality-satisfaction task; current agents fully solve the resolved k=3 case and reduce violations on the open k=4 case.