{"total":10,"items":[{"citing_arxiv_id":"2605.18004","ref_index":7,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"RL4RLA: Teaching ML to Discover Randomized Linear Algebra Algorithms Through Curriculum Design and Graph-Based Search","primary_cat":"cs.LG","submitted_at":"2026-05-18T07:57:26+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"RL4RLA is a reinforcement learning framework that discovers interpretable symbolic randomized linear algebra algorithms by combining curriculum learning and graph-based search to overcome sparse rewards and large search spaces.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.10598","ref_index":7,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Budget-Efficient Automatic Algorithm Design via Code Graph","primary_cat":"cs.AI","submitted_at":"2026-05-11T14:01:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A code-graph and correction-based LLM search framework outperforms full-algorithm generation at equal token budgets on three combinatorial optimization problems.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"discovery, and EoH [3, 11] refined it for general heuristic design by incorporating natural-language descriptions of heuristics into the generation prompts. Subsequent work has extended the loop along several axes: ReEvo [4] adds reflective self-critique between generations, EvoCut [6] targets 2 cutting-plane generation for integer programming, and EvoTune [7] and TIDE [12] integrate online tuning of the generator into the evolutionary loop. These methods share a common limitation: each generation produces full algorithms, and population diversity is maintained only through prompt-level mechanisms, leaving the search prone to paradigm collapse when the LLM's prior is concentrated around a few algorithmic features."},{"citing_arxiv_id":"2605.08756","ref_index":22,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"AHD Agent: Agentic Reinforcement Learning for Automatic Heuristic Design","primary_cat":"cs.AI","submitted_at":"2026-05-09T07:36:45+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"AHD Agent trains a 4B-parameter LLM via agentic RL to actively use tools for automatic heuristic design, matching or exceeding larger baselines across eight domains with fewer evaluations.","context_count":1,"top_context_role":"baseline","top_context_polarity":"baseline","context_text":"arXiv:2507.15615, 2025. [20] M. Chen and G. Li, \"Dasathco: Data-aware sat heuristics combinations optimization via large language models,\" arXiv preprint arXiv:2509.12602, 2025. [21] S. Zhang, S. Liu, N. Lu, J. Wu, J. Liu, Y .-S. Ong, and K. Tang, \"Llm-driven instance-speciﬁc heuristic generation and selection,\" arXiv preprint arXiv:2506.00490, 2026. [22] A. Surina, A. Mansouri, L. Quaedvlieg, A. Seddas, M. Viazovska, E. Abbe, and C. Gulcehre, \"Algorithm discovery with llms: Evolutionary search meets reinforcement learning,\" arXiv preprint arXiv:2504.05108, 2025. [23] R. Zhu, C. Zhang, and Z. Cao, \"Reﬁning hybrid genetic search for CVRP via reinforcement learning-ﬁnetuned LLM,\" in The F ourteenth International Conference on Learning"},{"citing_arxiv_id":"2605.08678","ref_index":90,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"MLS-Bench: A Holistic and Rigorous Assessment of AI Systems on Building Better AI","primary_cat":"cs.LG","submitted_at":"2026-05-09T04:29:46+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Paperbench: Evaluating ai's ability to replicate ai research. In Forty-second International Conference on Machine Learning, 2025. [89] Anja Surina, Amin Mansouri, Lars Quaedvlieg, Amal Seddas, Maryna Viazovska, Emmanuel Abbe, and Caglar Gulcehre. Algorithm discovery with llms: Evolutionary search meets reinforcement learning.arXiv preprint arXiv:2504.05108, 2025. [90] Kyle Swanson, Wesley Wu, Nash L. Bulaong, John E. Pak, and James Zou. The virtual lab: Ai agents design new sars-cov-2 nanobodies with experimental validation. 2024. [91] Jiabin Tang, Lianghao Xia, Zhonghang Li, and Chao Huang. Ai-researcher: Autonomous scientific innovation.arXiv preprint arXiv:2505.18705, 2025. [92] Xiangru Tang, Yuliang Liu, Zefan Cai, Yanjun Shao, Junjie Lu, Yichi Zhang, Zexuan Deng,"},{"citing_arxiv_id":"2605.06123","ref_index":54,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Back to the Beginning of Heuristic Design: Bridging Code and Knowledge with LLMs","primary_cat":"cs.AI","submitted_at":"2026-05-07T12:30:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A knowledge-first approach to LLM-driven automatic heuristic design in combinatorial optimization yields better discovery efficiency, transfer, and generalization than code-centric baselines by formalizing a distortion-compression trade-off.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.07240","ref_index":39,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"$k$-server-bench: Automating Potential Discovery for the $k$-Server Conjecture","primary_cat":"cs.MS","submitted_at":"2026-04-08T16:06:43+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":7.0,"formal_verification":"none","one_line_summary":"k-server-bench formulates potential-function discovery for the k-server conjecture as a code-based inequality-satisfaction task; current agents fully solve the resolved k=3 case and reduce violations on the open k=4 case.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"[37] Nat Sothanaphan.Resolution of Erd˝ os Problem #728: a writeup of Aristotle's Lean proof. 2026. arXiv:2601 . 07421 [math.NT].url:https : / / arxiv . org / abs / 2601 . 07421. [38] Anja Surina et al.Algorithm Discovery With LLMs: Evolutionary Search Meets Rein- forcement Learning. 2025. arXiv:2504 . 05108 [cs.AI].url:https : / / arxiv . org / abs/2504.05108. [39] Adam Zsolt Wagner.Constructions in combinatorics via neural networks. 2021. arXiv: 2104.14516 [math.CO]. [40] Chunhui Wan et al.LoongFlow: Directed Evolutionary Search via a Cognitive Plan- Execute-Summarize Paradigm. 2025. arXiv:2512 . 24077 [cs.AI].url:https : / / arxiv.org/abs/2512.24077. [41] Yiping Wang et al.ThetaEvolve: Test-time Learning on Open Problems."},{"citing_arxiv_id":"2604.02721","ref_index":28,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning","primary_cat":"cs.AI","submitted_at":"2026-04-03T04:26:56+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"GrandCode is the first AI system to consistently beat all human participants and place first in live Codeforces competitive programming contests.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"CodeElo: Benchmarking competition-level code generation of LLMs with human-comparable elo ratings.arXiv preprint arXiv:2501.01257, 2025. [27] Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y .K. Li, Y . Wu, and Daya Guo. DeepSeekMath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300, 2024. [28] Anja Surina, Amin Mansouri, Lars Quaedvlieg, Amal Seddas, Maryna Viazovska, Emmanuel Abbe, and Caglar Gulcehre. Algorithm discovery with LLMs: Evolutionary search meets reinforcement learning. arXiv preprint arXiv:2504.05108, 2025. [29] Thinking Machines Lab. Tinker.https://github.com/thinking-machines-lab/tinker, 2025. Open-source training API. [30] THUDM."},{"citing_arxiv_id":"2601.16175","ref_index":69,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Learning to Discover at Test Time","primary_cat":"cs.LG","submitted_at":"2026-01-22T18:24:00+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"TTT-Discover applies test-time RL to set new state-of-the-art results on math inequalities, GPU kernels, algorithm contests, and single-cell denoising using an open model and public code.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2510.16978","ref_index":21,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Lark: Biologically Inspired Neuroevolution for Multi-Stakeholder LLM Agents","primary_cat":"cs.MA","submitted_at":"2025-10-19T19:43:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Lark is a biologically inspired neuroevolution framework for multi-stakeholder LLM agents that iteratively generates, refines, and selects strategies using plasticity, duplication/maturation, influence-weighted Borda scoring, and token penalties, achieving top-3 performance in 80% of 30-round trials","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2506.13131","ref_index":98,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"AlphaEvolve: A coding agent for scientific and algorithmic discovery","primary_cat":"cs.AI","submitted_at":"2025-06-16T06:37:18+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"AlphaEvolve is an LLM-orchestrated evolutionary coding agent that discovered a 4x4 complex matrix multiplication algorithm using 48 scalar multiplications, the first improvement over Strassen's algorithm in 56 years, plus optimizations for Google data centers and hardware.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"AlphaEvolve: A coding agent for scientific and algorithmic discovery [96] A. Surina, A. Mansouri, L. Quaedvlieg, A. Seddas, M. Viazovska, E. Abbe, and C. Gul- cehre. Algorithm discovery with LLMs: Evolutionary search meets reinforcement learning. InarXiv preprint arXiv:2504.05108, 2025. [97] R. Tanese. Distributed genetic algorithms for function optimization. University of Michigan, 1989. [98] A. Thakur, G. Tsoukalas, Y. Wen, J. Xin, and S. Chaudhuri. An in-context learning agent for formal theorem-proving. InConference on Language Models, 2024. [99] T. H. Trinh, Y. Wu, Q. V. Le, H. He, and T. Luong. Solving olympiad geometry without human demonstrations.Nature, 625(7995):476-482, 2024. [100] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L."}],"limit":50,"offset":0}