{"total":11,"items":[{"citing_arxiv_id":"2605.21160","ref_index":7,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Learning First Integrals via Backward-Generated Data and Guided Reinforcement Learning","primary_cat":"cs.LG","submitted_at":"2026-05-20T13:27:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"FISolver trains a compact LLM on backward-generated (differential equation, first integral) pairs and uses guided reinforcement learning to outperform larger models and Mathematica on first-integral benchmarks at lower cost.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.22875","ref_index":3,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"RMA: an Agentic System for Research-Level Mathematical Problems","primary_cat":"cs.AI","submitted_at":"2026-05-20T04:54:22+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"RMA, a multi-agent system with structured memory and iterative feedback loops, solves 8 out of 10 research-level math problems on the new First Proof benchmark and outperforms GPT-5.2R and Aletheia according to expert evaluation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.20244","ref_index":41,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Lean Refactor: Multi-Objective Controllable Proof Optimization via Agentic Strategy Search","primary_cat":"cs.LO","submitted_at":"2026-05-18T04:19:41+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Lean Refactor uses retrieval from a curated multi-objective strategy database to guide frozen LLMs in refactoring Lean proofs, reporting over 70% token compression on benchmarks and improved version transfer.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.10754","ref_index":35,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"The Agent Use of Agent Beings: Agent Cybernetics Is the Missing Science of Foundation Agents","primary_cat":"cs.AI","submitted_at":"2026-05-11T15:53:54+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Agent Cybernetics reframes foundation agent design by adapting classical cybernetics laws into three engineering desiderata for reliable, long-running, self-improving agents.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Cradle: Empowering foundation agents towards general computer control. InICML, 2025. [33] Naftali Tishby, Fernando C Pereira, and William Bialek. The information bottleneck method. arXiv preprint physics/0004057, 2000. [34] Naftali Tishby and Noga Zaslavsky. Deep learning and the information bottleneck principle. In 2015 ieee information theory workshop (itw), pages 1-5. Ieee, 2015. [35] Trieu H Trinh, Yuhuai Wu, Quoc V Le, He He, and Thang Luong. Solving olympiad geometry without human demonstrations.Nature, 625(7995):476-482, 2024. [36] Hsue Shen Tsien.Engineering Cybernetics. McGraw-Hill, New York, 1954. [37] Alexander Matt Turner and Prasad Tadepalli. Parametrically retargetable decision-makers tend to seek power. InNeurIPS, pages 31391-31401, 2022."},{"citing_arxiv_id":"2605.06123","ref_index":66,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Back to the Beginning of Heuristic Design: Bridging Code and Knowledge with LLMs","primary_cat":"cs.AI","submitted_at":"2026-05-07T12:30:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A knowledge-first approach to LLM-driven automatic heuristic design in combinatorial optimization yields better discovery efficiency, transfer, and generalization than code-centric baselines by formalizing a distortion-compression trade-off.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2603.19715","ref_index":63,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Neuro-Symbolic Proof Generation for Scaling Systems Software Verification","primary_cat":"cs.AI","submitted_at":"2026-03-20T07:45:49+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A neuro-symbolic system using LLM-guided best-first search and Isabelle tools proves up to 77.6% of theorems on the seL4 benchmark, outperforming prior LLM methods and Sledgehammer.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2603.01070","ref_index":23,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"How RL Unlocks the Aha Moment in Geometric Interleaved Reasoning","primary_cat":"cs.CL","submitted_at":"2026-03-01T12:18:12+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Reinforcement learning with three causal constraints enables multimodal models to internalize diagram-reasoning links in geometry, unlike SFT which only mimics surface format and harms performance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2602.08167","ref_index":64,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Self-Supervised Bootstrapping of Action-Predictive Embodied Reasoning","primary_cat":"cs.RO","submitted_at":"2026-02-09T00:10:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"R&B-EnCoRe uses self-supervised importance-weighted variational inference to distill action-predictive reasoning datasets that improve VLA performance on manipulation, navigation, and driving tasks without external verifiers.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"20094, 2024. [62] Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li, Carlos Guestrin, Percy Liang, and Tatsunori B Hashimoto. Stanford alpaca: An instruction-following llama model, 2023. [63] Trieu H Trinh, Yuhuai Wu, Quoc V Le, He He, and Thang Luong. Solving olympiad geometry without human demonstrations.Nature, 625(7995):476-482, 2024. [64] Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948, 2025. [65] David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian"},{"citing_arxiv_id":"2601.12538","ref_index":29,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Agentic Reasoning for Large Language Models","primary_cat":"cs.AI","submitted_at":"2026-01-18T18:58:23+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"The survey structures agentic reasoning for LLMs into foundational, self-evolving, and collective multi-agent layers while distinguishing in-context orchestration from post-training optimization and reviewing applications across domains.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Post-training Reasoningtargets capability internalization: it consolidates successful reasoning patterns or tool-use strategies into the model's weights via reinforcement learning and fine-tuning. Together, they provide an actionable roadmap for designing agents. Building on the three-layer taxonomy, agentic reasoning has begun to underpin a wide range of practical applications, from mathematical exploration [29, 30] and vibe coding [11, 31, 32] to scientific discovery 3 Agentic Reasoning for Large Language Models Survey Scope This survey reviewsreasoning-empowered agentic systemswhere reasoning drives adaptive behavior. We analyze these systems through two complementary optimization modes: •In-context Reasoning: scales inference-time interaction through structured orchestration and plan-"},{"citing_arxiv_id":"2508.08636","ref_index":43,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"InternBootcamp Technical Report: Boosting LLM Reasoning with Verifiable Task Scaling","primary_cat":"cs.CL","submitted_at":"2025-08-12T05:00:00+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"InternBootcamp supplies 1000+ verifiable, auto-generated task environments across domains that enable task scaling to improve LLM reasoning, producing a 32B model with state-of-the-art results on the new Bootcamp-EVAL benchmark.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2505.10465","ref_index":10,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Superposition Yields Robust Neural Scaling","primary_cat":"cs.LG","submitted_at":"2025-05-15T16:18:13+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Strong superposition causes neural loss to scale as the inverse of model dimension due to geometric feature overlaps, explaining scaling laws for broad frequency distributions.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"[8] Ross Taylor, Marcin Kardas, Guillem Cucurull, Thomas Scialom, Anthony Hartshorn, Elvis Saravia, Andrew Poulton, Viktor Kerkez, and Robert Stojnic. Galactica: A large language model for science.arXiv preprint arXiv:2211.09085, 2022. [9] Stephen Wolfram. Wolfram|alpha as the computation engine for gpt models, 2023. https://www.wolfram.com/wolfram-alpha-openai-plugin. [10] Trieu H Trinh, Yuhuai Wu, Quoc V Le, He He, and Thang Luong. Solving olympiad geometry without human demonstrations.Nature, 625(7995):476-482, 2024. [11] Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde De Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. Evaluating large language models trained on code."}],"limit":50,"offset":0}