{"total":11,"items":[{"citing_arxiv_id":"2605.13165","ref_index":31,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"STOP: Structured On-Policy Pruning of Long-Form Reasoning in Low-Data Regimes","primary_cat":"cs.CL","submitted_at":"2026-05-13T08:28:05+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"STOP uses structured on-policy analysis to prune long reasoning traces to their earliest correct node, cutting token usage 19-42% with little accuracy loss on math benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.08221","ref_index":39,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"NoisyCoconut: Counterfactual Consensus via Latent Space Reasoning","primary_cat":"cs.LG","submitted_at":"2026-05-06T13:58:55+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Injecting noise into LLM latent trajectories creates diverse reasoning paths whose agreement acts as a confidence signal for selective abstention, cutting error rates from 40-70% to under 15% on math tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.21611","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Process Supervision via Verbal Critique Improves Reasoning in Large Language Models","primary_cat":"cs.CL","submitted_at":"2026-04-23T12:36:12+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Verbal Process Supervision uses structured critiques from stronger models in an iterative loop to improve LLM reasoning, reaching 94.9% on GPQA Diamond and large gains on AIME 2025.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.17433","ref_index":62,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Self-Consistency from Only Two Samples: CoT-PoT Ensembling for Efficient LLM Reasoning","primary_cat":"cs.CL","submitted_at":"2026-04-19T13:26:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"CoT-PoT ensembling achieves self-consistency accuracy in LLMs with only two samples for 78.6% of tasks, reducing computation by 9.3x compared to standard methods.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.12717","ref_index":26,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Transferable Expertise for Autonomous Agents via Real-World Case-Based Learning","primary_cat":"cs.AI","submitted_at":"2026-04-14T13:31:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A case-based learning framework extracts reusable knowledge from past tasks to improve LLM agents' structured performance on complex real-world tasks, outperforming standard prompting baselines especially as task complexity grows.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2412.06769","ref_index":14,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Training Large Language Models to Reason in a Continuous Latent Space","primary_cat":"cs.CL","submitted_at":"2024-12-09T18:55:56+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Coconut lets LLMs perform reasoning directly in continuous latent space by recycling hidden states as inputs, outperforming standard chain-of-thought on search-intensive logical tasks with better accuracy-efficiency trade-offs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2406.06592","ref_index":7,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Improve Mathematical Reasoning in Language Models by Automated Process Supervision","primary_cat":"cs.CL","submitted_at":"2024-06-05T19:25:40+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"OmegaPRM automates collection of 1.5 million process supervision labels via binary-search MCTS, raising Gemini Pro math accuracy from 51% to 69.4% on MATH500 and Gemma2 27B from 42.3% to 58.2%.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2406.00515","ref_index":93,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"A Survey on Large Language Models for Code Generation","primary_cat":"cs.CL","submitted_at":"2024-06-01T17:48:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"A systematic literature review that organizes recent work on LLMs for code generation into a taxonomy covering data curation, model advances, evaluations, ethics, environmental impact, and applications, with benchmark comparisons.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2402.02716","ref_index":14,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Understanding the planning of LLM agents: A survey","primary_cat":"cs.AI","submitted_at":"2024-02-05T04:25:24+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":4.0,"formal_verification":"none","one_line_summary":"A survey that provides a taxonomy of methods for improving planning in LLM-based agents across task decomposition, plan selection, external modules, reflection, and memory.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2309.07864","ref_index":185,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"The Rise and Potential of Large Language Model Based Agents: A Survey","primary_cat":"cs.AI","submitted_at":"2023-09-14T17:12:03+00:00","verdict":"ACCEPT","verdict_confidence":"HIGH","novelty_score":4.0,"formal_verification":"none","one_line_summary":"The paper surveys the origins, frameworks, applications, and open challenges of AI agents built on large language models.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Interactive retrieval Memory Sandbox[176], ChatDB [175], etc. Reasoning & Planning §3.1.4 Reasoning CoT [95], Zero-shot-CoT [96], Self-Consistency [97], Self- Polish [99], Selection-Inference [177], Self-Refine [178], etc. Planing Plan formulation Least-to-Most [98], SayCan [179], Hug- gingGPT [180], ToT [181], PET [182], DEPS [183], RAP [184], SwiftSage [185], LLM+P [125], MRKL [186], etc. Plan reflection LLM-Planner [101], Inner Monologue [187], ReAct [91], ChatCoT [188], AI Chains [189], V oyager [190], Zhao et al. [191], SelfCheck [192], etc. Transferability & Generalization §3.1.5 Unseen task generalization T0 [106], FLAN [105], Instruct- GPT [24], Chung et al. [107], etc. In-context learning GPT-3 [41], Wang et al."},{"citing_arxiv_id":"2305.10601","ref_index":9,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Tree of Thoughts: Deliberate Problem Solving with Large Language Models","primary_cat":"cs.CL","submitted_at":"2023-05-17T23:16:17+00:00","verdict":"ACCEPT","verdict_confidence":"HIGH","novelty_score":8.0,"formal_verification":"none","one_line_summary":"Tree of Thoughts enables language models to solve complex planning tasks by generating, evaluating, and searching over coherent intermediate thoughts in a tree, raising Game of 24 success from 4% to 74% with GPT-4.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}