{"total":19,"items":[{"citing_arxiv_id":"2605.22511","ref_index":14,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Search-E1: Self-Distillation Drives Self-Evolution in Search-Augmented Reasoning","primary_cat":"cs.AI","submitted_at":"2026-05-21T14:00:57+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.21027","ref_index":3,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Beyond Text-to-SQL: An Agentic LLM System for Governed Enterprise Analytics APIs","primary_cat":"cs.CL","submitted_at":"2026-05-20T11:00:56+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Analytic Agent is an agentic LLM system that translates natural language intents into governed enterprise analytics API interactions, evaluated on 90 expert-constructed real-world use cases.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18396","ref_index":28,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"NEWTON: Agentic Planning for Physically Grounded Video Generation","primary_cat":"cs.CV","submitted_at":"2026-05-18T13:42:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"NEWTON improves physical accuracy in video generation by deploying a trainable planner that coordinates physics-aware tools and a verifier, raising joint accuracy on VideoPhy-2 without altering the base generators.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14126","ref_index":15,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Reinforcement Learning for Tool-Calling Agents in Fast Healthcare Interoperability Resources (FHIR)","primary_cat":"cs.LG","submitted_at":"2026-05-13T21:27:21+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"RL post-training lifts answer correctness on FHIR-AgentBench from 50% (o4-mini) to 77% with a cheaper Qwen3-8B CodeAct agent.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14051","ref_index":14,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"SPIN: Structural LLM Planning via Iterative Navigation for Industrial Tasks","primary_cat":"cs.AI","submitted_at":"2026-05-13T19:12:24+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"SPIN enforces DAG-valid plans and prefix-based stopping for LLM agents, cutting executed tasks from 1061 to 623 and tool calls from 11.81 to 6.82 per run on AssetOpsBench while raising success from 0.638 to 0.706.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.13579","ref_index":16,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Position: Assistive Agents Need Accessibility Alignment","primary_cat":"cs.AI","submitted_at":"2026-05-13T14:13:53+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Assistive agents for BVI users need accessibility alignment as a core design goal, with a proposed lifecycle pipeline, because sighted assumptions cause unfixable failures in verification, risk, and interaction.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.09931","ref_index":35,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"PruneTIR: Inference-Time Tool Call Pruning for Effective yet Efficient Tool-Integrated Reasoning","primary_cat":"cs.CL","submitted_at":"2026-05-11T03:28:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"PruneTIR prunes erroneous tool-call trajectories during LLM inference via three trigger-based components to raise Pass@1 accuracy and efficiency while shortening context.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.07725","ref_index":5,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"SOD: Step-wise On-policy Distillation for Small Language Model Agents","primary_cat":"cs.CL","submitted_at":"2026-05-08T13:30:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SOD reweights on-policy distillation strength step-by-step using divergence to stabilize tool use in small language model agents, yielding up to 20.86% gains and 26.13% on AIME 2025 for a 0.6B model.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Existing post-training methods for enhancing TIR abilities are largely based on reinforcement learning (RL) [ 17-19], particularly policy optimization algorithms such as group relative policy optimization (GRPO) [20]. However, in SLM-based TIR settings, RL often suffers from unstable optimization [13, 14]. TIR tasks typically involve long-horizon trajectories [17], multi-step decision making [18], and interactions with external tools [5], whereas RL commonly provides only sparse outcome-level rewards [21]. For small models with limited capacity and weaker exploration ability, such sparse supervision can further exacerbate exploration failure, leaving the policy in a cold-start regime with few informative reasoning signals [22]. ∗Equal contribution. †This work was done during an internship at Tencent."},{"citing_arxiv_id":"2605.06534","ref_index":65,"ref_count":2,"confidence":0.98,"is_internal_anchor":true,"paper_title":"ROSE: Rollout On Serving GPUs via Cooperative Elasticity for Agentic RL","primary_cat":"cs.DC","submitted_at":"2026-05-07T16:33:40+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"ROSE is a system for cooperative elasticity that co-locates serving and rollout models on shared GPUs, delivering 1.3-3.3x higher end-to-end throughput than fixed-resource baselines while preserving serving SLOs.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Microsoft trace [60] at minute granularity alongside three zoomed-in 5-minute windows at per-second granularity. At the minute level, the peak rate reaches1.7× the 24-hour aver- age. At the second level, burstiness is far more pronounced: per-second peaks reach 4.22×, 1.58×, and 1.73× their respec- tive window averages, consistent with second-level spikes reported by BurstGPT [66]. To absorb such spikes, providers often statically overprovision for peak demand [88], result- ing in substantial GPU underutilization. We quantify this by replaying a 24-hour production trace from [47] (preserving original prompt lengths, response lengths, and arrival pro- cess) on Qwen3-8B with 8 H800 GPUs. Figure 3b shows the GPU utilization sampled at 1-second intervals and smoothed"},{"citing_arxiv_id":"2605.00737","ref_index":22,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"To Call or Not to Call: A Framework to Assess and Optimize LLM Tool Calling","primary_cat":"cs.AI","submitted_at":"2026-05-01T15:38:13+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.21138","ref_index":56,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Navigating the Clutter: Waypoint-Based Bi-Level Planning for Multi-Robot Systems","primary_cat":"cs.RO","submitted_at":"2026-04-22T22:58:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Waypoint-based bi-level planning with curriculum RLVR improves multi-robot task success rates in dense-obstacle benchmarks over motion-agnostic and VLA baselines.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.18936","ref_index":222,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Fine-Tuning Small Reasoning Models for Quantum Field Theory","primary_cat":"cs.LG","submitted_at":"2026-04-21T00:21:05+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Small 7B reasoning models were fine-tuned on synthetic and curated QFT problems using RL and SFT, yielding performance gains, error analysis, and public release of data and traces.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.16804","ref_index":61,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"AutoOR: Scalably Post-training LLMs to Autoformalize Operations Research Problems","primary_cat":"cs.LG","submitted_at":"2026-04-18T03:24:54+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"AutoOR uses synthetic data generation and RL post-training with solver feedback to enable 8B LLMs to autoformalize linear, mixed-integer, and non-linear OR problems, matching larger models on benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.09455","ref_index":24,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"E3-TIR: Enhanced Experience Exploitation for Tool-Integrated Reasoning","primary_cat":"cs.AI","submitted_at":"2026-04-10T16:14:48+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"E3-TIR integrates expert prefixes, guided branches, and self-exploration via mix policy optimization to deliver 6% better tool-use performance with under 10% of the usual synthetic data and 1.46x ROI.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.13064","ref_index":18,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Red Skills or Blue Skills? A Dive Into Skills Published on ClawHub","primary_cat":"cs.CL","submitted_at":"2026-03-19T14:31:22+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Analysis of ClawHub shows language-based functional divides in agent skills, with over 30% flagged suspicious and submission-time documentation enabling 73% accurate risk prediction.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2603.16876","ref_index":103,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Multi-Modal Multi-Agent Reinforcement Learning for Radiology Report Generation","primary_cat":"cs.CV","submitted_at":"2026-02-17T12:48:32+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MARL-Rad trains region-specific and global agents with reinforcement learning on clinical rewards to produce more accurate radiology reports than prior methods on MIMIC-CXR and IU X-ray datasets.","context_count":1,"top_context_role":"baseline","top_context_polarity":"baseline","context_text":"a realistic workflow. This workflow mirrors the real practice of radiologists, who meticulously examine each anatomi- cal region before composing the comprehensive diagnos- tic report. Experiments on the MIMIC-CXR and IU X- ray datasets demonstrate that MARL-Rad consistently im- proves clinical efficacy metrics such as RadGraph F1 [43], CheXbert F1 [103], and GREEN scores [89], achieving state-of-the-art performance. Moreover, deeper analyses show that MARL-Rad improves laterality consistency and produces more detailed and clinically accurate descriptions. Our key contributions are summarized as follows: •End-to-end optimization of collaborative agents: Un- like prior training-free approaches or single-agent RL"},{"citing_arxiv_id":"2601.14287","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Chain-of-Memory: Lightweight Memory Construction with Dynamic Evolution for LLM Agents","primary_cat":"cs.LG","submitted_at":"2026-01-14T04:42:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"CoM organizes memory fragments into evolving inference paths with adaptive truncation, delivering 7.5-10.4% accuracy gains on long-memory benchmarks at 2.7% token cost and 6% latency of complex alternatives.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2512.08980","ref_index":34,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Training Multi-Image Vision Agents via End2End Reinforcement Learning","primary_cat":"cs.CV","submitted_at":"2025-12-05T10:02:38+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"IMAgent trains a multi-image vision agent via pure end-to-end RL with visual reflection tools and a two-layer motion trajectory masking strategy, reaching SOTA on single- and multi-image benchmarks while revealing tool-use effects on attention.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2511.00739","ref_index":34,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Towards Understanding, Analyzing, and Optimizing Agentic AI Execution: A CPU-Centric Perspective","primary_cat":"cs.AI","submitted_at":"2025-11-01T23:46:44+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"The paper analyzes CPU bottlenecks in agentic AI serving, selects representative workloads, and demonstrates that CPU-aware scheduling optimizations COMB and MAS can reduce P50 latency by up to 1.7x and total latency by up to 2.49x on two hardware systems.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}