{"total":11,"items":[{"citing_arxiv_id":"2606.32025","ref_index":41,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Generative Skill Composition for LLM Agents","primary_cat":"cs.CL","submitted_at":"2026-06-30T17:53:09+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"SkillComposer performs task-conditioned skill sequence prediction with a constrained autoregressive decoder to jointly output skill subset, count, and order, raising pass rates by 23.1 and 18.2 percentage points on two production coding agents over no-skill baselines.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.08224","ref_index":123,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering","primary_cat":"cs.SE","submitted_at":"2026-04-09T13:19:41+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.08033","ref_index":59,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"IoT-Brain: Grounding LLMs for Semantic-Spatial Sensor Scheduling","primary_cat":"cs.AI","submitted_at":"2026-04-09T09:38:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"IoT-Brain uses a neuro-symbolic Spatial Trajectory Graph to ground LLMs for verifiable semantic-spatial sensor scheduling, achieving 37.6% higher task success with lower resource use on a campus-scale benchmark.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"MobiCom '26, October 26-30, 2026, Austin, TX, USA Zhou et al. Table 2: Real-World Scheduling Paradigm Comparison. Paradigm TCR (%) Latency (s) Bandwidth (GB) TFP (frames) Static Scheduling 3.61 403.42 0.138 179 Naive Parallel 65.64 927.99 0.540 704 IoT-Brain (Ours) 49.84 413.69 0.131 166 conventional security practice by triggering downstream sen- sors using a constant-velocity pedestrian model[59], and (2) Naive Parallel Scheduling, a resource-agnostic upper bound that activates all potentially relevant cameras simultaneously. To ensure a controlled comparison, all vision-language queries were handled by a locally deployed Qwen-VL-Chat model[7]. Performance was measured using a comprehensive suite of metrics, including task completion rate (TCR), end-to-end la-"},{"citing_arxiv_id":"2604.05568","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Beyond Tools and Persons: Who Are They? Classifying Robots and AI Agents for Proportional Governance","primary_cat":"cs.ET","submitted_at":"2026-04-07T08:08:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A CPST-based taxonomy sorts autonomous systems into Confined Actors, Socially-Aware Interactors, and CPST-Integrated Agents to enable proportional governance from enhanced liability to qualified personhood.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2509.18847","ref_index":5,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Failure Makes the Agent Stronger: Enhancing Accuracy through Structured Reflection for Reliable Tool Interactions","primary_cat":"cs.CV","submitted_at":"2025-09-23T09:35:49+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Structured reflection makes error diagnosis and repair an explicit trainable step that improves reliability and reduces redundant calls in tool-using LLM agents.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2506.04565","ref_index":137,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"From Standalone LLMs to Integrated Intelligence: A Survey of Compound Al Systems","primary_cat":"cs.MA","submitted_at":"2025-06-05T02:34:43+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A survey that defines Compound AI Systems, proposes a multi-dimensional taxonomy based on component roles and orchestration strategies, reviews four foundational paradigms, and identifies key challenges for future research.","context_count":1,"top_context_role":"dataset","top_context_polarity":"use_dataset","context_text":"From Standalone LLMs to Integrated Intelligence: A Survey of Compound AI Systems 25 Table 1. Summary of evaluation dimensions, benchmarks, and metrics for Compound AI Systems. Dimension Task Datasets/Benchmarks Evaluation Metrics RAG Reasoning QA Natural Questions [76], TriviaQA [65], HotpotQA [201], WebQuestions [9] Accuracy, F1 Score, EM Passage Retrieval WikiAsp [54], MS MARCO [7], SQUAD [137], TruthfulQA [91] MRR, nDCG, Precision, Recall, F1 Score Multi-Doc Summa- rization OpenBookQA [122], PopQA [115] ROUGE, BLEU, F1 Score Open-domain QA Multi-News [41], NarrativeQA [72], MuSiQue [176], BEIR [172], RealTimeQA [67], UniEval [221] EM, MRR, nDCG, F1 Score, Accu- racy Extractive QA RAGAs [39], KILT [129] EM, F1 Score, Precision, Recall Fact Verification FEVER [173], DROP [37], TREC-DL [25] Accuracy, Precision, Recall, F1"},{"citing_arxiv_id":"2504.19793","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Prompt Injection Attack to Tool Selection in LLM Agents","primary_cat":"cs.CR","submitted_at":"2025-04-28T13:36:43+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"ToolHijacker optimizes malicious tool documents via a two-phase strategy to hijack LLM agents' tool selection in no-box settings.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2410.07283","ref_index":32,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems","primary_cat":"cs.MA","submitted_at":"2024-10-09T11:01:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"Prompt injection attacks can self-replicate across LLM agents in multi-agent systems, enabling data theft, misinformation, and system disruption while propagating silently.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2409.00557","ref_index":17,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Learning to Ask: When LLM Agents Meet Unclear Instruction","primary_cat":"cs.CL","submitted_at":"2024-08-31T23:06:12+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Introduces NoisyToolBench benchmark and Ask-when-Needed framework to improve LLM tool-use performance when user instructions are unclear or incomplete.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2408.08435","ref_index":196,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Automated Design of Agentic Systems","primary_cat":"cs.AI","submitted_at":"2024-08-15T21:59:23+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Meta Agent Search uses a meta-agent to iteratively program novel agentic systems in code, producing agents that outperform state-of-the-art hand-designed ones across coding, science, and math while transferring across domains and models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2408.03314","ref_index":27,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters","primary_cat":"cs.LG","submitted_at":"2024-08-06T17:35:05+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"An adaptive compute-optimal strategy for scaling LLM test-time compute achieves over 4x efficiency gains versus best-of-N and lets smaller models outperform 14x larger ones on some problems.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"OpenAI, 2024. [25] OpenAI. Gpt-4 technical report, 2024. [26] Y. Qin, S. Liang, Y. Ye, K. Zhu, L. Yan, Y. Lu, Y. Lin, X. Cong, X. Tang, B. Qian, S. Zhao, L. Hong, R. Tian, R. Xie, J. Zhou, M. Gerstein, D. Li, Z. Liu, and M. Sun. Toolllm: Facilitating large language models to master 16000+ real-world apis, 2023. URLhttps://arxiv.org/abs/2307.16789. [27] C. Qu, S. Dai, X. Wei, H. Cai, S. Wang, D. Yin, J. Xu, and J.-R. Wen. Tool learning with large language models: A survey, 2024. URLhttps://arxiv.org/abs/2405.17935. [28] Y. Qu, T. Zhang, N. Garg, and A. Kumar. Recursive introspection: Teaching foundation models how to self-improve. 2024. 18 [29] N. Sardana and J. Frankle. Beyond chinchilla-optimal: Accounting for inference in language model"}],"limit":50,"offset":0}