{"total":10,"items":[{"citing_arxiv_id":"2605.22177","ref_index":57,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Maestro: Reinforcement Learning to Orchestrate Hierarchical Model-Skill Ensembles","primary_cat":"cs.LG","submitted_at":"2026-05-21T08:47:49+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Maestro uses outcome-based RL to train a lightweight policy that orchestrates ensembles of frozen expert models and skills, reporting 70.1% average accuracy across ten multimodal benchmarks and outperforming GPT-5 and Gemini-2.5-Pro while generalizing to unseen components.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.12015","ref_index":84,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces","primary_cat":"cs.CR","submitted_at":"2026-05-12T12:03:54+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.11169","ref_index":50,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"OLIVIA: Online Learning via Inference-time Action Adaptation for Decision Making in LLM ReAct Agents","primary_cat":"cs.AI","submitted_at":"2026-05-11T19:28:20+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"OLIVIA treats LLM agent action selection as a contextual linear bandit over frozen hidden states and applies UCB exploration to adapt online, yielding consistent gains over static ReAct and prompt-based baselines on four benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.07358","ref_index":104,"ref_count":4,"confidence":0.9,"is_internal_anchor":false,"paper_title":"A Comprehensive Survey on Agent Skills: Taxonomy, Techniques, and Applications","primary_cat":"cs.IR","submitted_at":"2026-05-08T07:10:26+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":2,"top_context_role":"background","top_context_polarity":"background","context_text":"Feedback-DrivenSkillRL [87], CUA-Skill [47], ToolExpNet [89], ExpeL [95], SMART [96] Skill Evolution (§4) Skill Revision EvoSkill [97], Memento-Skills [85], AutoSkill [81], XSkill [98] Skill ValidationSkillWeaver [84], ASI [99], TroVE [100], PSN [54], Audited Skill-Graph [101] Policy CouplingSkillRL [87], ARISE [102] Repository EvolutionUni-Skill [103], SkillX [104], SkillNet [64], SkillClaw [105] Runtime GovernanceSkillRouter [106], PoisonedSkills [107] Fig. 3: The taxonomy for agent skills in this survey. Skills differ from raw tools and MCP servers in that they encodesituatedprocedural knowledge (triggers, sequencing, fallbacks, pitfalls) and appear asbounded, reusable artifacts that can be loaded, inspected, shared, and revised without"},{"citing_arxiv_id":"2604.17503","ref_index":38,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SkillGraph: Self-Evolving Multi-Agent Collaboration with Multimodal Graph Topology","primary_cat":"cs.AI","submitted_at":"2026-04-19T15:46:46+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SkillGraph jointly evolves agent skills and collaboration topologies in multi-agent vision-language systems using a multimodal graph transformer and a skill designer, yielding consistent performance gains on benchmarks.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"through explicit skill libraries rather than relying solely on gradient updates. Early approaches store reusable experience as natural-language memory sum- maries[39,53],executablecodeskills[36],andabstractedworkflowtemplates[52]. Building on this line, recent skill-centric agent frameworks have shown that reusable skills can improve performance across web navigation [7,38,56], com- puter control [22], and long-horizon planning [25,49]. Building upon prior passive experience reuse, recent studies have driven the dynamic injection and deep co- evolution of agentic skills through reinforcement learning and closed-loop analy- sis [1,37,41,43,51]. Concurrently, frequent skill iteration demands auditable veri- fication to ensure lifecycle safety [17,18]."},{"citing_arxiv_id":"2604.08224","ref_index":153,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering","primary_cat":"cs.SE","submitted_at":"2026-04-09T13:19:41+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.06811","ref_index":14,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SkillTrojan: Backdoor Attacks on Skill-Based Agent Systems","primary_cat":"cs.CR","submitted_at":"2026-04-08T08:24:48+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.02913","ref_index":130,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning","primary_cat":"cs.LG","submitted_at":"2026-04-08T00:53:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"This survey introduces the Generate-Filter-Control-Replay (GFCR) taxonomy to structure rollout pipelines for RL-based post-training of reasoning LLMs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2601.14287","ref_index":3,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Chain-of-Memory: Lightweight Memory Construction with Dynamic Evolution for LLM Agents","primary_cat":"cs.LG","submitted_at":"2026-01-14T04:42:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"CoM organizes memory fragments into evolving inference paths with adaptive truncation, delivering 7.5-10.4% accuracy gains on long-memory benchmarks at 2.7% token cost and 6% latency of complex alternatives.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2509.06477","ref_index":34,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"MAS-Bench: A Unified Benchmark for Shortcut-Augmented Hybrid Mobile GUI Agents","primary_cat":"cs.AI","submitted_at":"2025-09-08T09:43:48+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"MAS-Bench introduces 139 tasks, 88 predefined shortcuts, and 9 metrics to evaluate hybrid GUI-shortcut mobile agents, reporting up to 68.3% success and 39% efficiency gains over GUI-only baselines.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}