hub

arXiv preprint arXiv:2603.02176 , year=

Organizing, Orchestrating, Benchmarking Agent Skills at Ecosystem Scale , author= · 2026 · arXiv 2603.02176

15 Pith papers cite this work. Polarity classification is still indexing.

15 Pith papers citing it

read on arXiv browse 15 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

representative citing papers

From Context to Skills: Can Language Models Learn from Context Skillfully?

cs.AI · 2026-04-30 · unverdicted · novelty 8.0

Ctx2Skill lets language models autonomously evolve context-specific skills via multi-agent self-play, improving performance on context learning tasks without human supervision.

HarmfulSkillBench: How Do Harmful Skills Weaponize Your Agents?

cs.CR · 2026-04-16 · unverdicted · novelty 8.0

Harmful skills in open agent ecosystems raise average harm scores from 0.27 to 0.76 across six LLMs by lowering refusal rates when tasks are presented via pre-installed skills.

OLIVIA: Online Learning via Inference-time Action Adaptation for Decision Making in LLM ReAct Agents

cs.AI · 2026-05-11 · unverdicted · novelty 7.0

OLIVIA treats LLM agent action selection as a contextual linear bandit over frozen hidden states and applies UCB exploration to adapt online, yielding consistent gains over static ReAct and prompt-based baselines on four benchmarks.

Skill-CMIB: Multimodal Agent Skill for Consistent Action via Conditional Multimodal Information Bottleneck

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

CMIB uses a conditional multimodal information bottleneck to create reusable agent skills that separate verbalizable text content from predictive perceptual residuals, improving execution stability.

SkillRet: A Large-Scale Benchmark for Skill Retrieval in LLM Agents

cs.AI · 2026-05-07 · unverdicted · novelty 7.0

SkillRet benchmark shows fine-tuned retrievers improve NDCG@10 by 13+ points over prior models on large-scale skill retrieval for LLM agents.

SkillEvolver: Skill Learning as a Meta-Skill

cs.AI · 2026-05-11 · unverdicted · novelty 6.0

A meta-skill authors and refines prose-and-code skills for agents by learning from post-deployment failures with an overfit audit, achieving 56.8% accuracy on SkillsBench tasks versus 43.6% for human-curated skills.

SkillRAE: Agent Skill-Based Context Compilation for Retrieval-Augmented Execution

cs.CL · 2026-05-11 · unverdicted · novelty 6.0

SkillRAE organizes skills into a graph and compiles compact, grounded contexts for LLM agents, yielding 11.7% gains on SkillsBench over prior RAE methods.

Skill-R1: Agent Skill Evolution via Reinforcement Learning

cs.LG · 2026-05-10 · unverdicted · novelty 6.0

Skill-R1 applies bi-level group-relative policy optimization to evolve skills recurrently from verified outcomes, yielding gains over baselines on multi-step tasks.

SearchSkill: Teaching LLMs to Use Search Tools with Evolving Skill Banks

cs.AI · 2026-05-09 · unverdicted · novelty 6.0

SearchSkill introduces an evolving SkillBank and two-stage SFT to make LLM search query planning explicit via skill selection, improving exact match on QA benchmarks and retrieval behavior.

Group of Skills: Group-Structured Skill Retrieval for Agent Skill Libraries

cs.CL · 2026-05-07 · unverdicted · novelty 6.0

GoSkills converts flat skill lists into role-labeled execution contexts via anchor-centered groups and graph expansion, preserving coverage and improving rewards on SkillsBench and ALFWorld under small skill budgets.

Toward Scalable Terminal Task Synthesis via Skill Graphs

cs.AI · 2026-04-28 · unverdicted · novelty 6.0

SkillSynth uses a scenario-mediated skill graph to sample workflow paths and generate executable terminal tasks, enabling controlled diversity in training trajectories for agents.

Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning

cs.AI · 2026-05-07 · unverdicted · novelty 5.0 · 3 refs

Skill1 trains a single RL policy to co-evolve skill selection, utilization, and distillation in language model agents from one task-outcome reward, using low-frequency trends to credit selection and high-frequency variation to credit distillation, outperforming baselines on ALFWorld and WebShop.

Bilevel Optimization of Agent Skills via Monte Carlo Tree Search

cs.AI · 2026-04-17 · unverdicted · novelty 5.0

Bilevel optimization with outer-loop MCTS for skill structure and inner-loop LLM refinement improves agent accuracy on an operations-research question-answering dataset.

Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering

cs.SE · 2026-04-09 · accept · novelty 5.0

LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.

A Comprehensive Survey on Agent Skills: Taxonomy, Techniques, and Applications

cs.IR · 2026-05-08 · unverdicted · novelty 4.0

The paper surveys agent skills for LLM agents, organizing the literature into a four-stage lifecycle of representation, acquisition, retrieval, and evolution while highlighting their role in system scalability.

citing papers explorer

Showing 15 of 15 citing papers.

From Context to Skills: Can Language Models Learn from Context Skillfully? cs.AI · 2026-04-30 · unverdicted · none · ref 19
Ctx2Skill lets language models autonomously evolve context-specific skills via multi-agent self-play, improving performance on context learning tasks without human supervision.
HarmfulSkillBench: How Do Harmful Skills Weaponize Your Agents? cs.CR · 2026-04-16 · unverdicted · none · ref 35
Harmful skills in open agent ecosystems raise average harm scores from 0.27 to 0.76 across six LLMs by lowering refusal rates when tasks are presented via pre-installed skills.
OLIVIA: Online Learning via Inference-time Action Adaptation for Decision Making in LLM ReAct Agents cs.AI · 2026-05-11 · unverdicted · none · ref 39
OLIVIA treats LLM agent action selection as a contextual linear bandit over frozen hidden states and applies UCB exploration to adapt online, yielding consistent gains over static ReAct and prompt-based baselines on four benchmarks.
Skill-CMIB: Multimodal Agent Skill for Consistent Action via Conditional Multimodal Information Bottleneck cs.LG · 2026-05-08 · unverdicted · none · ref 15
CMIB uses a conditional multimodal information bottleneck to create reusable agent skills that separate verbalizable text content from predictive perceptual residuals, improving execution stability.
SkillRet: A Large-Scale Benchmark for Skill Retrieval in LLM Agents cs.AI · 2026-05-07 · unverdicted · none · ref 17
SkillRet benchmark shows fine-tuned retrievers improve NDCG@10 by 13+ points over prior models on large-scale skill retrieval for LLM agents.
SkillEvolver: Skill Learning as a Meta-Skill cs.AI · 2026-05-11 · unverdicted · none · ref 4
A meta-skill authors and refines prose-and-code skills for agents by learning from post-deployment failures with an overfit audit, achieving 56.8% accuracy on SkillsBench tasks versus 43.6% for human-curated skills.
SkillRAE: Agent Skill-Based Context Compilation for Retrieval-Augmented Execution cs.CL · 2026-05-11 · unverdicted · none · ref 10
SkillRAE organizes skills into a graph and compiles compact, grounded contexts for LLM agents, yielding 11.7% gains on SkillsBench over prior RAE methods.
Skill-R1: Agent Skill Evolution via Reinforcement Learning cs.LG · 2026-05-10 · unverdicted · none · ref 3
Skill-R1 applies bi-level group-relative policy optimization to evolve skills recurrently from verified outcomes, yielding gains over baselines on multi-step tasks.
SearchSkill: Teaching LLMs to Use Search Tools with Evolving Skill Banks cs.AI · 2026-05-09 · unverdicted · none · ref 16
SearchSkill introduces an evolving SkillBank and two-stage SFT to make LLM search query planning explicit via skill selection, improving exact match on QA benchmarks and retrieval behavior.
Group of Skills: Group-Structured Skill Retrieval for Agent Skill Libraries cs.CL · 2026-05-07 · unverdicted · none · ref 8
GoSkills converts flat skill lists into role-labeled execution contexts via anchor-centered groups and graph expansion, preserving coverage and improving rewards on SkillsBench and ALFWorld under small skill budgets.
Toward Scalable Terminal Task Synthesis via Skill Graphs cs.AI · 2026-04-28 · unverdicted · none · ref 6
SkillSynth uses a scenario-mediated skill graph to sample workflow paths and generate executable terminal tasks, enabling controlled diversity in training trajectories for agents.
Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning cs.AI · 2026-05-07 · unverdicted · none · ref 64 · 3 links
Skill1 trains a single RL policy to co-evolve skill selection, utilization, and distillation in language model agents from one task-outcome reward, using low-frequency trends to credit selection and high-frequency variation to credit distillation, outperforming baselines on ALFWorld and WebShop.
Bilevel Optimization of Agent Skills via Monte Carlo Tree Search cs.AI · 2026-04-17 · unverdicted · none · ref 17
Bilevel optimization with outer-loop MCTS for skill structure and inner-loop LLM refinement improves agent accuracy on an operations-research question-answering dataset.
Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering cs.SE · 2026-04-09 · accept · none · ref 77
LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.
A Comprehensive Survey on Agent Skills: Taxonomy, Techniques, and Applications cs.IR · 2026-05-08 · unverdicted · none · ref 61
The paper surveys agent skills for LLM agents, organizing the literature into a four-stage lifecycle of representation, acquisition, retrieval, and evolution while highlighting their role in system scalability.

arXiv preprint arXiv:2603.02176 , year=

hub tools

fields

years

verdicts

representative citing papers

citing papers explorer