OLIVIA treats LLM agent action selection as a contextual linear bandit over frozen hidden states and applies UCB exploration to adapt online, yielding consistent gains over static ReAct and prompt-based baselines on four benchmarks.
hub
Memento-skills: Let agents design agents
13 Pith papers cite this work. Polarity classification is still indexing.
hub tools
years
2026 13representative citing papers
Evolving-RL jointly optimizes experience extraction and utilization in LLM agents via RL with separate evaluation signals, delivering up to 98.7% relative gains on out-of-distribution tasks in ALFWorld and Mind2Web.
SkillGuard extracts executable environment contracts from LLM skill documents to detect only relevant drifts, reporting zero false positives on 599 cases, 100% precision in known-drift tests, and raising one-round repair success from 10% to 78%.
AgentPSO evolves reusable multi-agent reasoning skills via PSO-inspired natural-language updates, outperforming static agents and test-time multi-agent baselines on math and general reasoning tasks with cross-benchmark transfer.
CMIB uses a conditional multimodal information bottleneck to create reusable agent skills that separate verbalizable text content from predictive perceptual residuals, improving execution stability.
SkillRet benchmark shows fine-tuned retrievers improve NDCG@10 by 13+ points over prior models on large-scale skill retrieval for LLM agents.
Agentic-imodels evolves scikit-learn regressors via an autoresearch loop to jointly boost predictive performance and LLM-simulatability, improving downstream agentic data science tasks by up to 73% on the BLADE benchmark.
SkillFlow benchmark shows lifelong skill evolution yields modest gains for some models like Claude Opus 4.6 but limited or negative utility for others despite high skill usage.
CASCADE enables LLMs to continually adapt at deployment via case-based episodic memory and contextual bandits, improving macro-averaged success by 20.9% over zero-shot on 16 tasks spanning medicine, law, code, and robotics.
Ace-Skill boosts multimodal agent self-evolution via prioritized rollouts with lazy-decay tracking and semantic knowledge clustering, yielding up to 35% relative gains on tool-use benchmarks and zero-shot transfer to smaller models.
Web2BigTable introduces a bi-level multi-agent system that achieves new state-of-the-art results on wide-coverage and deep web-to-table search benchmarks through orchestration, coordination, and closed-loop reflection.
Compact Gene representations of experience outperform documentation-oriented Skill packages for test-time control and iterative evolution in code-solving tasks, with measured gains on CritPt from 9.1% to 18.57% and 17.7% to 27.14%.
The paper surveys agent skills for LLM agents, organizing the literature into a four-stage lifecycle of representation, acquisition, retrieval, and evolution while highlighting their role in system scalability.
citing papers explorer
-
OLIVIA: Online Learning via Inference-time Action Adaptation for Decision Making in LLM ReAct Agents
OLIVIA treats LLM agent action selection as a contextual linear bandit over frozen hidden states and applies UCB exploration to adapt online, yielding consistent gains over static ReAct and prompt-based baselines on four benchmarks.
-
Evolving-RL: End-to-End Optimization of Experience-Driven Self-Evolving Capability within Agents
Evolving-RL jointly optimizes experience extraction and utilization in LLM agents via RL with separate evaluation signals, delivering up to 98.7% relative gains on out-of-distribution tasks in ALFWorld and Mind2Web.
-
Skill Drift Is Contract Violation: Proactive Maintenance for LLM Agent Skill Libraries
SkillGuard extracts executable environment contracts from LLM skill documents to detect only relevant drifts, reporting zero false positives on 599 cases, 100% precision in known-drift tests, and raising one-round repair success from 10% to 78%.
-
AgentPSO: Evolving Agent Reasoning Skill via Multi-agent Particle Swarm Optimization
AgentPSO evolves reusable multi-agent reasoning skills via PSO-inspired natural-language updates, outperforming static agents and test-time multi-agent baselines on math and general reasoning tasks with cross-benchmark transfer.
-
Skill-CMIB: Multimodal Agent Skill for Consistent Action via Conditional Multimodal Information Bottleneck
CMIB uses a conditional multimodal information bottleneck to create reusable agent skills that separate verbalizable text content from predictive perceptual residuals, improving execution stability.
-
SkillRet: A Large-Scale Benchmark for Skill Retrieval in LLM Agents
SkillRet benchmark shows fine-tuned retrievers improve NDCG@10 by 13+ points over prior models on large-scale skill retrieval for LLM agents.
-
Agentic-imodels: Evolving agentic interpretability tools via autoresearch
Agentic-imodels evolves scikit-learn regressors via an autoresearch loop to jointly boost predictive performance and LLM-simulatability, improving downstream agentic data science tasks by up to 73% on the BLADE benchmark.
-
SkillFlow:Benchmarking Lifelong Skill Discovery and Evolution for Autonomous Agents
SkillFlow benchmark shows lifelong skill evolution yields modest gains for some models like Claude Opus 4.6 but limited or negative utility for others despite high skill usage.
-
CASCADE: Case-Based Continual Adaptation for Large Language Models During Deployment
CASCADE enables LLMs to continually adapt at deployment via case-based episodic memory and contextual bandits, improving macro-averaged success by 20.9% over zero-shot on 16 tasks spanning medicine, law, code, and robotics.
-
Ace-Skill: Bootstrapping Multimodal Agents with Prioritized and Clustered Evolution
Ace-Skill boosts multimodal agent self-evolution via prioritized rollouts with lazy-decay tracking and semantic knowledge clustering, yielding up to 35% relative gains on tool-use benchmarks and zero-shot transfer to smaller models.
-
Web2BigTable: A Bi-Level Multi-Agent LLM System for Internet-Scale Information Search and Extraction
Web2BigTable introduces a bi-level multi-agent system that achieves new state-of-the-art results on wide-coverage and deep web-to-table search benchmarks through orchestration, coordination, and closed-loop reflection.
-
From Procedural Skills to Strategy Genes: Towards Experience-Driven Test-Time Evolution
Compact Gene representations of experience outperform documentation-oriented Skill packages for test-time control and iterative evolution in code-solving tasks, with measured gains on CritPt from 9.1% to 18.57% and 17.7% to 27.14%.
-
A Comprehensive Survey on Agent Skills: Taxonomy, Techniques, and Applications
The paper surveys agent skills for LLM agents, organizing the literature into a four-stage lifecycle of representation, acquisition, retrieval, and evolution while highlighting their role in system scalability.