OLIVIA treats LLM agent action selection as a contextual linear bandit over frozen hidden states and applies UCB exploration to adapt online, yielding consistent gains over static ReAct and prompt-based baselines on four benchmarks.
hub Canonical reference
Xskill: Continual learning from experience and skills in multimodal agents
Canonical reference. 100% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
years
2026 12roles
background 5polarities
background 5representative citing papers
CMIB uses a conditional multimodal information bottleneck to create reusable agent skills that separate verbalizable text content from predictive perceptual residuals, improving execution stability.
SkillRet benchmark shows fine-tuned retrievers improve NDCG@10 by 13+ points over prior models on large-scale skill retrieval for LLM agents.
GeoBrowse is a two-level geolocation benchmark combining visual cue composition with knowledge-intensive multi-hop queries, paired with the GATE agent workflow that outperforms no-tool, search-only, and image-only baselines.
EmbodiSkill uses skill-aware reflection on execution trajectories to update skills in embodied agents, achieving 93.28% success on ALFWorld with a frozen Qwen3.5-27B model, outperforming direct GPT-5.2 use by 31.58%.
ORACLE is a new agentic framework using adaptive context consolidation and teacher-student distillation to detect emerging scam patterns from incomplete, long-horizon app usage streams across 12 scam types.
SCOPE maintains semantic commitments via structured specifications and conditional skill orchestration, achieving 0.60 EGIP on the new Gen-Arena benchmark while outperforming baselines on WISE-V and MindBench.
Agent-World autonomously synthesizes verifiable real-world tasks and uses continuous self-evolution to train 8B and 14B agents that outperform proprietary models on 23 benchmarks.
AtlasVA organizes VLM agent memory into spatial heatmaps, visual exemplars, and symbolic skills, evolving atlases from trajectories to act as potential-based shaping rewards in teacher-free reinforcement learning.
Skill1 trains a single RL policy to co-evolve skill selection, utilization, and distillation in language model agents from one task-outcome reward, using low-frequency trends to credit selection and high-frequency variation to credit distillation, outperforming baselines on ALFWorld and WebShop.
citing papers explorer
-
OLIVIA: Online Learning via Inference-time Action Adaptation for Decision Making in LLM ReAct Agents
OLIVIA treats LLM agent action selection as a contextual linear bandit over frozen hidden states and applies UCB exploration to adapt online, yielding consistent gains over static ReAct and prompt-based baselines on four benchmarks.
-
Skill-CMIB: Multimodal Agent Skill for Consistent Action via Conditional Multimodal Information Bottleneck
CMIB uses a conditional multimodal information bottleneck to create reusable agent skills that separate verbalizable text content from predictive perceptual residuals, improving execution stability.
-
SkillRet: A Large-Scale Benchmark for Skill Retrieval in LLM Agents
SkillRet benchmark shows fine-tuned retrievers improve NDCG@10 by 13+ points over prior models on large-scale skill retrieval for LLM agents.
-
GeoBrowse: A Geolocation Benchmark for Agentic Tool Use with Expert-Annotated Reasoning Traces
GeoBrowse is a two-level geolocation benchmark combining visual cue composition with knowledge-intensive multi-hop queries, paired with the GATE agent workflow that outperforms no-tool, search-only, and image-only baselines.
-
EmbodiSkill: Skill-Aware Reflection for Self-Evolving Embodied Agents
EmbodiSkill uses skill-aware reflection on execution trajectories to update skills in embodied agents, achieving 93.28% success on ALFWorld with a frozen Qwen3.5-27B model, outperforming direct GPT-5.2 use by 31.58%.
-
ORACLE: Anticipating Scams from Partial Trajectories in Streaming App Usage
ORACLE is a new agentic framework using adaptive context consolidation and teacher-student distillation to detect emerging scam patterns from incomplete, long-horizon app usage streams across 12 scam types.
-
SCOPE: Structured Decomposition and Conditional Skill Orchestration for Complex Image Generation
SCOPE maintains semantic commitments via structured specifications and conditional skill orchestration, achieving 0.60 EGIP on the new Gen-Arena benchmark while outperforming baselines on WISE-V and MindBench.
-
Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence
Agent-World autonomously synthesizes verifiable real-world tasks and uses continuous self-evolution to train 8B and 14B agents that outperform proprietary models on 23 benchmarks.
-
AtlasVA: Self-Evolving Visual Skill Memory for Teacher-Free VLM Agents
AtlasVA organizes VLM agent memory into spatial heatmaps, visual exemplars, and symbolic skills, evolving atlases from trajectories to act as potential-based shaping rewards in teacher-free reinforcement learning.
-
Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning
Skill1 trains a single RL policy to co-evolve skill selection, utilization, and distillation in language model agents from one task-outcome reward, using low-frequency trends to credit selection and high-frequency variation to credit distillation, outperforming baselines on ALFWorld and WebShop.
- MMSkills: Towards Multimodal Skills for General Visual Agents
- A Comprehensive Survey on Agent Skills: Taxonomy, Techniques, and Applications