hub

Inducing programmatic skills for agentic tasks

Zora Zhiruo Wang, Apurva Gandhi, Graham Neubig, Daniel Fried · 2025 · arXiv 2504.06821

20 Pith papers cite this work. Polarity classification is still indexing.

20 Pith papers citing it

read on arXiv browse 20 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 4

citation-polarity summary

background 4

representative citing papers

WebChallenger: A Reliable and Efficient Generalist Web Agent

cs.CL · 2026-06-09 · conditional · novelty 7.0

WebChallenger introduces PageMem and three architecture mechanisms to achieve competitive web navigation with open-weight LLMs on WebArena, VisualWebArena, Online-Mind2Web, and WorkArena without fine-tuning or site adapters.

Co-Evolving Skill Generation and Policy Optimization

cs.CL · 2026-06-07 · unverdicted · novelty 7.0

Framework estimates context-dependent marginal utility of candidate skills via reward gaps in matched base vs. skill-augmented rollouts to filter skills and co-train policy as generator.

Skill or Skip? Learning Selective Skill Invocation in Agentic Tasks via Dual-Granularity Preference Learning

cs.CL · 2026-05-30 · unverdicted · novelty 7.0

SelSkill applies dual-granularity preference learning to selective skill-or-skip decisions, improving task success by 10.9 points and execution precision by 29.1 points on ALFWorld with Qwen3-8B.

SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces

cs.CR · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

SkillSafetyBench is a benchmark of 155 cases across 47 tasks and 6 risk domains showing that non-user attacks via skills, artifacts, or environments can consistently induce unsafe agent behavior.

OLIVIA: Online Learning via Inference-time Action Adaptation for Decision Making in LLM ReAct Agents

cs.AI · 2026-05-11 · unverdicted · novelty 7.0

OLIVIA treats LLM agent action selection as a contextual linear bandit over frozen hidden states and applies UCB exploration to adapt online, yielding consistent gains over static ReAct and prompt-based baselines on four benchmarks.

Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning

cs.LG · 2026-04-08 · unverdicted · novelty 7.0

This survey introduces the Generate-Filter-Control-Replay (GFCR) taxonomy to structure rollout pipelines for RL-based post-training of reasoning LLMs.

MAS-Bench: A Unified Benchmark for Shortcut-Augmented Hybrid Mobile GUI Agents

cs.AI · 2025-09-08 · conditional · novelty 7.0

MAS-Bench introduces 139 tasks, 88 predefined shortcuts, and 9 metrics to evaluate hybrid GUI-shortcut mobile agents, reporting up to 68.3% success and 39% efficiency gains over GUI-only baselines.

VISUALSKILL: Multimodal Skills for Computer-Use Agents

cs.CL · 2026-06-16 · unverdicted · novelty 6.0

Multimodal skills retaining visual figures improve CUA benchmark scores by 8.3 points over text-only equivalents generated from the same source content.

Beyond Domains: Reusing Web Skills via Transferable Interaction Patterns

cs.AI · 2026-06-16 · unverdicted · novelty 6.0

SkillMigrator reduces LLM-action counts by 8-10% on WebArena and Mind2Web by transferring web skills via layout-matched transferable interaction patterns.

Harnessing Agent Skills: Architectural Patterns and a Reference Architecture for Skill-Mediated LLM Agents

cs.AI · 2026-05-29 · unverdicted · novelty 6.0

Catalogs ten patterns and synthesizes a four-layer reference architecture for skill harnessing in LLM agents, evaluated via cross-instantiation on eight systems.

Learnable Assessment Skills for LLM-based Automated Scoring: Rubric Construction via Iterative Optimization

cs.CL · 2026-05-28 · unverdicted · novelty 6.0

An iterative framework lets LLMs learn procedural assessment skills for rubric construction, improving automated scoring on all ten ASAP-SAS items and often exceeding expert rubrics while showing cross-item transfer.

Maestro: Reinforcement Learning to Orchestrate Hierarchical Model-Skill Ensembles

cs.LG · 2026-05-21 · unverdicted · novelty 6.0

Maestro uses outcome-based RL to train a lightweight policy that orchestrates ensembles of frozen expert models and skills, reporting 70.1% average accuracy across ten multimodal benchmarks and outperforming GPT-5 and Gemini-2.5-Pro while generalizing to unseen components.

SkillGraph: Self-Evolving Multi-Agent Collaboration with Multimodal Graph Topology

cs.AI · 2026-04-19 · unverdicted · novelty 6.0

SkillGraph jointly evolves agent skills and collaboration topologies in multi-agent vision-language systems using a multimodal graph transformer and a skill designer, yielding consistent performance gains on benchmarks.

Chain-of-Memory: Lightweight Memory Construction with Dynamic Evolution for LLM Agents

cs.LG · 2026-01-14 · unverdicted · novelty 6.0

CoM organizes memory fragments into evolving inference paths with adaptive truncation, delivering 7.5-10.4% accuracy gains on long-memory benchmarks at 2.7% token cost and 6% latency of complex alternatives.

SKILL-DISCO: Distilling and Compiling Agent Traces into Reusable Procedural Skills

cs.AI · 2026-06-25 · unverdicted · novelty 5.0

SkillDisCo distills reusable PFSM subgraphs from successful agent traces and compiles them into callable procedural skills, improving success rates and reducing turns on ALFWorld and WebArena.

Agentic Environment Engineering for Large Language Models: A Survey of Environment Modeling, Synthesis, Evaluation, and Application

cs.CL · 2026-06-10 · unverdicted · novelty 5.0

This survey categorizes agentic environments for LLMs by eight attributes and domains, introduces symbolic and neural synthesis paradigms with evaluation, and outlines four agent evolution pathways plus three environment evolution paradigms.

Unsupervised Skill Discovery for Agentic Data Analysis

cs.AI · 2026-06-04 · unverdicted · novelty 5.0 · 2 refs

DataCOPE uses verifier-guided contrastive distillation from agent trajectories to discover skills, yielding average gains of 9.71% on report-style and 32.30% on reasoning-style data analysis tasks across four model settings.

Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering

cs.SE · 2026-04-09 · accept · novelty 5.0

LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.

A Comprehensive Survey on Agent Skills: Taxonomy, Techniques, and Applications

cs.IR · 2026-05-08 · unverdicted · novelty 3.0 · 3 refs

A survey that defines agent skills as reusable procedural artifacts and reviews methods, resources, and applications across their representation, acquisition, retrieval, and evolution stages.

SkillTrojan: Backdoor Attacks on Skill-Based Agent Systems

cs.CR · 2026-04-08

citing papers explorer

Showing 3 of 3 citing papers after filters.

Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning cs.LG · 2026-04-08 · unverdicted · none · ref 130
This survey introduces the Generate-Filter-Control-Replay (GFCR) taxonomy to structure rollout pipelines for RL-based post-training of reasoning LLMs.
Maestro: Reinforcement Learning to Orchestrate Hierarchical Model-Skill Ensembles cs.LG · 2026-05-21 · unverdicted · none · ref 57
Maestro uses outcome-based RL to train a lightweight policy that orchestrates ensembles of frozen expert models and skills, reporting 70.1% average accuracy across ten multimodal benchmarks and outperforming GPT-5 and Gemini-2.5-Pro while generalizing to unseen components.
Chain-of-Memory: Lightweight Memory Construction with Dynamic Evolution for LLM Agents cs.LG · 2026-01-14 · unverdicted · none · ref 3
CoM organizes memory fragments into evolving inference paths with adaptive truncation, delivering 7.5-10.4% accuracy gains on long-memory benchmarks at 2.7% token cost and 6% latency of complex alternatives.

Inducing programmatic skills for agentic tasks

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer