pith. machine review for the scientific record. sign in

arxiv: 2604.24026 · v4 · submitted 2026-04-27 · 💻 cs.CL · cs.AI

Recognition: unknown

From Skill Text to Skill Structure: The Scheduling-Structural-Logical Representation for Agent Skills

Hansi Wang, Qiliang Liang, Yang Liu, Zhong Liang

Authors on Pith no claims yet

Pith reviewed 2026-05-08 04:01 UTC · model grok-4.3

classification 💻 cs.CL cs.AI
keywords agent skillsstructured representationLLM agentsskill discoveryrisk assessmentscheduling-structural-logicalknowledge disentanglementskill reuse
0
0 comments X

The pith

An explicit Scheduling-Structural-Logical representation disentangles skill text into scheduling, structure, and logic for better machine use.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes the Scheduling-Structural-Logical (SSL) representation to turn dense text skill documents into explicit, machine-readable structures for LLM agents. It draws on classical cognitive models to separate skill-level scheduling signals, scene-level execution structure, and logic-level action and resource evidence. An LLM normalizer creates these structures from existing text, and tests on skill discovery and risk assessment show higher performance than plain text baselines. The gains indicate that explicit separation helps agents search collections and inspect side effects without losing details buried in natural language. A sympathetic reader would care because reusable skills in agent systems currently require heavy reasoning over entangled text, limiting reliability and scalability.

Core claim

The Scheduling-Structural-Logical (SSL) representation disentangles skill-level scheduling signals, scene-level execution structure, and logic-level action/resource-use evidence from text-heavy skill artifacts. Instantiated with an LLM-based normalizer, SSL yields MRR@50 of 0.729 on Skill Discovery (up from 0.649) and macro F1 of 0.509 on Risk Assessment (up from 0.409), showing that source-grounded structure improves search and review over text-only methods.

What carries the argument

The Scheduling-Structural-Logical (SSL) representation, which separates scheduling signals, execution structure, and action logic from skill text using principles from Memory Organization Packets, Script Theory, and Conceptual Dependency to make capabilities easier for machines to acquire and leverage.

If this is right

  • Agent skill collections become searchable by explicit interfaces and constraints rather than full-text similarity.
  • Risk assessment gains precision because reviewers can examine scheduling, structure, and logic separately.
  • Skill reuse in execution becomes more reliable since control flow and side effects are no longer hidden in prose.
  • Skill management systems can support inspection and auditing without requiring full natural-language reasoning at every step.
  • The approach positions skills as operationally actionable packages rather than static documents.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • SSL structures could support automated composition of skills into larger workflows by matching scheduling and logic components.
  • The separation may help detect conflicts or redundancies across skill libraries that text similarity alone misses.
  • Classical cognitive representations might be combined with modern LLM agents to create hybrid systems that retain inspectability.
  • Extending the normalizer to live execution traces could test whether SSL remains stable when skills are applied in real environments.

Load-bearing premise

The LLM-based normalizer accurately and consistently extracts scheduling, structural, and logical components from text skill descriptions without introducing hallucinations, biases, or loss of critical information.

What would settle it

Apply the normalizer to a benchmark set of skills that have manually annotated ground-truth SSL structures, then measure extraction fidelity and check whether the reported performance lifts disappear when using noisy or incomplete extractions.

Figures

Figures reproduced from arXiv: 2604.24026 by Hansi Wang, Qiliang Liang, Yang Liu, Zhong Liang.

Figure 1
Figure 1. Figure 1: Overview of the SSL representation. A text-heavy skill artifact is converted by a normalizer view at source ↗
read the original abstract

Large language model (LLM) agents increasingly rely on reusable skills: capability packages that combine instructions, control flow, constraints, and tool calls. In current agent systems, however, skills are still represented by text-heavy artifacts, mainly SKILL{.}md-style documents whose machine-usable evidence remains embedded largely in natural-language descriptions. As a result, skill-centered agent systems face a representation problem: both managing skill collections and using skills during agent execution require reasoning over invocation interfaces, execution structure, and concrete side effects, but these signals are often entangled in a single textual surface. An explicit representation of skill knowledge may therefore help make these artifacts easier for machines to acquire and leverage. Drawing on Memory Organization Packets, Script Theory, and Conceptual Dependency from Schank and Abelson's classical work on cognitive linguistic representation, we introduce what is, to our knowledge, the first structured representation for agent skill artifacts that disentangles skill-level scheduling signals, scene-level execution structure, and logic-level action/resource-use evidence: the Scheduling-Structural-Logical (SSL) representation. We instantiate SSL with an LLM-based normalizer and evaluate SSL-derived representations in two tasks, Skill Discovery and Risk Assessment. The experiment shows that SSL significantly outperforms the text-only baselines: in Skill Discovery, MRR@50 improves from 0.649 to 0.729; in Risk Assessment, macro F1 improves from 0.409 to 0.509. These findings suggest that an explicit, source-grounded structure can make agent skills easier to search and review, positioning SSL as a practical step toward more inspectable, reusable, and operationally actionable skill representations, rather than a finished standard or end-to-end skill-management mechanism.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces the Scheduling-Structural-Logical (SSL) representation for agent skills, drawing on Schank and Abelson's Memory Organization Packets, Script Theory, and Conceptual Dependency. Skills are converted from text (e.g., SKILL.md) to this disentangled form via an LLM-based normalizer. The representation is evaluated on Skill Discovery (MRR@50 improves from 0.649 to 0.729) and Risk Assessment (macro F1 improves from 0.409 to 0.509), outperforming text-only baselines. The authors position SSL as a practical step toward more inspectable and reusable skill artifacts for LLM agents.

Significance. If the results hold after proper validation, SSL offers a principled, source-grounded structure that could improve skill search, review, and operational use in agent systems. The explicit grounding in classical cognitive representations is a notable strength, providing external theoretical grounding rather than ad-hoc invention. The quantitative gains on two distinct tasks suggest potential for broader applicability in skill management. However, the significance is limited by the absence of controls that would confirm the gains arise from the SSL disentanglement itself.

major comments (2)
  1. [Experimental Evaluation] Experimental Evaluation: The headline improvements (MRR@50 0.649→0.729; macro F1 0.409→0.509) are reported without details on experimental setup, baseline implementations, skill data sources, statistical significance testing, or controls for LLM variability. This is load-bearing for the central claim that SSL outperforms text-only baselines, as unstated factors could explain the modest deltas.
  2. [SSL Instantiation] SSL Instantiation via LLM Normalizer: No quantitative validation (accuracy, hallucination rate, information loss) of the LLM normalizer is provided. Because SSL representations are generated by this normalizer, any systematic augmentation, rephrasing of constraints, or addition of scheduling cues would make the downstream comparisons non-equivalent; the performance delta cannot be attributed to the Scheduling-Structural-Logical disentanglement.
minor comments (2)
  1. [Introduction] The abstract and introduction could more explicitly define the three SSL components (scheduling signals, scene-level structure, logic-level evidence) with a small illustrative example to aid reader comprehension.
  2. [Evaluation] Notation for the SSL tuple or components is introduced but not consistently referenced in the evaluation sections; a table mapping original text elements to SSL fields would improve traceability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and describe the revisions we will incorporate to improve the manuscript.

read point-by-point responses
  1. Referee: [Experimental Evaluation] The headline improvements (MRR@50 0.649→0.729; macro F1 0.409→0.509) are reported without details on experimental setup, baseline implementations, skill data sources, statistical significance testing, or controls for LLM variability. This is load-bearing for the central claim that SSL outperforms text-only baselines, as unstated factors could explain the modest deltas.

    Authors: We agree that additional experimental details are necessary to substantiate the reported gains. In the revised manuscript we will expand the Experimental Evaluation section with: a complete description of the setup including LLM prompting strategies and hyperparameters; explicit implementation details and code references for all baselines; characteristics and provenance of the skill datasets; results of statistical significance testing (e.g., bootstrap confidence intervals or paired tests); and controls for LLM variability such as repeated runs across seeds and temperatures with reported variance. These additions will allow readers to evaluate the robustness of the MRR@50 and macro-F1 improvements. revision: yes

  2. Referee: [SSL Instantiation] No quantitative validation (accuracy, hallucination rate, information loss) of the LLM normalizer is provided. Because SSL representations are generated by this normalizer, any systematic augmentation, rephrasing of constraints, or addition of scheduling cues would make the downstream comparisons non-equivalent; the performance delta cannot be attributed to the Scheduling-Structural-Logical disentanglement.

    Authors: We acknowledge the importance of validating the normalizer to isolate the contribution of the SSL structure. The current manuscript does not contain such quantitative checks. In revision we will add an evaluation subsection that measures normalizer fidelity on a human-annotated subset of skills, reporting accuracy, hallucination rate, and information-preservation metrics (semantic similarity and manual review of scheduling, structural, and logical elements). We will also clarify that text-only baselines operate directly on the original skill documents without LLM processing, ensuring the comparison isolates the effect of the disentangled representation rather than any incidental augmentation by the normalizer. revision: yes

Circularity Check

0 steps flagged

SSL representation grounded in external classical work; no derivation reduces to self-inputs

full rationale

The paper defines the Scheduling-Structural-Logical (SSL) representation by explicit reference to Schank and Abelson's Memory Organization Packets, Script Theory, and Conceptual Dependency, which are independent external sources. No equations, fitted parameters, or predictions are present; the LLM normalizer is described as an instantiation method rather than a learned component whose outputs are then re-used as evidence. Downstream evaluations (Skill Discovery MRR@50 and Risk Assessment F1) compare against text baselines without any self-citation chain or renaming of known results as novel derivations. The central claim therefore remains non-circular and self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the applicability of Schank and Abelson's classical cognitive theories to modern LLM agent skills and the assumption that an LLM normalizer can faithfully produce the SSL structure from text.

axioms (1)
  • domain assumption Memory Organization Packets, Script Theory, and Conceptual Dependency from Schank and Abelson provide a suitable foundation for representing agent skills
    Invoked to motivate the three-layer SSL structure in the abstract.
invented entities (1)
  • Scheduling-Structural-Logical (SSL) representation no independent evidence
    purpose: Disentangle scheduling signals, scene-level execution structure, and logic-level action/resource evidence in agent skills
    Newly introduced structured format for skill artifacts.

pith-pipeline@v0.9.0 · 5618 in / 1396 out tokens · 65464 ms · 2026-05-08T04:01:42.468242+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. ObjectGraph: From Document Injection to Knowledge Traversal -- A Native File Format for the Agentic Era

    cs.AI 2026-04 unverdicted novelty 6.0

    ObjectGraph is a Markdown superset file format that represents documents as traversable knowledge graphs, achieving up to 95.3% token reduction for agents with no significant accuracy loss.

Reference graph

Works this paper leans on

37 extracted references · 35 canonical work pages · cited by 1 Pith paper · 19 internal anchors

  1. [1]

    AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents

    doi: 10.48550/arXiv.2410.09024. URLhttps://arxiv.org/abs/2410.09024. Anthropic. Introducing Claude Sonnet 4.5,

  2. [2]

    SkillAttack: Automated Red Teaming of Agent Skills through Attack Path Refinement

    doi: 10.52202/ 079017-2636. URL https://proceedings.neurips.cc/paper_files/paper/2024/hash/ 97091a5177d8dc64b1da8bf3e1f6fb54-Abstract-Datasets_and_Benchmarks_Track.html. Zenghao Duan, Yuxin Tian, Zhiyi Yin, Liang Pang, Jingcheng Deng, Zihao Wei, Shicheng Xu, Yuyao Ge, and Xueqi Cheng. SkillAttack: Automated red teaming of agent skills through attack path ...

  3. [3]

    SkillAttack: Automated Red Teaming of Agent Skills through Attack Path Refinement

    doi: 10.48550/arXiv.2604.04989. URL https://arxiv.org/abs/2604.04989. Charles J. Fillmore. Frame semantics. InLinguistics in the Morning Calm, pages 111–137. Hanshin Publishing Company, Seoul,

  4. [4]

    URLhttps://doi.org/10.1016/j.csi.2022.103657

    doi: 10.1016/j.csi.2022.103657. URLhttps://doi.org/10.1016/j.csi.2022.103657. Google. Gemini 3 developer guide,

  5. [5]

    Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection

    URL https://ai.google.dev/gemini-api/docs/ gemini-3. Official documentation forgemini-3.1-pro-preview. Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you’ve signed up for: Compromising real-world LLM-integrated applications with indirect prompt injection.arXiv preprint arXiv:2302.12173,

  6. [6]

    doi: 10.48550/arXiv.2302. 12173. URLhttps://arxiv.org/abs/2302.12173. Yinghan Hou and Zongyou Yang. SkillSieve: A hierarchical triage framework for detecting malicious AI agent skills.arXiv preprint arXiv:2604.06550,

  7. [7]

    SkillSieve: A Hierarchical Triage Framework for Detecting Malicious AI Agent Skills

    doi: 10.48550/arXiv.2604.06550. URL https://arxiv.org/abs/2604.06550. Zimo Ji, Daoyuan Wu, Wenyuan Jiang, Pingchuan Ma, Zongjie Li, Yudong Gao, Shuai Wang, and Yingjiu Li. Taming various privilege escalation in LLM-based agent systems: A mandatory access control framework.arXiv preprint arXiv:2601.11893,

  8. [8]

    Taming Various Privilege Escalation in LLM-Based Agent Systems,

    doi: 10.48550/arXiv.2601.11893. URLhttps://arxiv.org/abs/2601.11893. Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781...

  9. [9]

    Prompt flow integrity to prevent privilege escalation in LLM agents.arXiv preprint arXiv:2503.15547, 2025

    18653/v1/2020.emnlp-main.550. URLhttps://aclanthology.org/2020.emnlp-main.550/. 10 Juhee Kim, Woohyuk Choi, and Byoungyoung Lee. Prompt flow integrity to prevent privilege escalation in LLM agents.arXiv preprint arXiv:2503.15547,

  10. [10]

    doi: 10.48550/arXiv.2503. 15547. URLhttps://arxiv.org/abs/2503.15547. Koren Lazar, Matan Vetzler, Kiran Kate, Jason Tsay, David Boaz, Himanshu Gupta, Avraham Shinnar, Rohith D. Vallam, David Amid, Esther Goldbraich, Guy Uziel, Jim Laredo, and Ateret Anaby Tavor. Generating OpenAPI specifications from online API documentation with large language models. In...

  11. [11]

    Towards Secure Agent Skills: Architecture, Threat Taxonomy, and Security Analysis

    URL https://aclanthology.org/2025. acl-industry.18/. Zhiyuan Li, Jingzheng Wu, Xiang Ling, Xing Cui, and Tianyue Luo. Towards secure agent skills: Architecture, threat taxonomy, and security analysis.arXiv preprint arXiv:2604.02837,

  12. [12]

    Towards Secure Agent Skills: Architecture, Threat Taxonomy, and Security Analysis

    doi: 10.48550/arXiv.2604.02837. URLhttps://arxiv.org/abs/2604.02837. Yuan Liang, Ruobin Zhong, Haoming Xu, Chen Jiang, Yi Zhong, Runnan Fang, Jia-Chen Gu, Shumin Deng, Yunzhi Yao, Mengru Wang, Shuofei Qiao, Xin Xu, Tongtong Wu, Kun Wang, Yang Liu, Zhen Bi, Jungang Lou, Yuchen Eleanor Jiang, Hangcheng Zhu, Gang Yu, Haiwen Hong, Longtao Huang, Hui Xue, Chen...

  13. [13]

    SkillNet: Create, evaluate, and connect AI skills,

    doi: 10.48550/arXiv.2603.04448. URL https://arxiv.org/abs/2603.04448. Jianghao Lin, Xinyuan Wang, Xinyi Dai, Menghui Zhu, Bo Chen, Ruiming Tang, Yong Yu, and Weinan Zhang. MassTool: A multi-task search-based tool retrieval framework for large language models.arXiv preprint arXiv:2507.00487,

  14. [14]

    URL https: //arxiv.org/abs/2507.00487

    doi: 10.48550/arXiv.2507.00487. URL https: //arxiv.org/abs/2507.00487. Aixin Liu, Aoxue Mei, Bangcai Lin, Bing Xue, Bingxuan Wang, Bingzheng Xu, Bochao Wu, Bowei Zhang, Chaofan Lin, Chen Dong, et al. DeepSeek-V3.2: Pushing the frontier of open large language models.arXiv preprint arXiv:2512.02556,

  15. [15]

    DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

    doi: 10.48550/arXiv.2512.02556. URL https://arxiv.org/abs/2512.02556. Dawei Liu, Zongxia Li, Hongyang Du, Xiyang Wu, Shihang Gui, Yongbei Kuang, and Lichao Sun. Graph of skills: Dependency-aware structural retrieval for massive agent skills.arXiv preprint arXiv:2604.05333, 2026a. doi: 10.48550/arXiv.2604.05333. URL https://arxiv.org/abs/ 2604.05333. Yi Li...

  16. [16]

    Large Language Model Agent: A Survey on Methodology, Applications and Challenges

    doi: 10.48550/arXiv.2503.21460. URLhttps://arxiv.org/abs/2503.21460. Marvin Minsky. A framework for representing knowledge. In Patrick H. Winston, editor,The Psychology of Computer Vision, pages 211–277. McGraw-Hill, New York,

  17. [17]

    Gorilla: Large Language Model Connected with Massive APIs

    URL https://openai.com/index/gpt-5-system-card/ . Official system card. Shishir G Patil, Tianjun Zhang, Xin Wang, and Joseph E Gonzalez. Gorilla: Large language model connected with massive APIs.arXiv preprint arXiv:2305.15334,

  18. [18]

    doi: 10.48550/arXiv.2305. 15334. URLhttps://arxiv.org/abs/2305.15334. Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, Sihan Zhao, Lauren Hong, Runchu Tian, Ruobing Xie, Jie Zhou, Mark Gerstein, Dahai Li, Zhiyuan Liu, and Maosong Sun. ToolLLM: Facilitating large language models to master 1600...

  19. [19]

    ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

    doi: 10.48550/arXiv.2307.16789. URLhttps://arxiv.org/abs/2307.16789. Nils Reimers and Iryna Gurevych. Sentence-BERT: Sentence embeddings using Siamese BERT- networks. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP- IJCNLP), pages 3982...

  20. [20]

    Sentence- BERT : Sentence Embeddings using S iamese BERT -Networks

    Association for Computational Linguistics. doi: 10.18653/v1/D19-1410. URLhttps://aclanthology.org/D19-1410/. Yangjun Ruan, Honghua Dong, Andrew Wang, Silviu Pitis, Yongchao Zhou, Jimmy Ba, Yann Dubois, Chris J. Maddison, and Tatsunori Hashimoto. Identifying the risks of LM agents with an LM-emulated sandbox. InInternational Conference on Learning Representations,

  21. [21]

    URL https://arxiv.org/abs/2309.15817. Roger C. Schank. Conceptual dependency: A theory of natural language understanding.Cognitive Psychology, 3(4):552–631,

  22. [22]

    doi: 10.1016/0010-0285(72)90022-9. Roger C. Schank. Language and memory.Cognitive Science, 4(3):243–284,

  23. [23]

    Toolformer: Language Models Can Teach Themselves to Use Tools

    Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools.arXiv preprint arXiv:2302.04761,

  24. [24]

    Toolformer: Language Models Can Teach Themselves to Use Tools

    doi: 10.48550/arXiv.2302.04761. URL https://arxiv.org/abs/2302.04761. Zhengliang Shi, Yuhan Wang, Lingyong Yan, Pengjie Ren, Shuaiqiang Wang, Dawei Yin, and Zhaochun Ren. Retrieval models aren’t tool-savvy: Benchmarking tool retrieval for large language models. InFindings of the Association for Computational Linguistics: ACL 2025, pages 24497– 24524, Vien...

  25. [25]

    Are multimodal LLMs robust against adversarial perturbations? RoMMath: A systematic evaluation on multimodal math reasoning

    Association for Computational Linguistics. doi: 10.18653/v1/2025. findings-acl.1258. URLhttps://aclanthology.org/2025.findings-acl.1258/. Nandan Thakur, Nils Reimers, Andreas Rucklé, Abhishek Srivastava, and Iryna Gurevych. BEIR: A heterogeneous benchmark for zero-shot evaluation of information retrieval models. InThirty-fifth Conference on Neural Informa...

  26. [26]

    SkillX: Automatically Constructing Skill Knowledge Bases for Agents

    URLhttps://openreview.net/forum?id=wCu6T5xFjeJ. Chenxi Wang, Zhuoyun Yu, Xin Xie, Wuguannan Yao, Runnan Fang, Shuofei Qiao, Kexin Cao, Guozhou Zheng, Xiang Qi, Peng Zhang, and Shumin Deng. SkillX: Automatically constructing skill knowledge bases for agents.arXiv preprint arXiv:2604.04804,

  27. [27]

    SkillX: Automatically Constructing Skill Knowledge Bases for Agents

    doi: 10.48550/arXiv. 2604.04804. URLhttps://arxiv.org/abs/2604.04804. Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. V oyager: An open-ended embodied agent with large language models.arXiv preprint arXiv:2305.16291,

  28. [28]

    Voyager: An Open-Ended Embodied Agent with Large Language Models

    doi: 10.48550/arXiv.2305.16291. URLhttps://arxiv.org/ abs/2305.16291. Zhiheng Xi, Wenxiang Chen, Xin Guo, Wei He, Yiwen Ding, Boyang Hong, Ming Zhang, Jun- zhe Wang, Senjie Jin, Enyu Zhou, Rui Zheng, Xiaoran Fan, Xiao Wang, Limao Xiong, Yuhao Zhou, Weiran Wang, Changhao Jiang, Yicheng Zou, Xiangyang Liu, Zhangyue Yin, Shihan Dou, Rongxiang Weng, Wensen Ch...

  29. [29]

    The Rise and Potential of Large Language Model Based Agents: A Survey

    doi: 10.48550/arXiv.2309.07864. URL https://arxiv.org/abs/2309.07864. Haoyuan Xu, Chang Li, Xinyan Ma, Xianhao Ou, Zihan Zhang, Tao He, Xiangyu Liu, Zixiang Wang, Jiafeng Liang, Zheng Chu, Runxuan Liu, Rongchuan Mu, Dandan Tu, Ming Liu, and Bing Qin. The evolution of tool use in llm agents: From single-tool call to multi-tool orchestration.arXiv preprint ...

  30. [30]

    Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward

    Renjun Xu and Yang Yan. Agent skills for large language models: Architecture, acquisition, security, and the path forward.arXiv preprint arXiv:2602.12430,

  31. [31]

    Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward

    doi: 10.48550/arXiv.2602.12430. URLhttps://arxiv.org/abs/2602.12430. Lifan Yuan, Yangyi Chen, Xingyao Wang, Yi R. Fung, Hao Peng, and Heng Ji. CRAFT: Customizing LLMs by creating and retrieving from specialized toolsets. InInternational Conference on Learning Representations,

  32. [32]

    Craft: Customizing llms by creating and retrieving from specialized toolsets.arXiv preprint arXiv:2309.17428, 2023

    doi: 10.48550/arXiv.2309.17428. URL https://arxiv.org/abs/2309. 17428. Siyu Yuan, Kaitao Song, Jiangjie Chen, Xu Tan, Yongliang Shen, Kan Ren, Dongsheng Li, and Deqing Yang. EASYTOOL: Enhancing LLM-based agents with concise tool instruction. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational L...

  33. [33]

    Proceedings of the 2025

    Association for Computational Linguistics. doi: 10.18653/v1/2025.naacl-long.44. URLhttps://aclanthology.org/2025.naacl-long.44/. Hanrong Zhang, Jingyuan Huang, Kai Mei, Yifei Yao, Zhenting Wang, Chenlu Zhan, Hongwei Wang, and Yongfeng Zhang. Agent security bench (ASB): Formalizing and benchmarking attacks and defenses in LLM-based agents. InInternational ...

  34. [34]

    SkillRouter: Retrieve-and-rerank skill selection for LLM agents at scale.arXiv preprint arXiv:2603.22455, 2026

    doi: 10.48550/arXiv.2603.22455. URL https://arxiv.org/abs/2603.22455. Yuanhang Zheng, Peng Li, Wei Liu, Yang Liu, Jian Luan, and Bin Wang. ToolRerank: Adaptive and hierarchy-aware reranking for tool retrieval. InProceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages ...

  35. [35]

    URL https://aclanthology

    ELRA and ICCL. URL https://aclanthology. org/2024.lrec-main.1413/. A Definition of SSL Schema This appendix gives the field-level realization of the SSL representation defined in Eq

  36. [36]

    skill": {

    The scene_type field is likewise drawn from a closed inventory so that phase categories remain comparable across skills; as listed in Table 5, these types are PREPARE, ACQUIRE,REASON,ACT,VERIFY,RECOVER, andFINALIZE. A.4 Logical Layer The logical layer represents the operations that implement each scene. It is defined over source- grounded atomic actions, ...

  37. [37]

    They deterministically extract headings, short bullet items, interface- or resource-looking lines, and short prose spans from the original SKILL.md

    The source-outline controls in Table 1 are constructed without SSL fields. They deterministically extract headings, short bullet items, interface- or resource-looking lines, and short prose spans from the original SKILL.md. This provides a deterministic non-SSL document-enrichment control for testing whether the SSL improvement can be explained by generic...