Recognition: 1 theorem link
· Lean TheoremSkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces
Pith reviewed 2026-05-13 05:04 UTC · model grok-4.3
The pith
SkillSafetyBench shows that attacks on reusable skills can induce unsafe actions in LLM agents even from benign user requests.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SkillSafetyBench includes 155 adversarial cases across 47 tasks, 6 risk domains, and 30 safety categories, each with a case-specific rule-based verifier. Experiments with multiple CLI agents and model backends show that localized non-user attacks can consistently induce unsafe behavior, with distinct failure patterns across domains, attack methods, and scaffold-model pairings. The findings indicate that agent safety depends not only on model-level alignment, but also on how agents interpret skills, trust workflow context, and act through executable environments.
What carries the argument
SkillSafetyBench, a runnable benchmark for evaluating skill-mediated safety failures using adversarial cases and rule-based verifiers.
If this is right
- Agent safety evaluations need to include tests for skill-facing attacks in addition to direct user prompts.
- Distinct failure patterns suggest that safety improvements must be tailored to specific agent scaffolds and model backends.
- Trust in workflow context from skills can be exploited to bypass safety measures in executable environments.
- Reusable skills should be designed with safeguards against local adversarial artifacts to maintain agent safety.
Where Pith is reading between the lines
- Extending the benchmark to include more diverse agent types beyond CLI could reveal additional vulnerabilities in deployed systems.
- Skill providers might need to incorporate validation mechanisms for skill content to reduce attack surfaces.
- The results imply that future agent designs could benefit from isolated execution environments for skills to limit the impact of compromised context.
Load-bearing premise
The constructed adversarial cases and rule-based verifiers in SkillSafetyBench correctly identify and measure real-world skill-mediated safety failures without missing important cases or introducing errors in verification.
What would settle it
Re-running the experiments on the 155 cases with new agent-model combinations and observing that no or very few unsafe behaviors are triggered according to the verifiers would challenge the claim of consistent induction of unsafe behavior.
Figures
read the original abstract
Reusable skills are becoming a common interface for extending large language model agents, packaging procedural guidance with access to files, tools, memory, and execution environments. However, this modularity introduces attack surfaces that are largely missed by existing safety evaluations: even when the user request is benign, task-relevant skill materials or local artifacts can steer an agent toward unsafe actions. We present SkillSafetyBench, a runnable benchmark for evaluating such skill-mediated safety failures. SkillSafetyBench includes 155 adversarial cases across 47 tasks, 6 risk domains, and 30 safety categories, each evaluated with a case-specific rule-based verifier. Experiments with multiple CLI agents and model backends show that localized non-user attacks can consistently induce unsafe behavior, with distinct failure patterns across domains, attack methods, and scaffold-model pairings. Our findings suggest that agent safety depends not only on model-level alignment, but also on how agents interpret skills, trust workflow context, and act through executable environments.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces SkillSafetyBench, a runnable benchmark for evaluating safety failures in LLM agents induced by reusable skills that grant access to files, tools, memory, and execution environments. It comprises 155 adversarial cases across 47 tasks, 6 risk domains, and 30 safety categories, each paired with a case-specific rule-based verifier. Experiments with multiple CLI agents and model backends demonstrate that localized non-user attacks can consistently induce unsafe behavior, with distinct failure patterns varying by domain, attack method, and scaffold-model pairing. The authors argue that agent safety requires attention to skill interpretation, workflow context, and executable environments beyond model-level alignment.
Significance. If the benchmark's cases and verifiers hold up under validation, the work is significant for identifying an overlooked attack surface in modular LLM agents. It supplies empirical evidence of how benign user requests combined with adversarial skill materials can steer agents toward unsafe actions, highlighting the need for skill-aware safety mechanisms. The runnable design and multi-domain coverage are strengths that could aid reproducibility and future extensions.
major comments (2)
- [Benchmark Design] Benchmark Design section (around the description of the 155 cases and verifiers): The central claim of consistent unsafe behavior induction depends on the case-specific rule-based verifiers correctly identifying safety failures. However, no details are provided on rule development, validation against human judgments, inter-rater agreement, or checks that rules capture intent/context rather than surface keywords (e.g., file writes or tool calls). This is load-bearing, as overfitting or misclassification could artifactually generate the reported distinct failure patterns across domains and scaffolds.
- [Experimental Results] Experimental Results section (around the experiments with CLI agents and model backends): The abstract reports consistent induction of unsafe behavior but omits information on case construction (e.g., independence from tested agents' failure modes), statistical significance, controls for prompt sensitivity, or confounding factors. Without these, the generalizability of the distinct failure patterns across domains, attack methods, and pairings cannot be assessed reliably.
minor comments (2)
- [Abstract] The abstract would be clearer if it specified the exact number and identities of CLI agents and model backends tested.
- [Benchmark Design] Consider adding a summary table or figure showing the distribution of the 155 cases across the 6 risk domains and 30 safety categories to aid reader comprehension.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed review of our manuscript introducing SkillSafetyBench. The comments identify areas where additional methodological transparency will strengthen the presentation of the benchmark and results. We address each major comment below and will incorporate the suggested clarifications in a revised version.
read point-by-point responses
-
Referee: [Benchmark Design] Benchmark Design section (around the description of the 155 cases and verifiers): The central claim of consistent unsafe behavior induction depends on the case-specific rule-based verifiers correctly identifying safety failures. However, no details are provided on rule development, validation against human judgments, inter-rater agreement, or checks that rules capture intent/context rather than surface keywords (e.g., file writes or tool calls). This is load-bearing, as overfitting or misclassification could artifactually generate the reported distinct failure patterns across domains and scaffolds.
Authors: We agree that greater detail on verifier construction is warranted to support the central claims. The case-specific rules were authored to detect observable violations of the safety categories within each task's defined context, rather than relying on isolated keywords; for example, a rule for unauthorized file access checks both the target path and the absence of required permissions given the workflow state. In the revision we will add a dedicated subsection describing the rule development process, including how rules were derived from the 30 safety categories and 47 tasks. We will also report results from a human validation study on a representative subset of cases, including inter-annotator agreement metrics and alignment between automated verdicts and expert judgments. These additions will directly address concerns about potential misclassification and allow readers to assess the reliability of the observed failure patterns. revision: yes
-
Referee: [Experimental Results] Experimental Results section (around the experiments with CLI agents and model backends): The abstract reports consistent induction of unsafe behavior but omits information on case construction (e.g., independence from tested agents' failure modes), statistical significance, controls for prompt sensitivity, or confounding factors. Without these, the generalizability of the distinct failure patterns across domains, attack methods, and pairings cannot be assessed reliably.
Authors: We acknowledge the value of these additional details for evaluating generalizability. The 155 cases were constructed from domain-specific risk scenarios and common agent workflow patterns prior to selecting the evaluation scaffolds, ensuring independence from any particular agent's failure modes. In the revised manuscript we will expand the experimental section to include: (1) a description of the case construction methodology and its separation from the tested CLI agents and model backends; (2) statistical significance testing and confidence intervals for the reported unsafe behavior rates; and (3) discussion of controls for prompt sensitivity (e.g., template variations) and other potential confounders such as environment initialization and temperature settings. These changes will provide a clearer basis for interpreting the distinct failure patterns across domains, attack methods, and scaffold-model pairings. revision: yes
Circularity Check
No circularity in derivation chain
full rationale
The paper presents SkillSafetyBench as an empirical benchmark consisting of 155 adversarial cases across tasks, domains, and categories, each paired with a case-specific rule-based verifier. It reports experimental outcomes from running multiple CLI agents and model backends under localized non-user attacks. No mathematical derivations, equations, fitted parameters, predictions, or self-citations appear in the abstract or described structure. The central claim—that such attacks induce unsafe behavior with distinct patterns—is a direct reporting of benchmark results rather than any reduction to inputs by construction, self-definition, or load-bearing self-citation. The evaluation is self-contained as an observational study of agent behavior.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclearSkillSafetyBench includes 155 adversarial cases across 47 tasks, 6 risk domains, and 30 safety categories, each evaluated with a case-specific rule-based verifier.
Reference graph
Works this paper leans on
-
[9]
Advances in Neural Information Processing Systems , volume=
Taskbench: Benchmarking large language models for task automation , author=. Advances in Neural Information Processing Systems , volume=
-
[11]
Advances in Neural Information Processing Systems , volume=
GTA: a benchmark for general tool agents , author=. Advances in Neural Information Processing Systems , volume=
-
[19]
Advances in Neural Information Processing Systems , volume=
Osworld: Benchmarking multimodal agents for open-ended tasks in real computer environments , author=. Advances in Neural Information Processing Systems , volume=
-
[21]
Findings of the Association for Computational Linguistics: EMNLP 2023 , pages=
Creator: Tool creation for disentangling abstract and concrete reasoning of large language models , author=. Findings of the Association for Computational Linguistics: EMNLP 2023 , pages=
work page 2023
-
[22]
Proceedings of the 16th ACM workshop on artificial intelligence and security , pages=
Not what you've signed up for: Compromising real-world llm-integrated applications with indirect prompt injection , author=. Proceedings of the 16th ACM workshop on artificial intelligence and security , pages=
-
[23]
33rd USENIX Security Symposium (USENIX Security 24) , pages=
Formalizing and benchmarking prompt injection attacks and defenses , author=. 33rd USENIX Security Symposium (USENIX Security 24) , pages=
-
[24]
Advances in Neural Information Processing Systems , volume=
Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents , author=. Advances in Neural Information Processing Systems , volume=
-
[27]
Findings of the Association for Computational Linguistics: ACL 2024 , pages=
Injecagent: Benchmarking indirect prompt injections in tool-integrated large language model agents , author=. Findings of the Association for Computational Linguistics: ACL 2024 , pages=
work page 2024
-
[28]
Advances in Neural Information Processing Systems , volume=
Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases , author=. Advances in Neural Information Processing Systems , volume=
-
[35]
Yao, Shunyu and Shinn, Noah and Razavi, Pedram and Narasimhan, Karthik , journal=
-
[36]
Findings of the Association for Computational Linguistics: NAACL 2025 , pages=
Toolsandbox: A stateful, conversational, interactive evaluation benchmark for llm tool use capabilities , author=. Findings of the Association for Computational Linguistics: NAACL 2025 , pages=
work page 2025
-
[37]
Appworld: A controllable world of apps and people for benchmarking interactive coding agents , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
-
[43]
CREATOR: Tool creation for disentangling abstract and concrete reasoning of large language models,
CREATOR: disentangling abstract and concrete reasonings of large language models through tool creation. CoRR, abs/2305.14318, 2023b. doi: 10.48550 , author=
-
[44]
Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V
Benchmarking and defending against indirect prompt injection attacks on large language models , author=. Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1 , pages=
-
[46]
34th USENIX Security Symposium (USENIX Security 25) , pages=
\ StruQ \ : Defending against prompt injection with structured queries , author=. 34th USENIX Security Symposium (USENIX Security 25) , pages=
-
[47]
34th USENIX Security Symposium (USENIX Security 25) , pages=
\ PoisonedRAG \ : Knowledge corruption attacks to \ Retrieval-Augmented \ generation of large language models , author=. 34th USENIX Security Symposium (USENIX Security 25) , pages=
-
[49]
Maksym Andriushchenko, Alexandra Souly, Mateusz Dziemian, Derek Duenas, Maxwell Lin, Justin Wang, Dan Hendrycks, Andy Zou, Zico Kolter, Matt Fredrikson, and 1 others. 2024. Agentharm: A benchmark for measuring harmfulness of llm agents. arXiv preprint arXiv:2410.09024
work page internal anchor Pith review arXiv 2024
-
[50]
Jun Shern Chan, Neil Chowdhury, Oliver Jaffe, James Aung, Dane Sherburn, Evan Mays, Giulio Starace, Kevin Liu, Leon Maksin, Tejal Patwardhan, and 1 others. 2024. Mle-bench: Evaluating machine learning agents on machine learning engineering. arXiv preprint arXiv:2410.07095
work page Pith review arXiv 2024
-
[51]
Sizhe Chen, Julien Piet, Chawin Sitawarin, and David Wagner. 2025. \ StruQ \ : Defending against prompt injection with structured queries. In 34th USENIX Security Symposium (USENIX Security 25), pages 2383--2400
work page 2025
-
[52]
Zhaorun Chen, Zhen Xiang, Chaowei Xiao, Dawn Song, and Bo Li. 2024. Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases. Advances in Neural Information Processing Systems, 37:130185--130213
work page 2024
-
[53]
Edoardo Debenedetti, Jie Zhang, Mislav Balunovic, Luca Beurer-Kellner, Marc Fischer, and Florian Tram \`e r. 2024. Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents. Advances in Neural Information Processing Systems, 37:82895--82920
work page 2024
-
[54]
Zenghao Duan, Yuxin Tian, Zhiyi Yin, Liang Pang, Jingcheng Deng, Zihao Wei, Shicheng Xu, Yuyao Ge, and Xueqi Cheng. 2026. Skillattack: Automated red teaming of agent skills through attack path refinement. arXiv preprint arXiv:2604.04989
work page internal anchor Pith review Pith/arXiv arXiv 2026
- [55]
-
[56]
Yunhao Feng, Yifan Ding, Yingshui Tan, Boren Zheng, Yanming Guo, Xiaolong Li, Kun Zhai, Yishan Li, and Wenke Huang. 2026. Skilltrojan: Backdoor attacks on skill-based agent systems. arXiv preprint arXiv:2604.06811
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[57]
Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. 2023. Not what you've signed up for: Compromising real-world llm-integrated applications with indirect prompt injection. In Proceedings of the 16th ACM workshop on artificial intelligence and security, pages 79--90
work page 2023
-
[58]
Yinghan Hou and Zongyou Yang. 2026. Skillsieve: A hierarchical triage framework for detecting malicious ai agent skills. arXiv preprint arXiv:2604.06550
work page internal anchor Pith review Pith/arXiv arXiv 2026
- [59]
-
[60]
Naman Jain, King Han, Alex Gu, Wen-Ding Li, Fanjia Yan, Tianjun Zhang, Sida Wang, Armando Solar-Lezama, Koushik Sen, and Ion Stoica. 2024. Livecodebench: Holistic and contamination free evaluation of large language models for code. arXiv preprint arXiv:2403.07974
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[61]
Yanna Jiang, Delong Li, Haiyu Deng, Baihe Ma, Xu Wang, Qin Wang, and Guangsheng Yu. 2026. Sok: Agentic skills--beyond tool use in llm agents. arXiv preprint arXiv:2602.20867
work page internal anchor Pith review arXiv 2026
-
[62]
Carlos E Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan. 2023. Swe-bench: Can language models resolve real-world github issues? arXiv preprint arXiv:2310.06770
work page internal anchor Pith review Pith/arXiv arXiv 2023
- [63]
-
[64]
Xiangyi Li, Wenbo Chen, Yimin Liu, Shenghan Zheng, Xiaokun Chen, Yifeng He, Yubo Li, Bingran You, Haotian Shen, Jiankai Sun, and 1 others. 2026 a . Skillsbench: Benchmarking how well agent skills work across diverse tasks. arXiv preprint arXiv:2602.12670
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[65]
Zhiyuan Li, Jingzheng Wu, Xiang Ling, Xing Cui, and Tianyue Luo. 2026 b . Towards secure agent skills: Architecture, threat taxonomy, and security analysis. arXiv preprint arXiv:2604.02837
work page internal anchor Pith review Pith/arXiv arXiv 2026
- [66]
-
[67]
Xiao Liu, Hao Yu, Hanchen Zhang, Yifan Xu, Xuanyu Lei, Hanyu Lai, Yu Gu, Hangliang Ding, Kaiwen Men, Kejuan Yang, and 1 others. 2023 a . Agentbench: Evaluating llms as agents. arXiv preprint arXiv:2308.03688
work page internal anchor Pith review Pith/arXiv arXiv 2023
- [68]
-
[69]
Yi Liu, Gelei Deng, Yuekang Li, Kailong Wang, Zihao Wang, Xiaofeng Wang, Tianwei Zhang, Yepang Liu, Haoyu Wang, Yan Zheng, and 1 others. 2023 b . Prompt injection attack against llm-integrated applications. arXiv preprint arXiv:2306.05499
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[70]
Yi Liu, Weizhe Wang, Ruitao Feng, Yao Zhang, Guangquan Xu, Gelei Deng, Yuekang Li, and Leo Zhang. 2026 b . Agent skills in the wild: An empirical study of security vulnerabilities at scale. arXiv preprint arXiv:2601.10338
work page internal anchor Pith review arXiv 2026
-
[71]
Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, and Neil Zhenqiang Gong. 2024. Formalizing and benchmarking prompt injection attacks and defenses. In 33rd USENIX Security Symposium (USENIX Security 24), pages 1831--1847
work page 2024
-
[72]
Jiarui Lu, Thomas Holleis, Yizhe Zhang, Bernhard Aumayer, Feng Nan, Haoping Bai, Shuang Ma, Shen Ma, Mengyu Li, Guoli Yin, and 1 others. 2025. Toolsandbox: A stateful, conversational, interactive evaluation benchmark for llm tool use capabilities. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 1160--1183
work page 2025
-
[73]
Cheng Qian, Chi Han, Yi Fung, Yujia Qin, Zhiyuan Liu, and Heng Ji. 2023. Creator: Tool creation for disentangling abstract and concrete reasoning of large language models. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 6922--6939
work page 2023
-
[74]
Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, and 1 others. 2023. Toolllm: Facilitating large language models to master 16000+ real-world apis. arXiv preprint arXiv:2307.16789
work page internal anchor Pith review Pith/arXiv arXiv 2023
- [75]
-
[76]
Yangjun Ruan, Honghua Dong, Andrew Wang, Silviu Pitis, Yongchao Zhou, Jimmy Ba, Yann Dubois, Chris J Maddison, and Tatsunori Hashimoto. 2023. Identifying the risks of lm agents with an lm-emulated sandbox. arXiv preprint arXiv:2309.15817
work page internal anchor Pith review Pith/arXiv arXiv 2023
- [77]
-
[78]
Yongliang Shen, Kaitao Song, Xu Tan, Wenqi Zhang, Kan Ren, Siyu Yuan, Weiming Lu, Dongsheng Li, and Yueting Zhuang. 2024. Taskbench: Benchmarking large language models for task automation. Advances in Neural Information Processing Systems, 37:4540--4574
work page 2024
-
[79]
Guiyao Tie, Jiawen Shi, Pan Zhou, and Lichao Sun. 2026. Badskill: Backdoor attacks on agent skills via model-in-skill poisoning. arXiv preprint arXiv:2604.09378
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[80]
Harsh Trivedi, Tushar Khot, Mareike Hartmann, Ruskin Manku, Vinty Dong, Edward Li, Shashank Gupta, Ashish Sabharwal, and Niranjan Balasubramanian. 2024. Appworld: A controllable world of apps and people for benchmarking interactive coding agents. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Pap...
work page 2024
-
[81]
Chenxi Wang, Zhuoyun Yu, Xin Xie, Wuguannan Yao, Runnan Fang, Shuofei Qiao, Kexin Cao, Guozhou Zheng, Xiang Qi, Peng Zhang, and 1 others. 2026. Skillx: Automatically constructing skill knowledge bases for agents. arXiv preprint arXiv:2604.04804
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[82]
Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. 2023. Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[83]
Jize Wang, Zerun Ma, Yining Li, Songyang Zhang, Cailian Chen, Kai Chen, and Xinyi Le. 2024. Gta: a benchmark for general tool agents. Advances in Neural Information Processing Systems, 37:75749--75790
work page 2024
- [84]
-
[85]
Tianbao Xie, Danyang Zhang, Jixuan Chen, Xiaochuan Li, Siheng Zhao, Ruisheng Cao, Toh J Hua, Zhoujun Cheng, Dongchan Shin, Fangyu Lei, and 1 others. 2024. Osworld: Benchmarking multimodal agents for open-ended tasks in real computer environments. Advances in Neural Information Processing Systems, 37:52040--52094
work page 2024
-
[86]
Frank F Xu, Yufan Song, Boxuan Li, Yuxuan Tang, Kritanjali Jain, Mengxue Bao, Zora Z Wang, Xuhui Zhou, Zhitong Guo, Murong Cao, and 1 others. 2024. Theagentcompany: benchmarking llm agents on consequential real world tasks. arXiv preprint arXiv:2412.14161
work page internal anchor Pith review arXiv 2024
-
[87]
Renjun Xu and Yang Yan. 2026. Agent skills for large language models: Architecture, acquisition, security, and the path forward. arXiv preprint arXiv:2602.12430
work page internal anchor Pith review arXiv 2026
-
[88]
Shunyu Yao, Noah Shinn, Pedram Razavi, and Karthik Narasimhan. 2024. -bench : A benchmark for tool-agent-user interaction in real-world domains. arXiv preprint arXiv:2406.12045
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[89]
Asaf Yehudai, Lilach Eden, Alan Li, Guy Uziel, Yilun Zhao, Roy Bar-Haim, Arman Cohan, and Michal Shmueli-Scheuer. 2025. Survey on evaluation of llm-based agents. arXiv preprint arXiv:2503.16416
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[90]
Jingwei Yi, Yueqi Xie, Bin Zhu, Emre Kiciman, Guangzhong Sun, Xing Xie, and Fangzhao Wu. 2025. Benchmarking and defending against indirect prompt injection attacks on large language models. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1, pages 1809--1820
work page 2025
-
[91]
Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. 2024. Injecagent: Benchmarking indirect prompt injections in tool-integrated large language model agents. In Findings of the Association for Computational Linguistics: ACL 2024, pages 10471--10506
work page 2024
-
[92]
Hanrong Zhang, Jingyuan Huang, Kai Mei, Yifei Yao, Zhenting Wang, Chenlu Zhan, Hongwei Wang, and Yongfeng Zhang. 2024. Agent security bench (asb): Formalizing and benchmarking attacks and defenses in llm-based agents. arXiv preprint arXiv:2410.02644
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[93]
Boyuan Zheng, Michael Y Fatemi, Xiaolong Jin, Zora Zhiruo Wang, Apurva Gandhi, Yueqi Song, Yu Gu, Jayanth Srinivasa, Gaowen Liu, Graham Neubig, and 1 others. 2025. Skillweaver: Web agents can self-improve by discovering and honing skills. arXiv preprint arXiv:2504.07079
work page internal anchor Pith review arXiv 2025
-
[94]
Shuyan Zhou, Frank F Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, and 1 others. 2023. Webarena: A realistic web environment for building autonomous agents. arXiv preprint arXiv:2307.13854
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[95]
Wei Zou, Runpeng Geng, Binghui Wang, and Jinyuan Jia. 2025. \ PoisonedRAG \ : Knowledge corruption attacks to \ Retrieval-Augmented \ generation of large language models. In 34th USENIX Security Symposium (USENIX Security 25), pages 3827--3844
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.