This paper delivers the first systematic taxonomy and cross-benchmark consistency analysis of 40 agent safety benchmarks, finding broad but shallow risk coverage, no ranking concordance across evaluations, and that benchmark choice systematically alters reported safety.
arXiv preprint arXiv:2506.15253 , year=
5 Pith papers cite this work. Polarity classification is still indexing.
years
2026 5representative citing papers
Coding agents struggle to infer least-privilege file permissions by omitting needed accesses while granting unused or sensitive ones, but Sufficiency-Tightness Decomposition improves sensitive-task success by up to 15.8% and reduces attacks.
GrantBox evaluates LLM agents using real-world tools and finds they remain vulnerable to sophisticated prompt injection attacks with an 84.80% average success rate.
ADR is a three-component detection system for AI agents that combines telemetry sensors, red teaming, and two-tier detection, achieving 97.2% precision in a ten-month Uber deployment and outperforming baselines on the new ADR-Bench.
Content embeddings from SBERT enable AUROC above 0.89 for attack detection in MCP tool-call sessions, with tree ensembles on pooled embeddings reaching 0.975 and outperforming GNNs when using task-stratified splits instead of random ones.
citing papers explorer
-
Taxonomy and Consistency Analysis of Safety Benchmarks for AI Agents
This paper delivers the first systematic taxonomy and cross-benchmark consistency analysis of 40 agent safety benchmarks, finding broad but shallow risk coverage, no ranking concordance across evaluations, and that benchmark choice systematically alters reported safety.
-
Do Coding Agents Understand Least-Privilege Authorization?
Coding agents struggle to infer least-privilege file permissions by omitting needed accesses while granting unused or sensitive ones, but Sufficiency-Tightness Decomposition improves sensitive-task success by up to 15.8% and reduces attacks.
-
Evaluating Privilege Usage of Agents with Real-World Tools
GrantBox evaluates LLM agents using real-world tools and finds they remain vulnerable to sophisticated prompt injection attacks with an 84.80% average success rate.
-
ADR: An Agentic Detection System for Enterprise Agentic AI Security
ADR is a three-component detection system for AI agents that combines telemetry sensors, red teaming, and two-tier detection, achieving 97.2% precision in a ten-month Uber deployment and outperforming baselines on the new ADR-Bench.
-
Content-Aware Attack Detection in LLM Agent Tool-Call Traffic: An Empirical Study of Features, Architectures, and Evaluation Protocols
Content embeddings from SBERT enable AUROC above 0.89 for attack detection in MCP tool-call sessions, with tree ensembles on pooled embeddings reaching 0.975 and outperforming GNNs when using task-stratified splits instead of random ones.