{"total":13,"items":[{"citing_arxiv_id":"2606.29225","ref_index":5,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"PolicyGuard: A Dialogue-Grounded Sub-Agent Verifier for Policy Adherence in LLM Agents","primary_cat":"cs.AI","submitted_at":"2026-06-28T06:27:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"PolicyGuard is a dialogue-grounded sub-agent verifier that raises PASS4 scores by 6-12 points on an airline benchmark while catching more violations with fewer blocks than argument-level guards.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18991","ref_index":43,"ref_count":2,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Agent Security is a Systems Problem","primary_cat":"cs.CR","submitted_at":"2026-05-18T18:11:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"The paper argues that agent security is best addressed as a systems problem by applying principles from operating systems, networks, and formal methods rather than relying solely on model robustness improvements.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18414","ref_index":5,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Prompts Don't Protect: Architectural Enforcement via MCP Proxy for LLM Tool Access Control","primary_cat":"cs.CR","submitted_at":"2026-05-18T13:52:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MCP proxy enforces ABAC for LLM tool access by filtering discovery and invocation, achieving 0% unauthorized invocation rate across tested models and attacks where prompts reduce risk by only 11-18 points.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.10907","ref_index":40,"ref_count":2,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Engineering Robustness into Personal Agents with the AI Workflow Store","primary_cat":"cs.CR","submitted_at":"2026-05-11T17:46:33+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Position paper advocating a shift from on-the-fly AI agent synthesis to reusable hardened workflows in an AI Workflow Store to improve robustness and security.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"URL:https : / / chatgpt . com / features / agent/ (visited on 04/29/2026). [38] OpenClaw.OpenClaw - Personal AI Assistant. OpenClaw.URL:https://openclaw.ai/(visited on 04/29/2026). [39] Nils Palumbo, Sarthak Choudhary, Jihye Choi, Prasad Chalasani, and Somesh Jha. \"Policy Compiler for Secure Agentic Systems\". In: (2026). arXiv:2602.16708 [cs.CR]. 11 [40] F 'abio Perez and Ian Ribeiro. \"Ignore Previous Prompt: Attack Techniques For Language Models\". In: arXiv(2022). eprint:2211.09527(cs.CL). [41] Neil Perry, Megha Srivastava, Deepak Kumar, and Dan Boneh. \"Do users write more insecure code with ai assistants?\" In:Proceedings of the 2023 ACM SIGSAC conference on computer and communications security. 2023, pages 2785-2799."},{"citing_arxiv_id":"2605.06393","ref_index":35,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Constraining Host-Level Abuse in Self-Hosted Computer-Use Agents via TEE-Backed Isolation","primary_cat":"cs.CR","submitted_at":"2026-05-07T15:08:40+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A TEE-backed architecture isolates security-critical decisions in self-hosted AI agents to prevent host-level abuse from malicious inputs while maintaining allowed functionality.","context_count":1,"top_context_role":"baseline","top_context_polarity":"baseline","context_text":"modeling Trusted classifica- tion / decision path Remote terminal verification SHCUA/OpenClaw se- curity analysis [12], [13], [16], [25] Yes Partial No No Computer-use agent at- tack benchmarks [14], [19], [20], [22], [24] Mostly yes Partial No No Input/model-side defenses [27]-[30] Partial No No No Policy/runtime enforce- ment for agents [31]-[33], [35]-[39] Partial Partial Usually no No Sandboxing and con- strained execution [11], [41], [42] Partial No / partial No No General system-level protection [43]-[46] No No Partial No This work-Yes Yes Yes Yes if the REE-side runtime is compromised. Our design applies this principle to agent operations by placing classification, authorization, binding, and evidence generation in a TEE-"},{"citing_arxiv_id":"2605.05501","ref_index":36,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"SOCpilot: Verifying Policy Compliance for LLM-Assisted Incident Response","primary_cat":"cs.CR","submitted_at":"2026-05-06T22:51:18+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SOCpilot supplies a fixed verifier and public artifact that removes 466 non-compliant approval-gated actions from LLM plans on 200 real incidents while preserving task recall.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.05440","ref_index":16,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Authorization Propagation in Multi-Agent AI Systems: Identity Governance as Infrastructure","primary_cat":"cs.AI","submitted_at":"2026-05-06T20:56:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Multi-agent AI creates an authorization propagation problem not solved by prompt injection defenses or classical access control, requiring identity governance as continuously enforced infrastructure.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.19657","ref_index":53,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"An AI Agent Execution Environment to Safeguard User Data","primary_cat":"cs.CR","submitted_at":"2026-04-21T16:45:30+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"GAAP guarantees confidentiality of private user data for AI agents by enforcing user-specified permissions deterministically through persistent information flow tracking, without trusting the agent or requiring attack-free models.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"[51] Nils Palumbo, Sarthak Choudhary, Jihye Choi, Prasad Chalasani, and Somesh Jha. 2026. Policy Compiler for Secure Agentic Systems. (2026). arXiv:2602.16708 [cs.CR]https://arxiv.org/abs/2602.16708 [52] Chetan Pathade. 2025. Red Teaming the Mind of the Machine: A Systematic Evaluation of Prompt Injection and Jailbreak Vulnerabilities in LLMs. arXiv:2505.04806 [cs.CR]https://arxiv.org/abs/2505.04806 [53] Ethan Perez et al. 2022. Red Teaming Language Models with Language Models.arXiv preprint arXiv:2202.03286(2022). [54] Niels Provos. 2026.IronCurtain: A Personal AI Assistant Built Secure from the Ground Up. Niels Provos Blog.https://www.provos.org/p/ ironcurtain-secure-personal-assistant/ [55] PulseMCP. 2026. PulseMCP: Model Context Protocol Community"},{"citing_arxiv_id":"2604.18658","ref_index":10,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Owner-Harm: A Missing Threat Model for AI Agent Safety","primary_cat":"cs.CR","submitted_at":"2026-04-20T10:11:26+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Owner-Harm is a new threat model with eight categories of agent behavior that harms the deployer, and existing defenses achieve only 14.8% true positive rate on injection-based owner-harm tasks versus 100% on generic criminal harm.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.15579","ref_index":53,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Symbolic Guardrails for Domain-Specific Agents: Stronger Safety and Security Guarantees Without Sacrificing Utility","primary_cat":"cs.SE","submitted_at":"2026-04-16T23:18:22+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Symbolic guardrails enforce 74% of specified safety policies in agent benchmarks and boost safety without hurting utility.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"explore LLM for detecting prompt injection. Some work aims to provide deterministic, rule-based guardrails, but still relies partly on LLMs to generate guardrail rules, decide when to trigger them, or execute them. These approaches, therefore, remain probabilis- tic and cannot provide guarantees. For example, GuardAgent [68] uses LLMs to generate guardrail code; NeMo Guardrails [53] pro- vides programmable guardrails while execution still involves LLMs; ShieldAgent[ 11] relies on LLMs to retrieve and execute rule-based policy checks;AgentGuardian[ 1] uses LLMs to generate control policies. A key limitation of neural guardrails is their inherently probabilistic nature. Because their generation or execution depends on LLMs, they can be error-prone or circumvented by attackers."},{"citing_arxiv_id":"2604.16524","ref_index":17,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Anumati: Proof of Adherence as a Formal Consent Model for Autonomous Agent Protocols","primary_cat":"cs.CR","submitted_at":"2026-04-16T10:48:21+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Anumati defines proof of adherence via versioned PolicyDocument, ConsentRecord, and AdherenceEvent primitives as a non-breaking extension to A2A and MCP protocols.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.04035","ref_index":21,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Causality Laundering: Denial-Feedback Leakage in Tool-Calling LLM Agents","primary_cat":"cs.CR","submitted_at":"2026-04-05T09:28:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"The paper defines causality laundering as an attack leaking information from denial outcomes in LLM tool calls and proposes the Agentic Reference Monitor to block it using denial-aware provenance graphs.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Recentworkhasmaderealprogressonruntimeenforcementfortool-callingagents. Information- flow-control systems such as FIDES [6] enforce label-based restrictions over observed flows. Secure-by-design architectures such as CaMeL [8] extract trusted control and data flow from the user query and enforce capability-style restrictions. Graph-based approaches such as PCAS [21] construct dependency graphs over tool-call events and evaluate declarative policies against them. Causal attribution defenses [16, 31] use counterfactual re-execution to determine whether a pro- posed action is driven by user intent or by injected content. These systems represent genuine progress; collectively, they show that principled runtime enforcement for agents is both feasible"},{"citing_arxiv_id":"2603.09002","ref_index":12,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Security Considerations for Multi-agent Systems","primary_cat":"cs.CR","submitted_at":"2026-03-09T22:46:27+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"No existing AI security framework covers a majority of the 193 identified multi-agent system threats in any category, with OWASP Agentic Security Initiative achieving the highest overall coverage at 65.3%.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}