pith. sign in

hub Mixed citations

Defeating Prompt Injections by Design

Mixed citation behavior. Most common role is background (60%).

74 Pith papers citing it
Background 60% of classified citations
abstract

Large Language Models (LLMs) are increasingly deployed in agentic systems that interact with an untrusted environment. However, LLM agents are vulnerable to prompt injection attacks when handling untrusted data. In this paper we propose CaMeL, a robust defense that creates a protective system layer around the LLM, securing it even when underlying models are susceptible to attacks. To operate, CaMeL explicitly extracts the control and data flows from the (trusted) query; therefore, the untrusted data retrieved by the LLM can never impact the program flow. To further improve security, CaMeL uses a notion of a capability to prevent the exfiltration of private data over unauthorized data flows by enforcing security policies when tools are called. We demonstrate effectiveness of CaMeL by solving $77\%$ of tasks with provable security (compared to $84\%$ with an undefended system) in AgentDojo. We release CaMeL at https://github.com/google-research/camel-prompt-injection.

hub tools

citation-role summary

background 10 baseline 2 method 2 dataset 1

citation-polarity summary

years

2026 70 2025 4

representative citing papers

Whose Side Is Your Agent On? Multi-Party Principal Loyalty in LLM Agents

cs.AI · 2026-06-29 · unverdicted · novelty 7.0

PrincipalBench exposes a sharp split in frontier LLMs between selective and over-refusing behavior on multi-party loyalty, with prompt scaffolding and KL distillation reducing harm rates but only along an existing leak/over-refusal trade-off.

Data Flow Control: Data Safety Policies for AI Agents

cs.DB · 2026-06-04 · unverdicted · novelty 7.0

Data Flow Control formalizes data safety as aggregate predicates over provenance monomials and implements enforcement via the Passant query rewriting layer achieving near-zero overhead across five DBMS engines.

What You Approve Is What Executes: Consent Integrity for Black-Box LLM Agents

cs.CR · 2026-06-01 · unverdicted · novelty 7.0

The paper introduces Consent Integrity as the property that actions shown for approval must be rendered by a trusted mediator from the real boundary action over an unspoofable path and bound to execution, with uninspectable actions surfaced rather than silently approved.

Boiling the Frog: A Multi-Turn Benchmark for Agentic Safety

cs.CL · 2026-05-21 · unverdicted · novelty 7.0 · 2 refs

Boiling the Frog is a new stateful multi-turn benchmark that finds an aggregate 44.4% strict attack success rate for incremental safety violations across nine AI models, with rates ranging from 20.5% to 92.9%.

AgenTEE: Confidential LLM Agent Execution on Edge Devices

cs.CR · 2026-04-20 · unverdicted · novelty 7.0

AgenTEE isolates LLM agent runtime, inference, and apps in independently attested cVMs on Arm-based edge devices, achieving under 5.15% overhead versus commodity OS deployments.

citing papers explorer

Showing 50 of 74 citing papers.