pith. sign in

arxiv: 2601.22569 · v2 · pith:YHGQMGJQnew · submitted 2026-01-30 · 💻 cs.CR · cs.AI

Whispers of Wealth: Red-Teaming Google's Agent Payments Protocol via Prompt Injection

Pith reviewed 2026-05-21 15:13 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords prompt injectionagent payments protocolAP2red-teamingLLM agentsfinancial securityadversarial promptsGoogle ADK
0
0 comments X

The pith

Simple adversarial prompts can hijack Google's Agent Payments Protocol to alter product rankings and extract user data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper evaluates the security of the Agent Payments Protocol (AP2), which uses cryptographic mandates to protect AI-driven purchases. By constructing a working shopping agent with Gemini-2.5-Flash and the Google ADK framework, the authors show that two prompt-injection techniques reliably override the agent's choices. A sympathetic reader would care because these attacks demonstrate that contextual reasoning in payment agents can be exploited even when cryptographic protections are present. The results indicate that current designs leave open paths for manipulation of recommendations and theft of private information.

Core claim

Using a functional AP2 based shopping agent built with Gemini-2.5-Flash and the Google ADK framework, simple adversarial prompts can reliably subvert agent behavior through the Branded Whisper Attack and the Vault Whisper Attack which manipulate product ranking and extract sensitive user data.

What carries the argument

The Branded Whisper Attack and Vault Whisper Attack: prompt injection methods that use indirect or direct text inputs to alter the agent's product selection and data handling inside the AP2 mandate system.

If this is right

  • Agentic payment systems remain exposed to prompt-driven manipulation despite cryptographic mandates.
  • Stronger isolation between user context and agent decision logic is required.
  • Defensive safeguards against indirect and direct injection must be added to LLM-mediated financial workflows.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar injection patterns could affect other LLM agents handling financial or personal decisions outside AP2.
  • Input filtering or context separation layers might block these whisper-style attacks in future designs.
  • Production testing of AP2 should specifically check resistance to ranking manipulation and data exfiltration.

Load-bearing premise

The functional prototype agent built with Gemini-2.5-Flash and the Google ADK framework accurately represents the security properties of the actual deployed Agent Payments Protocol.

What would settle it

Running the Branded Whisper and Vault Whisper attacks against the production deployment of Google's Agent Payments Protocol and finding that they no longer succeed.

Figures

Figures reproduced from arXiv: 2601.22569 by Pranjol Sen Gupta, Tanusree Debi, Wentian Zhu.

Figure 1
Figure 1. Figure 1: Agent Payment Protocol (AP2) [2] 2 Background 2.1 Agent Payments Protocol (AP2) The Agent Payments Protocol (AP2) is an open protocol developed through collaboration with major payment providers and technology companies to support secure agent-led payment transactions across platforms. Recently released (September 16, 2025) AP2 primarily addresses emerging challenges introduced by large language model (LLM… view at source ↗
Figure 2
Figure 2. Figure 2: Key Security Questions Addressed by the Agent Payments Protocol (AP2) [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: AP2 workflow 2.3 Agent-to-Agent Communication as AP2’s Foundation Agent-to-Agent (A2A) communication serves as the foundational communication layer upon which the Agent Payments Protocol (AP2) is built. A2A is an open interoperability standard that enables AI agents to communicate, collaborate, and coordinate tasks across heterogeneous platforms, providers, 3 [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: A2A Protocol AP2 extends A2A by introducing a payment-specific semantic layer and explicit security guarantees. Whereas A2A focuses on general-purpose agent communication, AP2 defines a specialized vocabulary and protocol rules for handling financial intent, authorization, and accountability. This includes the use of cryptographically signed mandates and verifiable credentials to bind agent actions to expl… view at source ↗
Figure 5
Figure 5. Figure 5: Direct & Indirect Prompt Injection In the context of AP2, prompt injection presents a particularly serious risk. While mandates provide cryptographic guarantees over what actions are permitted, the agent’s interpretation of conversational context determines when and how those mandates are invoked. An attacker who can influence this context, especially through indirect prompt injection may be able to manipu… view at source ↗
Figure 6
Figure 6. Figure 6: Branded Whisper Attack 6 [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Vault Whisper Attack 4.2 Vault Whisper Attack The second attack we introduce is the Vault Whisper Attack. In this threat model, the adversary is a malicious user who interacts with the system through the same interface available to legitimate users. Because users are free to input arbitrary text when communicating with the Shopping Agent, the attacker can craft malicious prompts aimed at influencing the ag… view at source ↗
Figure 8
Figure 8. Figure 8: Normal Product Selection [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Adversarial Selection 5.1 Branded Whisper Attack Before launching the Branded Whisper Attack, we first examined the Merchant Agent’s behavior under normal operating conditions. As depicted in [PITH_FULL_IMAGE:figures/full_fig_p008_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Privacy Leakage As depicted in [PITH_FULL_IMAGE:figures/full_fig_p009_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Product Selection 11 [PITH_FULL_IMAGE:figures/full_fig_p011_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Information Gathering [PITH_FULL_IMAGE:figures/full_fig_p012_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Payment Processing 12 [PITH_FULL_IMAGE:figures/full_fig_p012_13.png] view at source ↗
read the original abstract

Large language model (LLM) based agents are increasingly used to automate financial transactions, yet their reliance on contextual reasoning exposes payment systems to prompt-driven manipulation. The Agent Payments Protocol (AP2) aims to secure agent-led purchases through cryptographically verifiable mandates, but its practical robustness remains underexplored. In this work, we perform an AI red-teaming evaluation of AP2 and identify vulnerabilities arising from indirect and direct prompt injection. We introduce two attack techniques, the Branded Whisper Attack and the Vault Whisper Attack which manipulate product ranking and extract sensitive user data. Using a functional AP2 based shopping agent built with Gemini-2.5-Flash and the Google ADK framework, we experimentally validate that simple adversarial prompts can reliably subvert agent behavior. Our findings reveal critical weaknesses in current agentic payment architectures and highlight the need for stronger isolation and defensive safeguards in LLM-mediated financial systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper performs an AI red-teaming evaluation of Google's Agent Payments Protocol (AP2), which uses cryptographically verifiable mandates to secure LLM-based agent purchases. It introduces two prompt-injection techniques—the Branded Whisper Attack and the Vault Whisper Attack—that are claimed to manipulate product rankings and extract sensitive user data. Using a functional AP2-based shopping agent implemented with Gemini-2.5-Flash and the Google ADK framework, the authors assert that simple adversarial prompts can reliably subvert agent behavior, revealing critical weaknesses in current agentic payment architectures.

Significance. If the attacks are shown to succeed against a faithful implementation of AP2's cryptographic mandates, the work would usefully highlight practical risks in LLM-mediated financial systems and motivate stronger isolation mechanisms. The empirical focus on a deployed protocol is a strength, but the absence of quantitative results and unclear mapping from the proxy implementation to AP2's actual security primitives substantially reduces the current impact.

major comments (2)
  1. [Abstract] Abstract: the claim of 'experimentally validate that simple adversarial prompts can reliably subvert agent behavior' is unsupported by any quantitative results, success rates, trial counts, controls, or methodology details in the abstract or experimental description.
  2. [Experimental setup / functional agent description] Description of the functional AP2-based shopping agent (built with Gemini-2.5-Flash and Google ADK): no evidence is provided that mandate generation, signing, verification, or enforcement steps from the AP2 specification are present and active in the tested implementation. Consequently, success of the Branded Whisper Attack and Vault Whisper Attack does not establish that these attacks bypass AP2's cryptographic protections rather than simply operating in their absence.
minor comments (2)
  1. [Introduction] The two new attack names are introduced without concise, self-contained definitions early in the manuscript; a short dedicated subsection would improve readability.
  2. [Related work] The manuscript would benefit from explicit comparison to prior prompt-injection work on agentic systems to clarify the incremental contribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback on our manuscript. We have addressed the major comments point by point below and revised the paper to improve its clarity and empirical rigor.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim of 'experimentally validate that simple adversarial prompts can reliably subvert agent behavior' is unsupported by any quantitative results, success rates, trial counts, controls, or methodology details in the abstract or experimental description.

    Authors: We acknowledge the referee's observation regarding the abstract. To address this, we have revised the abstract to provide a more accurate summary of our experimental findings without overstating the results. We have also expanded the experimental description in the main text to include specific details on the number of trials, success rates observed for each attack, control conditions, and the overall methodology. These additions ensure that the claim is now supported by the reported evidence. revision: yes

  2. Referee: [Experimental setup / functional agent description] Description of the functional AP2-based shopping agent (built with Gemini-2.5-Flash and Google ADK): no evidence is provided that mandate generation, signing, verification, or enforcement steps from the AP2 specification are present and active in the tested implementation. Consequently, success of the Branded Whisper Attack and Vault Whisper Attack does not establish that these attacks bypass AP2's cryptographic protections rather than simply operating in their absence.

    Authors: We thank the referee for highlighting this important clarification. Our implementation is intended to be a functional representation of the AP2 protocol, and we have now added detailed descriptions and a mapping table in the revised manuscript that explicitly outlines how mandate generation, signing, verification, and enforcement are implemented and active during the experiments. This demonstrates that the attacks succeed in subverting the agent even when these cryptographic steps are enforced, rather than in their absence. We believe this addresses the concern about the proxy implementation. revision: yes

Circularity Check

0 steps flagged

Empirical red-teaming study contains no derivation chain or self-referential constructions

full rationale

The manuscript is an empirical red-teaming evaluation that constructs a functional shopping agent using Gemini-2.5-Flash and the Google ADK framework, then tests two prompt-injection techniques (Branded Whisper Attack and Vault Whisper Attack) for their ability to alter product ranking or extract data. No equations, fitted parameters, uniqueness theorems, or ansatzes appear in the provided text; the central claims rest on direct experimental outcomes rather than any reduction to prior self-citations or definitional equivalence. The methodological choice to treat the constructed agent as a proxy for AP2 is an external assumption subject to independent verification, not a circular step that equates outputs to inputs by construction. This is a standard empirical security study whose validity can be assessed against external benchmarks without internal logical collapse.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

Ledger is based solely on abstract content; no free parameters are described, and the two named attacks are presented as new techniques rather than independently evidenced entities.

axioms (1)
  • domain assumption The Agent Payments Protocol aims to secure agent-led purchases through cryptographically verifiable mandates.
    This premise is stated directly in the abstract as the core security mechanism of AP2.
invented entities (2)
  • Branded Whisper Attack no independent evidence
    purpose: Manipulate product ranking via indirect prompt injection in the agent context.
    Presented as a novel attack technique introduced by the authors.
  • Vault Whisper Attack no independent evidence
    purpose: Extract sensitive user data via prompt injection in the agent context.
    Presented as a novel attack technique introduced by the authors.

pith-pipeline@v0.9.0 · 5687 in / 1409 out tokens · 92111 ms · 2026-05-21T15:13:39.968701+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. SoK: Blockchain Agent-to-Agent Payments

    q-fin.GN 2026-04 unverdicted novelty 7.0

    The first systematization of blockchain-based agent-to-agent payments organizes designs into discovery, authorization, execution, and accounting stages while identifying trust and security gaps.

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages · cited by 1 Pith paper · 3 internal anchors

  1. [1]

    Announcing agent payments protocol (ap2)

    Google Cloud. Announcing agent payments protocol (ap2). https://cloud.google.com/blog/ products/ai-machine-learning/announcing-agents-to-payments-ap2-protocol , 2025. Ac- cessed: 2025-12-10

  2. [2]

    Google agentic commerce

    Google Cloud. Google agentic commerce. https://github.com/google-agentic-commerce/AP2,

  3. [3]

    Accessed: 2025-12-10

  4. [4]

    Fundamentals of building autonomous llm agents.arXiv e-prints, pages arXiv–2510, 2025

    Victor de Lamo Castrillo, Habtom Kahsay Gidey, Alexander Lenz, and Alois Knoll. Fundamentals of building autonomous llm agents.arXiv e-prints, pages arXiv–2510, 2025

  5. [5]

    A2a — a new era of agent interoperability

    Google Inc. A2a — a new era of agent interoperability. https://developers.googleblog.com/en/ a2a-a-new-era-of-agent-interoperability/, 2025. Accessed: 2025-12-10

  6. [6]

    Piguard: Prompt injection guardrail via mitigating overdefense for free

    Hao Li, Xiaogeng Liu, Ning Zhang, and Chaowei Xiao. Piguard: Prompt injection guardrail via mitigating overdefense for free. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 30420–30437, 2025

  7. [7]

    Prompt Injection attack against LLM-integrated Applications

    Yi Liu, Gelei Deng, Yuekang Li, Kailong Wang, Zihao Wang, Xiaofeng Wang, Tianwei Zhang, Yepang Liu, Haoyu Wang, Yan Zheng, et al. Prompt injection attack against llm-integrated applications.arXiv preprint arXiv:2306.05499, 2023

  8. [8]

    Datasentinel: A game-theoretic detection of prompt injection attacks

    Yupei Liu, Yuqi Jia, Jinyuan Jia, Dawn Song, and Neil Zhenqiang Gong. Datasentinel: A game-theoretic detection of prompt injection attacks. In2025 IEEE Symposium on Security and Privacy (SP), pages 2190–2208. IEEE, 2025

  9. [9]

    Agrail: A lifelong agent guardrail with effective and adaptive safety detection.arXiv preprint arXiv:2502.11448, 2025

    Weidi Luo, Shenghong Dai, Xiaogeng Liu, Suman Banerjee, Huan Sun, Muhao Chen, and Chaowei Xiao. Agrail: A lifelong agent guardrail with effective and adaptive safety detection.arXiv preprint arXiv:2502.11448, 2025

  10. [10]

    Red teaming the mind of the machine: A systematic evaluation of prompt injection and jailbreak vulnerabilities in llms.arXiv preprint arXiv:2505.04806, 2025

    Chetan Pathade. Red teaming the mind of the machine: A systematic evaluation of prompt injection and jailbreak vulnerabilities in llms.arXiv preprint arXiv:2505.04806, 2025

  11. [11]

    Red Teaming Language Models with Language Models

    Ethan Perez, Saffron Huang, Francis Song, Trevor Cai, Roman Ring, John Aslanides, Amelia Glaese, Nat McAleese, and Geoffrey Irving. Red teaming language models with language models.arXiv preprint arXiv:2202.03286, 2022

  12. [12]

    Promptarmor: Simple yet effective prompt injection defenses.arXiv preprint arXiv:2507.15219, 2025

    Tianneng Shi, Kaijie Zhu, Zhun Wang, Yuqi Jia, Will Cai, Weida Liang, Haonan Wang, Hend Alzahrani, Joshua Lu, Kenji Kawaguchi, et al. Promptarmor: Simple yet effective prompt injection defenses.arXiv preprint arXiv:2507.15219, 2025

  13. [13]

    Unveiling privacy risks in llm agent memory

    Bo Wang, Weiyi He, Shenglai Zeng, Zhen Xiang, Yue Xing, Jiliang Tang, and Pengfei He. Unveiling privacy risks in llm agent memory. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 25241–25260, 2025. 10

  14. [14]

    GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning

    Zhen Xiang, Linzhi Zheng, Yanjie Li, Junyuan Hong, Qinbin Li, Han Xie, Jiawei Zhang, Zidi Xiong, Chulin Xie, Carl Yang, et al. Guardagent: Safeguard llm agents by a guard agent via knowledge-enabled reasoning.arXiv preprint arXiv:2406.09187, 2024

  15. [15]

    React: Synergizing reasoning and acting in language models

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. InThe eleventh international conference on learning representations, 2022

  16. [16]

    Melon: Provable defense against indirect prompt injection attacks in ai agents.arXiv preprint arXiv:2502.05174, 2025

    Kaijie Zhu, Xianjun Yang, Jindong Wang, Wenbo Guo, and William Yang Wang. Melon: Provable defense against indirect prompt injection attacks in ai agents.arXiv preprint arXiv:2502.05174, 2025. A Diagrams of AP2 Workflow Figure 11: Product Selection 11 Figure 12: Information Gathering Figure 13: Payment Processing 12