Free-Riding the Agentic Web: A Systematic Security Analysis of x402 Payments
Pith reviewed 2026-06-28 22:12 UTC · model grok-4.3
The pith
x402 payments harbor four flaw classes enabling up to 100% resource leakage plus a structural pricing limit of √(1+Θ) manipulation gap.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Through invariant-based analysis the x402 stack contains four flaw classes—cross-resource substitution, duplicate-settlement race, allowance overdraft, and denial of settlement—that produce resource leakage up to 100% in official SDKs and production deployments. For pay-per-token schemes the paper proves a structural limit: no output-only pricing can be both fair to honest users and bounded against inflation of hidden thinking tokens, with the price of fairness being a √(1+Θ) manipulation gap. Proposed per-flaw mitigations together with a defense triple deliver provable guarantees that cut per-call reasoning cost by 47% and invert attacker leverage from 8.7× to 0.9× at 2.8% overhead.
What carries the argument
Five invariants grounded in protocol specifications, literature, and vendor expectations that organize the analysis and map every violation to its responsible layer.
If this is right
- Official SDKs and production deployments reach resource-leakage ratios up to 100% under the four identified flaw classes.
- A defense triple with provable guarantees reduces per-call reasoning cost by 47% and reverses attacker leverage from 8.7× to 0.9× at 2.8% overhead.
- Per-flaw mitigations address cross-resource substitution, duplicate-settlement race, allowance overdraft, and denial of settlement individually.
- Pay-per-token pricing carries an unavoidable √(1+Θ) manipulation gap when restricted to output-only schemes.
Where Pith is reading between the lines
- State-synchronization mismatches between synchronous web requests and asynchronous blockchain settlement may appear in other payment protocols that combine the two.
- The quantitative √(1+Θ) bound offers a concrete metric that could be applied when evaluating pricing designs in additional token-based services.
- The disclosed mitigations could be tested for transferability to related systems that bridge HTTP semantics with on-chain finality.
Load-bearing premise
The five invariants used to organize the analysis are correctly and completely grounded in the protocol specifications, literature, and vendor expectations, allowing every violation to be resolved to the responsible layer without unexamined interactions.
What would settle it
Observing zero leakage across all four flaw classes in a production x402 deployment that follows official SDKs, or exhibiting an output-only pricing scheme that achieves both user fairness and bounded inflation without a √(1+Θ) gap.
Figures
read the original abstract
The x402 protocol has crossed from prototype to infrastructure for the agentic web, driving 130 million all-time transactions and embedded in Google Cloud, Cloudflare, and Stripe. Yet bridging synchronous HTTP requests with asynchronous blockchain finality creates state-synchronization challenges, and x402's security has so far been examined only in piecemeal vendor disclosures. It is moreover not one artefact but a stack of an HTTP semantic, per-chain schemes, and a long tail of SDK and deployment choices whose required guarantees prior work has not established. We perform a systematic security analysis organized around five invariants grounded in specifications, literature, and vendor expectations, resolving every violation to the responsible layer. We identify four flaw classes: cross-resource substitution, duplicate-settlement race (independently corroborated by subsequent third-party reports), allowance overdraft, and denial of settlement. Against official SDKs and a production deployment, these reach resource-leakage ratios up to 100%. For pay-per-token scheme we prove a structural limit: no output-only pricing can be both fair to honest users and bounded against inflation of the hidden "thinking" tokens, the price of fairness being a $\sqrt{1+\Theta}$ manipulation gap. We propose per-flaw mitigations and a defense triple with provable guarantees, cutting per-call reasoning cost by 47% and inverting attacker leverage from 8.7$\times$ to 0.9$\times$ at only 2.8% overhead. All findings have been disclosed.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript performs a systematic security analysis of the x402 protocol, which bridges HTTP requests with blockchain settlements and has seen 130 million transactions with adoption in Google Cloud, Cloudflare, and Stripe. The analysis is organized around five invariants drawn from protocol specifications, literature, and vendor expectations. It identifies four flaw classes (cross-resource substitution, duplicate-settlement race, allowance overdraft, denial of settlement) that produce resource-leakage ratios up to 100% when tested against official SDKs and a production deployment. For pay-per-token pricing it supplies a structural proof that no output-only scheme can simultaneously be fair to honest users and bounded against hidden-token inflation, with the fairness price being a √(1+Θ) manipulation gap. Mitigations and a defense triple are proposed that reduce per-call reasoning cost by 47% and invert attacker leverage from 8.7× to 0.9× at 2.8% overhead.
Significance. If the invariants prove exhaustive and the empirical and proof results hold, the work is significant for securing high-volume agentic payment infrastructure. Strengths include the empirical leakage measurements on real SDKs and deployments, the structural proof for the pricing limit, and the quantified defense triple with provable guarantees. These elements supply both diagnostic coverage and concrete, low-overhead countermeasures for a protocol already embedded in production systems.
major comments (1)
- [§4 (Invariants)] §4 (Invariants): The five invariants are presented as complete and sufficient to resolve every violation to its responsible layer, thereby establishing that the four flaw classes are exhaustive and that the reported 100% leakage ratios plus the √(1+Θ) gap fully characterize the attack surface. No explicit enumeration or formal argument is supplied showing that all interactions among HTTP semantics, per-chain settlement races, and SDK-specific state are covered; an omitted cross-layer interaction would render the partition incomplete and undermine the central claims.
minor comments (1)
- [Abstract] Abstract: the parenthetical note that the duplicate-settlement race was 'independently corroborated by subsequent third-party reports' should include the specific citations so readers can locate the corroboration.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on the completeness argument for our invariants. We address the major comment below and will revise the manuscript to strengthen the presentation of coverage.
read point-by-point responses
-
Referee: [§4 (Invariants)] §4 (Invariants): The five invariants are presented as complete and sufficient to resolve every violation to its responsible layer, thereby establishing that the four flaw classes are exhaustive and that the reported 100% leakage ratios plus the √(1+Θ) gap fully characterize the attack surface. No explicit enumeration or formal argument is supplied showing that all interactions among HTTP semantics, per-chain settlement races, and SDK-specific state are covered; an omitted cross-layer interaction would render the partition incomplete and undermine the central claims.
Authors: We acknowledge that §4 grounds the invariants in the x402 specification, HTTP and blockchain security literature, and vendor expectations but does not supply an explicit enumeration or formal completeness argument for every possible cross-layer interaction. The analysis instead demonstrates coverage by deriving each invariant from the protocol's core state partitions (HTTP request semantics, asynchronous settlement finality, and SDK-managed allowances/nonces) and validating the resulting flaw classes through concrete attacks on official SDKs and a production deployment. To address the concern directly, the revised manuscript will expand §4 with a table that enumerates the principal interaction classes (HTTP header vs. on-chain nonce races, allowance state vs. duplicate settlement, cross-resource substitution across SDK state machines) and provides a short argument that any unlisted interaction reduces to one of the four identified flaw classes. This addition clarifies the partition without altering the empirical leakage measurements or the structural pricing proof. revision: yes
Circularity Check
No circularity; analysis grounded externally
full rationale
The paper's derivation organizes the security analysis around five invariants that are stated to be grounded in external protocol specifications, literature, and vendor expectations rather than derived from the paper's own findings. Flaw classes are validated against official SDKs and a production deployment, and the pay-per-token structural limit is presented as an independent mathematical proof with no reduction to fitted inputs or self-citations. No self-definitional steps, fitted predictions renamed as results, or load-bearing self-citation chains appear in the text. The chain is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 1 Pith paper
-
Can Trustless Agents Be Trusted? An Empirical Study of the ERC-8004 Decentralized AI Agent Ecosystem
First empirical study of ERC-8004 finds identity registries mostly inactive and reputation system manipulable with 59-90% of reviewers showing coordinated Sybil behavior, leaving most agents without valid feedback aft...
Reference graph
Works this paper leans on
-
[1]
Launching the x402 foundation with coinbase, and support for x402 transactions, 2025
Will Allen, Cam Whiteside, Rohin Lohe, and Steve James. Launching the x402 foundation with coinbase, and support for x402 transactions, 2025. Online at: https://blog.cloudflare.com/x402/
2025
-
[2]
Agentharm: A benchmark for measuring harmfulness of LLM agents
Maksym Andriushchenko, Alexandra Souly, Mateusz Dziemian, Derek Duenas, Maxwell Lin, Justin Wang, Dan Hendrycks, Andy Zou, J Zico Kolter, Matt Fredrik- son, Yarin Gal, and Xander Davies. Agentharm: A benchmark for measuring harmfulness of LLM agents. InThe Thirteenth International Conference on Learning Representations, 2025
2025
-
[3]
What is the model context protocol (mcp)?,
Anthropic. What is the model context protocol (mcp)?,
-
[4]
io/docs/getting-started/intro
Online at: https://modelcontextprotocol. io/docs/getting-started/intro
-
[5]
A formal security analysis of the w3c web payment apis: Attacks and verification
Quoc Huy Do, Pedram Hosseyni, Ralf Küsters, Guido Schmitz, Nils Wenzler, and Tim Würtele. A formal security analysis of the w3c web payment apis: Attacks and verification. In2022 IEEE Symposium on Security and Privacy (SP), 2022
2022
-
[6]
Dune. x402_tx_by_month, 2026. Online at: https: //dune.com/queries/6212622
-
[11]
Introducing x402: a new standard for internet-native payments,
Dan Kim Erik Reppel, Nemil Dalal. Introducing x402: a new standard for internet-native payments,
-
[12]
Online at: https://www.coinbase.com/ developer-platform/discover/launches/x402
-
[13]
L402: Lightning http 402 proto- col, 2025
Lightning Labs. L402: Lightning http 402 proto- col, 2025. Online at: https://docs.lightning. engineering/the-lightning-network/l402
2025
-
[14]
Toward understanding se- curity issues in the model context protocol ecosystem,
Xiaofan Li and Xing Gao. Toward understanding se- curity issues in the model context protocol ecosystem,
-
[15]
URL: https://arxiv.org/abs/2510.16558, arXiv:2510.16558
work page internal anchor Pith review Pith/arXiv arXiv
-
[16]
Messy states of wiring: Vulnerabilities in emerging personal payment systems
Jiadong Lou, Xu Yuan, and Ning Zhang. Messy states of wiring: Vulnerabilities in emerging personal payment systems. In30th USENIX Security Symposium (USENIX Security 21). USENIX Association, 2021
2021
-
[17]
All your shops are belong to us: Security weaknesses in e-commerce platforms
Rohan Pagey, Mohammad Mannan, and Amr Youssef. All your shops are belong to us: Security weaknesses in e-commerce platforms. InProceedings of the ACM Web Conference 2023, WWW ’23. Association for Comput- ing Machinery, 2023
2023
-
[18]
x402, 2025
Coinbase Developer Platform. x402, 2025. Online at: https://www.x402.org/
2025
-
[19]
Powering ai commerce with the new agent payments protocol (ap2), 2025
Rao Surapaneni Stavan Parikh. Powering ai commerce with the new agent payments protocol (ap2), 2025. Online at: https://cloud.google. com/blog/products/ai-machine-learning/ announcing-agents-to-payments-ap2-protocol
2025
-
[20]
Detecting logic vulnerabilities in e-commerce applications
Fangqi Sun, Liang Xu, and Zhendong Su. Detecting logic vulnerabilities in e-commerce applications. In NDSS, 2014
2014
-
[21]
Native internet payments, 2025
thirdweb. Native internet payments, 2025. Online at: https://thirdweb.com/x402
2025
-
[22]
Visa introduces trusted agent protocol: An ecosystem-led framework for ai commerce
Visa Inc. Visa introduces trusted agent protocol: An ecosystem-led framework for ai commerce. https: //investor.visa.com/news/news-details/2025/ Visa-Introduces-Trusted-Agent-Protocol-An-Ecosystem-Led-Framework-for-AI-Commerce/ , 2025
2025
-
[23]
How to shop for free online – security anal- ysis of cashier-as-a-service based web stores
Rui Wang, Shuo Chen, XiaoFeng Wang, and Shaz Qadeer. How to shop for free online – security anal- ysis of cashier-as-a-service based web stores. In2011 IEEE Symposium on Security and Privacy, 2011
2011
-
[24]
Integuard: Toward automatic protection of third- party web service integrations
Luyi Xing, Yangyi Chen, XiaoFeng Wang, and Shuo Chen. Integuard: Toward automatic protection of third- party web service integrations. InNetwork & Dis- tributed System Security Symposium (NDSS), 2013
2013
-
[25]
Show me the money! finding flawed implementations of third-party in-app payment in android apps
Wenbo Yang, Yuanyuan Zhang, Juanru Li, Hui Liu, Qing Wang, Yueheng Zhang, and Dawu Gu. Show me the money! finding flawed implementations of third-party in-app payment in android apps. InNDSS, 2017. 14 Ethical Considerations This research investigates security vulnerabilities in financial protocols and AI infrastructure. To uphold ethical standards and pre...
2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.