pith. sign in

arxiv: 2605.26542 · v4 · pith:2DBH3L6Wnew · submitted 2026-05-26 · 💻 cs.CR · cs.AI

ChainCaps: Composition-Safe Tool-Using Agents via Monotonic Capability Attenuation

Pith reviewed 2026-07-02 23:03 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords tool-using agentspermission launderingcapability budgetscomposition safetymonotonic attenuationexplicit flowsMCP proxy
0
0 comments X

The pith

ChainCaps prevents permission laundering by attaching sink-specific capability budgets that only shrink through tool composition.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Tool-using agents can satisfy every individual tool permission yet still produce unsafe end-to-end results, such as reading private data, summarizing it, and sending the summary outward. ChainCaps closes this gap by giving every value a sink-specific capability budget that composition propagates only by intersection, so authority can stay the same or decrease but never increase. The mechanism runs as a transparent proxy that needs no changes to agents or tool servers. Across 82 tasks and five frontier models the approach lowers attack success from 25-68 percent to 0-4.8 percent while keeping benign completion between 96 and 100 percent. Results hold only when manifests are trusted and all data movement is visible to the proxy.

Core claim

ChainCaps addresses permission laundering with a runtime rule: every value carries a sink-specific capability budget, and tool composition propagates budgets by intersection. A value can preserve or lose authority as it moves through a tool chain, but it cannot gain new authority through composition.

What carries the argument

sink-specific capability budget propagated by intersection, so that authority cannot increase during tool composition

If this is right

  • Attack success rate falls from 25-68% to 0-4.8% on the 82 tasks.
  • Benign task completion remains between 96% and 100%.
  • ChainCaps outperforms scalar-IFC and per-function-isolation baselines in replay experiments.
  • Expert manifests achieve 100% attack blocking while naive manifests achieve only 27.3%.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same intersection rule on budgets could be applied to limit effects in other agent composition settings beyond the tested explicit flows.
  • Automated or improved manifest generation would directly raise the fraction of attacks blocked in practical deployments.

Load-bearing premise

The approach depends on trusted manifests that correctly describe tool effects and on the proxy being able to observe all data movement.

What would settle it

A replay of the 82-task suite in which an attack succeeds under ChainCaps despite accurate manifests and fully visible flows, or in which benign completion falls below 96 percent.

Figures

Figures reproduced from arXiv: 2605.26542 by Haoran Yu, Lifei Liu, Shiqi Yang, Xiaochong Jiang, Yichen Liu, Ziwei Li.

Figure 1
Figure 1. Figure 1: Budget propagation example. A summary combining salary data (display-only) and public news inherits the most restric￾tive budget via intersection. Because the resulting budget permits display but not HTTP sending, the outbound call is blocked while user display is allowed. This monotonic attenuation is the core runtime property of ChainCaps. where op names an effectful operation such as http send, file wri… view at source ↗
Figure 2
Figure 2. Figure 2: ChainCaps proxy architecture. The proxy intercepts every tools/call between the LLM agent and tool server. Steps 1–2 resolve argument dependencies and compute Bagg = Bctx ∩ T x∈D B(x). Step 3 checks whether Req(t, a) ∈ Bagg; if not, it verifies a lineage-bound declassification token before either forwarding (step 4) or blocking (step 4’, dashed). On the response path, step 5 propagates B(y) = Pass(t) ∩ Bag… view at source ↗
Figure 3
Figure 3. Figure 3: Attack success rate across five frontier models. Without defense, ASR ranges from 25% to 68%. With ChainCaps, all tested models fall to ≤5% ASR (Qwen 3.5 reaches 0%), corresponding to an 86–100% relative reduction on this stress-test suite. 4.2. Main Results [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
read the original abstract

Tool-using agents increasingly operate in open-ended deployment environments, where they compose file systems, web APIs, code interpreters, and enterprise services at runtime. This creates a safety gap in tool composition: an agent can satisfy every per-tool permission check and still produce an unsafe end-to-end effect, such as reading a confidential document, summarizing it, and sending the summary to an external endpoint. We call this failure mode permission laundering. ChainCaps addresses it with a runtime rule: every value carries a sink-specific capability budget, and tool composition propagates budgets by intersection. A value can preserve or lose authority as it moves through a tool chain, but it cannot gain new authority through composition. We implement ChainCaps as a transparent MCP proxy that requires no changes to the agent or tool servers. On 82 tasks across five frontier models from three providers, ChainCaps reduces attack success rate from 25-68% to 0-4.8% while preserving 96-100% benign completion. In replay experiments, it also outperforms scalar-IFC and per-function-isolation baselines. Manifest quality is the dominant deployment bottleneck: expert manifests reach 100% attack blocking, while naive manifests fall to 27.3%. Our claims are limited to explicit-flow composition safety under trusted manifests and proxy-visible data movement, a practical gap in deployed tool-using agents today.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper claims that ChainCaps prevents permission laundering in tool-using agents by attaching sink-specific capability budgets to values and propagating them monotonically via intersection during composition. Implemented as a transparent MCP proxy with no changes to agents or tool servers, it is evaluated on 82 tasks across five frontier models from three providers, reducing attack success rate from 25-68% to 0-4.8% while preserving 96-100% benign completion and outperforming scalar-IFC and per-function-isolation baselines in replay experiments. Manifest quality is noted as the dominant bottleneck (expert manifests achieve 100% blocking; naive ones 27.3%), with all claims scoped to explicit-flow composition safety under trusted manifests and proxy-visible data movement.

Significance. If reproducible, the work addresses a genuine and timely gap in end-to-end safety for composed tool use in open agent deployments. The proxy-based, non-intrusive design is a practical strength. The multi-model, multi-provider evaluation is a positive feature. The explicit scoping of claims to trusted manifests is appropriately cautious and avoids overclaiming.

major comments (3)
  1. [Abstract] Abstract: The headline empirical result (ASR reduced to 0-4.8%) is obtained under expert manifests that achieve 100% blocking, yet no data, method, cost estimate, or sensitivity analysis is supplied for producing reliable expert manifests at scale or for the effect of missing sinks; this precondition is load-bearing for any practical transfer of the reported numbers.
  2. [Abstract] Abstract: The 82-task evaluation reports concrete ASR and benign-completion figures but supplies no task definitions, attack scenarios, error bars, statistical tests, or manifest examples, rendering the central empirical claim impossible to verify or reproduce from the given information.
  3. [Abstract] Abstract: The statement that ChainCaps 'outperforms scalar-IFC and per-function-isolation baselines' in replay experiments provides no quantitative deltas, conditions, or per-baseline numbers, so the magnitude and robustness of the improvement cannot be assessed.
minor comments (1)
  1. The abstract could more precisely define 'attack success rate' and 'benign completion' to avoid ambiguity in the reported percentages.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on reproducibility and practical applicability. We address each major comment below and commit to revisions that strengthen the manuscript without altering its core claims.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The headline empirical result (ASR reduced to 0-4.8%) is obtained under expert manifests that achieve 100% blocking, yet no data, method, cost estimate, or sensitivity analysis is supplied for producing reliable expert manifests at scale or for the effect of missing sinks; this precondition is load-bearing for any practical transfer of the reported numbers.

    Authors: We agree this is a substantive gap. The manuscript already identifies manifest quality as the dominant bottleneck and reports the expert vs. naive contrast, but provides no scaling methodology or sensitivity data. We will add a new subsection (likely in Section 3 or 6) detailing the manifest authoring process, estimated human effort per sink, a sensitivity analysis for omitted sinks, and conditions under which expert-level manifests can be produced at scale. revision: yes

  2. Referee: [Abstract] Abstract: The 82-task evaluation reports concrete ASR and benign-completion figures but supplies no task definitions, attack scenarios, error bars, statistical tests, or manifest examples, rendering the central empirical claim impossible to verify or reproduce from the given information.

    Authors: The abstract is intentionally concise, but the referee is correct that the provided information is insufficient for verification. The full manuscript contains task categories and attack descriptions in Section 4 plus manifest examples in the appendix; however, these are not prominent enough. We will expand the evaluation section with explicit task summaries, representative attack prompts, error bars on all reported percentages, and basic statistical comparisons. Manifest examples will be moved into the main body or a dedicated figure. revision: yes

  3. Referee: [Abstract] Abstract: The statement that ChainCaps 'outperforms scalar-IFC and per-function-isolation baselines' in replay experiments provides no quantitative deltas, conditions, or per-baseline numbers, so the magnitude and robustness of the improvement cannot be assessed.

    Authors: We accept the criticism. The replay experiments are described in Section 5, but the abstract and main text lack the per-baseline numbers and deltas. We will add a comparison table (or expanded results paragraph) reporting ASR and benign-completion rates for each baseline under identical replay conditions, together with the absolute and relative improvements achieved by ChainCaps. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper introduces a runtime mechanism (capability budgets propagated by intersection) whose monotonicity property holds by the explicit definition of the propagation rule rather than by any derived prediction or fitted parameter. All reported outcomes are empirical measurements on 82 tasks under stated conditions (trusted manifests, proxy-visible flows); no equations, self-citations, or ansatzes are invoked to justify the central safety claim. Manifest quality is acknowledged as an external precondition, not smuggled into the result. The derivation chain is therefore self-contained as an engineering design plus experimental validation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review yields no visible free parameters or invented entities beyond the core budget concept itself; the trusted-manifest assumption is the main unverified premise.

axioms (1)
  • domain assumption Manifests are trusted and accurately describe tool capabilities
    Paper explicitly limits claims to trusted manifests.

pith-pipeline@v0.9.1-grok · 5788 in / 1052 out tokens · 30447 ms · 2026-07-02T23:03:24.005898+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. AgentFlow: Building Agent Dependency Graphs for Static Analysis of Agent Programs

    cs.SE 2026-07 unverdicted novelty 7.0

    AgentFlow builds a framework-agnostic Agent Dependency Graph from agent program source code to support static analyses such as BOM generation and prompt-to-tool risk detection, evaluated on 5,399 real programs across ...

Reference graph

Works this paper leans on

10 extracted references · 10 canonical work pages · cited by 1 Pith paper · 4 internal anchors

  1. [1]

    URL https: //arxiv.org/abs/2510.21236. Chen, J. and Cong, S. L. Agentguard: Repurposing agen- tic orchestrator for safety evaluation of tool orchestra- tion,

  2. [2]

    arXiv:2503.22738

    URL https://arxiv.org/abs/2503.22738. Costa, M., K¨opf, B., Kolluri, A., Paverd, A., Russinovich, M., Salem, A., Tople, S., Wutschitz, L., and Zanella- B´eguelin, S. Securing ai agents with information-flow control,

  3. [3]

    Securing AI Agents with Information-Flow Control

    URL https://arxiv.org/abs/ 2505.23643. Garby, Z., Gordon, A. D., and Sands, D. The llmbda calcu- lus: Ai agents, conversations, and information flow,

  4. [4]

    Ji, Z., Wu, D., Jiang, W., Ma, P., Li, Z., Gao, Y ., Wang, S., and Li, Y

    URLhttps://arxiv.org/abs/2602.20064. Ji, Z., Wu, D., Jiang, W., Ma, P., Li, Z., Gao, Y ., Wang, S., and Li, Y . Taming various privilege escalation in llm-based agent systems: A mandatory access control framework,

  5. [5]

    Taming various privilege escalation in LLM-based agent systems: A mandatory access control framework,

    URL https://arxiv.org/abs/ 2601.11893. Jiang, X., Yang, S., Yang, W., Liu, Y ., and Ji, C. Sok: A taxonomy of attack vectors and defense strategies for agentic supply chain runtime,

  6. [6]

    SOK: A Taxonomy of Attack Vectors and Defense Strategies for Agentic Supply Chain Runtime

    URL https:// arxiv.org/abs/2602.19555. Kim, J., Choi, W., and Lee, B. Prompt flow integrity to prevent privilege escalation in llm agents,

  7. [7]

    Ruan, Y ., Dong, H., Wang, A., Pitis, S., Zhou, Y ., Ba, J., Dubois, Y ., Maddison, C

    URL https://arxiv.org/abs/2503.15547. Ruan, Y ., Dong, H., Wang, A., Pitis, S., Zhou, Y ., Ba, J., Dubois, Y ., Maddison, C. J., and Hashimoto, T. Identi- fying the risks of lm agents with an lm-emulated sand- box,

  8. [8]

    Chainfuzzer: Greybox fuzzing for workflow-level multi-tool vulnerabilities in LLM agents,

    URL https://arxiv.org/ abs/2603.12614. Xing, W., Qi, Z., Qin, Y ., Li, Y ., Chang, C., Yu, J., Lin, C., Xie, Z., and Han, M. Mcp-guard: A multi-stage defense-in-depth framework for securing model context protocol in agentic ai,

  9. [9]

    org/abs/2508.10991

    URL https://arxiv. org/abs/2508.10991. Zhan, Q., Liang, Z., Ying, Z., and Kang, D. Injeca- gent: Benchmarking indirect prompt injections in tool- integrated large language model agents,

  10. [10]

    URL https://arxiv.org/abs/2403.02691. 7