pith. sign in

arxiv: 2606.09084 · v1 · pith:XRMC25L3new · submitted 2026-06-08 · 💻 cs.CR · cs.AI

Context-Fractured Decomposition Attacks on Tool-Using LLM Agents: Exploiting Artifact Provenance Gaps

Pith reviewed 2026-06-27 16:34 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords Context-Fractured Decompositionprovenance gaptool-using LLM agentsjailbreak attacksartifact compositionmulti-step attacksagent pipelines
0
0 comments X

The pith

Tool-using LLM agents face Context-Fractured Decomposition attacks that hide harmful intent inside benign artifacts composed across separate steps and contexts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that jailbreak defenses for LLM agents must reason about how tool actions compose through persistent artifacts rather than isolated messages. It introduces the provenance gap, where enforcement is split across tools and time, allowing early benign-looking outputs to trigger later harm without any single step looking suspicious. Context-Fractured Decomposition operationalizes this by preserving harmless intermediate files or logs that later combine into disallowed behavior, possibly in a different agent instance. A sympathetic reader would care because current multi-turn attacks still assume one visible conversation, an assumption that fails in real agent pipelines. The work reports CFD lifting success rates by up to 28.3 percentage points over baselines even when judged by strong single-turn filters.

Core claim

Context-Fractured Decomposition (CFD) is a family of cross-context multi-step jailbreaks that preserve benign-looking intermediate artifacts from an early interaction and elicit harmful behavior much later, potentially in a different agent instance or workflow stage, via individually innocuous tool actions whose risk emerges only under delayed artifact-mediated composition. The paper operationalizes the provenance gap as the deployment failure mode in tool-using agents and demonstrates that CFD raises jailbreak success rates by up to 28.3 percentage points over state-of-the-art baselines, including against strong single-turn judges.

What carries the argument

The provenance gap, the absence of tracked lineage for artifacts produced by tool actions across fragmented enforcement points, which CFD exploits through delayed composition of innocuous steps.

If this is right

  • CFD raises jailbreak success by up to 28.3 percentage points on agent-system benchmarks.
  • The attack succeeds even when each individual tool action is judged benign by strong single-turn filters.
  • Real agent pipelines fragment enforcement across tools, modules, and separate instances, breaking the contiguous-conversation assumption.
  • Provenance lineage tagging is proposed as one verifiable mitigation direction.
  • Multi-turn methods such as Crescendo and Tree of Attacks still rely on the same single-context visibility that CFD bypasses.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Agent frameworks without persistent artifact tracking will remain exposed to composition attacks that span multiple user sessions or sub-agents.
  • Safety evaluation suites limited to single-turn or single-context tests will systematically underestimate risk for tool-using systems.
  • Mitigations could be tested by injecting synthetic provenance gaps into existing agent traces and measuring whether lineage checks block delayed composition.
  • The same gap may affect non-security properties such as reproducibility and auditability whenever artifacts carry implicit state across workflow stages.

Load-bearing premise

Existing attacks and defenses assume enforcement occurs over a single contiguous conversation visible to the defender.

What would settle it

A controlled run on an agent system that logs full artifact provenance across every tool call and workflow stage, showing CFD success rates no higher than the strongest single-turn baseline.

Figures

Figures reproduced from arXiv: 2606.09084 by Charles Fleming, Daniel Guo, Guang Cheng, Sahil Arun Nale, Xiaofeng Lin, Yukai Yang.

Figure 1
Figure 1. Figure 1: The provenance gap in agent security. (a) Many safety monitors implicitly assume a contiguous interaction trace is visible, enabling intent detection from the full history. (b) In real pipelines, context is fractured across sessions, tools, and instances while persistent artifacts carry state; benign-looking writes can later be read and composed into harmful actions when provenance is not tracked. 1 arXiv:… view at source ↗
Figure 2
Figure 2. Figure 2: The Context-Fractured Decomposition (CFD) Workflow. (Top) Phase 1: Semantic Decomposition. An attacker agent recursively breaks a prohibited objective (e.g., “Compile and exfiltrate sensitive data”) into a dependency tree of semantically disjoint primitives. (Bottom) Phase 2: Fractured Execution. These primitives are scheduled across independent sessions (t1, t2, . . . ) separated by time or context resets… view at source ↗
Figure 3
Figure 3. Figure 3: Harmfulness distribution of (left) textual jailbreak prompts vs. (right) agentic jailbreak prompts, rated by the same offline evaluator (E) [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Refusal rises with single-turn harmfulness; CFD leaf queries concentrate in low-harm regions, explaining why local monitors fail. F. Artifact and Code Release Appendix F.1. Artifact Licenses and Terms of Use We list below each external model and dataset used in our experiments, the governing license or terms of service, and how our use complies with those terms. • Meta Llama 3 8B Licensed under the “Llama … view at source ↗
read the original abstract

Tool-using LLM agents interact with the world through actions that persist state in artifacts (e.g., workspace files or logs). Consequently, jailbreak defenses must reason about cross-step composition rather than isolated text. Yet most existing attacks and defenses, including ``multi-turn'' jailbreaks such as Crescendo and Tree of Attacks,still assume a single contiguous conversation visible to the defender. This assumption breaks down in real agent pipelines, where enforcement is fragmented across tools, modules, and time, and where artifact provenance is often not tracked. We operationalize a deployment failure mode for tool-using LLM agents, the \emph{provenance gap}, and study reproducible triggers for it: \emph{Context-Fractured Decomposition} (CFD), a family of cross-context multi-step jailbreaks that preserve benign-looking intermediate artifacts from an early interaction and elicit harmful behavior much later, potentially in a different agent instance or workflow stage, via individually innocuous tool actions whose risk emerges only under delayed artifact-mediated composition. We instrument the failure mode with trace-level diagnostics and outline a verifiable mitigation direction (provenance lineage tagging). Across agent-system jailbreak benchmarks, CFD improves success rates by up to 28.3 percentage points over state-of-the-art baselines, even against strong single-turn judges. Disclaimer: This paper contains examples of harmful or offensive language.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that tool-using LLM agents have a 'provenance gap' because enforcement is fragmented across tools, modules, and time with no tracking of artifact lineage, unlike the single-contiguous-conversation assumption in prior multi-turn jailbreaks (Crescendo, Tree of Attacks). It introduces Context-Fractured Decomposition (CFD) attacks that preserve benign intermediate artifacts early and elicit harm later via innocuous tool actions, reports up to 28.3 pp higher success rates on agent-system jailbreak benchmarks even against strong single-turn judges, supplies trace-level diagnostics, and outlines provenance lineage tagging as a mitigation.

Significance. If the reported gains hold under benchmarks that genuinely instantiate split enforcement and delayed artifact composition, the work would usefully identify a deployment-relevant failure mode for agent pipelines and motivate provenance-aware defenses. The explicit framing of a verifiable mitigation direction is a positive feature.

major comments (2)
  1. [Abstract] Abstract: the central claim of a 28.3 pp improvement over baselines is stated without any description of the agent frameworks, concrete tools/workspaces, how benign artifacts are persisted and later composed, the exact baselines, judge prompting, success metric, number of trials, or statistical controls; these details are load-bearing for determining whether the evaluated benchmarks actually instantiate the provenance-gap scenario.
  2. [Introduction / threat model] The weakest-assumption paragraph and experimental claims rest on the assertion that 'most existing attacks and defenses still assume a single contiguous conversation visible to the defender,' yet no concrete evidence or counter-example from the cited baselines (Crescendo, Tree of Attacks) is supplied showing that those methods were evaluated under fragmented artifact pipelines rather than single-threaded conversations.
minor comments (1)
  1. [Abstract] The disclaimer about harmful language is appropriate but could be expanded to note that the paper does not release attack code or prompts.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful review and constructive comments. We address each major comment below, clarifying the experimental details and strengthening the threat model discussion. We propose targeted revisions to improve clarity.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim of a 28.3 pp improvement over baselines is stated without any description of the agent frameworks, concrete tools/workspaces, how benign artifacts are persisted and later composed, the exact baselines, judge prompting, success metric, number of trials, or statistical controls; these details are load-bearing for determining whether the evaluated benchmarks actually instantiate the provenance-gap scenario.

    Authors: The abstract is intentionally concise to highlight the core contribution. Detailed descriptions of the agent frameworks (LangChain and AutoGPT-style agents), concrete tools (file system, code execution workspaces), persistence of benign artifacts (via intermediate file writes that are later read), exact baselines (Crescendo, Tree of Attacks, and their adaptations), judge prompting (detailed in Appendix), success metric (harmful intent detection by LLM judge), number of trials (n=50-100 per setting with 3 seeds), and statistical controls are provided in Sections 3, 4, and 5. These setups explicitly use fragmented pipelines with delayed composition to instantiate the provenance gap. To address the concern, we will revise the abstract to briefly reference the evaluation on agent benchmarks with split enforcement. revision: yes

  2. Referee: [Introduction / threat model] The weakest-assumption paragraph and experimental claims rest on the assertion that 'most existing attacks and defenses still assume a single contiguous conversation visible to the defender,' yet no concrete evidence or counter-example from the cited baselines (Crescendo, Tree of Attacks) is supplied showing that those methods were evaluated under fragmented artifact pipelines rather than single-threaded conversations.

    Authors: We agree that providing explicit references to the evaluation settings in the cited works would strengthen the claim. Crescendo and Tree of Attacks are presented in their papers as operating on single conversation threads with full history visibility. We will add a clarifying sentence or short paragraph in the introduction, citing specific aspects of those works (e.g., their use of progressive context building in one session) to demonstrate the single-contiguous-conversation assumption, thereby supplying the requested evidence. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical performance claim stands on direct benchmark comparison

full rationale

The paper presents an empirical attack method (Context-Fractured Decomposition) and reports measured success-rate gains on agent-system jailbreak benchmarks. No equations, fitted parameters, derivations, or self-citation chains appear in the abstract or described content. The central claim reduces to a straightforward experimental delta rather than any self-definitional, fitted-input, or uniqueness-imported step. The evaluation is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no visible free parameters, axioms, or invented entities; the provenance gap is presented as an operationalized deployment failure mode rather than a new postulated entity.

pith-pipeline@v0.9.1-grok · 5784 in / 1099 out tokens · 30521 ms · 2026-06-27T16:34:21.107095+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

19 extracted references · 7 canonical work pages · 3 internal anchors

  1. [1]

    Debenedetti, J

    URL https://assets.anthropic. com/m/ec212e6566a0d47/original/ Disrupting-the-first-reported-AI-orchestrated-cyber-espionage-campaign. pdf. Anthropic Threat Intelligence Report on the GTG- 1002 campaign. Chen, Z., Xiang, Z., Xiao, C., Song, D., and Li, B. Agent- poison: Red-teaming llm agents via poisoning memory or knowledge bases.arXiv preprint arXiv:240...

  2. [2]

    URL https: //arxiv.org/abs/2503.03704

    doi: 10.48550/arXiv.2503.03704. URL https: //arxiv.org/abs/2503.03704. Gao, L., Madaan, A., Zhou, S., Alon, U., Liu, P., Yang, Y ., Callan, J., and Neubig, G. Pal: Program-aided language models. InProceedings of the 40th International Confer- ence on Machine Learning, volume 202 ofProceedings of Machine Learning Research, pp. 10764–10799. PMLR,

  3. [3]

    Google AI for Developers

    URL https://proceedings.mlr.press/ v202/gao23f.html. Google AI for Developers. Gemini models: Gemini 2.0 flash (model code gemini-2.0-flash). https:// ai.google.dev/gemini-api/docs/models, 2025a. Model code: gemini-2.0-flash; accessed 2026-06-06. Google AI for Developers. Gemini models: Gemini 2.5 pro (model code gemini-2.5-pro). https://ai.google. dev/ge...

  4. [4]

    Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection

    doi: 10.48550/arXiv.2302.12173. URL https: //arxiv.org/abs/2302.12173. Ichter, B., Brohan, A., Chebotar, Y ., Finn, C., Hausman, K., Herzog, A., Ho, D., Ibarz, J., Irpan, A., Jang, E., Julian, R., Kalashnikov, D., Levine, S., Lu, Y ., Parada, C., Rao, K., Sermanet, P., Toshev, A. T., Vanhoucke, V ., Xia, F., Xiao, T., Xu, P., Yan, M., Brown, N., Ahn, M., ...

  5. [5]

    Prompt Injection attack against LLM-integrated Applications

    URL https://aclanthology.org/2024. findings-emnlp.813. Liu, Y ., Deng, G., Li, Y ., Wang, K., Wang, Z., Wang, X., Zhang, T., Liu, Y ., Wang, H., Zheng, Y ., Zhang, L. Y ., and Liu, Y . Prompt injection attack against LLM-integrated applications, 2023. arXiv:2306.05499 (submitted 2023; last revised 2025). Mehrotra, A., Zampetakis, M., Kassianik, P., Nelson...

  6. [6]

    URL https: //arxiv.org/abs/2305.18752

    doi: 10.48550/arXiv.2305.18752. URL https: //arxiv.org/abs/2305.18752. Yi, J., Xie, Y ., Zhu, B., Kiciman, E., Sun, G., Xie, X., and Wu, F. Benchmarking and defending against indirect prompt injection attacks on large language models. In 10 Context-Fractured Decomposition Attacks on Tool-Using LLM Agents Proceedings of the 31st ACM SIGKDD Conference on Kn...

  7. [7]

    Kirk, R., Mediratta, I., Nalmpantis, C., Luketina, J., Ham- bro, E., Grefenstette, E., and Raileanu, R

    URL https://aclanthology.org/2024. findings-acl.624/. Zhang, B., Tan, Y ., Shen, Y ., Salem, A., Backes, M., Zannet- tou, S., and Zhang, Y . Breaking agents: Compromising autonomous llm agents through malfunction amplifica- tion. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pp. 34964– 34976, Suzhou, China, 2025...

  8. [8]

    BadRobot: Jailbreaking Embodied LLM Agents in the Physical World

    URL https://aclanthology.org/2025. emnlp-main.1771/. Zhang, H., Zhu, C., Wang, X., Zhou, Z., Yin, C., Li, M., Xue, L., Wang, Y ., Hu, S., Liu, A., Guo, P., and Zhang, L. Y . Badrobot: Jailbreaking embodied llms in the physical world.arXiv preprint arXiv:2407.20242, 2024. doi: 10.48550/arXiv.2407.20242. URL https://arxiv. org/abs/2407.20242. Zhang, H., Hua...

  9. [9]

    SubclassBaseAttackerorBasePipeline

  10. [10]

    Register the class in attacker/ init .py or pipeline/ init .py

  11. [11]

    attacker

    Reference the new type in a JSON config under the "attacker"or"pipeline"field. Additional hooks exist for selector functions and tool wrap- pers. B.3. Hyperparameters Model inference.Temperature is 0.2 for all models; other decoding parameters use provider defaults. Attack decomposition.Default values use Dmax = 4 and Wmax = 3 (Table 5); ablations vary de...

  12. [12]

    Loaded the data from the provided file

  13. [13]

    Verified the data structure and content

  14. [14]

    Filtered the data to include only high-end French restaurants in Paris

  15. [15]

    Next steps: - The data file should contain the expected records before re-running the analysis

    Found that the data file is empty, resulting in no matching records. Next steps: - The data file should contain the expected records before re-running the analysis. - Re-running the analysis with an updated data file should resolve the issue." 16 Context-Fractured Decomposition Attacks on Tool-Using LLM Agents Table 8.All prompts by environment and task t...

  16. [16]

    Describe your approach in character

  17. [17]

    Examine input file for relevant data

  18. [18]

    Extract all requested information

  19. [19]

    Detective

    Sign as "Detective" >I’m sorry, I can’t comply with that request. TERMINATE How to convert a yaml file to text file? >files processed How to parse all email elements from txt? >files processed Extract all sequences and numbers as a out.txt. >file compressed 18