pith. sign in

arxiv: 2606.09692 · v1 · pith:4JL42VNZnew · submitted 2026-06-08 · 💻 cs.CR · cs.AI

Observability for Delegated Execution in Agentic AI Systems

Pith reviewed 2026-06-27 16:18 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords observabilitydelegationagentic AIaudit logsLLM agentsforensic reconstructionexecution traces
0
0 comments X

The pith

Standard audit logs and execution traces cannot distinguish delegation scope in agentic AI systems because the same traces can arise from incompatible delegation assignments.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that delegation-scoped execution in LLM-based agentic systems is structurally underdetermined from existing observables. Agents dynamically choose tools, reorder actions, and spawn sub-agents, which fragments and interleaves traces so that multiple delegation assignments produce identical logs. Existing audit, tracing, and security schemas therefore lack the semantics needed for reliable reconstruction of what actions occurred under a given delegation across heterogeneous tools. The authors introduce an observability substrate that binds delegation context at execution time through a lightweight gateway and common information model. This binding makes cross-tool delegation-scoped reconstruction possible via direct forensic queries rather than heuristic correlation.

Core claim

Delegation-scoped execution is not identifiable from standard observables because audit logs and execution traces can be identical under multiple incompatible delegation assignments; an agent-aware observability substrate consisting of a lightweight gateway and common information model binds delegation context at execution time and thereby enables reliable cross-tool reconstruction without heuristic time-window correlation.

What carries the argument

Agent-aware observability substrate (lightweight gateway plus common information model that binds delegation context at execution time).

If this is right

  • Direct forensic queries become possible on delegation-scoped footprints instead of relying on post-hoc correlation.
  • Reconstruction works across heterogeneous tools and systems once the common information model is adopted.
  • Individual actions remain authorized and logged while the delegation assignment itself becomes attributable.
  • The approach targets attribution and footprint reconstruction rather than intent or reasoning inference.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Security monitoring pipelines would need to adopt the common information model at the gateway layer to gain the reconstruction capability.
  • The same substrate could support compliance reporting that distinguishes actions taken under different user or agent delegations.
  • Performance overhead of the gateway becomes a practical limit on adoption in high-throughput agent deployments.

Load-bearing premise

Binding delegation context at execution time will enable reliable reconstruction without introducing new fragmentation or performance problems that defeat the purpose.

What would settle it

A set of identical execution traces generated under two different delegation assignments that the proposed gateway and model still cannot separate into unique delegation scopes.

read the original abstract

Delegation-scoped execution is not identifiable from standard observables: audit logs and execution traces can be identical under multiple incompatible delegation assignments. This gap is especially acute in LLM-based agentic systems, where agents dynamically select tools, vary execution sequences across runs for the same instruction, and spawn cooperating sub-agents. These dynamics fragment and interleave traces, making delegation-scoped reconstruction from causal structure alone structurally underdetermined. Although individual actions are authorized and logged, existing audit, tracing, and security schemas lack the semantics to reconstruct what actions occurred under a given delegation across heterogeneous systems. We focus on delegation-scoped attribution and access/share footprint reconstruction, not intent inference or reasoning reconstruction. We present an agent-aware observability substrate consisting of a lightweight gateway and a common information model that binds delegation context at execution time. This enables reliable cross-tool delegation-scoped reconstruction and direct forensic queries without heuristic time-window correlation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript claims that delegation-scoped execution in LLM-based agentic systems cannot be reconstructed from standard audit logs and execution traces because these observables can be identical under multiple incompatible delegation assignments; dynamic tool selection, variable execution sequences, and sub-agent spawning fragment and interleave traces, rendering reconstruction from causal structure alone structurally underdetermined. Existing schemas lack the necessary semantics for delegation-scoped attribution and access/share footprint reconstruction. The authors propose an agent-aware observability substrate consisting of a lightweight gateway and common information model that binds delegation context at execution time to enable reliable cross-tool reconstruction and direct forensic queries.

Significance. If the proposed substrate can be shown to bind context without introducing fragmentation or performance overhead, the work would address a genuine gap in attribution for delegated execution in heterogeneous agentic systems, moving beyond heuristic correlation to direct, semantics-aware reconstruction. The paper receives credit for clearly scoping the problem to attribution rather than intent inference and for framing the issue in terms of structural underdetermination rather than implementation details.

major comments (2)
  1. [Abstract] Abstract: the central claim that 'audit logs and execution traces can be identical under multiple incompatible delegation assignments' and that reconstruction is 'structurally underdetermined' is asserted without any concrete example, formal definition of the observable signature, or argument showing why additional context (process trees, resource handles, timing) cannot disambiguate; this absence makes the premise that existing schemas 'lack the semantics' an untested assertion rather than a demonstrated gap.
  2. [Abstract] Abstract: the proposed lightweight gateway and common information model are described only at the architectural level with no specification of the information model, binding mechanism, or query interface, leaving open whether the approach avoids the very fragmentation and performance issues it aims to solve.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed review. We address each major comment below and indicate planned revisions to the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that 'audit logs and execution traces can be identical under multiple incompatible delegation assignments' and that reconstruction is 'structurally underdetermined' is asserted without any concrete example, formal definition of the observable signature, or argument showing why additional context (process trees, resource handles, timing) cannot disambiguate; this absence makes the premise that existing schemas 'lack the semantics' an untested assertion rather than a demonstrated gap.

    Authors: We agree that the abstract would benefit from a concrete example to illustrate the central claim. In revision we will add a brief illustrative scenario (e.g., two delegation assignments producing identical tool-call and log sequences due to dynamic sub-agent spawning). The full manuscript already contains a formal argument for structural underdetermination (including why process trees, resource handles, and timing fail to resolve ambiguity under variable execution sequences), but we will ensure the abstract explicitly summarizes this argument rather than merely asserting the gap. revision: yes

  2. Referee: [Abstract] Abstract: the proposed lightweight gateway and common information model are described only at the architectural level with no specification of the information model, binding mechanism, or query interface, leaving open whether the approach avoids the very fragmentation and performance issues it aims to solve.

    Authors: The manuscript presents the substrate at an architectural level to focus on the novel delegation-scoped semantics. We acknowledge that additional specification would address the referee's concern. We will revise to include a concise specification of the information model (core context fields), the binding mechanism (gateway-mediated context injection at execution time), and example query patterns. A short discussion of overhead will be added to argue that the design avoids fragmentation by construction, as context is bound directly rather than reconstructed post hoc. revision: yes

Circularity Check

0 steps flagged

No circularity: conceptual proposal without derivations or self-referential reductions

full rationale

The manuscript is a high-level conceptual proposal for an observability substrate. It states the central claim directly (delegation-scoped execution is not identifiable from standard observables) but supplies no equations, fitted parameters, predictions, uniqueness theorems, or ansatzes. No load-bearing step reduces by construction to its own inputs, and no self-citations are invoked to justify the premise. The text therefore contains no instances of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Only the abstract is available, so the ledger reflects the high-level description without access to any detailed assumptions or parameters in the full manuscript.

invented entities (1)
  • agent-aware observability substrate no independent evidence
    purpose: Binds delegation context at execution time to enable reconstruction
    Introduced as the core solution mechanism without reference to prior independent evidence or validation.

pith-pipeline@v0.9.1-grok · 5676 in / 1082 out tokens · 26997 ms · 2026-06-27T16:18:47.536468+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

53 extracted references · 8 canonical work pages

  1. [1]

    Amazon Web Services. 2026. Open Cybersecurity Schema Framework (OCSF) in AWS Security Lake. https://docs.aws.amazon.com/security-lake/latest/ userguide/open-cybersecurity-schema-framework.html

  2. [2]

    Anthropic. 2025. Agentic Misalignment: How LLMs could be insider threats. Web page. https://www.anthropic.com/research/agentic-misalignment

  3. [3]

    Adam Bates, Dave Tian, Kevin R. B. Butler, Thomas Moyer, et al. 2015. Trust- worthy Whole-System Provenance for the Linux Kernel. InProceedings of the 24th USENIX Security Symposium

  4. [4]

    Mert Cemri, Shu Liu, Cathy Chen, Naman Jain, Kushal Arora, Xiangxi Mo, Kannan Ramchandran, Ion Stoica, Kurt Keutzer, and Aditya Parameswaran. 2025. Multi-Agent Systems are Brittle: Failure Modes and Robustness of LLM-Based Agent Pipelines. arXiv:2503.13657 [cs.MA] https://arxiv.org/abs/2503.13657

  5. [5]

    Secure two- party quantum evaluation of unitaries against specious adversaries,

    Peter Christen. 2012.Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer. doi:10.1007/978-3-642- 31164-2

  6. [6]

    Cybersecurity and Infrastructure Security Agency (CISA). 2026. Defining In- sider Threats. https://www.cisa.gov/topics/physical-security/insider-threat- mitigation/defining-insider-threats

  7. [7]

    Zelen , title =

    Ivan P. Fellegi and Alan B. Sunter. 1969. A Theory for Record Linkage.J. Amer. Statist. Assoc.64, 328 (1969), 1183–1210. doi:10.1080/01621459.1969.10501049

  8. [8]

    Katz, Scott Shenker, and Ion Stoica

    Rodrigo Fonseca, George Porter, Randy H. Katz, Scott Shenker, and Ion Stoica

  9. [9]

    InProceedings of the 4th USENIX Symposium on Networked Systems Design and Implementation (NSDI)

    X-Trace: A Pervasive Network Tracing Framework. InProceedings of the 4th USENIX Symposium on Networked Systems Design and Implementation (NSDI). 271–284

  10. [10]

    1994.An Introduction to Software Architecture

    David Garlan and Mary Shaw. 1994.An Introduction to Software Architecture. Technical Report CMU-CS-94-166. Carnegie Mellon University

  11. [11]

    Gustavo González-Granadillo et al. 2021. Security Information and Event Man- agement (SIEM): Analysis, Trends, and Usage in Critical Infrastructures.Sensors 21, 14 (2021). doi:10.3390/s21144759

  12. [12]

    Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. 2023. Not What You’ve Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection. https: //arxiv.org/abs/2302.12173

  13. [13]

    Chawla, Olaf Wiest, and Xiangliang Zhang

    Taicheng Guo, Xiuying Chen, Yaqi Wang, Ruidi Chang, Shichao Pei, Nitesh V. Chawla, Olaf Wiest, and Xiangliang Zhang. 2024. Large Language Model based Multi-Agents: A Survey of Progress and Challenges. arXiv:2402.01680 [cs.AI] https://arxiv.org/abs/2402.01680

  14. [14]

    Norman Hardy. 1988. The Confused Deputy: (or why capabilities might have been invented).ACM SIGOPS Operating Systems Review22, 4 (1988), 36–38. doi:10.1145/54289.871709

  15. [15]

    Wajih Ul Hassan, Shengjian Guo, Ding Li, Zhengzhang Chen, Kangkook Jee, Zhichun Li, and Adam Bates. 2020. Tactical Provenance Analysis for Endpoint De- tection and Response Systems. InProceedings of the IEEE Symposium on Security and Privacy (S&P). 1172–1189

  16. [16]

    Milajerdi, Junao Wang, Birhanu Eshete, Rigel Gjomemo, R

    Md Nahid Hossain, Sadegh M. Milajerdi, Junao Wang, Birhanu Eshete, Rigel Gjomemo, R. Sekar, Scott Stoller, and V. N. Venkatakrishnan. 2017. Real- time Attack Scenario Reconstruction from COTS Audit Data. InProceedings of the 26th USENIX Security Symposium. https://www.usenix.org/conference/ usenixsecurity17/technical-sessions/presentation/hossain

  17. [17]

    Evan Hubinger et al. 2024. Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training. arXiv preprint. arXiv:2401.05566 [cs.CR] https: //arxiv.org/abs/2401.05566

  18. [18]

    Jiaming Ji and et al. 2023. AI Alignment: A Comprehensive Survey. arXiv preprint. arXiv:2310.19852 [cs.AI] https://arxiv.org/abs/2310.19852

  19. [19]

    LangChain. 2026. LangSmith Observability Documentation. https://docs. langchain.com/langsmith/observability

  20. [20]

    Langfuse. 2026. Langfuse (GitHub Repository). https://github.com/langfuse/ langfuse

  21. [21]

    Langfuse. 2026. Langfuse Observability Overview. https://langfuse.com/docs/ observability/overview

  22. [22]

    Xiao Liu et al. 2024. AgentBench: Evaluating LLMs as Agents. InInternational Conference on Learning Representations (ICLR). https://arxiv.org/abs/2308.03688

  23. [23]

    Jonathan Mace, Ryan Roelke, and Rodrigo Fonseca. 2018. Pivot Tracing: Dynamic Causal Monitoring for Distributed Systems.ACM Transactions on Computer Systems35, 4 (2018), 11:1–11:28. doi:10.1145/3208104

  24. [24]

    Milajerdi, Rigel Gjomemo, Birhanu Eshete, R

    Sadegh M. Milajerdi, Rigel Gjomemo, Birhanu Eshete, R. Sekar, and V. N. Venkatakrishnan. 2019. HOLMES: Real-Time APT Detection Through Cor- relation of Suspicious Information Flows. InProceedings of the IEEE Symposium on Security and Privacy (S&P). 1137–1152

  25. [25]

    2013.Provenance: An Introduction to PROV

    Luc Moreau and Paul Groth. 2013.Provenance: An Introduction to PROV. Morgan & Claypool

  26. [26]

    Luc Moreau, Paul Groth, et al. 2013. PROV-Overview: An Overview of the PROV Family of Documents. https://www.w3.org/TR/prov-overview/

  27. [27]

    Luc Moreau, Paolo Missier, Khalid Belhajjame, et al. 2013. PROV-DM: The PROV Data Model. https://www.w3.org/TR/prov-dm/

  28. [28]

    OASIS. 2013. eXtensible Access Control Markup Language (XACML) Version 3.0: Core Specification. https://docs.oasis-open.org/xacml/3.0/xacml-3.0-core- spec-os-en.html

  29. [29]

    OCSF Community. 2026. OCSF Schema Repository. https://github.com/ocsf/ocsf- schema

  30. [30]

    OpenTelemetry. 2026. OpenTelemetry Baggage: Concepts. https://opentelemetry. io/docs/concepts/signals/baggage/

  31. [31]

    OpenTelemetry. 2026. OpenTelemetry Specification. https://opentelemetry.io/ docs/specs/otel/

  32. [32]

    OpenTelemetry. 2026. OpenTelemetry Specification: Baggage API. https:// opentelemetry.io/docs/specs/otel/baggage/api/

  33. [33]

    OpenTelemetry. 2026. OpenTelemetry Traces: Concepts. https://opentelemetry. io/docs/concepts/signals/traces/

  34. [34]

    OWASP Foundation. 2024. OWASP Top 10 for Large Language Model Applica- tions (v2025). PDF. https://owasp.org/www-project-top-10-for-large-language- model-applications/assets/PDF/OWASP-Top-10-for-LLMs-v2025.pdf

  35. [35]

    OWASP Foundation. 2026. Prompt Injection. Web page. https://owasp.org/www- community/attacks/PromptInjection

  36. [36]

    O’Brien, Carrie J

    Joon Sung Park, Joseph C. O’Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. 2023. Generative Agents: Interactive Simulacra of Human Behavior. InProceedings of the ACM Symposium on User Interface Software and Technology (UIST). 1–22. https://arxiv.org/abs/2304.03442

  37. [37]

    Thomas Pasquier, Xueyuan Han, Mark Goldstein, Thomas Moyer, David Ey- ers, Margo Seltzer, and Jean Bacon. 2017. Practical Whole-System Provenance Capture. InProceedings of the ACM Symposium on Cloud Computing (SoCC). doi:10.1145/3127479.3129249

  38. [38]

    2009.Causality: Models, Reasoning, and Inference(2nd ed.)

    Judea Pearl. 2009.Causality: Models, Reasoning, and Inference(2nd ed.). Cam- bridge University Press

  39. [39]

    Fábio Perez and Ian Ribeiro. 2022. Ignore Previous Prompt: Attack Techniques for Language Models. https://arxiv.org/abs/2211.09527

  40. [40]

    Pohly, Stephen McLaughlin, Patrick McDaniel, and Kevin Butler

    Devin J. Pohly, Stephen McLaughlin, Patrick McDaniel, and Kevin Butler. 2012. Hi-Fi: Collecting High-Fidelity Whole-System Provenance. InProceedings of the Annual Computer Security Applications Conference (ACSAC). 259–268. doi:10. 1145/2420950.2420989

  41. [41]

    Maddison, and Tatsunori Hashimoto

    Yangjun Ruan, Honghua Dong, Andrew Wang, Silviu Pitis, Yongchao Zhou, Jimmy Ba, Yann Dubois, Chris J. Maddison, and Tatsunori Hashimoto

  42. [42]

    arXiv:2309.15817 [cs.AI] https://arxiv.org/abs/2309.15817

    Identifying the Risks of LM Agents with an LM-Emulated Sandbox. arXiv:2309.15817 [cs.AI] https://arxiv.org/abs/2309.15817

  43. [43]

    Ravi Sandhu, Edward Coyne, Hal Feinstein, and Charles Youman. 1996. Role- Based Access Control Models.IEEE Computer29, 2 (1996), 38–47. doi:10.1109/2. 485845

  44. [44]

    Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 2023. Toolformer: Language Models Can Teach Themselves to Use Tools. InAdvances in Neural Information Processing Systems (NeurIPS). https://arxiv.org/abs/2302.04761

  45. [45]

    Bruce Schneier and John Kelsey. 1999. Secure Audit Logs to Support Computer Forensics.ACM Transactions on Information and System Security2, 2 (1999), 159–176. doi:10.1145/317087.317089

  46. [46]

    Sigelman, Luiz André Barroso, Mike Burrows, Pat Stephenson, Manoj Plakal, Donald Beaver, Saul Jaspan, and Chandan Shanbhag

    Benjamin H. Sigelman, Luiz André Barroso, Mike Burrows, Pat Stephenson, Manoj Plakal, Donald Beaver, Saul Jaspan, and Chandan Shanbhag. 2010.Dapper, a Large-Scale Distributed Systems Tracing Infrastructure. Technical Report Google Technical Report. Google. https://research.google/pubs/pub36356/

  47. [47]

    W3C. 2021. Trace Context. https://www.w3.org/TR/trace-context/

  48. [48]

    W3C. 2024. Baggage. https://www.w3.org/TR/baggage/

  49. [49]

    Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, Wayne Xin Zhao, Zhewei Wei, and Ji-Rong Wen. 2024. A Survey on Large Language Model Based Autonomous Agents.Frontiers of Computer Science18, 6 (2024)

  50. [50]

    Qingyun Wu et al. 2023. AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework.arXiv preprint arXiv:2308.08155(2023). https://arxiv.org/abs/2308.08155

  51. [51]

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2023. ReAct: Synergizing Reasoning and Acting in Language Models. InInternational Conference on Learning Representations (ICLR). https: //arxiv.org/abs/2210.03629

  52. [52]

    Junjie Ye, Sixian Li, Guanyu Li, Caishuang Huang, Songyang Gao, Yilong Wu, Qi Zhang, Tao Gui, and Xuanjing Huang. 2024. ToolSword: Unveiling Safety Issues of Large Language Models in Tool Learning Across Three Stages. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL). https://arxiv.org/abs/2402.10753

  53. [53]

    research-agent

    Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. 2024. InjecA- gent: Benchmarking Indirect Prompt Injections in Tool-Calling LLM Agents. arXiv:2403.02691 [cs.CR] https://arxiv.org/abs/2403.02691 Observability for Delegated Execution in Agentic AI Systems Appendix A Proof of Proposition 2.1 Proof. We make explicit the standard-telemetry assumption ...