pith. sign in

arxiv: 2604.05589 · v1 · submitted 2026-04-07 · 💻 cs.CR · cs.AI

Foundations for Agentic AI Investigations from the Forensic Analysis of OpenClaw

Pith reviewed 2026-05-10 19:07 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords agentic AIdigital forensicsOpenClawLLM agentsartifact taxonomynondeterminismforensic tracesagent interaction loop
0
0 comments X

The pith

Agentic AI systems like OpenClaw add abstraction and LLM-driven nondeterminism that complicate recovery of forensic traces compared to traditional software.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper performs static code analysis on OpenClaw and applies differential forensic methods to trace what can be recovered at each stage of the agent interaction loop. It then classifies those traces by investigative value and organizes recurring patterns into an agent artifact taxonomy. This matters because agentic AI assistants are becoming common, yet their behavior depends on language model outputs, execution environments, and shifting contexts in ways that produce variable traces unlike fixed rule-based programs. The work supplies an initial systematic approach for reconstructing actions in such systems and flags consequences for digital investigation practice.

Core claim

By examining OpenClaw, the authors establish that agent-mediated execution introduces an extra layer of abstraction together with substantial nondeterminism in trace generation, where the large language model, the execution environment, and the evolving context jointly shape tool selection and state transitions in ways absent from rule-based software; differential analysis across interaction stages nevertheless yields recoverable traces that support a taxonomy of agent artifacts for forensic use.

What carries the argument

Differential forensic analysis applied across the stages of the agent interaction loop to classify and correlate recoverable traces.

If this is right

  • Recoverable traces exist at multiple points in the agent interaction loop even with added nondeterminism.
  • An agent artifact taxonomy can organize recurring patterns for consistent investigative use.
  • Digital forensic methods must account for LLM and context influences on tool choice and state changes.
  • Systematic approaches developed for single-agent systems provide a starting point for broader agentic AI investigations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Automated filtering tools could be built to separate LLM-induced trace variations from signals that indicate user intent or policy violations.
  • The same taxonomy might need extension for multi-agent or tool-augmented systems where interactions compound nondeterminism.
  • Real-world case studies applying the taxonomy would reveal whether the identified patterns hold when agents operate with live user data and external APIs.

Load-bearing premise

That the nondeterminism patterns and artifact taxonomy observed in OpenClaw will generalize to other agentic AI systems.

What would settle it

A test on a different agentic system in which differential analysis recovers no distinguishable traces once LLM-induced variations are present, or in which all state transitions remain fully deterministic despite context changes.

Figures

Figures reproduced from arXiv: 2604.05589 by Jan Gruber, Jan-Niclas Hilgert.

Figure 1
Figure 1. Figure 1: Data generation workflow. Actions 𝜎𝑖 transition the system states 𝑞𝑖 , enabling differential analysis 𝛿𝑖 to isolate artifacts. various interactions with the agent. Each action transitions the system to a distinct state (𝑞𝑖 ), which is captured as a disk image, starting from a baseline 𝑞0 [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Agent Artifact Taxonomy for forensic analysis of agentic AI systems. The taxonomy organizes agent-related evidence into five planes, each corresponding to a distinct aspect of the agent’s architecture. Session transcripts are cross-cutting and contribute direct or indirect evidence to multiple planes. Representative OpenClaw artifacts are listed as examples. of the agent’s architecture. The taxonomy was de… view at source ↗
read the original abstract

Agentic Al systems are increasingly deployed as personal assistants and are likely to become a common object of digital investigations. However, little is known about how their internal state and actions can be reconstructed during forensic analysis. Despite growing popularity, systematic forensic approaches for such systems remain largely unexplored. This paper presents an empirical study of OpenClaw a widely used single-agent assistant. We examine OpenClaw's technical design via static code analysis and apply differential forensic analysis to identify recoverable traces across stages of the agent interaction loop. We classify and correlate these traces to assess their investigative value in a systematic way. Based on these observations, we propose an agent artifact taxonomy that captures recurring investigative patterns. Finally, we highlight a foundational challenge for agentic Al forensics: agent-mediated execution introduces an additional layer of abstraction and substantial nondeterminism in trace generation. The large language model (LLM), the execution environment, and the evolving context can influence tool choice and state transitions in ways that are largely absent from rule-based software. Overall, our results provide an initial foundation for the systematic investigation of agentic Al and outline implications for digital forensic practice and future research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents an empirical study of OpenClaw, a single-agent AI assistant. It performs static code analysis of the system's design and differential forensic analysis of interaction traces across the agent loop to identify, classify, and correlate recoverable artifacts. From these observations the authors propose an agent artifact taxonomy and argue that agent-mediated execution introduces an additional abstraction layer plus substantial nondeterminism (driven by the LLM, execution environment, and evolving context) that is largely absent from rule-based software, thereby establishing initial foundations for agentic AI forensics.

Significance. If the taxonomy can be shown to be stable and generalizable, the work would supply a useful starting point for digital-forensic practice in an emerging domain. The identification of LLM-induced nondeterminism as a core investigative challenge is conceptually sound and timely. However, the absence of quantitative controls, variance data, or cross-system validation means the current contribution remains largely descriptive rather than demonstrative.

major comments (2)
  1. [empirical study / differential analysis] The differential forensic analysis (described in the abstract and empirical-study sections) reports no multi-run variance measurements, no ablation holding the LLM fixed while varying context, and no quantitative metric for artifact persistence across stochastic executions. Without these controls the claim that the taxonomy reliably separates LLM-induced nondeterminism from stable investigative signals cannot be evaluated and is load-bearing for the central contribution.
  2. [conclusions / foundational challenge] The assertion that nondeterminism is 'substantial' and 'largely absent from rule-based software' is presented as a foundational challenge, yet the manuscript supplies no comparative data or baseline measurements against conventional rule-based agents. This weakens the contrast that underpins the proposed taxonomy's novelty.
minor comments (2)
  1. [abstract] The abstract states that traces are 'classified and correlated' but does not indicate the classification criteria, inter-rater reliability, or any error-handling procedures used during trace extraction.
  2. [results] No tables or figures summarizing the recovered artifact types, their frequency, or investigative value are referenced in the provided description; adding such a summary would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. We address the major comments point by point below and outline the revisions we plan to make to improve the clarity and scope of our claims.

read point-by-point responses
  1. Referee: [empirical study / differential analysis] The differential forensic analysis (described in the abstract and empirical-study sections) reports no multi-run variance measurements, no ablation holding the LLM fixed while varying context, and no quantitative metric for artifact persistence across stochastic executions. Without these controls the claim that the taxonomy reliably separates LLM-induced nondeterminism from stable investigative signals cannot be evaluated and is load-bearing for the central contribution.

    Authors: We agree that the absence of multi-run variance measurements and quantitative metrics for artifact persistence limits the ability to statistically evaluate the taxonomy's robustness against nondeterminism. Our differential analysis was conducted on individual interaction traces to identify and classify artifacts at different stages of the agent loop, revealing patterns that informed the taxonomy. The work is intended as an initial empirical foundation rather than a comprehensive validation study. In the revised manuscript, we will expand the discussion of limitations to explicitly note the lack of these controls and emphasize that the taxonomy represents observed investigative patterns from our case study of OpenClaw. We will also propose directions for future work involving repeated executions and ablations to quantify persistence. revision: partial

  2. Referee: [conclusions / foundational challenge] The assertion that nondeterminism is 'substantial' and 'largely absent from rule-based software' is presented as a foundational challenge, yet the manuscript supplies no comparative data or baseline measurements against conventional rule-based agents. This weakens the contrast that underpins the proposed taxonomy's novelty.

    Authors: The description of nondeterminism as substantial stems from our observations of variable tool invocations and state transitions in OpenClaw, driven by the LLM's probabilistic outputs and context evolution, which contrast with the fixed logic in rule-based systems. However, we acknowledge that no direct comparative experiments were performed. We will revise the relevant sections to frame this as an observed characteristic in LLM-based agents that introduces challenges not typically present in deterministic rule-based software, supported by references to agent architectures, while toning down the language to avoid implying a quantified comparison. This will better position the taxonomy as a starting point for agentic AI forensics. revision: partial

Circularity Check

0 steps flagged

No circularity: purely empirical classification of observed traces

full rationale

The paper performs static code analysis and differential forensic analysis directly on OpenClaw interactions, classifies recoverable traces from the agent interaction loop, and proposes an artifact taxonomy based on those observations. No equations, fitted parameters, self-referential definitions, or load-bearing self-citations appear in the derivation chain. Central claims about nondeterminism and investigative value rest on direct examination of the system rather than reducing to prior inputs or author-defined constructs by construction. The work is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The study rests on the domain assumption that OpenClaw is a representative single-agent system and that standard forensic techniques can be extended to LLM-driven loops without new formal models.

axioms (1)
  • domain assumption OpenClaw is representative of single-agent assistants
    Study uses one concrete system to derive general taxonomy and challenges.

pith-pipeline@v0.9.0 · 5498 in / 1162 out tokens · 50219 ms · 2026-05-10T19:07:51.459341+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages

  1. [1]

    Agentic AI: a com- prehensivesurveyofarchitectures,applications,andfuturedirections

    Ali, M.A., Dornaika, F., Charafeddine, J., 2026. Agentic AI: a com- prehensivesurveyofarchitectures,applications,andfuturedirections. Artif. Intell. Rev. 59, 11. URL:https://doi.org/10.1007/s10462-025 -11422-4, doi:10.1007/S10462-025-11422-4

  2. [2]

    Behzadan, V., Baggili, I.M., 2020. Founding the domain of AI forensics, in: Espinoza, H., Hernández-Orallo, J., Chen, X.C., ÓhÉigeartaigh, S.S., Huang, X., Castillo-Effen, M., Mallah, R., Mc- Dermid,J.A.(Eds.),ProceedingsoftheWorkshoponArtificialIntel- ligence Safety, co-located with 34th AAAI Conference on Artificial Intelligence,SafeAI@AAAI2020,NewYorkC...

  3. [3]

    Defining digital forensic examination and analysis tool using abstraction layers

    Carrier, B.D., 2003. Defining digital forensic examination and analysis tool using abstraction layers. Int. J. Digit. EVid. 1

  4. [4]

    ISBN 9798400712098

    Chernyshev, M., Baig, Z.A., Doss, R.R.M., 2024. Towards large language model (LLM) forensics using llm-based invocation log analysis, in: Li, B., Xu, W., Chen, J., Zhang, Y., Xue, J., Wang, S., Bai, G., Yuan, X. (Eds.), Proceedings of the 1st ACM Workshop on Large AI Systems and Models with Privacy and Safety Analysis, LAMPS2024,SaltLakeCity,UT,USA,Octobe...

  5. [5]

    Digital forensic approaches for amazon alexa ecosystem

    Chung, H., Park, J., Lee, S., 2017. Digital forensic approaches for amazon alexa ecosystem. Digit. Investig. 22 Supplement, S15–S25. URL:https://doi.org/10.1016/j.diin.2017.06.010,doi:10.1016/J.DI IN.2017.06.010

  6. [6]

    Crasselt, J., Pugliese, G., 2024. Started Off Local, Now We’re in the Cloud: Forensic Examination of the Amazon Echo Show 15 Smart Display,in:ProceedingsoftheDigitalForensicsResearchConference Europe(DFRWSUSA),dfrws.org,BatonRouge,Louisiana.pp.1–11. doi:10.48550/ARXIV.2408.15768

  7. [7]

    Forensic analysis of openai’s chatgpt mobile application

    Dragonas, E., Lambrinoudakis, C., Nakoutis, P., 2024. Forensic analysis of openai’s chatgpt mobile application. Forensic Sci. Int. Digit. Investig. 50, 301801. URL:https://doi.org/10.1016/j.fsidi. 2024.301801, doi:10.1016/J.FSIDI.2024.301801

  8. [8]

    Digital forensics and strong AI: A structured literaturereview

    Fähndrich, J., Honekamp, W., Povalej, R., Rittelmeier, H., Berner, S., Labudde, D., 2023. Digital forensics and strong AI: A structured literaturereview. ForensicSci.Int.Digit.Investig.46,301617. URL: https://doi.org/10.1016/j.fsidi.2023.301617, doi:10.1016/J.FSIDI. 2023.301617

  9. [9]

    Garfinkel, S.L., 2009. Automating disk forensic processing with sleuthkit, XML and python, in: Fourth International IEEE Workshop on Systematic Approaches to Digital Forensic Engineering, SADFE 2009, Berkeley, California, USA, May 21, 2009, IEEE Computer Society. pp. 73–84. URL:https://doi.org/10.1109/SADFE.2009.12, doi:10.1109/SADFE.2009.12

  10. [10]

    Garfinkel,S.L.,2012.DigitalforensicsXMLandtheDFXMLtoolset. Digit. Investig. 8, 161–174. URL:https://doi.org/10.1016/j.diin.2 011.11.002, doi:10.1016/J.DIIN.2011.11.002

  11. [11]

    A general strategy for differential forensic analysis

    Garfinkel, S.L., Nelson, A.J., Young, J., 2012. A general strategy for differential forensic analysis. Digit. Investig. 9, S50–S59. URL: https://doi.org/10.1016/j.diin.2012.05.003, doi:10.1016/J.DIIN.2 012.05.003

  12. [12]

    AI Agent Tools and Frameworks

    Huang, K., Huang, J., 2025. AI Agent Tools and Frameworks. Springer Nature Switzerland. p. 23–50. URL:http://dx.doi.org /10.1007/978-3-031-90026-6_2, doi:10.1007/978-3-031-90026-6_2

  13. [13]

    Introduction to Agentic AI: Founda- tions,Drivers,andRisks.SpringerNatureSwitzerland.p.3–16.URL: http://dx.doi.org/10.1007/978-3-032-02130-4_1, doi:10.1007/978-3 -032-02130-4_1

    Huang, K., Hughes, C., 2025. Introduction to Agentic AI: Founda- tions,Drivers,andRisks.SpringerNatureSwitzerland.p.3–16.URL: http://dx.doi.org/10.1007/978-3-032-02130-4_1, doi:10.1007/978-3 -032-02130-4_1

  14. [14]

    Kälber, S., Dewald, A., Freiling, F.C., 2013. Forensic application- fingerprinting based on file system metadata, in: Morgenstern, H., Ehlert, R., Freiling, F.C., Frings, S., Göbel, O., Günther, D., Kiltz, S., Nedon, J., Schadt, D. (Eds.), Seventh International Conference on IT Security Incident Management and IT Forensics, IMF 2013, Nuremberg, Germany, M...

  15. [15]

    Hammers and others , title =

    Lazer, S.J., Aryal, K., Gupta, M., Bertino, E., 2026. A survey of agentic ai and cybersecurity: Challenges, opportunities and use- case prototypes. URL:https://arxiv.org/abs/2601.05293, arXiv:2601.05293

  16. [16]

    Meske, C., Hermanns, T., von der Weiden, E., Loser, K., Berger, T.,

  17. [17]

    IEEE Access8, 199523–199538 (2020) https://doi.org/10.1109/ACCESS

    Vibecodingasareconfigurationofintentmediationinsoftware development: Definition, implications, and research agenda. IEEE Access 13, 213242–213259. URL:https://doi.org/10.1109/ACCESS .2025.3645466, doi:10.1109/ACCESS.2025.3645466

  18. [18]

    Digital forensics artifacts repository documentation

    Metz, J., 2025. Digital forensics artifacts repository documentation. https://github.com/ForensicArtifacts/artifacts. URL:https: //artifacts.readthedocs.io/en/latest/. accessed: 2026-02-03

  19. [19]

    Agentic AI: A comprehensive survey of tech- nologies, applications, and societal implications

    Pati, A.K., 2025. Agentic AI: A comprehensive survey of tech- nologies, applications, and societal implications. IEEE Access 13, 151824–151837. URL:https://doi.org/10.1109/ACCESS.2025.358560 9, doi:10.1109/ACCESS.2025.3585609

  20. [20]

    https://doi.org/https://doi.org/10.1016/j.inffus.2025

    Sapkota, R., Roumeliotis, K.I., Karkee, M., 2026. AI agents vs. agenticAI:Aconceptualtaxonomy,applicationsandchallenges. Inf. Preprint Page 11 of 13 Gruber & Hilgert/Foundations for Agentic AI Investigations Fusion 126, 103599. URL:https://doi.org/10.1016/j.inffus.2025. 103599, doi:10.1016/J.INFFUS.2025.103599

  21. [21]

    Chatgpt for digital forensic investigation: The good, the bad, and the unknown

    Scanlon, M., Breitinger, F., Hargreaves, C., Hilgert, J.N., Sheppard, J., 2023. Chatgpt for digital forensic investigation: The good, the bad, and the unknown. Forensic Science International: Digital Investigation 46, 301609. URL:https://www.sciencedirect.com/ science/article/pii/S266628172300121X, doi:https://doi.org/10.101 6/j.fsidi.2023.301609

  22. [22]

    Towards AI forensics: Did the artificial intelligence system do it? J

    Schneider, J., Breitinger, F., 2023. Towards AI forensics: Did the artificial intelligence system do it? J. Inf. Secur. Appl. 76, 103517. URL:https://doi.org/10.1016/j.jisa.2023.103517,doi:10.1016/J.JI SA.2023.103517

  23. [23]

    Preserving meaning of evidence from evolving systems

    Spichiger, H., Adelstein, F., 2025. Preserving meaning of evidence from evolving systems. Digit. Investig. 52, 301867. URL:https: //doi.org/10.1016/j.fsidi.2025.301867, doi:10.1016/J.FSIDI.2025.3 01867

  24. [24]

    Walker, C., Gharaibeh, T., Alsmadi, R., Hall, C.L., Baggili, I.M.,

  25. [25]

    Forensic analysis of artifacts from microsoft’s multi-agent LLM platform autogen, in: Proceedings of the 19th International Conference on Availability, Reliability and Security, ARES 2024, Vienna, Austria, 30 July 2024 - 2 August 2024, ACM. pp. 198:1– 198:9. URL:https://doi.org/10.1145/3664476.3670908, doi:10.1145/ 3664476.3670908

  26. [26]

    A framework for integrated digital forensic investigation employing autogen ai agents, in: 2024 12th International Symposium on Digital Forensics and Security (ISDFS), pp

    Wickramasekara, A., Scanlon, M., 2024. A framework for integrated digital forensic investigation employing autogen ai agents, in: 2024 12th International Symposium on Digital Forensics and Security (ISDFS), pp. 01–06. doi:10.1109/ISDFS60797.2024.10527235. A. Application of the Taxonomy to Other Works Asapreliminaryvalidation,weappliedourtaxonomyto twoothe...