Foundations for Agentic AI Investigations from the Forensic Analysis of OpenClaw
Pith reviewed 2026-05-10 19:07 UTC · model grok-4.3
The pith
Agentic AI systems like OpenClaw add abstraction and LLM-driven nondeterminism that complicate recovery of forensic traces compared to traditional software.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By examining OpenClaw, the authors establish that agent-mediated execution introduces an extra layer of abstraction together with substantial nondeterminism in trace generation, where the large language model, the execution environment, and the evolving context jointly shape tool selection and state transitions in ways absent from rule-based software; differential analysis across interaction stages nevertheless yields recoverable traces that support a taxonomy of agent artifacts for forensic use.
What carries the argument
Differential forensic analysis applied across the stages of the agent interaction loop to classify and correlate recoverable traces.
If this is right
- Recoverable traces exist at multiple points in the agent interaction loop even with added nondeterminism.
- An agent artifact taxonomy can organize recurring patterns for consistent investigative use.
- Digital forensic methods must account for LLM and context influences on tool choice and state changes.
- Systematic approaches developed for single-agent systems provide a starting point for broader agentic AI investigations.
Where Pith is reading between the lines
- Automated filtering tools could be built to separate LLM-induced trace variations from signals that indicate user intent or policy violations.
- The same taxonomy might need extension for multi-agent or tool-augmented systems where interactions compound nondeterminism.
- Real-world case studies applying the taxonomy would reveal whether the identified patterns hold when agents operate with live user data and external APIs.
Load-bearing premise
That the nondeterminism patterns and artifact taxonomy observed in OpenClaw will generalize to other agentic AI systems.
What would settle it
A test on a different agentic system in which differential analysis recovers no distinguishable traces once LLM-induced variations are present, or in which all state transitions remain fully deterministic despite context changes.
Figures
read the original abstract
Agentic Al systems are increasingly deployed as personal assistants and are likely to become a common object of digital investigations. However, little is known about how their internal state and actions can be reconstructed during forensic analysis. Despite growing popularity, systematic forensic approaches for such systems remain largely unexplored. This paper presents an empirical study of OpenClaw a widely used single-agent assistant. We examine OpenClaw's technical design via static code analysis and apply differential forensic analysis to identify recoverable traces across stages of the agent interaction loop. We classify and correlate these traces to assess their investigative value in a systematic way. Based on these observations, we propose an agent artifact taxonomy that captures recurring investigative patterns. Finally, we highlight a foundational challenge for agentic Al forensics: agent-mediated execution introduces an additional layer of abstraction and substantial nondeterminism in trace generation. The large language model (LLM), the execution environment, and the evolving context can influence tool choice and state transitions in ways that are largely absent from rule-based software. Overall, our results provide an initial foundation for the systematic investigation of agentic Al and outline implications for digital forensic practice and future research.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents an empirical study of OpenClaw, a single-agent AI assistant. It performs static code analysis of the system's design and differential forensic analysis of interaction traces across the agent loop to identify, classify, and correlate recoverable artifacts. From these observations the authors propose an agent artifact taxonomy and argue that agent-mediated execution introduces an additional abstraction layer plus substantial nondeterminism (driven by the LLM, execution environment, and evolving context) that is largely absent from rule-based software, thereby establishing initial foundations for agentic AI forensics.
Significance. If the taxonomy can be shown to be stable and generalizable, the work would supply a useful starting point for digital-forensic practice in an emerging domain. The identification of LLM-induced nondeterminism as a core investigative challenge is conceptually sound and timely. However, the absence of quantitative controls, variance data, or cross-system validation means the current contribution remains largely descriptive rather than demonstrative.
major comments (2)
- [empirical study / differential analysis] The differential forensic analysis (described in the abstract and empirical-study sections) reports no multi-run variance measurements, no ablation holding the LLM fixed while varying context, and no quantitative metric for artifact persistence across stochastic executions. Without these controls the claim that the taxonomy reliably separates LLM-induced nondeterminism from stable investigative signals cannot be evaluated and is load-bearing for the central contribution.
- [conclusions / foundational challenge] The assertion that nondeterminism is 'substantial' and 'largely absent from rule-based software' is presented as a foundational challenge, yet the manuscript supplies no comparative data or baseline measurements against conventional rule-based agents. This weakens the contrast that underpins the proposed taxonomy's novelty.
minor comments (2)
- [abstract] The abstract states that traces are 'classified and correlated' but does not indicate the classification criteria, inter-rater reliability, or any error-handling procedures used during trace extraction.
- [results] No tables or figures summarizing the recovered artifact types, their frequency, or investigative value are referenced in the provided description; adding such a summary would improve clarity.
Simulated Author's Rebuttal
We thank the referee for their constructive comments on our manuscript. We address the major comments point by point below and outline the revisions we plan to make to improve the clarity and scope of our claims.
read point-by-point responses
-
Referee: [empirical study / differential analysis] The differential forensic analysis (described in the abstract and empirical-study sections) reports no multi-run variance measurements, no ablation holding the LLM fixed while varying context, and no quantitative metric for artifact persistence across stochastic executions. Without these controls the claim that the taxonomy reliably separates LLM-induced nondeterminism from stable investigative signals cannot be evaluated and is load-bearing for the central contribution.
Authors: We agree that the absence of multi-run variance measurements and quantitative metrics for artifact persistence limits the ability to statistically evaluate the taxonomy's robustness against nondeterminism. Our differential analysis was conducted on individual interaction traces to identify and classify artifacts at different stages of the agent loop, revealing patterns that informed the taxonomy. The work is intended as an initial empirical foundation rather than a comprehensive validation study. In the revised manuscript, we will expand the discussion of limitations to explicitly note the lack of these controls and emphasize that the taxonomy represents observed investigative patterns from our case study of OpenClaw. We will also propose directions for future work involving repeated executions and ablations to quantify persistence. revision: partial
-
Referee: [conclusions / foundational challenge] The assertion that nondeterminism is 'substantial' and 'largely absent from rule-based software' is presented as a foundational challenge, yet the manuscript supplies no comparative data or baseline measurements against conventional rule-based agents. This weakens the contrast that underpins the proposed taxonomy's novelty.
Authors: The description of nondeterminism as substantial stems from our observations of variable tool invocations and state transitions in OpenClaw, driven by the LLM's probabilistic outputs and context evolution, which contrast with the fixed logic in rule-based systems. However, we acknowledge that no direct comparative experiments were performed. We will revise the relevant sections to frame this as an observed characteristic in LLM-based agents that introduces challenges not typically present in deterministic rule-based software, supported by references to agent architectures, while toning down the language to avoid implying a quantified comparison. This will better position the taxonomy as a starting point for agentic AI forensics. revision: partial
Circularity Check
No circularity: purely empirical classification of observed traces
full rationale
The paper performs static code analysis and differential forensic analysis directly on OpenClaw interactions, classifies recoverable traces from the agent interaction loop, and proposes an artifact taxonomy based on those observations. No equations, fitted parameters, self-referential definitions, or load-bearing self-citations appear in the derivation chain. Central claims about nondeterminism and investigative value rest on direct examination of the system rather than reducing to prior inputs or author-defined constructs by construction. The work is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption OpenClaw is representative of single-agent assistants
Reference graph
Works this paper leans on
-
[1]
Agentic AI: a com- prehensivesurveyofarchitectures,applications,andfuturedirections
Ali, M.A., Dornaika, F., Charafeddine, J., 2026. Agentic AI: a com- prehensivesurveyofarchitectures,applications,andfuturedirections. Artif. Intell. Rev. 59, 11. URL:https://doi.org/10.1007/s10462-025 -11422-4, doi:10.1007/S10462-025-11422-4
-
[2]
Behzadan, V., Baggili, I.M., 2020. Founding the domain of AI forensics, in: Espinoza, H., Hernández-Orallo, J., Chen, X.C., ÓhÉigeartaigh, S.S., Huang, X., Castillo-Effen, M., Mallah, R., Mc- Dermid,J.A.(Eds.),ProceedingsoftheWorkshoponArtificialIntel- ligence Safety, co-located with 34th AAAI Conference on Artificial Intelligence,SafeAI@AAAI2020,NewYorkC...
work page 2020
-
[3]
Defining digital forensic examination and analysis tool using abstraction layers
Carrier, B.D., 2003. Defining digital forensic examination and analysis tool using abstraction layers. Int. J. Digit. EVid. 1
work page 2003
-
[4]
Chernyshev, M., Baig, Z.A., Doss, R.R.M., 2024. Towards large language model (LLM) forensics using llm-based invocation log analysis, in: Li, B., Xu, W., Chen, J., Zhang, Y., Xue, J., Wang, S., Bai, G., Yuan, X. (Eds.), Proceedings of the 1st ACM Workshop on Large AI Systems and Models with Privacy and Safety Analysis, LAMPS2024,SaltLakeCity,UT,USA,Octobe...
-
[5]
Digital forensic approaches for amazon alexa ecosystem
Chung, H., Park, J., Lee, S., 2017. Digital forensic approaches for amazon alexa ecosystem. Digit. Investig. 22 Supplement, S15–S25. URL:https://doi.org/10.1016/j.diin.2017.06.010,doi:10.1016/J.DI IN.2017.06.010
-
[6]
Crasselt, J., Pugliese, G., 2024. Started Off Local, Now We’re in the Cloud: Forensic Examination of the Amazon Echo Show 15 Smart Display,in:ProceedingsoftheDigitalForensicsResearchConference Europe(DFRWSUSA),dfrws.org,BatonRouge,Louisiana.pp.1–11. doi:10.48550/ARXIV.2408.15768
-
[7]
Forensic analysis of openai’s chatgpt mobile application
Dragonas, E., Lambrinoudakis, C., Nakoutis, P., 2024. Forensic analysis of openai’s chatgpt mobile application. Forensic Sci. Int. Digit. Investig. 50, 301801. URL:https://doi.org/10.1016/j.fsidi. 2024.301801, doi:10.1016/J.FSIDI.2024.301801
-
[8]
Digital forensics and strong AI: A structured literaturereview
Fähndrich, J., Honekamp, W., Povalej, R., Rittelmeier, H., Berner, S., Labudde, D., 2023. Digital forensics and strong AI: A structured literaturereview. ForensicSci.Int.Digit.Investig.46,301617. URL: https://doi.org/10.1016/j.fsidi.2023.301617, doi:10.1016/J.FSIDI. 2023.301617
-
[9]
Garfinkel, S.L., 2009. Automating disk forensic processing with sleuthkit, XML and python, in: Fourth International IEEE Workshop on Systematic Approaches to Digital Forensic Engineering, SADFE 2009, Berkeley, California, USA, May 21, 2009, IEEE Computer Society. pp. 73–84. URL:https://doi.org/10.1109/SADFE.2009.12, doi:10.1109/SADFE.2009.12
-
[10]
Garfinkel,S.L.,2012.DigitalforensicsXMLandtheDFXMLtoolset. Digit. Investig. 8, 161–174. URL:https://doi.org/10.1016/j.diin.2 011.11.002, doi:10.1016/J.DIIN.2011.11.002
-
[11]
A general strategy for differential forensic analysis
Garfinkel, S.L., Nelson, A.J., Young, J., 2012. A general strategy for differential forensic analysis. Digit. Investig. 9, S50–S59. URL: https://doi.org/10.1016/j.diin.2012.05.003, doi:10.1016/J.DIIN.2 012.05.003
-
[12]
Huang, K., Huang, J., 2025. AI Agent Tools and Frameworks. Springer Nature Switzerland. p. 23–50. URL:http://dx.doi.org /10.1007/978-3-031-90026-6_2, doi:10.1007/978-3-031-90026-6_2
-
[13]
Huang, K., Hughes, C., 2025. Introduction to Agentic AI: Founda- tions,Drivers,andRisks.SpringerNatureSwitzerland.p.3–16.URL: http://dx.doi.org/10.1007/978-3-032-02130-4_1, doi:10.1007/978-3 -032-02130-4_1
-
[14]
Kälber, S., Dewald, A., Freiling, F.C., 2013. Forensic application- fingerprinting based on file system metadata, in: Morgenstern, H., Ehlert, R., Freiling, F.C., Frings, S., Göbel, O., Günther, D., Kiltz, S., Nedon, J., Schadt, D. (Eds.), Seventh International Conference on IT Security Incident Management and IT Forensics, IMF 2013, Nuremberg, Germany, M...
-
[15]
Lazer, S.J., Aryal, K., Gupta, M., Bertino, E., 2026. A survey of agentic ai and cybersecurity: Challenges, opportunities and use- case prototypes. URL:https://arxiv.org/abs/2601.05293, arXiv:2601.05293
-
[16]
Meske, C., Hermanns, T., von der Weiden, E., Loser, K., Berger, T.,
-
[17]
IEEE Access8, 199523–199538 (2020) https://doi.org/10.1109/ACCESS
Vibecodingasareconfigurationofintentmediationinsoftware development: Definition, implications, and research agenda. IEEE Access 13, 213242–213259. URL:https://doi.org/10.1109/ACCESS .2025.3645466, doi:10.1109/ACCESS.2025.3645466
-
[18]
Digital forensics artifacts repository documentation
Metz, J., 2025. Digital forensics artifacts repository documentation. https://github.com/ForensicArtifacts/artifacts. URL:https: //artifacts.readthedocs.io/en/latest/. accessed: 2026-02-03
work page 2025
-
[19]
Agentic AI: A comprehensive survey of tech- nologies, applications, and societal implications
Pati, A.K., 2025. Agentic AI: A comprehensive survey of tech- nologies, applications, and societal implications. IEEE Access 13, 151824–151837. URL:https://doi.org/10.1109/ACCESS.2025.358560 9, doi:10.1109/ACCESS.2025.3585609
-
[20]
https://doi.org/https://doi.org/10.1016/j.inffus.2025
Sapkota, R., Roumeliotis, K.I., Karkee, M., 2026. AI agents vs. agenticAI:Aconceptualtaxonomy,applicationsandchallenges. Inf. Preprint Page 11 of 13 Gruber & Hilgert/Foundations for Agentic AI Investigations Fusion 126, 103599. URL:https://doi.org/10.1016/j.inffus.2025. 103599, doi:10.1016/J.INFFUS.2025.103599
-
[21]
Chatgpt for digital forensic investigation: The good, the bad, and the unknown
Scanlon, M., Breitinger, F., Hargreaves, C., Hilgert, J.N., Sheppard, J., 2023. Chatgpt for digital forensic investigation: The good, the bad, and the unknown. Forensic Science International: Digital Investigation 46, 301609. URL:https://www.sciencedirect.com/ science/article/pii/S266628172300121X, doi:https://doi.org/10.101 6/j.fsidi.2023.301609
-
[22]
Towards AI forensics: Did the artificial intelligence system do it? J
Schneider, J., Breitinger, F., 2023. Towards AI forensics: Did the artificial intelligence system do it? J. Inf. Secur. Appl. 76, 103517. URL:https://doi.org/10.1016/j.jisa.2023.103517,doi:10.1016/J.JI SA.2023.103517
-
[23]
Preserving meaning of evidence from evolving systems
Spichiger, H., Adelstein, F., 2025. Preserving meaning of evidence from evolving systems. Digit. Investig. 52, 301867. URL:https: //doi.org/10.1016/j.fsidi.2025.301867, doi:10.1016/J.FSIDI.2025.3 01867
-
[24]
Walker, C., Gharaibeh, T., Alsmadi, R., Hall, C.L., Baggili, I.M.,
-
[25]
Forensic analysis of artifacts from microsoft’s multi-agent LLM platform autogen, in: Proceedings of the 19th International Conference on Availability, Reliability and Security, ARES 2024, Vienna, Austria, 30 July 2024 - 2 August 2024, ACM. pp. 198:1– 198:9. URL:https://doi.org/10.1145/3664476.3670908, doi:10.1145/ 3664476.3670908
-
[26]
Wickramasekara, A., Scanlon, M., 2024. A framework for integrated digital forensic investigation employing autogen ai agents, in: 2024 12th International Symposium on Digital Forensics and Security (ISDFS), pp. 01–06. doi:10.1109/ISDFS60797.2024.10527235. A. Application of the Taxonomy to Other Works Asapreliminaryvalidation,weappliedourtaxonomyto twoothe...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.