pith. sign in

arxiv: 2606.20669 · v1 · pith:L3FXK7XEnew · submitted 2026-06-12 · 💻 cs.AI · cs.SE

Agent Behavior Mining: Generative AI Agent Governance in Business Processes

Pith reviewed 2026-06-27 05:12 UTC · model grok-4.3

classification 💻 cs.AI cs.SE
keywords Agent Behavior Mininggenerative AI agentsprocess miningbusiness process managementAI governanceevent data modelreasoning traces
0
0 comments X

The pith

Generative AI agents in business processes become auditable when their reasoning traces are recorded as standard event logs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Companies face a control problem when they let generative AI agents run business processes because the agents make unpredictable choices that clash with the standardization goals of business process management. The paper proposes Agent Behavior Mining to solve this by defining an event data model that converts each agent's reasoning steps, tool calls, and token usage into ordinary process logs. These logs can then be analyzed with existing process mining methods to spot policy violations and measure how much the agents vary in their actions. The approach is demonstrated in a working multi-agent order-to-cash system and assessed through interviews with 18 practitioners, who report that the ability to inspect agent reasoning is viewed as necessary for trust.

Core claim

An event data model that translates granular agent activities including reasoning traces, tool usage, and token costs into standardized process logs enables the direct application of process mining techniques, thereby making generative AI agent decision-making observable and traceable and addressing the invisible autonomy risk in AI-driven business processes.

What carries the argument

The event data model for agent activities that records reasoning traces, tool usage, and token costs as process events suitable for mining.

If this is right

  • Managers can detect when generative AI agents deviate from company policies using standard process mining tools.
  • The amount of operational variability introduced by the agents can be measured and compared across runs.
  • Behavioral transparency through log inspection is treated as a necessary condition for establishing trust in the agents.
  • The capacity to examine agent reasoning steps is positioned as a required governance feature for future AI-driven processes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same logging structure could be adapted to monitor autonomous agents in domains outside business processes, such as customer service or software development pipelines.
  • Linking the logs to cost and outcome data might let organizations tune agent prompts or tools for better compliance without separate oversight systems.
  • Auditors or regulators could one day treat the presence of such reasoning traces as evidence of due diligence in AI deployments.

Load-bearing premise

The event data model that records reasoning traces, tool usage, and token costs can be implemented inside real multi-agent business processes without losing essential decision information or creating excessive costs.

What would settle it

A live multi-agent business process deployment in which the captured logs omit key decision factors or require so much extra effort that teams stop using the model.

Figures

Figures reproduced from arXiv: 2606.20669 by Adrian Rebmann, Gabriel Kevorkian, Gregor Berg, Hoang Vu, Maximilian K\"orner, Michael Perscheid, Timotheus Kampik.

Figure 1
Figure 1. Figure 1: Order-to-Cash multi-agent system with four GenAI agents. into systemic risk: unnecessary restocking by the Inventory Agent becomes an unmonitored financial leak, and adversarial prompts can silently induce the Customer Service Agent to grant undeserved refunds—both undetectable without access to reasoning chains. Existing observability tools do not close this gap: agent frameworks produce verbose logs opti… view at source ↗
Figure 2
Figure 2. Figure 2: Agent Event Concept Hierarchy the event data model. Crucially, most ai:* attributes align with OpenTelemetry GenAI semantic conventions (R6), ensuring broad compatibility and semantic coverage. Although not formally defined as an XES extension, the model builds upon the XES standard [12] and corresponding event logs are readily usable by process mining tools supporting the standard (R7). By satisfying thes… view at source ↗
Figure 3
Figure 3. Figure 3: Agent event log fragment. execute_tool logs the action payload via ai_tool_args ({"order":[...],...}) to enable decision reconstruction. Together, these attributes allow organizations to distinguish cost sources, trace reasoning back to user intent, and analyze operational efficiency across execution variants. 5 Agent Behavior Mining in Practice This section demonstrates how ABM can address governance chal… view at source ↗
Figure 4
Figure 4. Figure 4: Process Discovery: Visibility of Execution Patterns [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Conformance Checking: Specification Alignment Assessment. calculate_total; Barista Agent skips estimate_prep_time and then executes remake_order_item (insertion violation). Model alignment with tracing stan￾dards (R6) and tool interoperability (R7) enables compliance rate assessment and violation categorization. This quantifies gaps between intended versus actual execution, supporting quality assurance for… view at source ↗
Figure 6
Figure 6. Figure 6 [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Variant Analysis: Observable Execution Differences. tions to standardize agent behavior and ensure consistent service quality despite probabilistic variance. 5.3 Practical Utility Assessment We complemented the quantitative log analysis with practitioner feedback to assess the artifact’s perceived value for decision-making ( [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Practitioner evaluation results (n=18): perceived value, transparency improvement, insight helpfulness, and adoption barriers [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗
read the original abstract

As organizations increasingly deploy generative AI agents to automate business processes, they face a governance dilemma: although these agents can increase operational flexibility, their non-deterministic nature challenges the control and standardization that Business Process Management seeks to enforce. This paper addresses this \emph{invisible autonomy risk} by introducing \emph{Agent Behavior Mining}, a governance capability that enables the application of process mining techniques to render generative AI agent decision-making observable and traceable. We (1) improve the understanding of generative AI agent behavior through an event data model that translates granular agent activities -- including reasoning traces, tool usage, and token costs -- into standardized process logs; (2) instantiate the data model in a multi-agent order-to-cash implementation, demonstrating how process managers can leverage agent logs to detect policy deviations and quantify operational variability; and (3) evaluate the perceived practical utility of the approach in an exploratory study with 18 industry practitioners. The results indicate that practitioners view behavioral transparency as a prerequisite for trust and consider the ability to examine agent reasoning as an important governance requirement for the next generation of AI-driven business processes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper introduces Agent Behavior Mining as a governance approach for generative AI agents in business processes. It defines an event data model that maps agent reasoning traces, tool usage, and token costs to standardized process logs; instantiates the model in a multi-agent order-to-cash system to enable deviation detection and variability quantification; and reports results from an exploratory study with 18 industry practitioners indicating that behavioral transparency is viewed as essential for trust and governance.

Significance. If the central claims hold, the work would offer a concrete mechanism for applying established process mining techniques to non-deterministic AI agents, addressing a practical gap in BPM governance. The practitioner evaluation, if methodologically sound, would provide initial evidence that transparency features align with industry needs. Strengths include the explicit linkage of agent internals to process logs and the focus on a real business process instantiation.

major comments (2)
  1. [Abstract] Abstract, points (1) and (2): the claim that the event data model enables detection of policy deviations and quantification of operational variability rests on the assumption that the mapping from raw reasoning traces and tool calls to logs preserves all decision-relevant information without truncation or prohibitive overhead. No fidelity metrics, information-loss analysis, or cost measurements are provided to substantiate this; if the model summarizes or filters traces, downstream mining may miss the very deviations the approach claims to surface.
  2. [Abstract] Abstract, point (3) and practitioner study description: the reported results on practitioner views of transparency as a prerequisite for trust are presented without any methods details, survey instrument, sampling procedure, response analysis, or quantitative breakdown. This absence prevents verification of the exploratory study's soundness and weakens its role as supporting evidence for the governance utility claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract claims and the practitioner evaluation. We address each major comment below and will revise the manuscript accordingly to improve substantiation and transparency.

read point-by-point responses
  1. Referee: [Abstract] Abstract, points (1) and (2): the claim that the event data model enables detection of policy deviations and quantification of operational variability rests on the assumption that the mapping from raw reasoning traces and tool calls to logs preserves all decision-relevant information without truncation or prohibitive overhead. No fidelity metrics, information-loss analysis, or cost measurements are provided to substantiate this; if the model summarizes or filters traces, downstream mining may miss the very deviations the approach claims to surface.

    Authors: We agree that the abstract does not include explicit fidelity metrics or overhead measurements. The full manuscript (Section 3) defines a direct mapping that retains all reasoning traces, tool calls, and costs without summarization or filtering, and Section 4 demonstrates deviation detection on the order-to-cash process. To strengthen the claim, we will add a dedicated analysis subsection quantifying information preservation (e.g., trace completeness) and token overhead in the revised version. revision: yes

  2. Referee: [Abstract] Abstract, point (3) and practitioner study description: the reported results on practitioner views of transparency as a prerequisite for trust are presented without any methods details, survey instrument, sampling procedure, response analysis, or quantitative breakdown. This absence prevents verification of the exploratory study's soundness and weakens its role as supporting evidence for the governance utility claim.

    Authors: We acknowledge that the abstract and study description lack sufficient methodological detail. The exploratory study with 18 practitioners is reported in Section 5, but we agree more transparency is needed. In revision we will expand both the abstract and Section 5 to include the survey instrument, sampling approach, response rate, and quantitative breakdown of findings. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical application of process mining to AI agents with no derivations or self-referential reductions

full rationale

The paper introduces Agent Behavior Mining as the application of existing process mining techniques to generative AI agent logs via an event data model. No equations, fitted parameters, predictions derived from inputs, or load-bearing self-citations appear in the abstract or described contributions. The three steps—defining the data model, instantiating it in an order-to-cash system, and running a practitioner study—are presented as independent empirical work without any reduction of outputs to inputs by construction. This matches the default expectation of a non-circular paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no free parameters, axioms, or invented entities are specified in the provided text.

pith-pipeline@v0.9.1-grok · 5738 in / 1119 out tokens · 21655 ms · 2026-06-27T05:12:24.362544+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

30 extracted references

  1. [1]

    Springer, 2 edn

    van der Aalst, W.M.P.: Process Mining - Data Science in Action. Springer, 2 edn. (2016)

  2. [2]

    vanderAalst,W.M.P.,Bichler,M.,Heinzl,A.:Roboticprocessautomation.Business & information systems engineering 60(4), 269–272 (2018)

  3. [3]

    In: Contemporary issues in database design and information systems development, pp

    van der Aalst, W.M.P., Netjes, M., Reijers, H.A.: Supporting the full bpm life-cycle using process mining and intelligent redesign. In: Contemporary issues in database design and information systems development, pp. 100–132. IGI Global Scientific Publishing (2007)

  4. [4]

    Qualitative research in psychology 3(2), 77–101 (2006)

    Braun, V., Clarke, V.: Using thematic analysis in psychology. Qualitative research in psychology 3(2), 77–101 (2006)

  5. [5]

    MIS quarterly 13(3), 319–340 (1989)

    Davis, F.D.: Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS quarterly 13(3), 319–340 (1989)

  6. [6]

    arXiv preprint arXiv:2411.05285 (2024)

    Dong, L., Lu, Q., Zhu, L.: Agentops: Enabling observability of llm agents. arXiv preprint arXiv:2411.05285 (2024)

  7. [7]

    ACM Trans

    Dumas, M., Fournier, F., Limonad, L., Marrella, A., Montali, M., Rehse, J.R., Ac- corsi, R., Calvanese, D., De Giacomo, G., Fahland, D., et al.: AI-augmented business process management systems: a research manifesto. ACM Trans. on Management Information Systems 14(1), 1–19 (2023)

  8. [8]

    Springer (2018)

    Dumas, M., Rosa, L.M., Mendling, J., Reijers, A.H.: Fundamentals of business process management. Springer (2018)

  9. [9]

    arXiv preprint arXiv:2410.14495 (2024)

    Fahland, D., Montali, M., Lebherz, J., van der Aalst, W.M.P., van Asseldonk, M., Blank, P., Bosmans, L., Brenscheidt, M., Di Ciccio, C., Delgado, A., et al.: Towards a simple and extensible standard for object-centric event data (oced)–core model, design space, and lessons learned. arXiv preprint arXiv:2410.14495 (2024)

  10. [10]

    Sage publications (2013) Agent Behavior Mining 17

    Fowler Jr, F.J.: Survey research methods. Sage publications (2013) Agent Behavior Mining 17

  11. [11]

    Academy of management annals 14(2), 627–660 (2020)

    Glikson, E., Woolley, A.W.: Human trust in artificial intelligence: Review of empir- ical research. Academy of management annals 14(2), 627–660 (2020)

  12. [12]

    BPM reports, BPMcenter.org (2014)

    Gunther, C., Verbeek, H.: XES - standard definition. BPM reports, BPMcenter.org (2014)

  13. [13]

    In: Design research in information systems: theory and practice, pp

    Hevner, A., Chatterjee, S.: Design science research in information systems. In: Design research in information systems: theory and practice, pp. 9–22. Springer (2010)

  14. [14]

    Jennings, N.R., Norman, T.J., Faratin, P., O’Brien, P.D., Odgers, B.: Autonomous agents for business process management. Appl. Artif. Intell. 14(2), 145–189 (2000)

  15. [15]

    Academy of management annals 14(1), 366–410 (2020)

    Kellogg, K.C., Valentine, M.A., Christin, A.: Algorithms at work: The new contested terrain of control. Academy of management annals 14(1), 366–410 (2020)

  16. [16]

    In: Proceedings of the International Conference on Information Systems (ICIS) 2017

    Mohlmann, M., Zalmanson, L.: Hands on the wheel: Navigating algorithmic manage- ment and uber drivers’ autonomy. In: Proceedings of the International Conference on Information Systems (ICIS) 2017. AIS (2017),https://aisel.aisnet.org/ icis2017/DigitalPlatforms/Presentations/3, paper 3

  17. [17]

    arXiv preprint arXiv:2503.06745 (2025)

    Moshkovich, D., Mulian, H., Zeltyn, S., Eder, N., Skarbovsky, I., Abitbol, R.: Beyond black-box benchmarking: Observability, analytics, and optimization of agentic systems. arXiv preprint arXiv:2503.06745 (2025)

  18. [18]

    OpenTelemetry Authors: Semantic conventions for generative ai systems.https: //opentelemetry.io/docs/specs/semconv/gen-ai/(2024), version 1.36.0

  19. [19]

    Journal of applied psychology 88(5), 879 (2003)

    Podsakoff, P.M., MacKenzie, S.B., Lee, J.Y., Podsakoff, N.P.: Common method biases in behavioral research: a critical review of the literature and recommended remedies. Journal of applied psychology 88(5), 879 (2003)

  20. [20]

    In: Proceedings of the 2020 conference on fairness, accountability, and transparency

    Raji, I.D., Smart, A., White, R.N., Mitchell, M., Gebru, T., Hutchinson, B., SmithLoud, J., Theron, D., Barnes, P.: Closing the ai accountability gap: Defining an end-to-end framework for internal algorithmic auditing. In: Proceedings of the 2020 conference on fairness, accountability, and transparency. pp. 33–44 (2020)

  21. [21]

    Com- puters in Industry 126, 103404 (2021)

    Reijers, H.A.: Business process management: The evolution of a discipline. Com- puters in Industry 126, 103404 (2021)

  22. [22]

    In: Handbook on business process management 1: introduction, methods, and information systems, pp

    Rosemann, M., vom Brocke, J.: The six core elements of business process manage- ment. In: Handbook on business process management 1: introduction, methods, and information systems, pp. 105–122. Springer (2014)

  23. [23]

    ACM SIGAda Ada Letters 43(2), 43–51 (2024)

    Schmidt, D.C., Spencer-Smith, J., Fu, Q., White, J.: Towards a catalog of prompt patterns to enhance the discipline of prompt engineering. ACM SIGAda Ada Letters 43(2), 43–51 (2024)

  24. [24]

    In: Conceptual Modeling

    Shen, Q., Polyvyanyy, A., Lipovetzky, N., Kampik, T.: Agent system event data: Concepts, dimensions, applications. In: Conceptual Modeling. pp. 56–72 (2024)

  25. [25]

    IEEE Access 9, 99480–99494 (2021)

    Tour, A., Polyvyanyy, A., Kalenkova, A.A.: Agent system mining: Vision, benefits, and challenges. IEEE Access 9, 99480–99494 (2021)

  26. [26]

    In: Business Process Management

    Tour, A., Polyvyanyy, A., Kalenkova, A.A., Senderovich, A.: Agent miner: An algorithm for discovering agent systems from event data. In: Business Process Management. pp. 284–302 (2023)

  27. [27]

    Harvard Business Press (2004)

    Weill, P., Ross, J.W.: IT governance: How top performers manage IT decision rights for superior results. Harvard Business Press (2004)

  28. [28]

    Science China Information Sciences 68(2), 121101 (2025)

    Xi, Z., Chen, W., Guo, X., He, W., Ding, Y., Hong, B., Zhang, M., Wang, J., Jin, S., Zhou, E., et al.: The rise and potential of large language model based agents: A survey. Science China Information Sciences 68(2), 121101 (2025)

  29. [29]

    In: ICLR

    Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K.R., Cao, Y.: React: Synergizing reasoning and acting in language models. In: ICLR. OpenReview.net (2023),https://openreview.net/forum?id=WE_vluYUL-X

  30. [30]

    Yin, R.K.: Case study research: Design and methods, vol. 5. sage (2009)