pith. machine review for the scientific record. sign in

arxiv: 2604.21378 · v1 · submitted 2026-04-23 · 💻 cs.FL

Recognition: unknown

Active Inference of Extended Finite State Machine Models with Registers and Guards

Authors on Pith no claims yet

Pith reviewed 2026-05-08 13:07 UTC · model grok-4.3

classification 💻 cs.FL
keywords active learningextended finite state machinesmodel inferenceblack-box learningregistersguardssoftware reverse engineeringstate machines
0
0 comments X

The pith

A black-box active learning algorithm infers extended finite state machines that include internal registers and guard conditions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper presents an algorithm for actively learning models of stateful systems that include internal data storage and conditional transitions. Unlike earlier methods, it does not require the ability to reset the system to a known state or assume that control decisions ignore stored data. The approach works solely through observations and input queries, allowing inference even when internal variables affect behavior in complex ways. If successful, such models enable better understanding and testing of software systems that maintain state across interactions.

Core claim

The authors introduce a black-box active learning algorithm that constructs extended finite state machine models incorporating registers for data storage and guards for data-dependent control flow, under significantly relaxed assumptions compared to prior techniques that necessitated resets or data-independent control.

What carries the argument

The active learning algorithm that uses queries to distinguish register values and guard conditions without requiring system resets.

If this is right

  • Models of systems with data-dependent behavior can be inferred from black-box interactions alone.
  • Reverse engineering becomes feasible for systems that cannot be reset during the learning process.
  • Applications in software engineering include automated model-based testing and verification without prior knowledge of data dependencies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This approach could extend to learning other formal models that combine control flow with data, such as those with more complex memory structures.
  • It opens the possibility of applying active inference to live, non-resettable systems in operational environments.
  • Integration with existing testing tools might allow automated generation of models for legacy software without source access.

Load-bearing premise

The system provides enough observable behavior through queries to uniquely identify the values stored in registers and the logical conditions of the guards.

What would settle it

A system where multiple possible register assignments or guard conditions produce identical external behavior, causing the algorithm to output an ambiguous or incorrect model.

Figures

Figures reproduced from arXiv: 2604.21378 by Adenilso Simao (ICMC-USP), Catherine Oriat (LIG), German Eduardo Vega Baez (LIG), Michael Foster, Neil Walkinshaw, Roland Groz (LIG).

Figure 1
Figure 1. Figure 1: Our vending machine EFSM and an example trace. view at source ↗
Figure 2
Figure 2. Figure 2: Example of an execution trace for the running example. view at source ↗
Figure 3
Figure 3. Figure 3: Control NFSM of the vending machine. When we exercise a system, we learn occurrences of transitions with concrete values for input and output parameters. For each transition of a control machine, we may collect several sample values. We call a sampled FSM the quintuple (Q, I, O, ∆, Λ) where Λ : Q × I × O → 2 R×I×O×R associates transitions from the control machine to a table of observed concrete inputs and … view at source ↗
Figure 4
Figure 4. Figure 4: Sampled FSM with trace from Fig. 2 view at source ↗
Figure 5
Figure 5. Figure 5: Learning initial daisy, and first sample view at source ↗
Figure 6
Figure 6. Figure 6: The sampled FSM after 47 steps view at source ↗
Figure 7
Figure 7. Figure 7: The generalised inferred EFSM At this point, the sampled FSM is complete, so the ehW -inference global algorithm calls the generalise procedure that uses genetic programming to generalise from the collected samples to get the EFSM, as was done in [5]. The algorithm stops with the final conjectured EFSM shown in view at source ↗
read the original abstract

Extended finite state machines (EFSMs) model stateful systems with internal data variables and have numerous applications in software engineering. A major advantage of this type of model lies in its ability to model both the data flow and the data-dependent control behaviour. In the absence of such models, it is desirable to reverse-engineer them by observing the system's behaviour. However, existing approaches generally require the ability to reset the system during inference, or can only handle situations where the control flow depends exclusively on the input parameters, and not on the values of the stored data. In this work, we present a black-box active learning algorithm that infers EFSMs with guards and registers, and which significantly relaxes the assumptions that have to be made about the system in comparison to previous attempts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript presents a black-box active learning algorithm for inferring Extended Finite State Machines (EFSMs) that incorporate registers and guards. It claims this approach significantly relaxes prior assumptions, such as the need for system resets during inference or the restriction that control flow depends exclusively on input parameters rather than stored data values.

Significance. If the algorithm is correct and applicable under the stated relaxed assumptions, it would advance automata learning techniques for data-dependent systems, broadening their use in software engineering for reverse-engineering stateful behaviors without requiring resets.

major comments (2)
  1. Abstract: the central claim that the algorithm 'significantly relaxes the assumptions' is unsupported by any statement of the precise new assumptions, comparison to prior work, or evidence of correctness, which is load-bearing for the contribution.
  2. The manuscript: no formal description of the algorithm, pseudocode, proof of termination/correctness, or empirical validation is provided to substantiate that query observations alone suffice to resolve register values and data-dependent guards.
minor comments (1)
  1. Abstract: the description of the weakest assumption (sufficient black-box observations to distinguish registers/guards) could be clarified with a brief example of a data-dependent transition.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their constructive comments. We address the major comments point by point below, indicating where revisions will be made to improve clarity and substantiation of our claims.

read point-by-point responses
  1. Referee: Abstract: the central claim that the algorithm 'significantly relaxes the assumptions' is unsupported by any statement of the precise new assumptions, comparison to prior work, or evidence of correctness, which is load-bearing for the contribution.

    Authors: We agree that the abstract would benefit from greater precision. In the revised manuscript we will expand the abstract to explicitly state the relaxed assumptions (no resets required during inference; guards may depend on register values rather than input parameters alone) and include a concise comparison to prior EFSM learning approaches. We will also add a brief pointer to the correctness argument given in the body of the paper. revision: yes

  2. Referee: The manuscript: no formal description of the algorithm, pseudocode, proof of termination/correctness, or empirical validation is provided to substantiate that query observations alone suffice to resolve register values and data-dependent guards.

    Authors: The manuscript contains a textual description of the active-learning procedure and how observations are used to infer states, registers and guards. We acknowledge, however, that the presentation can be strengthened. We will add pseudocode for the core procedures and expand the empirical examples to more clearly illustrate resolution of register values and data-dependent guards. A complete formal proof of termination and correctness is not present in the current version. revision: partial

standing simulated objections not resolved
  • A complete formal proof of termination and correctness

Circularity Check

0 steps flagged

No significant circularity: algorithmic presentation without derivations or fitted parameters

full rationale

The paper presents a black-box active learning algorithm for inferring EFSMs with guards and registers. No equations, parameters, or derivations are present that could reduce to inputs by construction. The contribution is the description of an algorithm that relaxes prior assumptions on resets and data dependencies, which stands as an independent algorithmic advance rather than a logical or statistical reduction to self-defined or fitted elements. The central claim relies on the empirical applicability of the algorithm under stated black-box query assumptions, with no self-citation load-bearing or ansatz smuggling identifiable in the provided text.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only abstract available; no specific free parameters, axioms, or invented entities can be extracted. The approach implicitly assumes queryable black-box behavior sufficient to learn guards and registers.

pith-pipeline@v0.9.0 · 5457 in / 1005 out tokens · 31665 ms · 2026-05-08T13:07:34.241532+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

13 extracted references

  1. [1]

    Machine learn ing 2(4) (1988)

    Angluin, D.: Queries and concept learning. Machine learn ing 2(4) (1988)

  2. [2]

    Acm Sigplan Notices 48(10) (2013)

    Choi, W., Necula, G., Sen, K.: Guided gui testing of androi d apps with minimal restart and approximate learning. Acm Sigplan Notices 48(10) (2013)

  3. [3]

    In: Internationa l Conference on Integrated Formal Methods

    Damasceno, C.D.N., Mousavi, M.R., da Silva Simao, A.: Lea rning to reuse: Adap- tive model learning for evolving systems. In: Internationa l Conference on Integrated Formal Methods. Springer (2019)

  4. [4]

    In: IFIP International Conference on Testin g Software and Systems

    Foster, M., Derrick, J., Walkinshaw, N.: Reverse-engine ering EFSMs with data dependencies. In: IFIP International Conference on Testin g Software and Systems. Springer (2022)

  5. [5]

    In: International Conference on Formal Engineering Methods

    Foster, M., Groz, R., Oriat, C., da Silva Sim˜ ao, A., Vega, G., Walkinshaw, N.: Active inference of EFSMs without reset. In: International Conference on Formal Engineering Methods. Lecture Notes in Computer Science, vo l. 14308, pp. 29–46. Springer (2023)

  6. [6]

    In: International Conference on Tools and Algorithms for the Construction and Analysis of Systems

    Groce, A., Peled, D., Yannakakis, M.: Adaptive model chec king. In: International Conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer (2002)

  7. [7]

    Journal of Sys tems and Software 159 (2020)

    Groz, R., Bremond, N., Simao, A., Oriat, C.: hW -inference: A heuristic approach to retrieve models through black box testing. Journal of Sys tems and Software 159 (2020)

  8. [8]

    In: International Workshop on Verification, Model C hecking, and Abstract Interpretation

    Howar, F., Steffen, B., Jonsson, B., Cassel, S.: Inferring canonical register au- tomata. In: International Workshop on Verification, Model C hecking, and Abstract Interpretation. Springer (2012)

  9. [9]

    Machine Learning 96(1) (2014)

    Isberner, M., Howar, F., Steffen, B.: Learning register au tomata: from languages to program structures. Machine Learning 96(1) (2014)

  10. [10]

    Proceedings of the IEEE 84(8) (1996)

    Lee, D., Yannakakis, M.: Principles and methods of testi ng finite state machines-a survey. Proceedings of the IEEE 84(8) (1996)

  11. [11]

    lulu.com (2008), http://www.gp-field-guide.org.uk/

    Poli, R., Langdon, W.B., McPhee, N.F.: A Field Guide to Ge netic Programming. lulu.com (2008), http://www.gp-field-guide.org.uk/

  12. [12]

    In: Proceedings of the twenty-first annual ACM symposium on T heory of comput- ing (1989)

    Rivest, R.L., Schapire, R.E.: Inference of finite automa ta using homing sequences. In: Proceedings of the twenty-first annual ACM symposium on T heory of comput- ing (1989)

  13. [13]

    state under construction

    Walkinshaw, N., Hall, M.: Inferring computational stat e machine models from pro- gram executions. In: International Conference on Software Maintenance and Evo- lution. IEEE (2016) Appendix A Algorithms 1 Input: I1⊂ Is⊂I , W⊂ I ∗ 1 , h∈ I ∗ 1 ⊲ h and W can be empty, or hints could be provided 2 Initializing: T← ǫ ⊲ T is the learning trace 3 repeat 4 Q, ∆,...