arxiv: 2605.07541 · v1 · submitted 2026-05-08 · 📡 eess.SY · cs.SY

Recognition: no theorem link

Learning Neural Hybrid Surrogates for Gradient-Based Falsification

Authors on Pith no claims yet

Pith reviewed 2026-05-11 02:57 UTC · model grok-4.3

classification 📡 eess.SY cs.SY

keywords hybrid dynamical systemsfalsificationneural hybrid automatasurrogate modelinggradient-based optimizationsafety verificationdata-driven model learning

0 comments

The pith

Learning a neural hybrid automaton from trajectory data allows gradient-based optimization on the surrogate to discover valid counterexamples in hybrid systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a falsification method for hybrid dynamical systems that combines continuous flows with discrete mode switches. It learns a neural hybrid automaton surrogate consisting of a latent mode encoder, mode-specific dynamics, and inferred guards. This surrogate is made differentiable so that gradient-based optimal control can minimize a smooth approximation of a safety specification and produce candidate violating inputs. Those inputs are then executed on the original system to confirm any violation, preserving soundness. The approach matters because direct search on hybrid models is hindered by non-differentiable transitions, and the method reports better sample efficiency with lower simulation cost on most tested benchmarks.

Core claim

A neural hybrid automaton is learned from data to approximate both the latent modes and their vector fields, after which transition guards are extracted from existing trajectories. Gradient-based optimal control is then formulated on this differentiable surrogate to minimize a smooth safety metric, yielding candidate counterexamples that are re-simulated on the original hybrid system to ensure they remain valid violations.

What carries the argument

The neural hybrid automaton surrogate, which learns a latent mode encoder together with mode-conditioned vector fields and infers guards from data, serving as the differentiable model for gradient-based search.

If this is right

The method uncovers counterexamples on a majority of evaluated benchmark specifications.
It achieves competitive or improved sample efficiency relative to existing tools.
It operates with a reduced simulation budget during the search process.
It extends prior surrogate-based falsification techniques from purely continuous dynamics to full hybrid systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The learned surrogate could be reused for multiple safety specifications on the same system, spreading the cost of the initial data-driven learning step.
Similar differentiable hybrid models might support other tasks such as reachability analysis or controller synthesis that also require gradient information.

Load-bearing premise

The learned neural hybrid automaton must be accurate enough that inputs found to violate the specification on the surrogate still violate it when executed on the original hybrid system.

What would settle it

A benchmark where an input optimized on the surrogate produces a trajectory that satisfies the safety specification when re-simulated on the original system would demonstrate that surrogate error prevents reliable falsification.

Figures

Figures reproduced from arXiv: 2605.07541 by Knut {\AA}kesson, Lasse K\"otz.

**Figure 3.** Figure 3: The learned NHA vector fields used to successfully [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

read the original abstract

Falsification of hybrid dynamical systems remains challenging due to mode-dependent dynamics and discrete transitions. In this work, we propose a surrogate-based falsification approach that enables hybrid systems by learning a differentiable hybrid automaton model from data. This extends previous surrogate-based falsification methods, which were limited to purely continuous dynamics. Specifically, we employ neural hybrid automata to learn both a latent mode encoder and the corresponding mode-conditioned vector fields. Once the surrogate has paired each mode with an associated vector field, the transition guards are inferred using existing trajectory data. The learned surrogate is subsequently subjected to a gradient-based optimal control formulation, which minimizes a smooth approximation of the safety specification to find safety violations. In the last step, an experiment with the optimal control solution is carried out on the original system to ensure soundness. The proposed method consistently uncovers counterexamples on a majority of evaluated benchmark specifications; on these cases, it achieves competitive or improved sample efficiency than other tools while using a reduced simulation budget.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The neural hybrid surrogate extends falsification to mode-switching systems in a clean way, but the abstract gives no numbers on approximation error or transfer success so the performance claims are hard to assess.

read the letter

Hi, the main point is that they learn a neural hybrid automaton from data to create a differentiable stand-in for the real hybrid system, run gradient-based search on it to find candidate violations, and then re-simulate those candidates on the original model. This is new compared with earlier surrogate falsification work that stayed in continuous dynamics only. The mode encoder plus mode-specific vector fields plus data-driven guard inference is a reasonable way to capture the discrete structure without hand-labeling modes. Keeping the final check on the real system is the right safeguard against the surrogate leading you astray. That part of the workflow is solid in principle. The soft spot is exactly what the stress-test note flags: we get no quantitative sense of how close the surrogate stays to the real dynamics near the points the optimizer cares about. The abstract says the method finds counterexamples on a majority of benchmarks with competitive or better sample efficiency and lower simulation cost, but there are no error metrics, no ablation on training coverage, no count of how often the surrogate optimum fails the re-simulation step, and no statistical detail. Without those, it is difficult to know whether the gradient search is actually reliable or whether the reported wins depend on lucky benchmark choices. This is for people working on scalable verification of hybrid models in robotics or automotive control. It is worth sending to peer review because the core construction is new and the soundness step is properly placed, even though the experiments will need tightening on surrogate fidelity before the performance claims can be taken at face value.

Referee Report

3 major / 2 minor

Summary. The paper proposes a surrogate-based falsification method for hybrid dynamical systems. It learns a neural hybrid automaton from trajectory data, consisting of a latent mode encoder and mode-conditioned vector fields, with transition guards inferred from existing data. Gradient-based optimal control is then applied to the differentiable surrogate to minimize a smooth approximation of the safety specification. Candidate solutions are validated by simulation on the original system. The method is reported to uncover counterexamples on a majority of benchmark specifications, achieving competitive or improved sample efficiency with a reduced simulation budget.

Significance. If the empirical results hold under closer scrutiny, the work extends surrogate-based falsification from continuous to hybrid systems by providing a differentiable model that supports gradient search while preserving soundness through final validation on the original dynamics. This avoids circularity in the performance claims. The combination of learned mode encoding with vector fields and guard inference represents a technical step forward for handling discrete transitions in learned surrogates.

major comments (3)

[Abstract] Abstract: The central performance claim that the method 'consistently uncovers counterexamples on a majority of evaluated benchmark specifications' with 'competitive or improved sample efficiency' is not supported by any quantitative details on the number of benchmarks, success rates per specification, statistical significance, surrogate approximation error, or failure cases.
[Method] Method section (neural hybrid automaton and guard inference): No quantitative bounds on the approximation error of the learned surrogate (e.g., trajectory prediction accuracy) or analysis of how errors in inferred guards propagate into the gradient-based optimal control problem are provided; this is load-bearing because the workflow relies on the surrogate optimum transferring to valid counterexamples on the original system.
[Experiments] Experiments section: No ablations on training-trajectory coverage or metrics evaluating surrogate fidelity near the discovered optima are reported, leaving open the possibility that the optimizer exploits surrogate inaccuracies that do not survive re-simulation on the original hybrid system.

minor comments (2)

[Method] The notation for the smooth safety approximation objective could be introduced with an explicit equation to improve readability of the optimization step.
[Related Work] Ensure the related-work discussion explicitly contrasts the proposed neural hybrid automaton with prior continuous-only surrogate methods to highlight the extension.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and outline the revisions we will make to the manuscript. These changes will improve the quantitative support for our claims and the analysis of the surrogate model while maintaining the soundness of the approach through final validation on the original system.

read point-by-point responses

Referee: [Abstract] Abstract: The central performance claim that the method 'consistently uncovers counterexamples on a majority of evaluated benchmark specifications' with 'competitive or improved sample efficiency' is not supported by any quantitative details on the number of benchmarks, success rates per specification, statistical significance, surrogate approximation error, or failure cases.

Authors: We agree that the abstract would be strengthened by incorporating specific quantitative details from the experimental evaluation. In the revised version, we will update the abstract to report the number of benchmark specifications and hybrid systems tested, the overall success rate in uncovering counterexamples, and references to the tables and figures that detail sample-efficiency comparisons, statistical results, and any observed failure cases. This will provide a more precise summary of the performance claims without altering the manuscript's core narrative. revision: yes
Referee: [Method] Method section (neural hybrid automaton and guard inference): No quantitative bounds on the approximation error of the learned surrogate (e.g., trajectory prediction accuracy) or analysis of how errors in inferred guards propagate into the gradient-based optimal control problem are provided; this is load-bearing because the workflow relies on the surrogate optimum transferring to valid counterexamples on the original system.

Authors: We acknowledge the value of quantitative surrogate-error analysis. The current manuscript ensures soundness exclusively via re-simulation of all candidate solutions on the original hybrid dynamics rather than relying on a priori error bounds. In the revision, we will add a new subsection reporting empirical trajectory-prediction accuracy (e.g., mean-squared error on held-out trajectories) and a discussion of how guard-inference errors affect the optimal-control search. We will also present empirical transfer rates from surrogate optima to original-system counterexamples across the benchmarks, clarifying that the gradient-based search serves only to improve efficiency while the final validation step preserves correctness. revision: partial
Referee: [Experiments] Experiments section: No ablations on training-trajectory coverage or metrics evaluating surrogate fidelity near the discovered optima are reported, leaving open the possibility that the optimizer exploits surrogate inaccuracies that do not survive re-simulation on the original hybrid system.

Authors: We agree that targeted ablations and local fidelity metrics would further address concerns about surrogate exploitation. In the revised experiments section, we will include ablations that vary the number and coverage of training trajectories and report their impact on falsification success. We will also add metrics that evaluate surrogate prediction error specifically in neighborhoods of the discovered optima (e.g., by comparing surrogate and original trajectories at the optimized control inputs). These additions will provide direct evidence that the reported counterexamples are not artifacts of surrogate inaccuracies. revision: yes

Circularity Check

0 steps flagged

No circularity: surrogate learning, gradient optimization, and original-system validation form an externally checked pipeline

full rationale

The paper's workflow learns a neural hybrid automaton (latent mode encoder, mode-conditioned vector fields, and data-inferred guards) from trajectory data, performs gradient-based minimization of a smooth safety approximation on the learned surrogate, and then re-simulates candidate solutions on the original hybrid system to confirm soundness. This final validation step supplies an independent check that prevents any discovered counterexample from reducing to a fitted quantity by construction. No load-bearing step equates a prediction to its training inputs, no self-citation is invoked to justify uniqueness or an ansatz, and the reported performance is measured against external benchmarks rather than internal consistency alone. The derivation chain therefore remains self-contained.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the assumption that trajectory data suffice to learn an accurate differentiable surrogate whose optimization yields transferable counterexamples; the neural network weights constitute the primary fitted elements.

free parameters (1)

neural network weights for mode encoder and vector fields
Learned from data to approximate mode-dependent dynamics and transitions.

axioms (1)

domain assumption Trajectory data contain sufficient information to infer both latent modes and transition guards accurately enough for downstream optimization.
Invoked when the surrogate is trained and guards are extracted from existing runs.

invented entities (1)

neural hybrid automaton no independent evidence
purpose: Differentiable surrogate that encodes modes, vector fields, and guards for gradient-based search.
New modeling construct introduced to handle hybrid dynamics; no independent falsifiable prediction outside the method's benchmark performance is supplied.

pith-pipeline@v0.9.0 · 5471 in / 1287 out tokens · 56324 ms · 2026-05-11T02:57:59.249997+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages

[1]

Optimal control-based falsifi- cation of learnt dynamics via neural odes and symbolic regression,

L. K ¨otz, J. Sj ¨oberg, and K. ˚Akesson, “Optimal control-based falsifi- cation of learnt dynamics via neural odes and symbolic regression,” Saint Malo, France, May in press, accepted for publication, to appear in HSCC 2026 proceedings. TABLE II: Performance of our approach in the modified chasing cars model vs the participating tools in the ARCH- COMP 2...

work page 2026
[2]

Arch-comp 2024 category report: Falsification,

T. Khandait, F. Formica, P. Arcaini, S. Chotaliya, G. Fainekos, A. Hekal, A. Kundu, E. Lew, M. Loreti, C. Menghiet al., “Arch-comp 2024 category report: Falsification,” inInternational Workshop on Applied Verification for Continuous and Hybrid Systems. EasyChair, 2024, pp. 122–144

work page 2024
[3]

S- taliro: A tool for temporal logic falsification for hybrid systems,

Y . Annpureddy, C. Liu, G. Fainekos, and S. Sankaranarayanan, “S- taliro: A tool for temporal logic falsification for hybrid systems,” in International Conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer, 2011, pp. 254–257

work page 2011
[4]

Breach, a toolbox for verification and parameter synthesis of hybrid systems,

A. Donz ´e, “Breach, a toolbox for verification and parameter synthesis of hybrid systems,” inInternational Conference on Computer Aided Verification. Springer, 2010, pp. 167–170

work page 2010
[5]

Effective hybrid system falsification using monte carlo tree search guided by QB-robustness,

Z. Zhang, D. Lyu, P. Arcaini, L. Ma, I. Hasuo, and J. Zhao, “Effective hybrid system falsification using monte carlo tree search guided by QB-robustness,” inInternational Conference on Computer Aided Verification. Springer, 2021, pp. 595–618

work page 2021
[6]

Search-based software testing driven by automatically generated and manually defined fitness func- tions,

F. Formica, T. Fan, and C. Menghi, “Search-based software testing driven by automatically generated and manually defined fitness func- tions,”ACM Transactions on Software Engineering and Methodology, vol. 33, no. 2, pp. 1–37, 2023

work page 2023
[7]

Falsification of cyber-physical systems using bayesian optimization,

Z. Ramezani, K. ˇSehi´c, L. Nardi, and K. ˚Akesson, “Falsification of cyber-physical systems using bayesian optimization,”ACM Transac- tions on Embedded Computing Systems, vol. 24, no. 3, pp. 1–23, 2025

work page 2025
[8]

Data-driven falsification of cyber- physical systems,

A. Kundu, S. Gon, and R. Ray, “Data-driven falsification of cyber- physical systems,” inProceedings of the 17th Innovations in Software Engineering Conference, 2024, pp. 1–5

work page 2024
[9]

Wasserstein generative adversarial networks for online test generation for cyber physical systems,

J. Peltom ¨aki, F. Spencer, and I. Porres, “Wasserstein generative adversarial networks for online test generation for cyber physical systems,” inProceedings of the 15th Workshop on Search-Based Software Testing, 2022, pp. 1–5

work page 2022
[10]

Moon- light: a lightweight tool for monitoring spatio-temporal properties,

L. Nenzi, E. Bartocci, L. Bortolussi, S. Silvetti, and M. Loreti, “Moon- light: a lightweight tool for monitoring spatio-temporal properties,” International Journal on Software Tools for Technology Transfer, vol. 25, no. 4, pp. 503–517, 2023

work page 2023
[11]

Approximation- refinement testing of compute-intensive cyber-physical models: An approach based on system identification,

C. Menghi, S. Nejati, L. Briand, and Y . I. Parache, “Approximation- refinement testing of compute-intensive cyber-physical models: An approach based on system identification,” inProceedings of the ACM/IEEE 42nd International Conference on Software Engineering, 2020, pp. 372–384

work page 2020
[12]

Falsification using reachability of surrogate koopman models,

S. Bak, S. Bogomolov, A. Hekal, N. Kochdumper, E. Lew, A. Mata, and A. Rahmati, “Falsification using reachability of surrogate koopman models,” inProceedings of the 27th ACM International Conference on Hybrid Systems: Computation and Control, 2024, pp. 1–13

work page 2024
[13]

Dryvr: Data-driven verification and compositional reasoning for automotive systems,

C. Fan, B. Qi, S. Mitra, and M. Viswanathan, “Dryvr: Data-driven verification and compositional reasoning for automotive systems,” in International Conference on Computer Aided Verification. Springer, 2017, pp. 441–461

work page 2017
[14]

Falsification of cyber-physical systems with robustness- guided black-box checking,

M. Waga, “Falsification of cyber-physical systems with robustness- guided black-box checking,” inProceedings of the 23rd International Conference on Hybrid Systems: Computation and Control, 2020, pp. 1–13

work page 2020
[15]

Dynamical properties of hybrid automata,

J. Lygeros, K. H. Johansson, S. N. Simic, J. Zhang, and S. S. Sastry, “Dynamical properties of hybrid automata,”IEEE Transactions on automatic control, vol. 48, no. 1, pp. 2–17, 2003

work page 2003
[16]

Robustness of temporal logic spec- ifications for continuous-time signals,

G. E. Fainekos and G. J. Pappas, “Robustness of temporal logic spec- ifications for continuous-time signals,”Theoretical Computer Science, vol. 410, no. 42, pp. 4262–4291, 2009

work page 2009
[17]

Neural hybrid automata: Learning dynamics with multiple modes and stochastic transitions,

M. Poli, S. Massaroli, L. Scimeca, S. Chun, S. J. Oh, A. Yamashita, H. Asama, J. Park, and A. Garg, “Neural hybrid automata: Learning dynamics with multiple modes and stochastic transitions,”Advances in Neural Information Processing Systems, vol. 34, pp. 9977–9989, 2021

work page 2021
[18]

Neu- ral ordinary differential equations,

T. Q. Chen, Y . Rubanova, J. Bettencourt, and D. Duvenaud, “Neu- ral ordinary differential equations,”Advances in neural information processing systems, vol. 31, 2018

work page 2018
[19]

L. S. Pontryagin,Mathematical theory of optimal processes. Rout- ledge, 2018

work page 2018
[20]

Smooth operator: Control using the smooth robustness of temporal logic,

Y . V . Pant, H. Abbas, and R. Mangharam, “Smooth operator: Control using the smooth robustness of temporal logic,” in2017 IEEE Confer- ence on Control Technology and Applications (CCTA). IEEE, 2017, pp. 1235–1240

work page 2017
[21]

Towards a theory of stochastic hybrid systems,

J. Hu, J. Lygeros, and S. Sastry, “Towards a theory of stochastic hybrid systems,” inInternational Workshop on Hybrid Systems: Computation and Control. Springer, 2000, pp. 160–173

work page 2000