arxiv: 2605.05395 · v1 · submitted 2026-05-06 · 💻 cs.LG · cs.MS

Recognition: unknown

Differentiable Parameter Optimization for DAEs with State-Dependent Events

Ion Matei , Maksym Zhenirovskyy , Anthony Wong

Authors on Pith no claims yet

Pith reviewed 2026-05-08 17:30 UTC · model grok-4.3

classification 💻 cs.LG cs.MS

keywords differentiable simulationdifferential-algebraic equationsstate-dependent eventsgradient computationhybrid dynamical systemsautomatic differentiationdiscrete adjoint

0 comments

The pith

Two gradient methods enable parameter optimization for DAEs interrupted by state-dependent events.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tackles gradient-based learning for differential-algebraic equations whose continuous evolution is interrupted by events such as mode switches, impacts, or reinitializations. It casts the task as a constrained least-squares problem that includes the DAE dynamics, algebraic constraints, guard conditions, and reset maps. Two strategies are presented: an automatic-differentiation approach that solves algebraic variables implicitly and integrates segment by segment, and a discrete-adjoint approach that treats the entire simulation as an event-split residual system solved via Lagrange multipliers. Both return gradients that match the event sequence chosen during the forward pass, provided ordering stays fixed and crossings remain transversal.

Core claim

The paper shows that gradients for the selected event path in semi-explicit DAE simulations can be obtained either by differentiating through implicit algebraic solves and segmented integration or by solving Lagrange multipliers for the residuals of smooth segments and events, clarifying that the adjoint treats residuals as equality constraints rather than penalties.

What carries the argument

The constrained least-squares formulation that incorporates DAE dynamics, guard equations, and reset maps, together with the two gradient strategies: automatic-differentiation-through-simulation using the implicit function theorem and explicit discrete-adjoint on an event-split residual system.

If this is right

Gradients are supplied only for the exact event path realized in the forward simulation.
Algebraic variables are differentiated via the implicit function theorem inside the vector field.
The adjoint method represents the simulation as an explicit event-split residual system.
Implementation complexity and local validity differ between the two methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The methods could support training of hybrid models that combine neural networks with physical DAE constraints.
If event ordering varies with parameters, multiple path evaluations or combinatorial search may become necessary.
Embedding these gradient routines in existing DAE solvers would broaden their use in control and robotics.

Load-bearing premise

Event ordering remains fixed and guard crossings stay transversal when parameters change.

What would settle it

A small parameter perturbation that causes the forward simulation to select a different event sequence or a non-transversal crossing, after which the computed gradients no longer match the actual change in the loss.

Figures

Figures reproduced from arXiv: 2605.05395 by Anthony Wong, Ion Matei, Maksym Zhenirovskyy.

**Figure 1.** Figure 1: Cauer ladder: optimized vs. true trajectories at the best-loss snapshot. Both routes capture the qualitative view at source ↗

**Figure 2.** Figure 2: Cauer ladder: Adam loss histories on log scale. The AD curve descends view at source ↗

**Figure 3.** Figure 3: Bouncing balls: prediction error vs. wall-clock time. The horizontal axis exposes the throughput differential: view at source ↗

read the original abstract

Differential-algebraic equations (DAEs) with state-dependent events arise in systems whose continuous dynamics are constrained by algebraic equations and interrupted by mode changes, switching logic, impacts, or state reinitializations. Gradient-based parameter learning for such systems is challenging because algebraic variables are implicitly defined, event times depend on the parameters, and reset maps introduce discontinuities. This paper studies differentiable parameter optimization for semi-explicit DAEs with events. We formulate the learning problem as a constrained least-squares problem with DAE dynamics, algebraic constraints, guard equations, and reset maps. We then develop two complementary gradient-computation strategies. The first is an automatic-differentiation-through-simulation method that solves algebraic variables inside the vector field, differentiates the algebraic solve using the implicit function theorem, and handles events through segmented differentiable integration. The second is an explicit discrete-adjoint method that represents the forward simulation as an event-split residual system and computes gradients by solving for the Lagrange multipliers of smooth-segment and event residuals. The formulation clarifies that residual terms in the adjoint method are equality constraints, not heuristic penalties. We compare the two approaches in terms of gradient interpretation, event-time handling, implementation complexity, and local validity. Both methods provide gradients for the event path selected by the forward simulation and are valid under fixed event ordering and transversal guard crossings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives two solid gradient methods for DAEs with state-dependent events but both require the event sequence to stay fixed as parameters change.

read the letter

The core contribution is a pair of gradient strategies for optimizing parameters in semi-explicit DAEs interrupted by state-dependent events. One runs automatic differentiation through the simulation, using the implicit function theorem to handle the algebraic variables inside each segment and splitting the integration at events. The other recasts the forward pass as an event-split residual system and solves a discrete adjoint with Lagrange multipliers on the smooth segments and the event residuals. Both are shown to give gradients along the path actually taken by the forward simulation.

Referee Report

2 major / 2 minor

Summary. The paper develops two gradient-computation methods for parameter optimization in semi-explicit DAEs with state-dependent events: (1) automatic differentiation through segmented simulation that solves algebraic variables via the implicit function theorem and (2) a discrete-adjoint method that treats the event-split forward simulation as an equality-constrained residual system. Both methods are stated to yield gradients for the forward-selected event path under the assumptions of fixed event ordering and transversal guard crossings.

Significance. If the stated validity conditions can be maintained or monitored during optimization, the work supplies practical tools for end-to-end differentiable learning in hybrid DAE systems that appear in robotics, circuit simulation, and mechanical contact problems. The explicit residual-constraint formulation and the side-by-side comparison of AD-through-simulation versus adjoint approaches are useful contributions.

major comments (2)

[Abstract and §4] Abstract and §4 (validity discussion): the central claim that both methods 'provide gradients for the event path selected by the forward simulation and are valid under fixed event ordering and transversal guard crossings' is load-bearing, yet the manuscript supplies no mechanism to detect, enforce, or quantify the measure of the parameter set on which these assumptions remain true. Because parameters are the decision variables in the learning problem, typical optimization trajectories can reorder events or produce grazing crossings, rendering the computed gradients locally invalid precisely where they are needed.
[§3.2] §3.2 (discrete-adjoint formulation): the adjoint system is derived under the assumption that the event sequence is fixed; when an event time crosses another or a guard becomes tangent, the residual system itself changes discontinuously. No sensitivity analysis or continuation strategy is provided to handle these structural changes.

minor comments (2)

[§2 and §3] Notation for the reset map and guard function should be introduced once and used consistently; several symbols are redefined between the continuous and discrete-adjoint sections.
[§5] The numerical examples would benefit from an explicit check (e.g., a plot of event times versus parameter) confirming that the fixed-ordering assumption held throughout the reported optimization runs.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and detailed review. The comments highlight important considerations regarding the validity of the proposed gradient methods under the stated assumptions. We address each major comment below.

read point-by-point responses

Referee: [Abstract and §4] Abstract and §4 (validity discussion): the central claim that both methods 'provide gradients for the event path selected by the forward simulation and are valid under fixed event ordering and transversal guard crossings' is load-bearing, yet the manuscript supplies no mechanism to detect, enforce, or quantify the measure of the parameter set on which these assumptions remain true. Because parameters are the decision variables in the learning problem, typical optimization trajectories can reorder events or produce grazing crossings, rendering the computed gradients locally invalid precisely where they are needed.

Authors: We agree with the referee that maintaining the assumptions of fixed event ordering and transversal guard crossings is essential for the gradients to be valid, and that optimization trajectories may violate them. The manuscript presents the methods as providing gradients for the forward-selected event path under these conditions, as stated in the abstract and Section 4. To strengthen the presentation, we will revise Section 4 to include guidance on monitoring these conditions during optimization. Specifically, we can suggest post-simulation checks: verifying that the computed event times maintain the original ordering and that the guard function's time derivative is nonzero at each crossing point. While a general mechanism to enforce or quantify the measure of the valid parameter set during learning is not provided (as it would require problem-specific constraints or robust optimization techniques beyond the scope of this work), such monitoring can alert users to potential invalidity. We believe this addition addresses the practical concern without altering the core contribution. revision: partial
Referee: [§3.2] §3.2 (discrete-adjoint formulation): the adjoint system is derived under the assumption that the event sequence is fixed; when an event time crosses another or a guard becomes tangent, the residual system itself changes discontinuously. No sensitivity analysis or continuation strategy is provided to handle these structural changes.

Authors: The discrete-adjoint formulation in Section 3.2 explicitly assumes a fixed event sequence, as the residual system is constructed by splitting the simulation into segments based on the events detected in the forward pass. When event times cross or a grazing condition occurs, the number or ordering of residuals changes, making the system discontinuous. Our derivation provides the adjoint for the fixed-path case, which is consistent with the forward simulation's selected path. We will add a clarifying paragraph in Section 3.2 noting this limitation and stating that the method does not include sensitivity analysis for structural changes in the event sequence. Handling such cases would necessitate additional strategies like event smoothing or hybrid system differentiation techniques, which we identify as directions for future work. This clarification will better delineate the scope of the current adjoint method. revision: partial

Circularity Check

0 steps flagged

No circularity: derivation applies standard IFT and adjoint methods to explicitly formulated constrained residuals.

full rationale

The paper states the learning task as a constrained least-squares problem whose residuals are the DAE dynamics, algebraic constraints, guard equations and reset maps. Gradients are obtained by (1) solving algebraic variables inside the vector field and differentiating via the implicit function theorem, and (2) casting the event-split trajectory as an explicit residual system whose Lagrange multipliers are solved in the discrete adjoint. Both steps are direct applications of well-known, externally verifiable techniques to the stated residuals; no quantity is defined in terms of itself, no fitted parameter is relabeled as a prediction, and no load-bearing premise rests on a self-citation chain. The stated validity conditions (fixed event ordering, transversal crossings) are explicit assumptions rather than derived results, so the derivation chain remains self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The methods rely on standard mathematical tools from optimization and differential equations, plus domain-specific assumptions about event behavior in the systems being modeled.

axioms (2)

standard math The implicit function theorem can be applied to differentiate the algebraic variable solves within the DAE vector field.
Used in the automatic-differentiation-through-simulation method.
domain assumption Event ordering is fixed and guard crossings are transversal.
Required for the validity of the gradients provided by both methods.

pith-pipeline@v0.9.0 · 5535 in / 1361 out tokens · 50986 ms · 2026-05-08T17:30:28.491665+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

26 extracted references

[1]

Ascher and Linda R

Uri M. Ascher and Linda R. Petzold.Computer Methods for Ordinary Differential Equations and Differential- Algebraic Equations. Society for Industrial and Applied Mathematics, Philadelphia, PA, 1998

1998
[2]

Jeff Bezanson, Alan Edelman, Stefan Karpinski, and Viral B. Shah. Julia: A fresh approach to numerical com- puting.SIAM Review, 59(1):65–98, 2017

2017
[3]

JAX: composable transfor- mations of Python+NumPy programs, 2018

James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao Zhang. JAX: composable transfor- mations of Python+NumPy programs, 2018

2018
[4]

Adjoint sensitivity analysis for differential-algebraic equations: The adjoint DAE system and its numerical solution.SIAM Journal on Scientific Computing, 24(3):1076–1089, 2003

Yang Cao, Shengtai Li, Linda Petzold, and Radu Serban. Adjoint sensitivity analysis for differential-algebraic equations: The adjoint DAE system and its numerical solution.SIAM Journal on Scientific Computing, 24(3):1076–1089, 2003

2003
[5]

Simulation of large-scale models in Modelica: State of the art and future perspectives

Francesco Casella. Simulation of large-scale models in Modelica: State of the art and future perspectives. In Proceedings of the 11th International Modelica Conference, pages 459–468, Versailles, France, 2015

2015
[6]

Ricky T. Q. Chen. torchdiffeq: Differentiable ode solvers with full gpu support and o(1)-memory backpropaga- tion.https://github.com/rtqichen/torchdiffeq, 2018. GitHub repository

2018
[7]

Ricky T. Q. Chen, Brandon Amos, and Maximilian Nickel. Learning neural event functions for ordinary differ- ential equations. InInternational Conference on Learning Representations, 2021

2021
[8]

Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, and David K. Duvenaud. Neural ordinary differential equations. InAdvances in Neural Information Processing Systems, volume 31, pages 6571–6583, 2018

2018
[9]

Methods for tearing systems of equations in object-oriented modeling

Hilding Elmqvist and Martin Otter. Methods for tearing systems of equations in object-oriented modeling. In Proceedings of the European Simulation Multiconference, pages 326–332, Barcelona, Spain, 1994. Society for Computer Simulation. 11 Differentiable Parameter Optimization for DAEs with State-Dependent Events

1994
[10]

An equation-based algorithmic dif- ferentiation technique for differential algebraic equations.Journal of Computational and Applied Mathematics, 281:135–151, 2015

Ahmed Elsheikh, Francesco Casella, Dirk Zimmer, and Wladimir Schamai. An equation-based algorithmic dif- ferentiation technique for differential algebraic equations.Journal of Computational and Applied Mathematics, 281:135–151, 2015

2015
[11]

John Wiley & Sons, 2014

Peter Fritzson.Principles of Object-Oriented Modeling and Simulation with Modelica 3.3: A Cyber-Physical Approach. John Wiley & Sons, 2014

2014
[12]

Feehery, and Paul I

Santos Gal ´an, William F. Feehery, and Paul I. Barton. Parametric sensitivity functions for hybrid discrete/con- tinuous systems.Applied Numerical Mathematics, 31(1):17–47, 1999

1999
[13]

Springer, Berlin, Heidelberg, 2 edition, 1996

Ernst Hairer and Gerhard Wanner.Solving Ordinary Differential Equations II: Stiff and Differential-Algebraic Problems, volume 14 ofSpringer Series in Computational Mathematics. Springer, Berlin, Heidelberg, 2 edition, 1996

1996
[14]

Hindmarsh, Peter N

Alan C. Hindmarsh, Peter N. Brown, Keith E. Grant, Steven L. Lee, Radu Serban, Dan E. Shumaker, and Carol S. Woodward. SUNDIALS: Suite of nonlinear and differential/algebraic equation solvers.ACM Transactions on Mathematical Software, 31(3):363–396, 2005

2005
[15]

Hiskens and M

Ian A. Hiskens and M. A. Pai. Trajectory sensitivity analysis of hybrid systems.IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, 47(2):204–220, 2000

2000
[16]

Diffrax: Numerical differential equation solvers in JAX.https://github.com/ patrick-kidger/diffrax, 2021

Patrick Kidger. Diffrax: Numerical differential equation solvers in JAX.https://github.com/ patrick-kidger/diffrax, 2021. Software library

2021
[17]

Nathan J. Kong, J. Joe Payne, James Zhu, and Aaron M. Johnson. Saltation matrices: The essential tool for linearizing hybrid dynamical systems.Proceedings of the IEEE, 2024

2024
[18]

COTODE: COntinuous Trajectory neural Ordinary Differential Equations for modelling event sequences, 2024

Ilya Kuleshov, Galina Boeva, Vladislav Zhuzhel, Evgenia Romanenkova, Evgeni V orsin, and Alexey Zaytsev. COTODE: COntinuous Trajectory neural Ordinary Differential Equations for modelling event sequences, 2024

2024
[19]

Modelica Association, 2025

Modelica Association.Modelica Language Specification: Appendix B, Modelica DAE Representation. Modelica Association, 2025. Accessed 2026-05-04

2025
[20]

PyTorch: An imperative style, high-performance deep learning library

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas K¨opf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. PyTorch: An imperative style, high-perf...

2019
[21]

Linda R. Petzold. Description of DASSL: A differential/algebraic system solver. Technical report, Sandia National Laboratories, September 1982

1982
[22]

DifferentialEquations.jl – a performant and feature-rich ecosystem for solving differential equations in Julia.Journal of Open Research Software, 5(1):15, 2017

Christopher Rackauckas and Qing Nie. DifferentialEquations.jl – a performant and feature-rich ecosystem for solving differential equations in Julia.Journal of Open Research Software, 5(1):15, 2017

2017
[23]

Sensitivity analysis of hybrid systems with state jumps with application to trajectory tracking

Alessandro Saccon, Nathan van de Wouw, and Henk Nijmeijer. Sensitivity analysis of hybrid systems with state jumps with application to trajectory tracking. InProceedings of the 53rd IEEE Conference on Decision and Control, pages 3065–3070, 2014

2014
[24]

Practical realization and adaptation of Cellier’s tearing method

Patrick T ¨auber, Lennart Ochel, Willi Braun, and Bernhard Bachmann. Practical realization and adaptation of Cellier’s tearing method. InProceedings of the 6th International Workshop on Equation-Based Object-Oriented Modeling Languages and Tools, pages 11–19. ACM Press, 2014

2014
[25]

Constantinescu, and Mihai Anitescu

Hong Zhang, Shrirang Abhyankar, Emil M. Constantinescu, and Mihai Anitescu. Discrete adjoint sensitivity analysis of hybrid dynamical systems with switching.IEEE Transactions on Circuits and Systems I: Regular Papers, 64(5):1247–1259, 2017

2017
[26]

Module-preserving compilation of Modelica models

Dirk Zimmer. Module-preserving compilation of Modelica models. InProceedings of the 7th International Modelica Conference, pages 556–565, Como, Italy, 2009. 12 Differentiable Parameter Optimization for DAEs with State-Dependent Events A Details of the AD-through-Simulation DAE Method This appendix gives the mathematical and implementation details of the A...

2009