Recognition: unknown
Differentiable Parameter Optimization for DAEs with State-Dependent Events
Pith reviewed 2026-05-08 17:30 UTC · model grok-4.3
The pith
Two gradient methods enable parameter optimization for DAEs interrupted by state-dependent events.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper shows that gradients for the selected event path in semi-explicit DAE simulations can be obtained either by differentiating through implicit algebraic solves and segmented integration or by solving Lagrange multipliers for the residuals of smooth segments and events, clarifying that the adjoint treats residuals as equality constraints rather than penalties.
What carries the argument
The constrained least-squares formulation that incorporates DAE dynamics, guard equations, and reset maps, together with the two gradient strategies: automatic-differentiation-through-simulation using the implicit function theorem and explicit discrete-adjoint on an event-split residual system.
If this is right
- Gradients are supplied only for the exact event path realized in the forward simulation.
- Algebraic variables are differentiated via the implicit function theorem inside the vector field.
- The adjoint method represents the simulation as an explicit event-split residual system.
- Implementation complexity and local validity differ between the two methods.
Where Pith is reading between the lines
- The methods could support training of hybrid models that combine neural networks with physical DAE constraints.
- If event ordering varies with parameters, multiple path evaluations or combinatorial search may become necessary.
- Embedding these gradient routines in existing DAE solvers would broaden their use in control and robotics.
Load-bearing premise
Event ordering remains fixed and guard crossings stay transversal when parameters change.
What would settle it
A small parameter perturbation that causes the forward simulation to select a different event sequence or a non-transversal crossing, after which the computed gradients no longer match the actual change in the loss.
Figures
read the original abstract
Differential-algebraic equations (DAEs) with state-dependent events arise in systems whose continuous dynamics are constrained by algebraic equations and interrupted by mode changes, switching logic, impacts, or state reinitializations. Gradient-based parameter learning for such systems is challenging because algebraic variables are implicitly defined, event times depend on the parameters, and reset maps introduce discontinuities. This paper studies differentiable parameter optimization for semi-explicit DAEs with events. We formulate the learning problem as a constrained least-squares problem with DAE dynamics, algebraic constraints, guard equations, and reset maps. We then develop two complementary gradient-computation strategies. The first is an automatic-differentiation-through-simulation method that solves algebraic variables inside the vector field, differentiates the algebraic solve using the implicit function theorem, and handles events through segmented differentiable integration. The second is an explicit discrete-adjoint method that represents the forward simulation as an event-split residual system and computes gradients by solving for the Lagrange multipliers of smooth-segment and event residuals. The formulation clarifies that residual terms in the adjoint method are equality constraints, not heuristic penalties. We compare the two approaches in terms of gradient interpretation, event-time handling, implementation complexity, and local validity. Both methods provide gradients for the event path selected by the forward simulation and are valid under fixed event ordering and transversal guard crossings.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper develops two gradient-computation methods for parameter optimization in semi-explicit DAEs with state-dependent events: (1) automatic differentiation through segmented simulation that solves algebraic variables via the implicit function theorem and (2) a discrete-adjoint method that treats the event-split forward simulation as an equality-constrained residual system. Both methods are stated to yield gradients for the forward-selected event path under the assumptions of fixed event ordering and transversal guard crossings.
Significance. If the stated validity conditions can be maintained or monitored during optimization, the work supplies practical tools for end-to-end differentiable learning in hybrid DAE systems that appear in robotics, circuit simulation, and mechanical contact problems. The explicit residual-constraint formulation and the side-by-side comparison of AD-through-simulation versus adjoint approaches are useful contributions.
major comments (2)
- [Abstract and §4] Abstract and §4 (validity discussion): the central claim that both methods 'provide gradients for the event path selected by the forward simulation and are valid under fixed event ordering and transversal guard crossings' is load-bearing, yet the manuscript supplies no mechanism to detect, enforce, or quantify the measure of the parameter set on which these assumptions remain true. Because parameters are the decision variables in the learning problem, typical optimization trajectories can reorder events or produce grazing crossings, rendering the computed gradients locally invalid precisely where they are needed.
- [§3.2] §3.2 (discrete-adjoint formulation): the adjoint system is derived under the assumption that the event sequence is fixed; when an event time crosses another or a guard becomes tangent, the residual system itself changes discontinuously. No sensitivity analysis or continuation strategy is provided to handle these structural changes.
minor comments (2)
- [§2 and §3] Notation for the reset map and guard function should be introduced once and used consistently; several symbols are redefined between the continuous and discrete-adjoint sections.
- [§5] The numerical examples would benefit from an explicit check (e.g., a plot of event times versus parameter) confirming that the fixed-ordering assumption held throughout the reported optimization runs.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and detailed review. The comments highlight important considerations regarding the validity of the proposed gradient methods under the stated assumptions. We address each major comment below.
read point-by-point responses
-
Referee: [Abstract and §4] Abstract and §4 (validity discussion): the central claim that both methods 'provide gradients for the event path selected by the forward simulation and are valid under fixed event ordering and transversal guard crossings' is load-bearing, yet the manuscript supplies no mechanism to detect, enforce, or quantify the measure of the parameter set on which these assumptions remain true. Because parameters are the decision variables in the learning problem, typical optimization trajectories can reorder events or produce grazing crossings, rendering the computed gradients locally invalid precisely where they are needed.
Authors: We agree with the referee that maintaining the assumptions of fixed event ordering and transversal guard crossings is essential for the gradients to be valid, and that optimization trajectories may violate them. The manuscript presents the methods as providing gradients for the forward-selected event path under these conditions, as stated in the abstract and Section 4. To strengthen the presentation, we will revise Section 4 to include guidance on monitoring these conditions during optimization. Specifically, we can suggest post-simulation checks: verifying that the computed event times maintain the original ordering and that the guard function's time derivative is nonzero at each crossing point. While a general mechanism to enforce or quantify the measure of the valid parameter set during learning is not provided (as it would require problem-specific constraints or robust optimization techniques beyond the scope of this work), such monitoring can alert users to potential invalidity. We believe this addition addresses the practical concern without altering the core contribution. revision: partial
-
Referee: [§3.2] §3.2 (discrete-adjoint formulation): the adjoint system is derived under the assumption that the event sequence is fixed; when an event time crosses another or a guard becomes tangent, the residual system itself changes discontinuously. No sensitivity analysis or continuation strategy is provided to handle these structural changes.
Authors: The discrete-adjoint formulation in Section 3.2 explicitly assumes a fixed event sequence, as the residual system is constructed by splitting the simulation into segments based on the events detected in the forward pass. When event times cross or a grazing condition occurs, the number or ordering of residuals changes, making the system discontinuous. Our derivation provides the adjoint for the fixed-path case, which is consistent with the forward simulation's selected path. We will add a clarifying paragraph in Section 3.2 noting this limitation and stating that the method does not include sensitivity analysis for structural changes in the event sequence. Handling such cases would necessitate additional strategies like event smoothing or hybrid system differentiation techniques, which we identify as directions for future work. This clarification will better delineate the scope of the current adjoint method. revision: partial
Circularity Check
No circularity: derivation applies standard IFT and adjoint methods to explicitly formulated constrained residuals.
full rationale
The paper states the learning task as a constrained least-squares problem whose residuals are the DAE dynamics, algebraic constraints, guard equations and reset maps. Gradients are obtained by (1) solving algebraic variables inside the vector field and differentiating via the implicit function theorem, and (2) casting the event-split trajectory as an explicit residual system whose Lagrange multipliers are solved in the discrete adjoint. Both steps are direct applications of well-known, externally verifiable techniques to the stated residuals; no quantity is defined in terms of itself, no fitted parameter is relabeled as a prediction, and no load-bearing premise rests on a self-citation chain. The stated validity conditions (fixed event ordering, transversal crossings) are explicit assumptions rather than derived results, so the derivation chain remains self-contained.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math The implicit function theorem can be applied to differentiate the algebraic variable solves within the DAE vector field.
- domain assumption Event ordering is fixed and guard crossings are transversal.
Reference graph
Works this paper leans on
-
[1]
Ascher and Linda R
Uri M. Ascher and Linda R. Petzold.Computer Methods for Ordinary Differential Equations and Differential- Algebraic Equations. Society for Industrial and Applied Mathematics, Philadelphia, PA, 1998
1998
-
[2]
Jeff Bezanson, Alan Edelman, Stefan Karpinski, and Viral B. Shah. Julia: A fresh approach to numerical com- puting.SIAM Review, 59(1):65–98, 2017
2017
-
[3]
JAX: composable transfor- mations of Python+NumPy programs, 2018
James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao Zhang. JAX: composable transfor- mations of Python+NumPy programs, 2018
2018
-
[4]
Adjoint sensitivity analysis for differential-algebraic equations: The adjoint DAE system and its numerical solution.SIAM Journal on Scientific Computing, 24(3):1076–1089, 2003
Yang Cao, Shengtai Li, Linda Petzold, and Radu Serban. Adjoint sensitivity analysis for differential-algebraic equations: The adjoint DAE system and its numerical solution.SIAM Journal on Scientific Computing, 24(3):1076–1089, 2003
2003
-
[5]
Simulation of large-scale models in Modelica: State of the art and future perspectives
Francesco Casella. Simulation of large-scale models in Modelica: State of the art and future perspectives. In Proceedings of the 11th International Modelica Conference, pages 459–468, Versailles, France, 2015
2015
-
[6]
Ricky T. Q. Chen. torchdiffeq: Differentiable ode solvers with full gpu support and o(1)-memory backpropaga- tion.https://github.com/rtqichen/torchdiffeq, 2018. GitHub repository
2018
-
[7]
Ricky T. Q. Chen, Brandon Amos, and Maximilian Nickel. Learning neural event functions for ordinary differ- ential equations. InInternational Conference on Learning Representations, 2021
2021
-
[8]
Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, and David K. Duvenaud. Neural ordinary differential equations. InAdvances in Neural Information Processing Systems, volume 31, pages 6571–6583, 2018
2018
-
[9]
Methods for tearing systems of equations in object-oriented modeling
Hilding Elmqvist and Martin Otter. Methods for tearing systems of equations in object-oriented modeling. In Proceedings of the European Simulation Multiconference, pages 326–332, Barcelona, Spain, 1994. Society for Computer Simulation. 11 Differentiable Parameter Optimization for DAEs with State-Dependent Events
1994
-
[10]
An equation-based algorithmic dif- ferentiation technique for differential algebraic equations.Journal of Computational and Applied Mathematics, 281:135–151, 2015
Ahmed Elsheikh, Francesco Casella, Dirk Zimmer, and Wladimir Schamai. An equation-based algorithmic dif- ferentiation technique for differential algebraic equations.Journal of Computational and Applied Mathematics, 281:135–151, 2015
2015
-
[11]
John Wiley & Sons, 2014
Peter Fritzson.Principles of Object-Oriented Modeling and Simulation with Modelica 3.3: A Cyber-Physical Approach. John Wiley & Sons, 2014
2014
-
[12]
Feehery, and Paul I
Santos Gal ´an, William F. Feehery, and Paul I. Barton. Parametric sensitivity functions for hybrid discrete/con- tinuous systems.Applied Numerical Mathematics, 31(1):17–47, 1999
1999
-
[13]
Springer, Berlin, Heidelberg, 2 edition, 1996
Ernst Hairer and Gerhard Wanner.Solving Ordinary Differential Equations II: Stiff and Differential-Algebraic Problems, volume 14 ofSpringer Series in Computational Mathematics. Springer, Berlin, Heidelberg, 2 edition, 1996
1996
-
[14]
Hindmarsh, Peter N
Alan C. Hindmarsh, Peter N. Brown, Keith E. Grant, Steven L. Lee, Radu Serban, Dan E. Shumaker, and Carol S. Woodward. SUNDIALS: Suite of nonlinear and differential/algebraic equation solvers.ACM Transactions on Mathematical Software, 31(3):363–396, 2005
2005
-
[15]
Hiskens and M
Ian A. Hiskens and M. A. Pai. Trajectory sensitivity analysis of hybrid systems.IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, 47(2):204–220, 2000
2000
-
[16]
Diffrax: Numerical differential equation solvers in JAX.https://github.com/ patrick-kidger/diffrax, 2021
Patrick Kidger. Diffrax: Numerical differential equation solvers in JAX.https://github.com/ patrick-kidger/diffrax, 2021. Software library
2021
-
[17]
Nathan J. Kong, J. Joe Payne, James Zhu, and Aaron M. Johnson. Saltation matrices: The essential tool for linearizing hybrid dynamical systems.Proceedings of the IEEE, 2024
2024
-
[18]
COTODE: COntinuous Trajectory neural Ordinary Differential Equations for modelling event sequences, 2024
Ilya Kuleshov, Galina Boeva, Vladislav Zhuzhel, Evgenia Romanenkova, Evgeni V orsin, and Alexey Zaytsev. COTODE: COntinuous Trajectory neural Ordinary Differential Equations for modelling event sequences, 2024
2024
-
[19]
Modelica Association, 2025
Modelica Association.Modelica Language Specification: Appendix B, Modelica DAE Representation. Modelica Association, 2025. Accessed 2026-05-04
2025
-
[20]
PyTorch: An imperative style, high-performance deep learning library
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas K¨opf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. PyTorch: An imperative style, high-perf...
2019
-
[21]
Linda R. Petzold. Description of DASSL: A differential/algebraic system solver. Technical report, Sandia National Laboratories, September 1982
1982
-
[22]
DifferentialEquations.jl – a performant and feature-rich ecosystem for solving differential equations in Julia.Journal of Open Research Software, 5(1):15, 2017
Christopher Rackauckas and Qing Nie. DifferentialEquations.jl – a performant and feature-rich ecosystem for solving differential equations in Julia.Journal of Open Research Software, 5(1):15, 2017
2017
-
[23]
Sensitivity analysis of hybrid systems with state jumps with application to trajectory tracking
Alessandro Saccon, Nathan van de Wouw, and Henk Nijmeijer. Sensitivity analysis of hybrid systems with state jumps with application to trajectory tracking. InProceedings of the 53rd IEEE Conference on Decision and Control, pages 3065–3070, 2014
2014
-
[24]
Practical realization and adaptation of Cellier’s tearing method
Patrick T ¨auber, Lennart Ochel, Willi Braun, and Bernhard Bachmann. Practical realization and adaptation of Cellier’s tearing method. InProceedings of the 6th International Workshop on Equation-Based Object-Oriented Modeling Languages and Tools, pages 11–19. ACM Press, 2014
2014
-
[25]
Constantinescu, and Mihai Anitescu
Hong Zhang, Shrirang Abhyankar, Emil M. Constantinescu, and Mihai Anitescu. Discrete adjoint sensitivity analysis of hybrid dynamical systems with switching.IEEE Transactions on Circuits and Systems I: Regular Papers, 64(5):1247–1259, 2017
2017
-
[26]
Module-preserving compilation of Modelica models
Dirk Zimmer. Module-preserving compilation of Modelica models. InProceedings of the 7th International Modelica Conference, pages 556–565, Como, Italy, 2009. 12 Differentiable Parameter Optimization for DAEs with State-Dependent Events A Details of the AD-through-Simulation DAE Method This appendix gives the mathematical and implementation details of the A...
2009
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.