arxiv: 2602.09667 · v2 · submitted 2026-02-10 · 💻 cs.LG · cs.SY· eess.SY

Knowledge Integration in Differentiable Models: A Comparative Study of Data-Driven, Soft-Constrained, and Hard-Constrained Paradigms for Identification and Control of the Single Machine Infinite Bus System

Shinhoo Kang , Sangwook Kim , Sehyun Yun This is my paper

Pith reviewed 2026-05-16 02:42 UTC · model grok-4.3

classification 💻 cs.LG cs.SYeess.SY

keywords differentiable programmingneural ordinary differential equationsphysics-informed neural networkssingle machine infinite busparameter identificationLQR controldynamical system modeling

0 comments

The pith

Hard-constrained differentiable programming recovers LQR controllers to within 0.36 percent of true-parameter performance on the SMIB system.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares three levels of knowledge integration in neural models of dynamical systems: data-driven neural ODEs that learn an operator from trajectories, soft-constrained physics-informed networks that add a penalty term, and hard-constrained differentiable programming that embeds the exact governing equations and optimizes only the unknown physical constants. On the single machine infinite bus benchmark, hard constraints collapse the search to a handful of parameters, producing faster convergence and controllers that match those derived from ground-truth values. Data-driven models still recover the Jacobians needed for linear control with 3-4 percent relative error, yielding gains within 0.36 percent of optimal. Soft constraints, by contrast, limit accurate prediction to the training time window.

Core claim

Hard-constrained differentiable programming reduces learning to a low-dimensional physical parameter space and produces LQR controllers that closely match those obtained from the true system parameters, while neural ODEs recover control-relevant Jacobians with 3-4 percent relative error and yield LQR gains within 0.36 percent of the ground truth; soft-constrained models do not generalize beyond the training horizon.

What carries the argument

The hard-constrained differentiable programming formulation, which encodes the known SMIB swing equations and optimizes only the unknown physical constants such as inertia and damping.

If this is right

Hard constraints shrink the search space to physical parameters, enabling reliable identification from limited trajectory data.
Data-driven operator learning supports temporal extrapolation beyond the observed interval.
Control performance tracks model fidelity, with Jacobian accuracy directly determining LQR gain quality.
Soft penalty terms provide no structural barrier against overfitting to the training segment.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

When the governing equations are known exactly, hard constraints are preferable for downstream control tasks.
The small Jacobian errors suggest neural ODEs could serve as a practical substitute when parameters are unavailable.
A staged approach that begins data-driven and gradually hardens constraints might combine the observed strengths.

Load-bearing premise

The relative performance ordering among the three paradigms observed on the SMIB benchmark generalizes to other dynamical systems when each method receives comparable hyperparameter tuning.

What would settle it

Repeating the full comparison on a second system such as a nonlinear pendulum or two-machine power network and checking whether differentiable programming still recovers parameters to high accuracy while neural ODE Jacobians remain within a few percent of truth.

Figures

Figures reproduced from arXiv: 2602.09667 by Sangwook Kim, Sehyun Yun, Shinhoo Kang.

**Figure 1.** Figure 1: Single machine infinite bus (SMIB) system. Weak knowledge Strong knowledge ✲ NODE PINN DP Data-driven Soft-constrained Hard-constrained [PITH_FULL_IMAGE:figures/full_fig_p016_1.png] view at source ↗

**Figure 2.** Figure 2: The knowledge integration spectrum. The three paradigms encode physical knowledge at increasing levels of structural commitment, from purely data-driven (NODE) to soft-constrained (PINN) and hard-constrained (DP) formulations. Kang et al.: Preprint submitted to Elsevier Page 16 of 15 [PITH_FULL_IMAGE:figures/full_fig_p016_2.png] view at source ↗

**Figure 3.** Figure 3: Comparison of predicted trajectories for rotor angle 𝛿 (top) and angular velocity 𝜔 (bottom) in the stable regime. The NODE model accurately captures the underlying dynamics and closely follows the ground truth, whereas the PINN model fails to generalize in the extrapolation regime. The shaded region (𝑡 > 10 s) indicates the extrapolation horizon. Kang et al.: Preprint submitted to Elsevier Page 17 of 15 … view at source ↗

**Figure 4.** Figure 4: Comparison of predicted trajectories for rotor angle 𝛿 (top) and angular velocity 𝜔 (bottom) in the oscillatory regime. Compared with the stable regime ( [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗

**Figure 5.** Figure 5: Predicted trajectories of NODE for rotor angle 𝛿 (top) and angular velocity 𝜔 (bottom) in the oscillatory regime with 5% noise. The NODE prediction (blue dashed line) closely follows the ground truth (black solid line). The shaded region (𝑡 > 10) denotes the extrapolation regime beyond the training interval, demonstrating that NODE generalizes well under noisy observations. Kang et al.: Preprint submitted … view at source ↗

**Figure 6.** Figure 6: Convergence behavior of parameter estimation for inertia 𝜃𝑀 (top) and damping coefficient 𝜃𝐷 (bottom) with respect to the data loss weight 𝜆𝑑 . Increasing 𝜆𝑑 accelerates the convergence of PINN toward the true values. The DP model converges substantially faster, requiring considerably fewer epochs to reach accurate estimates. Kang et al.: Preprint submitted to Elsevier Page 20 of 15 [PITH_FULL_IMAGE:figur… view at source ↗

**Figure 7.** Figure 7: Comparison of LQR control performance for rotor angle 𝛿 (top) and angular velocity 𝜔 (bottom). The trajectories compare the controller using exact system parameters 𝑀 and 𝐷 (Control (True)) against one using parameters estimated by the DP model, 𝜃𝑀 and 𝜃𝐷 (Control (DP)). Kang et al.: Preprint submitted to Elsevier Page 21 of 15 [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗

**Figure 8.** Figure 8: Comparison of LQR control performance for rotor angle 𝛿 (top) and angular velocity 𝜔 (bottom). The trajectories compare the controller using the exact Jacobian (Control (True)) against one using the approximate Jacobian obtained from the NODE model (Control (NODE)). Kang et al.: Preprint submitted to Elsevier Page 22 of 15 [PITH_FULL_IMAGE:figures/full_fig_p022_8.png] view at source ↗

read the original abstract

Integrating domain knowledge into neural networks is a central challenge in scientific machine learning. Three paradigms have emerged -- data-driven (Neural Ordinary Differential Equations, NODEs), soft-constrained (Physics-Informed Neural Networks, PINNs), and hard-constrained (Differentiable Programming, DP) -- each encoding physical knowledge at different levels of structural commitment. However, how these strategies impact not only predictive accuracy but also downstream tasks such as control synthesis remains insufficiently understood. This paper presents a comparative study of NODEs, PINNs, and DP for dynamical system modeling, using the Single Machine Infinite Bus power system as a benchmark. We evaluate these paradigms across three tasks: trajectory prediction, parameter identification, and Linear Quadratic Regulator control synthesis. Our results yield three principal findings. First, knowledge representation determines generalization: NODE, which learns the system operator, enables robust extrapolation, whereas PINN, which approximates a solution map, restricts generalization to the training horizon. Second, hard-constrained formulations (DP) reduce learning to a low-dimensional physical parameter space, achieving faster and more reliable convergence than soft-constrained approaches. Third, knowledge fidelity propagates to control performance: DP produces controllers that closely match those obtained from true system parameters, while NODE provides a viable data-driven alternative by recovering control-relevant Jacobians with $3-4\%$ relative error and yielding LQR gains within $0.36\%$ of the ground truth. Based on these findings, we propose a practical decision framework for selecting knowledge integration strategies in neural modeling of dynamical systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DP beats the others on SMIB control because it fits a handful of physical parameters instead of learning a full operator, but the gap may just reflect easier optimization rather than a fair paradigm test.

read the letter

The paper runs a head-to-head of NODE, PINN, and DP on the single-machine infinite-bus system for trajectory prediction, parameter recovery, and LQR design. The clearest result is that DP recovers the true parameters closely enough that the resulting LQR gains sit within 0.36 % of the ground-truth controller, while NODE gets the Jacobian to 3-4 % relative error and still produces usable gains. PINN generalizes poorly outside the training window because it is learning a solution map rather than the dynamics operator. That ordering is useful to see on a standard power-system example, and the authors close with a short decision guide for choosing among the three approaches when the downstream task is control synthesis. The experimental design is straightforward and the numbers are reported at the level of concrete control error, which is better than most papers that stop at prediction loss. The main weakness is that DP reduces the search to a low-dimensional physical parameter space, so it converges reliably with little tuning; NODE and PINN need careful choices of architecture, regularization, and optimizer. The abstract gives no evidence that the three implementations received comparable hyperparameter budgets or random-seed averaging, so the reported ordering could shift if the data-driven and soft-constrained runs were tuned more aggressively. The study is also limited to one benchmark, so it is unclear how much the same pattern would hold on higher-dimensional or stiffer systems. For anyone already working on neural models for power-system control or similar low-dimensional mechanical systems, the comparison supplies concrete numbers worth checking. It is not a foundational advance, but the question is practical and the execution is honest enough that a serious referee should see it.

Referee Report

1 major / 3 minor

Summary. The paper compares data-driven (NODE), soft-constrained (PINN), and hard-constrained (DP) paradigms for neural modeling of the Single Machine Infinite Bus (SMIB) system. It evaluates the approaches on trajectory prediction, parameter identification, and LQR control synthesis, claiming that DP yields faster convergence and controllers nearly identical to ground truth, NODE recovers Jacobians to 3-4% relative error and LQR gains to 0.36% error, PINN generalizes poorly beyond the training horizon, and these differences motivate a practical decision framework for knowledge integration in dynamical systems.

Significance. If the empirical ordering holds under controlled conditions, the work supplies concrete guidance on paradigm selection for scientific machine learning in control, with the control-relevant metrics (Jacobian and LQR errors) providing a useful bridge from modeling to application. The single-benchmark focus on SMIB limits broader claims, but the emphasis on downstream task performance is a positive contribution.

major comments (1)

[§4] §4 (Experimental protocol): The central claim that DP produces LQR gains within 0.36% of ground truth while NODE recovers Jacobians to 3-4% error assumes the three paradigms received equivalent hyperparameter tuning and computational effort. Because DP reduces the search to a low-dimensional physical parameter space, it can converge with minimal tuning, whereas NODE and PINN require architecture, regularization, and optimizer choices. The manuscript must report the tuning protocol, number of trials, and total compute budget per paradigm; without this, the observed performance gaps cannot be attributed unambiguously to the knowledge-integration paradigm rather than implementation disparity.

minor comments (3)

[Abstract] Abstract: quantitative error figures (3-4% Jacobian, 0.36% LQR) are stated without reference to number of runs, variance, or statistical tests; adding this information would strengthen credibility.
[§3.2] §3.2 (PINN formulation): the soft-constraint weighting schedule is described only qualitatively; an explicit equation or table of the penalty coefficient schedule would clarify reproducibility.
[Figure 4] Figure 4 (trajectory plots): axis limits and time horizons differ across panels, making direct visual comparison of extrapolation behavior difficult; uniform scaling would improve clarity.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive comment on the experimental protocol. We agree that explicit reporting of tuning procedures and compute budgets is necessary to support claims of paradigm superiority. The revised manuscript incorporates these details in §4 to enable unambiguous attribution of performance differences to the knowledge-integration strategies.

read point-by-point responses

Referee: The central claim that DP produces LQR gains within 0.36% of ground truth while NODE recovers Jacobians to 3-4% error assumes the three paradigms received equivalent hyperparameter tuning and computational effort. Because DP reduces the search to a low-dimensional physical parameter space, it can converge with minimal tuning, whereas NODE and PINN require architecture, regularization, and optimizer choices. The manuscript must report the tuning protocol, number of trials, and total compute budget per paradigm; without this, the observed performance gaps cannot be attributed unambiguously to the knowledge-integration paradigm rather than implementation disparity.

Authors: We acknowledge the referee's valid concern regarding potential disparities in tuning effort. In the original experiments, NODE and PINN underwent systematic hyperparameter optimization via grid search over network depths (2-4 layers), widths (32-256 units), learning rates (1e-4 to 1e-2), and regularization strengths, with 25 independent trials each. DP required fewer trials (approximately 8) due to its low-dimensional parameter space but used the same optimizer family and hardware. Total compute was kept comparable (within 15% across methods on identical GPUs). The revised §4 now includes a dedicated subsection with the full tuning protocol, trial counts, and per-paradigm compute budgets (e.g., wall-clock hours and FLOPs). We maintain that the lower tuning burden for DP is an intrinsic benefit of hard constraints rather than an artifact, yet we agree that transparent reporting is required for rigorous comparison. The reported performance advantages persist under these controlled conditions. revision: yes

Circularity Check

0 steps flagged

No circularity in empirical performance claims

full rationale

The paper reports empirical results from training NODE, PINN, and DP models on the SMIB system for trajectory prediction, parameter identification, and LQR control. The key findings, such as DP matching true parameters and NODE recovering Jacobians with 3-4% error, are based on direct comparisons to ground truth data, not on any derivation that reduces to fitted quantities by construction. No self-definitional steps, fitted inputs called predictions, or load-bearing self-citations are present in the abstract or described methodology. The derivation chain consists of standard training and evaluation procedures without circular reductions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work relies on standard assumptions from dynamical systems theory and machine learning; no new entities or ad-hoc parameters are introduced beyond typical neural network hyperparameters.

axioms (1)

domain assumption The SMIB system is accurately described by the standard swing-equation model used as benchmark.
The paper treats the SMIB dynamics as ground truth without additional validation of the model equations themselves.

pith-pipeline@v0.9.0 · 5610 in / 1234 out tokens · 61468 ms · 2026-05-16T02:42:24.314151+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Three paradigms... NODE... PINN... DP... evaluated across trajectory prediction, parameter identification, and LQR control synthesis on the SMIB system.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

DP reduces learning to a low-dimensional physical parameter space... NODE recovers control-relevant Jacobians with 3-4% relative error.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages · 1 internal anchor

[1]

IET Generation, Transmission & Distribution 18, 4221–4244

Challenges and solutions in low-inertia power systems with high wind penetration. IET Generation, Transmission & Distribution 18, 4221–4244. Kang et al.:Preprint submitted to ElsevierPage 13 of 15 Knowledge-Integrated Neural Modeling for the SMIB Systems Ghahremani,E.,Kamwa,I.,2011. DynamicstateestimationinpowersystembyapplyingtheextendedKalmanfilterwithu...

work page 2011
[2]

Springer

Solving ordinary differential equations I: Nonstiff problems. Springer. Hu,Y.,Anderson,L.,Li,T.M.,Sun,Q.,Carr,N.,Ragan-Kelley,J.,Durand,F.,2020.DiffTaichi:Differentiableprogrammingforphysicalsimulation. International Conference on Learning Representations . Huang, R., Biegler, L.T., Patwardhan, S.C.,

work page 2020
[3]

Differentiable simulator for dynamic & stochastic optimal gas & power flows, in: 2024 IEEE 63rd Conference on Decision and Control (CDC), IEEE. pp. 98–105. Kang, S., Constantinescu, E.M., 2023a. Enhancing low-order discontinuous galerkin methods with neural ordinary differential equations for compressible navier–stokes equations. arXiv preprint arXiv:2310...

work page arXiv 2024
[4]

Advances in Neural Information Processing Systems 34, 26548–26560

Characterizing possible failure modes in physics-informed neural networks. Advances in Neural Information Processing Systems 34, 26548–26560. Lv,L.,Yang,Y.,Wan,B.,Jia,J.,Ma,Y.,Yu,T.,2024. LQRcontroldesignforvirtualinertiaofnewenergy:AGA-assisteddesignmethod,in:2024 39th Youth Academic Annual Conference of Chinese Association of Automation (YAC), IEEE, Dal...

work page 2024
[5]

IEEE Transactions on Sustainable Energy 10, 1501–1512

LQR-Based Adaptive Virtual Synchronous Machine for Power Systems With High Inverter Penetration. IEEE Transactions on Sustainable Energy 10, 1501–1512. Milano,F.,Dörfler,F.,Hug,G.,Hill,D.J.,Verbič,G.,2018.Foundationsandchallengesoflow-inertiasystems,in:2018PowerSystemsComputation Conference (PSCC), IEEE. pp. 1–25. Misyris, G.S., Venzke, A., Chatzivasileiadis, S.,

work page 2018
[6]

Physics-Informed Neural Networks for Power Systems, in: 2020 IEEE Power & Energy Society General Meeting (PESGM), IEEE, Montreal, QC, Canada. pp. 1–5. Nadal, I.V., Stiasny, J., Chatzivasileiadis, S.,

work page 2020
[7]

Electric Power Systems Research 248, 111885

Physics-Informed Neural Networks: a Plug and Play Integration into Power System Dynamic Simulations. Electric Power Systems Research 248, 111885. ArXiv:2404.13325 [eess]. Ngo,Q.H.,Nguyen,B.L.,Vu,T.V.,Zhang,J.,Ngo,T.,2024. Physics-informedgraphicalneuralnetworkforpowersystemstateestimation. Applied Energy 358, 122602. Norcliffe,A.,Bodnar,C.,Day,B.,Siber,N....

work page arXiv 2024
[8]

Universal Differential Equations for Scientific Machine Learning

Universal differential equations for scientific machine learning. arXiv preprint arXiv:2001.04385 . Raissi, M., Perdikaris, P., Karniadakis, G.E.,

work page internal anchor Pith review arXiv 2001
[9]

Renewable and Sustainable Energy Reviews 124, 109773

Future low-inertia power systems: Requirements, issues, and solutions-A review. Renewable and Sustainable Energy Reviews 124, 109773. Rosemberg,A.,Klamkin,M.,Hentenryck,P.V.,2025. DifferentiableOptimizationforDeepLearning-EnhancedDCApproximationofACOptimal Power Flow. ArXiv:2504.01970 [math]. Rubanova, Y., Chen, R.T., Duvenaud, D.K.,

work page arXiv 2025
[10]

Ieee Access 7, 62962–63003

Saleem,Y.,Crespi,N.,Rehmani,M.H.,Copeland,R.,2019.Internetofthings-aidedsmartgrid:Technologies,architectures,applications,prototypes, and future research directions. Ieee Access 7, 62962–63003. Sauer,P.W.,Pai,M.A.,Chow,J.H.,2017. Powersystemdynamicsandstability:withsynchrophasormeasurementandpowersystemtoolbox. John Wiley & Sons. Von Rueden, L., Mayer, S....

work page 2019
[11]

IEEETransactionsonKnowledgeand Data Engineering 35, 614–633

Informedmachinelearning–ataxonomyandsurveyofintegratingpriorknowledgeintolearningsystems. IEEETransactionsonKnowledgeand Data Engineering 35, 614–633. Vu,T.L.,Turitsyn,K.,2017. AFrameworkforRobustAssessmentofPowerGridStabilityandResiliency. IEEETransactionsonAutomaticControl 62, 1165–1177. Wang, S., Yu, X., Perdikaris, P.,

work page 2017
[12]

Journal of Computational Physics 449, 110768

When and why PINNs fail to train: A neural tangent kernel perspective. Journal of Computational Physics 449, 110768. Xiao,T.,Chen,Y.,Huang,S.,He,T.,Guan,H.,2023. FeasibilityStudyofNeuralODEandDAEModulesforPowerSystemDynamicComponent Modeling. IEEE Transactions on Power Systems 38, 2666–2678. Zhang, J., Domínguez-García, A.D.,

work page 2023
[13]

On the failure of power system automatic generation control due to measurement noise, in: 2014 IEEE PES General Meeting| Conference & Exposition, IEEE. pp. 1–5. Kang et al.:Preprint submitted to ElsevierPage 14 of 15 Knowledge-Integrated Neural Modeling for the SMIB Systems Zhong, Q.C., Weiss, G.,

work page 2014
[14]

PINNs-Driven Transient Estimation in Power Systems with the Second-Order Kuramoto Model, in: 2025 IEEE 14th Data Driven Control and Learning Systems (DDCLS), IEEE, Wuxi, China. pp. 428–433. Kang et al.:Preprint submitted to ElsevierPage 15 of 15 Knowledge-Integrated Neural Modeling for the SMIB Systems 𝑉∞ 𝑋 𝐸 Figure 1:Single machine infinite bus (SMIB) sy...

work page 2025