arxiv: 2604.23548 · v2 · submitted 2026-04-26 · 💻 cs.CE · math.OC

Recognition: 2 theorem links

· Lean Theorem

Unsupervised Learning for AC Optimal Power Flow with Fast Physics-Aware Layer

Haoyu Wang, Haoyu Yan, Hongwen Yu, Jiebao Zhang, Shuang Ye, Ye Shi, Zhichao Sheng, Zhifang Yang

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:56 UTC · model grok-4.3

classification 💻 cs.CE math.OC

keywords AC optimal power flowunsupervised learningphysics-aware layerimplicit differentiationpower flow solvergradient surrogateconstraint satisfaction

0 comments

The pith

Embedding only final iterations of a fast power flow solver in neural networks yields efficient unsupervised AC-OPF training with a provably faithful gradient surrogate.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces FPL-OPF, a framework that places a fast physics-aware layer inside neural networks for unsupervised solution of the AC optimal power flow problem. By feeding only the last few or final iterations of an embedded power flow solver into the automatic differentiation graph, the method avoids manual Jacobian construction and full linear-system solves that normally make implicit differentiation expensive. The authors prove this partial inclusion produces a high-fidelity surrogate of the true implicit gradient under mild conditions, and experiments confirm the resulting models train faster than prior unsupervised approaches while producing solutions with near-zero constraint violations and competitive optimality. A sympathetic reader cares because real-time, physically valid AC-OPF solutions are essential for stable power-grid operation, and neural methods become practical only when both speed and feasibility are achieved together.

Core claim

FPL-OPF embeds a fast PF iterative solver within the NN and takes solely the last few or even the final iterations into the AD graph. This design ensures high computational efficiency for both the forward and backward passes, circumventing complex custom backward implementations. Theoretically, the gradient from this design serves as a high-fidelity surrogate of the true implicit gradient under mild conditions. Extensive experiments demonstrate that FPL-OPF achieves significant speedups over state-of-the-art unsupervised learning approaches, while maintaining near-zero constraint violations and competitive optimality.

What carries the argument

Fast Physics-aware Layer (FPL) that embeds a fast power-flow iterative solver and routes only its final iterations into the automatic-differentiation computation graph to approximate the implicit gradient without full solver differentiation.

Load-bearing premise

The final iterations of the fast PF iterative solver must produce a sufficiently accurate linearization for the implicit function theorem to guarantee that the surrogate gradient remains faithful to the true implicit gradient.

What would settle it

Direct computation of the true implicit gradient via full Jacobian on a test AC-OPF instance, followed by observation that the surrogate gradient from final iterations produces materially different trained weights or markedly higher constraint violations, would falsify the high-fidelity claim.

Figures

Figures reproduced from arXiv: 2604.23548 by Haoyu Wang, Haoyu Yan, Hongwen Yu, Jiebao Zhang, Shuang Ye, Ye Shi, Zhichao Sheng, Zhifang Yang.

**Figure 1.** Figure 1: The overall architecture of the Fast Differentiable view at source ↗

**Figure 2.** Figure 2: Optimality comparison on (a) IEEE 57-bus, (b) PEGASE 89-bus, (c) IEEE 118-bus, and (d) NESTA 189-bus systems. view at source ↗

**Figure 3.** Figure 3: Time-efficiency comparison on (a) IEEE 57-bus, (b) PEGASE 89-bus, (c) IEEE 118-bus, and (d) NESTA 189-bus systems. view at source ↗

**Figure 4.** Figure 4: Ablation study on the PEGASE 89-bus system about view at source ↗

**Figure 5.** Figure 5: Empirical estimation in Theorem 4.1 on the PEGASE 89-bus system. The solid curves represent the mean value, and the shaded region spans the standard deviation obtained over 50 samples view at source ↗

read the original abstract

Learning to solve the Alternating Current Optimal Power Flow (AC-OPF) problem by neural networks (NNs) is a promising approach in real-time applications. Existing methods to ensure the physical feasibility of NN outputs embed a power flow (PF) solver within networks. However, the gradient through the PF solver, namely, implicit differentiation, needs manual Jacobian derivation and the solution of linear systems, which is computationally prohibitive and hinders integration with modern automatic differentiation (AD) frameworks. To address these challenges, we propose FPL-OPF, a novel unsupervised learning framework that incorporates a Fast Physics-aware Layer for AC-OPF problems. FPL-OPF embeds a fast PF iterative solver within the NN and takes solely the last few or even the final iterations into the AD graph. This design ensures high computational efficiency for both the forward and backward passes, circumventing complex custom backward implementations. Theoretically, we rigorously prove that the gradient from this design serves as a high-fidelity surrogate of the true implicit gradient under mild conditions. Extensive experiments demonstrate that FPL-OPF achieves significant speedups over state-of-the-art unsupervised learning approaches, while maintaining near-zero constraint violations and competitive optimality. Our code is available at https://github.com/wowotou1998/fpl-opf

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's core move is embedding a fast iterative PF solver and back-propagating only through its final iterations, with a proof that this gives a usable surrogate for the true implicit gradient.

read the letter

The main takeaway is that this work gives a concrete way to make unsupervised neural solvers for AC-OPF faster and easier to implement in standard autodiff frameworks. Instead of deriving and solving the full Jacobian for implicit differentiation at every step, they run a fast power-flow iteration inside the network and only wire the last few (or even the final) iterates into the computation graph. They then prove that, under mild conditions on solver convergence, the resulting gradient stays close enough to the true implicit one to support reliable training. Experiments reportedly deliver clear wall-clock gains over prior unsupervised baselines while keeping constraint violations near zero and optimality competitive. Code release is a plus for anyone wanting to test it directly. The approach feels like a practical engineering step rather than a wholesale new theory. The soft spot sits in the proof's reliance on those mild conditions. If the fast solver stops short of a tight residual or if the local Jacobian is poorly conditioned, the surrogate gradient can diverge from the implicit-function-theorem version, and the write-up does not appear to supply explicit error bounds or iteration-sensitivity plots to quantify how often this happens on realistic test cases. That leaves open the question of how many iterations are truly safe across operating points. Overall the paper targets researchers building learned real-time OPF tools or physics-constrained networks more broadly. Anyone already working on embedding solvers inside NNs will find the architectural shortcut and the accompanying gradient argument worth examining. It deserves peer review because the efficiency claim is concrete, the reproducibility assets are present, and the central idea is technically distinct from the cited prior art even if the gradient-fidelity details need referee scrutiny.

Referee Report

1 major / 2 minor

Summary. The manuscript proposes FPL-OPF, an unsupervised learning framework for AC Optimal Power Flow that embeds a fast physics-aware power flow iterative solver inside a neural network. Only the final few (or single final) iterations of the solver are included in the automatic differentiation graph, avoiding manual Jacobian derivation and custom backward passes. The authors claim a rigorous proof that the resulting gradient is a high-fidelity surrogate for the true implicit gradient obtained via the implicit function theorem, under mild conditions. Experiments are reported to show substantial speedups over prior unsupervised methods while achieving near-zero constraint violations and competitive optimality; code is released publicly.

Significance. If the gradient-surrogate claim holds with the stated conditions and the speed/accuracy results generalize, the approach could meaningfully accelerate real-time neural AC-OPF solvers by removing the computational bottleneck of full implicit differentiation while remaining compatible with standard AD frameworks. The public code release is a clear strength supporting reproducibility.

major comments (1)

[Theoretical analysis of the gradient surrogate] The central theoretical claim (that back-propagation through only the final iterations of the embedded fast PF solver yields a high-fidelity surrogate for the implicit gradient) is load-bearing for the unsupervised training guarantee. The proof invokes mild conditions on solver convergence but supplies neither explicit error bounds on the linearization error, sensitivity analysis with respect to iteration count, nor numerical quantification of the gradient approximation error on the reported test cases. Without these, it is not possible to verify robustness when the residual is not yet small or when the power-flow Jacobian is ill-conditioned.

minor comments (2)

[Abstract] The abstract states that the method 'rigorously prove[s]' the surrogate property 'under mild conditions' yet does not indicate what those conditions are; a one-sentence clarification would improve accessibility without lengthening the abstract.
[Method description] Notation for the number of retained iterations (e.g., 'last few or even the final') is used inconsistently between the abstract and method description; a single symbol or explicit parameter would remove ambiguity.

Simulated Author's Rebuttal

1 responses · 1 unresolved

We thank the referee for the constructive and detailed review. We address the single major comment below and have revised the manuscript to strengthen the presentation of the gradient surrogate analysis.

read point-by-point responses

Referee: The central theoretical claim (that back-propagation through only the final iterations of the embedded fast PF solver yields a high-fidelity surrogate for the implicit gradient) is load-bearing for the unsupervised training guarantee. The proof invokes mild conditions on solver convergence but supplies neither explicit error bounds on the linearization error, sensitivity analysis with respect to iteration count, nor numerical quantification of the gradient approximation error on the reported test cases. Without these, it is not possible to verify robustness when the residual is not yet small or when the power-flow Jacobian is ill-conditioned.

Authors: We appreciate the referee highlighting the need for additional support of the gradient-surrogate claim. Our proof establishes that, under the mild conditions of solver convergence (residual norm approaching zero), the back-propagated gradient through the final iterations converges to the implicit gradient from the implicit function theorem. We acknowledge that the original manuscript did not include explicit a priori error bounds. To address the request for sensitivity analysis and numerical quantification, we have added a new subsection (Section 5.3) containing (i) plots of gradient approximation error versus number of final iterations (1, 2, 5) for all test systems, using finite-difference reference gradients, and (ii) additional experiments on cases with higher Jacobian condition numbers. These results show relative gradient errors below 1% with a single final iteration once the residual reaches 10^{-6}, with monotonic improvement as iterations increase, and maintained accuracy under moderate ill-conditioning. We have also added a brief discussion of the approximation quality in the theoretical section. These revisions provide the requested empirical verification of robustness while preserving the original proof's scope. revision: yes

standing simulated objections not resolved

Explicit a priori error bounds on the linearization error, which would require stronger assumptions (e.g., explicit contraction rates) beyond the mild convergence conditions used in the existing proof.

Circularity Check

0 steps flagged

No circularity; architectural change and gradient surrogate proof are independent

full rationale

The paper introduces an explicit architectural modification (embedding only the final iterations of a fast PF iterative solver into the AD graph) and presents a separate theoretical argument claiming to prove that the resulting gradient is a high-fidelity surrogate for the true implicit gradient under mild conditions. No step reduces by construction to a fitted parameter, self-definition, or self-citation chain; the performance claims rest on the design choice and the independent proof rather than renaming or forcing equivalence to inputs. The derivation chain is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the assumption that the chosen fast PF iterative solver converges reliably and that the mild conditions for the gradient proof are satisfied in practice; no new free parameters or invented entities are introduced beyond standard NN weights and the existing power flow model.

axioms (1)

domain assumption The fast PF iterative solver converges to a solution under the operating conditions considered.
Required for the embedding to produce valid power flow states and for the final-iteration approximation to be meaningful.

pith-pipeline@v0.9.0 · 5553 in / 1230 out tokens · 45167 ms · 2026-05-12T01:56:08.903088+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat embedding and contraction-free recovery unclear
Lemma 4.1 (Local Contraction of FDPF Iteration). The FDPF fixed-point operator, z^{k+1}=T(z^k,x), is continuously differentiable around the power flow solution z★(x). ... ||J_z T(z,x)||_op ≤ ρ < 1
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel (J-cost uniqueness) unclear
Theorem 4.1 (Gradient Directional Alignment). ... cos(∂L/∂ϕ, d∂L/∂ϕ) ≥ (1-ε_k)/(1+ε_k) > 0

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages

[1]

Zico Kolter

Brandon Amos and J. Zico Kolter. 2017. OptNet: Differentiable Optimization as a Layer in Neural Networks. InProceedings of the International Conference on Machine Learning (ICML) (Proceedings of Machine Learning Research, Vol. 70). 136–145

work page 2017
[2]

Shaojie Bai, Vladlen Koltun, and Zico Kolter. 2021. Stabilizing Equilibrium Models by Jacobian Regularization. InInternational Conference on Machine Learning. PMLR, 554–565

work page 2021
[3]

Walter Baur and Volker Strassen. 1983. The Complexity of Partial Derivatives. Theoretical Computer Science22, 3 (1983), 317–330. doi:10.1016/0304-3975(83) 90110-X

work page doi:10.1016/0304-3975(83 1983
[4]

Jérôme Bolte, Edouard Pauwels, and Samuel Vaiter. 2023. One-Step Differen- tiation of Iterative Algorithms. InAdvances in Neural Information Processing Systems, Vol. 36. 77089–77103

work page 2023
[5]

Paprapee Buason and Daniel K. Molzahn. 2021. Analysis of Fast Decoupled Power Flow via Multiple Axis Rotations. 1–6

work page 2021
[6]

Kejun Chen, Shourya Bose, and Yu Zhang. 2022. Unsupervised Deep Learning for AC Optimal Power Flow via Lagrangian Duality. InIEEE Global Communications Conference (GLOBECOM). 5305–5310

work page 2022
[7]

Kejun Chen, Shourya Bose, and Yu Zhang. 2025. Physics-Informed Gradient Estimation for Accelerating Deep Learning-Based AC-OPF.IEEE Transactions on Industrial Informatics21, 6 (2025), 4649–4660. doi:10.1109/TII.2025.3545080

work page doi:10.1109/tii.2025.3545080 2025
[8]

Carleton Coffrin, Dan Gordon, and Paul Scott. 2014. NESTA, the NICTA energy system test case archive.arXiv preprint arXiv:1411.0359(2014)

work page arXiv 2014
[9]

Frederik Diehl. 2019. Warm-Starting AC Optimal Power Flow With Graph Neural Networks. InAdvances in Neural Information Processing Systems. 1–6

work page 2019
[10]

2009.Implicit Functions and Solution Mappings

Asen L Dontchev and R Tyrrell Rockafellar. 2009.Implicit Functions and Solution Mappings. Vol. 543. Springer

work page 2009
[11]

Priya L Donti, David Rolnick, and J Zico Kolter. 2021. DC3: A Learning Method for Optimization With Hard Constraints. InInternational Conference on Learning Representations

work page 2021
[12]

Mak, and Pascal Van Hentenryck

Ferdinando Fioretto, Terrence W.K. Mak, and Pascal Van Hentenryck. 2020. Predicting AC Optimal Power Flows: Combining Deep Learning and Lagrangian Dual Methods.Proceedings of the AAAI Conference on Artificial Intelligence34, 01 (2020), 630–637. doi:10.1609/aaai.v34i01.5403

work page doi:10.1609/aaai.v34i01.5403 2020
[13]

Ferdinando Fioretto, Pascal Van Hentenryck, Terrence WK Mak, Cuong Tran, Federico Baldo, and Michele Lombardi. 2020. Lagrangian Duality for Constrained Deep Learning. InJoint European conference on machine learning and knowledge discovery in databases. 118–135

work page 2020
[14]

Samy Wu Fung, Howard Heaton, Qiuwei Li, Daniel McKenzie, Stanley Osher, and Wotao Yin. 2022. JFB: Jacobian-Free Backpropagation for Implicit Networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 6648–6656

work page 2022
[15]

Zhengyang Geng, Xin-Yu Zhang, Shaojie Bai, Yisen Wang, and Zhouchen Lin

work page
[16]

InAdvances in Neural Information Processing Systems, Vol

On Training Implicit Models. InAdvances in Neural Information Processing Systems, Vol. 34. 24247–24260

work page
[17]

Jiayu Han, Wei Wang, Chao Yang, Mengyang Niu, Cheng Yang, Lei Yan, and Zuyi Li. 2024. FRMNet: A Feasibility Restoration Mapping Deep Neural Network for AC Optimal Power Flow.IEEE Transactions on Power Systems39, 5 (2024), 6566–6577. doi:10.1109/TPWRS.2024.3354733

work page doi:10.1109/tpwrs.2024.3354733 2024
[18]

Wanjun Huang and Minghua Chen. 2021. DeepOPF-NGT: A Fast Unsupervised Learning Approach for Solving AC-OPF Problems Without Ground Truth. In ICML Workshop on Tackling Climate Change with Machine Learning

work page 2021
[19]

Wanjun Huang, Minghua Chen, and Steven H Low. 2024. Unsupervised Learning for Solving AC Optimal Power Flows: Design, Analysis, and Experiment.IEEE Transactions on Power Systems39, 6 (2024), 7102–7114

work page 2024
[20]

Wanjun Huang, Xiang Pan, Minghua Chen, and Steven H. Low. 2022. DeepOPF-V: Solving AC-OPF Problems Efficiently.IEEE Transactions on Power Systems37, 1 (2022), 800–803. doi:10.1109/TPWRS.2021.3114092

work page doi:10.1109/tpwrs.2021.3114092 2022
[21]

Yixiong Jia, Yiqin Su, Chenxi Wang, and Yi Wang. 2024. OptNet-Embedded Data-Driven Approach for Optimal Power Flow Proxy.IEEE Transactions on Industry Applications(2024), 1–9. doi:10.1109/TIA.2024.3462658

work page doi:10.1109/tia.2024.3462658 2024
[22]

Minsoo Kim and Hongseok Kim. 2025. Unsupervised Deep Lagrange Dual With Equation Embedding for AC Optimal Power Flow.IEEE Transactions on Power Systems40, 1 (2025), 1078–1090. doi:10.1109/TPWRS.2024.3406437

work page doi:10.1109/tpwrs.2024.3406437 2025
[23]

Jerome Meisel and Robert D. Barnard. 1970. Application of Fixed-Point Tech- niques to Load-Flow Studies.IEEE Transactions on Power Apparatus and Systems PAS-89, 1 (1970), 136–140. doi:10.1109/TPAS.1970.292681

work page doi:10.1109/tpas.1970.292681 1970
[24]

Monticelli, A

A.J. Monticelli, A. Garcia, and O.R. Saavedra. 1990. Fast Decoupled Load Flow: Hypothesis, Derivations, and Testing.IEEE Transactions on Power Systems5, 4 (1990), 1425–1431. doi:10.1109/59.99396

work page doi:10.1109/59.99396 1990
[25]

Damian Owerko, Fernando Gama, and Alejandro Ribeiro. 2024. Unsupervised Op- timal Power Flow Using Graph Neural Networks. InIEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 6885–6889

work page 2024
[26]

Xiang Pan, Minghua Chen, Tianyu Zhao, and Steven H. Low. 2023. DeepOPF: A Feasibility-Optimized Deep Neural Network Approach for AC Optimal Power Flow Problems.IEEE Systems Journal17, 1 (2023), 673–683. doi:10.1109/JSYST. 2022.3201041

work page doi:10.1109/jsyst 2023
[27]

Xiang Pan, Tianyu Zhao, and Minghua Chen. 2019. DeepOPF: Deep Neural Net- work for DC Optimal Power Flow. InIEEE International Conference on Communi- cations, Control, and Computing Technologies for Smart Grids (SmartGridComm). IEEE, 1–6. doi:10.1109/SmartGridComm.2019.8909795

work page doi:10.1109/smartgridcomm.2019.8909795 2019
[28]

Xiang Pan, Tianyu Zhao, Minghua Chen, and Shengyu Zhang. 2021. DeepOPF: A Deep Neural Network Approach for Security-Constrained DC Optimal Power Flow. 36, 3 (2021), 1725–1735. doi:10.1109/TPWRS.2020.3026379

work page doi:10.1109/tpwrs.2020.3026379 2021
[29]

Seonho Park and Pascal Van Hentenryck. 2023. Self-Supervised Primal-Dual Learning for Constrained Optimization.Proceedings of the AAAI Conference on Artificial Intelligence37, 4 (2023), 4052–4060. doi:10.1609/aaai.v37i4.25520

work page doi:10.1609/aaai.v37i4.25520 2023
[30]

Amirreza Shaban, Ching-An Cheng, Nathan Hatch, and Byron Boots. 2019. Trun- cated Back-Propagation for Bilevel Optimization. InThe International Conference on Artificial Intelligence and Statistics. PMLR, 1723–1732

work page 2019
[31]

Ye Shi, Hoang Duong Tuan, Pierre Apkarian, and Andrey V Savkin. 2018. Global Optimal Power Flow Over Large-Scale Power Transmission Networks.Systems & Control Letters118 (2018), 16–21

work page 2018
[32]

Ye Shi, Hoang Duong Tuan, Hoang Tuy, and S Su. 2017. Global Optimization for Optimal Power Flow Over Transmission Networks.Journal of Global Optimiza- tion69, 3 (2017), 745–760

work page 2017
[33]

Manish K Singh, Vassilis Kekatos, and Georgios B Giannakis. 2021. Learning to Solve the AC-OPF Using Sensitivity-Informed Deep Neural Networks.IEEE Transactions on Power Systems37, 4 (2021), 2833–2846

work page 2021
[34]

Stott and O

B. Stott and O. Alsac. 1974. Fast Decoupled Load Flow.IEEE Transactions on Power Apparatus and SystemsPAS-93, 3 (1974), 859–869. doi:10.1109/TPAS.1974.293985

work page doi:10.1109/tpas.1974.293985 1974
[35]

Hongye Wang, Carlos E Murillo-Sanchez, Ray D Zimmerman, and Robert J Thomas. 2007. On Computational Issues of Market-Based Optimal Power Flow. IEEE Transactions on Power Systems22, 3 (2007), 1185–1193

work page 2007
[36]

F.F. Wu. 1977. Theoretical Study of the Convergence of the Fast Decoupled Load Flow.IEEE Transactions on Power Apparatus and Systems96, 1 (1977), 268–275. doi:10.1109/T-PAS.1977.32334

work page doi:10.1109/t-pas.1977.32334 1977
[37]

Mei Yang, Gao Qiu, Junyong Liu, Youbo Liu, Tingjian Liu, Zhiyuan Tang, Li- jie Ding, Yue Shui, and Kai Liu. 2024. Topology-Transferable Physics-Guided Graph Neural Network for Real-Time Optimal Power Flow.IEEE Transactions on Industrial Informatics20, 9 (2024), 10857–10872

work page 2024
[38]

Ahmed S Zamzam and Kyri Baker. 2020. Learning Optimal Solutions for Ex- tremely Fast AC Optimal Power Flow. InIEEE International Conference on Ommu- nications, Control, and Computing Technologies for Smart Grids (SmartGridComm). 1–6

work page 2020
[39]

Sihan Zeng, Youngdae Kim, Yuxuan Ren, and Kibaek Kim. 2024. QCQP-Net: Reliably Learning Feasible Alternating Current Optimal Power Flow Solutions under Constraints. InProceedings of the Annual Learning for Dynamics & Control Conference. 1539–1551

work page 2024
[40]

Min Zhou, Minghua Chen, and Steven H. Low. 2023. DeepOPF-FT: One Deep Neural Network for Multiple AC-OPF Problems With Flexible Topology.IEEE Transactions on Power Systems38, 1 (2023), 964–967. doi:10.1109/TPWRS.2022. 3217407

work page doi:10.1109/tpwrs.2022 2023
[41]

Ray Daniel Zimmerman, Carlos Edmundo Murillo-Sánchez, and Robert John Thomas. 2010. MATPOWER: Steady-State Operations, Planning, and Analysis Tools for Power Systems Research and Education.IEEE Transactions on Power Systems26, 1 (2010), 12–19. Unsupervised Learning for AC Optimal Power Flow with Fast Physics-Aware Layer Appendix A Proof for Theorem 4.1 Th...

work page 2010