pith. sign in

arxiv: 2307.16830 · v2 · submitted 2023-07-31 · 🧮 math.OC · cs.DC

Accelerating Optimal Power Flow with GPUs: SIMD Abstraction of Nonlinear Programs and Condensed-Space Interior-Point Methods

Pith reviewed 2026-05-24 07:44 UTC · model grok-4.3

classification 🧮 math.OC cs.DC
keywords ACOPFGPU accelerationInterior-point methodsNonlinear programmingOptimal power flowSIMD abstractionCondensed-space methodsKKT system
0
0 comments X

The pith

A SIMD abstraction of nonlinear programs plus condensed-space interior-point methods with inequality relaxation lets GPUs solve ACOPF an order of magnitude faster than CPU tools.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that two changes together move the bulk of ACOPF computation onto GPUs while keeping all data in device memory. A single-instruction multiple-data view of the nonlinear program preserves the structure needed for parallel automatic differentiation. A condensed-space interior-point method relaxes inequalities so the KKT system becomes positive definite and can be factored without pivoting. When both are used, the resulting codes run roughly ten times faster on NVIDIA GPUs than current CPU solvers on the same problems.

Core claim

By representing nonlinear programs in SIMD form and applying a condensed-space interior-point method that relaxes inequalities, the KKT system reduces to a positive definite matrix that factors without numerical pivoting; the majority of operations then execute on the GPU with data never leaving device memory, producing an order-of-magnitude speedup over state-of-the-art CPU implementations for ACOPF.

What carries the argument

SIMD abstraction of nonlinear programs together with condensed-space interior-point method that relaxes inequalities and condenses the KKT system into a positive definite form factorizable without pivoting.

If this is right

  • Parallel automatic differentiation becomes straightforward because the SIMD abstraction keeps the parallelizable structure of the model equations.
  • The KKT matrix can be factorized without pivoting, removing a major barrier to GPU parallelization of interior-point methods.
  • All major operations stay on the GPU with data resident in device memory, eliminating host-device transfer overhead.
  • The same framework applies to the implementations MadNLP.jl and ExaModels.jl and delivers the reported order-of-magnitude speedup versus contemporary CPU tools.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach may extend to other large-scale nonlinear programs whose sparsity patterns allow the same condensation step.
  • Real-time or look-ahead optimal power flow on large grids could become practical if the observed speedups hold at transmission-scale sizes.
  • Similar SIMD-plus-condensation patterns might accelerate other optimization problems that currently rely on sparse direct solvers.

Load-bearing premise

The condensed-space formulation with inequality relaxation produces solutions and convergence behavior equivalent to the original problem across the ACOPF instances that arise in practice.

What would settle it

A side-by-side run on standard ACOPF test cases in which the GPU solver returns objective values or constraint violations that differ by more than solver tolerance from a trusted CPU solver, or fails to converge while the CPU solver succeeds.

Figures

Figures reproduced from arXiv: 2307.16830 by Fran\c{c}ois Pacaud, Mihai Anitescu, Sungho Shin.

Figure 1
Figure 1. Figure 1: Speedup achieved by using GPUs. external packages are called from Julia, through thin wrapper packages, such as Ipopt.jl and AmplNLWriters.jl. A tolerance of 10−4 is set for MadNLP.jl and Ipopt solvers, with other solver options adjusted to ensure a fair comparison across different solvers. The results can be reproduced with the script available at https://github.com/sshin23/opf-on-gpu. B. Results The nume… view at source ↗
read the original abstract

This paper introduces a framework for solving alternating current optimal power flow (ACOPF) problems using graphics processing units (GPUs). While GPUs have demonstrated remarkable performance in various computing domains, their application in ACOPF has been limited due to challenges associated with porting sparse automatic differentiation (AD) and sparse linear solver routines to GPUs. We address these issues with two key strategies. First, we utilize a single-instruction, multiple-data abstraction of nonlinear programs. This approach enables the specification of model equations while preserving their parallelizable structure and, in turn, facilitates the parallel AD implementation. Second, we employ a condensed-space interior-point method (IPM) with an inequality relaxation. This technique involves condensing the Karush--Kuhn--Tucker (KKT) system into a positive definite system. This strategy offers the key advantage of being able to factorize the KKT matrix without numerical pivoting, which has hampered the parallelization of the IPM algorithm. By combining these strategies, we can perform the majority of operations on GPUs while keeping the data residing in the device memory only. Comprehensive numerical benchmark results showcase the advantage of our approach. Remarkably, our implementations -- MadNLP.jl and ExaModels.jl -- running on NVIDIA GPUs achieve an order of magnitude speedup compared with state-of-the-art tools running on contemporary CPUs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper introduces a SIMD abstraction for nonlinear programs and a condensed-space interior-point method with inequality relaxation to enable GPU-accelerated solution of ACOPF problems. Implementations MadNLP.jl and ExaModels.jl are reported to deliver roughly 10x speedups on NVIDIA GPUs versus contemporary CPU-based solvers on benchmark instances.

Significance. If the condensed formulation is shown to produce solutions and convergence behavior equivalent (within tolerance) to standard IPMs on non-convex ACOPF, the work would provide a practical route to real-time large-scale power-system optimization on commodity GPUs. The open-source Julia packages and focus on avoiding pivoting are concrete strengths.

major comments (1)
  1. [condensed-space IPM derivation] The section describing the condensed-space IPM (around the derivation of the positive-definite KKT system after inequality relaxation): no statement, proof, or exhaustive numerical check is given that the relaxation preserves the original KKT conditions, active-set behavior, and optimal solutions for non-convex ACOPF instances in which inequalities bind at optimality. Without such verification on PGLib cases, the reported GPU speedups cannot be directly compared to CPU baselines that solve the unrelaxed problem.
minor comments (2)
  1. [numerical results] Abstract and results section: benchmark tables lack error bars, explicit test-case selection criteria, and verification that relaxed and unrelaxed solutions match within solver tolerance.
  2. [SIMD abstraction] Notation for the SIMD abstraction could be clarified with a small worked example showing how a single model equation maps to the parallel AD kernel.

Simulated Author's Rebuttal

1 responses · 0 unresolved

Thank you for the detailed review. We appreciate the referee's focus on the equivalence of the relaxed formulation. We address the major comment below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [condensed-space IPM derivation] The section describing the condensed-space IPM (around the derivation of the positive-definite KKT system after inequality relaxation): no statement, proof, or exhaustive numerical check is given that the relaxation preserves the original KKT conditions, active-set behavior, and optimal solutions for non-convex ACOPF instances in which inequalities bind at optimality. Without such verification on PGLib cases, the reported GPU speedups cannot be directly compared to CPU baselines that solve the unrelaxed problem.

    Authors: We acknowledge that the manuscript does not include a formal proof or exhaustive numerical verification of equivalence for the inequality relaxation in non-convex cases. The relaxation is introduced to obtain a positive-definite KKT system that avoids pivoting, but we agree that demonstrating preservation of KKT conditions and solutions is important for validating the speedups against unrelaxed baselines. In the revised manuscript, we will add a dedicated subsection providing both a brief theoretical argument for why the relaxation preserves the optimal solutions when inequalities bind (by showing that the relaxed problem is equivalent at optimality) and numerical comparisons on selected PGLib cases, confirming that the condensed IPM yields solutions and convergence behavior equivalent within solver tolerance to standard IPM implementations. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on external benchmarks

full rationale

The paper presents an engineering framework combining SIMD abstraction of NLPs and condensed-space IPM with inequality relaxation to enable GPU execution of ACOPF. The central result is an empirical order-of-magnitude speedup versus external CPU solvers on numerical benchmarks. No derivation step reduces by construction to a fitted parameter renamed as prediction, a self-citation chain that bears the load of the main claim, or a self-definitional equivalence. The implementations (MadNLP.jl, ExaModels.jl) are presented as the vehicle for the reported timings rather than as the source of the performance numbers themselves. The method is self-contained against external benchmarks and therefore receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the central claim rests on the unstated assumption that the condensed formulation is numerically equivalent to the original KKT system for the tested instances.

pith-pipeline@v0.9.0 · 5785 in / 1128 out tokens · 45693 ms · 2026-05-24T07:44:21.847388+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. GATO: GPU-Accelerated and Batched Trajectory Optimization for Scalable Edge Model Predictive Control

    cs.RO 2025-10 unverdicted novelty 6.0

    GATO is a new batched GPU trajectory optimization solver that achieves real-time MPC throughput with 18-21x speedups over CPU baselines for tens to low-hundreds of simultaneous solves.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · cited by 1 Pith paper

  1. [1]

    Targeting Exascale with Julia on GPUs for multiperiod optimization with scenario constraints,

    M. Anitescu, K. Kim, Y . Kim, A. Maldonado, F. Pacaud, V . Rao, M. Schanen, S. Shin, and A. Subramanian, “Targeting Exascale with Julia on GPUs for multiperiod optimization with scenario constraints,” SIAG/OPT Views and News , 2021

  2. [2]

    ExaModels.jl

    S. Shin, “ExaModels.jl.” https://github.com/sshin23/ExaModels.jl

  3. [3]

    MadNLP.jl

    S. Shin and F. Pacaud, “MadNLP.jl.” https://github.com/MadNLP/ MadNLP.jl

  4. [4]

    JuMP: A modeling language for mathematical optimization,

    I. Dunning, J. Huchette, and M. Lubin, “JuMP: A modeling language for mathematical optimization,” SIAM review, vol. 59, no. 2, pp. 295–320, 2017

  5. [5]

    A modeling language for mathematical programming,

    R. Fourer, D. M. Gay, and B. W. Kernighan, “A modeling language for mathematical programming,” Management Science , vol. 36, no. 5, pp. 519–554, 1990

  6. [6]

    Linear solvers for power grid optimization problems: a review of GPU-accelerated linear solvers,

    K. ´Swirydowicz, E. Darve, W. Jones, J. Maack, S. Regev, M. A. Saunders, S. J. Thomas, and S. Pele ˇs, “Linear solvers for power grid optimization problems: a review of GPU-accelerated linear solvers,” Parallel Computing, vol. 111, p. 102870, 2022

  7. [7]

    Nocedal and S

    J. Nocedal and S. J. Wright, Numerical optimization . Springer, 2006

  8. [8]

    The power grid library for benchmarking ac optimal power flow algorithms,

    S. Babaeinejadsarookolaee, A. Birchfield, R. D. Christie, C. Coffrin, C. DeMarco, R. Diao, M. Ferris, S. Fliscounakis, S. Greene, R. Huang, et al., “The power grid library for benchmarking AC optimal power flow algorithms,” arXiv preprint arXiv:1908.02788 , 2019

  9. [9]

    An augmented Lagrangian interior- point approach for large-scale NLP problems on graphics processing units,

    Y . Cao, A. Seth, and C. D. Laird, “An augmented Lagrangian interior- point approach for large-scale NLP problems on graphics processing units,” Computers & Chemical Engineering , vol. 85, pp. 76–83, 2016

  10. [10]

    Parallel interior-point solver for block-structured nonlinear programs on SIMD/GPU architectures,

    F. Pacaud, M. Schanen, S. Shin, D. A. Maldonado, and M. Anitescu, “Parallel interior-point solver for block-structured nonlinear programs on SIMD/GPU architectures,” arXiv preprint arXiv:2301.04869 , 2023

  11. [11]

    A feasible reduced space method for real-time optimal power flow,

    F. Pacaud, D. A. Maldonado, S. Shin, M. Schanen, and M. Anitescu, “A feasible reduced space method for real-time optimal power flow,” Electric Power Systems Research , vol. 212, p. 108268, 2022

  12. [12]

    Accelerating condensed interior-point methods on SIMD/GPU archi- tectures,

    F. Pacaud, S. Shin, M. Schanen, D. A. Maldonado, and M. Anitescu, “Accelerating condensed interior-point methods on SIMD/GPU archi- tectures,” Journal of Optimization Theory and Applications , pp. 1–20, 2023

  13. [13]

    Newton’s method for large bound-constrained optimization problems,

    C.-J. Lin and J. J. Mor ´e, “Newton’s method for large bound-constrained optimization problems,” SIAM Journal on Optimization , vol. 9, no. 4, pp. 1100–1127, 1999

  14. [14]

    Accelerated computation and tracking of AC optimal power flow solutions using GPUs,

    Y . Kim and K. Kim, “Accelerated computation and tracking of AC optimal power flow solutions using GPUs,” in Workshop Proceedings of the 51st International Conference on Parallel Processing , pp. 1–8, 2022

  15. [15]

    Leveraging GPU batching for scalable nonlinear programming through massive lagrangian decomposition,

    Y . Kim, F. Pacaud, K. Kim, and M. Anitescu, “Leveraging GPU batching for scalable nonlinear programming through massive lagrangian decomposition,” arXiv preprint arXiv:2106.14995 , 2021

  16. [16]

    HiOp – User Guide,

    C. G. Petra, N. Chiang, and J. Wang, “HiOp – User Guide,” Tech. Rep. LLNL-SM-743591, Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, 2018

  17. [17]

    HyKKT: a hybrid direct-iterative method for solving KKT linear systems,

    S. Regev, N.-Y . Chiang, E. Darve, C. G. Petra, M. A. Saunders, K. ´Swirydowicz, and S. Peleˇs, “HyKKT: a hybrid direct-iterative method for solving KKT linear systems,” Optimization Methods and Software , vol. 38, no. 2, pp. 332–355, 2023

  18. [18]

    Gravity: A mathematical modeling language for optimization and machine learning,

    H. Hijazi, G. Wang, and C. Coffrin, “Gravity: A mathematical modeling language for optimization and machine learning,” Machine Learning Open Source Software Workshop at NeurIPS 2018 , 2018. Available at www.gravityopt.com

  19. [19]

    A sparse and condensed QP formulation for predictive control of LTI systems,

    J. L. Jerez, E. C. Kerrigan, and G. A. Constantinides, “A sparse and condensed QP formulation for predictive control of LTI systems,” Automatica, vol. 48, no. 5, pp. 999–1002, 2012

  20. [20]

    Exploiting GPU/SIMD architectures for solving linear-quadratic MPC problems,

    D. Cole, S. Shin, F. Pacaud, V . M. Zavala, and M. Anitescu, “Exploiting GPU/SIMD architectures for solving linear-quadratic MPC problems,” in 2023 American Control Conference (ACC) , pp. 3995–4000, IEEE, 2023

  21. [21]

    Structured nonconvex optimization of large-scale energy systems using PIPS-NLP,

    N. Chiang, C. G. Petra, and V . M. Zavala, “Structured nonconvex optimization of large-scale energy systems using PIPS-NLP,” in 2014 Power Systems Computation Conference , pp. 1–7, IEEE, 2014

  22. [22]

    Scalable parallel nonlinear optimization with PyNumero and Parapint,

    J. S. Rodriguez, R. B. Parker, C. D. Laird, B. L. Nicholson, J. D. Siirola, and M. L. Bynum, “Scalable parallel nonlinear optimization with PyNumero and Parapint,” INFORMS Journal on Computing , vol. 35, no. 2, pp. 509–517, 2023

  23. [23]

    Graph-based model- ing and decomposition of energy infrastructures,

    S. Shin, C. Coffrin, K. Sundar, and V . M. Zavala, “Graph-based model- ing and decomposition of energy infrastructures,” IF AC-PapersOnLine, vol. 54, no. 3, pp. 693–698, 2021

  24. [24]

    On the implementation of an interior- point filter line-search algorithm for large-scale nonlinear programming,

    A. W ¨achter and L. T. Biegler, “On the implementation of an interior- point filter line-search algorithm for large-scale nonlinear programming,” Mathematical Programming, vol. 106, pp. 25–57, 2006

  25. [25]

    Julia: A fresh approach to numerical computing,

    J. Bezanson, A. Edelman, S. Karpinski, and V . B. Shah, “Julia: A fresh approach to numerical computing,” SIAM Review, vol. 59, no. 1, pp. 65– 98, 2017

  26. [26]

    An Approximate Minimum Degree Ordering Algorithm,

    P. R. Amestoy, T. A. Davis, and I. S. Duff, “An Approximate Minimum Degree Ordering Algorithm,” SIAM Journal on Matrix Analysis and Applications, vol. 17, pp. 886–905, Oct. 1996

  27. [27]

    AMD.jl: A Julia interface to the AMD library of Amestoy, Davis and Duff,

    A. Montoison, D. Orban, A. S. Siqueira, and contributors, “AMD.jl: A Julia interface to the AMD library of Amestoy, Davis and Duff,” May 2020. 23rd Power Systems Computation Conference PSCC 2024 Paris, France — June 4 – 7, 2024

  28. [28]

    rosetta-opf

    “rosetta-opf.” https://github.com/lanl-ansi/rosetta-opf

  29. [29]

    PowerModels.jl: An open-source framework for exploring power flow formulations,

    C. Coffrin, R. Bent, K. Sundar, Y . Ng, and M. Lubin, “PowerModels.jl: An open-source framework for exploring power flow formulations,” in 2018 Power Systems Computation Conference (PSCC) , pp. 1–8, June 2018

  30. [30]

    Algorithm 887: CHOLMOD, supernodal sparse Cholesky factorization and up- date/downdate,

    Y . Chen, T. A. Davis, W. W. Hager, and S. Rajamanickam, “Algorithm 887: CHOLMOD, supernodal sparse Cholesky factorization and up- date/downdate,” ACM Transactions on Mathematical Software (TOMS) , vol. 35, no. 3, pp. 1–14, 2008

  31. [31]

    Theseus: A library for differentiable nonlinear optimization,

    L. Pineda, T. Fan, M. Monge, S. Venkataraman, P. Sodhi, R. T. Chen, J. Ortiz, D. DeTone, A. Wang, S. Anderson, et al. , “Theseus: A library for differentiable nonlinear optimization,” Advances in Neural Information Processing Systems , vol. 35, pp. 3801–3818, 2022. Government License: The submitted manuscript has been cre- ated by UChicago Argonne, LLC, O...