Recognition: unknown
Reinforcement Learning for Public Safety Power Shutoffs Under Decision-Dependent Uncertainty and Nonlinear Wildfire Ignition Models
Pith reviewed 2026-05-07 14:54 UTC · model grok-4.3
The pith
Reinforcement learning optimizes public safety power shutoffs by training directly on flexible wildfire ignition simulators.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that a Proximal Policy Optimization reinforcement learning framework can learn distribution system topology adjustments by direct interaction with a simulator of decision-dependent wildfire ignition, without the restrictive structural assumptions required by mixed-integer programming formulations, and that this yields lower operational costs on 54-bus and 138-bus test systems while scaling computationally.
What carries the argument
A Proximal Policy Optimization agent that repeatedly selects topology changes and receives rewards based on simulated wildfire ignition outcomes and community costs, trained against any chosen nonlinear line-failure probability model.
If this is right
- Utilities could adopt more realistic nonlinear ignition models when planning shutoffs instead of being limited to tractable simplifications.
- Operational costs from de-energizing lines can be lowered while still controlling ignition risk on the tested network sizes.
- Computation time remains manageable as network size increases, supporting use on larger real-world distribution systems.
- Policies can be trained offline in simulation before any live deployment.
Where Pith is reading between the lines
- Better wildfire simulators could become a shared resource for training such agents across utilities.
- The same reinforcement learning setup might extend to other grid decisions where failures depend on prior actions, such as maintenance scheduling.
- Real deployment would still need separate validation that simulator-trained policies remain safe under unmodeled conditions like weather extremes.
Load-bearing premise
The simulator must faithfully reproduce how real-world decisions change the probabilities of power lines igniting wildfires.
What would settle it
Deploying the learned policies on an actual distribution system and observing either substantially higher wildfire ignition rates or higher total costs than the simulator predicted would disprove the practical value of the approach.
Figures
read the original abstract
Power grid infrastructure is an increasingly significant source of wildfire ignitions and poses severe risks to communities in fire-prone regions. Public Safety Power Shutoffs (PSPS) have emerged as a critical operational tool for utilities to mitigate this risk by proactively de-energizing portions of the grid under high-threat conditions. These shutoffs, however, impose costs on affected communities, and it is therefore essential that PSPS decisions be informed by realistic models of wildfire ignition risk. Current Mixed Integer Programming based methods require restrictive structural assumptions about the probability models for line failures caused by power line ignitions. While these simplifications yield tractable solutions, the resulting models may differ significantly from the true underlying dynamics. In this paper, we propose a reinforcement learning framework based on Proximal Policy Optimization that learns to adjust the topology of a distribution system by interacting directly with a simulator that accommodates any line failure probability model without imposing such restrictions. We test our methodology on 54-bus and 138-bus distribution systems and demonstrate its ability to lower operational costs compared to existing methods while allowing only marginally increased compute times as network size grows.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a Proximal Policy Optimization (PPO) reinforcement learning framework for Public Safety Power Shutoffs (PSPS) that interacts directly with a black-box simulator to optimize distribution system topology under arbitrary (including nonlinear) decision-dependent line failure probability models for wildfire ignition. Unlike Mixed Integer Programming (MIP) approaches that require restrictive structural assumptions on the probability models, the RL method is tested on 54-bus and 138-bus systems and claims lower operational costs with only marginal growth in compute time as network size increases.
Significance. If the experimental claims hold under rigorous validation, the work would be significant for enabling PSPS decisions with more realistic nonlinear ignition models that MIP methods cannot tractably handle. The core strength is the simulator-interaction approach that avoids self-referential parameter restrictions and allows general failure models. However, the current lack of experimental details, real-data calibration, and robustness checks substantially reduces the assessed significance.
major comments (3)
- [Abstract] Abstract: the central performance claims of cost reductions on the 54-bus and 138-bus systems are presented without any description of the experimental setup, baselines employed, number of runs, error bars, or statistical significance testing; this information is load-bearing for evaluating whether the RL policy actually outperforms MIP under the claimed general (nonlinear) models.
- [Results] Results section: comparisons to existing MIP methods are reported, but it is not specified whether these baselines were run under the same nonlinear wildfire ignition models or only under the restrictive (linear/convex) models solvable by MIP; without this distinction the superiority claim under arbitrary models cannot be assessed.
- [Simulator and experimental design] Simulator and experimental design: the framework's ability to accommodate any failure model rests on direct simulator interaction, yet no calibration to historical ignition data, sensitivity analysis to model misspecification, or out-of-distribution generalization tests for the learned PPO policy are provided; these omissions directly undermine the practical transferability asserted in the abstract.
minor comments (1)
- [Abstract] Abstract: the phrase 'marginally increased compute times as network size grows' is stated without quantitative scaling data or explicit comparison tables, reducing clarity on the computational advantage.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. The comments identify important opportunities to strengthen the presentation of experimental results and clarify the scope of our claims. We address each major comment below, indicating where revisions will be made to the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central performance claims of cost reductions on the 54-bus and 138-bus systems are presented without any description of the experimental setup, baselines employed, number of runs, error bars, or statistical significance testing; this information is load-bearing for evaluating whether the RL policy actually outperforms MIP under the claimed general (nonlinear) models.
Authors: We agree that the abstract would benefit from additional context on the experimental protocol. In the revised version we will expand the abstract to note that MIP baselines are applied only under model assumptions they can accommodate, that results are averaged over multiple random seeds with reported variability, and that consistent cost reductions are observed. Full details on the 54-bus and 138-bus test cases, number of runs, and statistical procedures remain in Section 4. revision: yes
-
Referee: [Results] Results section: comparisons to existing MIP methods are reported, but it is not specified whether these baselines were run under the same nonlinear wildfire ignition models or only under the restrictive (linear/convex) models solvable by MIP; without this distinction the superiority claim under arbitrary models cannot be assessed.
Authors: The referee correctly notes the distinction. MIP formulations are tractable only for linear or convex ignition probability models; all reported MIP comparisons therefore use instances satisfying those assumptions. For general nonlinear models, MIP is intractable by construction, which is precisely the setting our simulator-based RL approach targets. We will revise the Results section to state this separation explicitly, add a clarifying paragraph, and include a supplementary table that reports RL performance on both linear and nonlinear instances while noting MIP intractability on the latter. revision: yes
-
Referee: [Simulator and experimental design] Simulator and experimental design: the framework's ability to accommodate any failure model rests on direct simulator interaction, yet no calibration to historical ignition data, sensitivity analysis to model misspecification, or out-of-distribution generalization tests for the learned PPO policy are provided; these omissions directly undermine the practical transferability asserted in the abstract.
Authors: We acknowledge that the current experiments employ synthetic nonlinear ignition models chosen to demonstrate generality rather than real-data calibration. In revision we will add a sensitivity analysis varying ignition probability parameters and will include a discussion of how the black-box simulator can be calibrated with historical records (e.g., from CAL FIRE). Comprehensive out-of-distribution tests on held-out real ignition events would require additional curated datasets beyond the scope of this methodological study and are noted as future work; we will make this limitation explicit. revision: partial
Circularity Check
No significant circularity; derivation relies on standard RL interaction with external simulator
full rationale
The paper's core claim is a PPO-based RL framework that learns topology adjustments via direct interaction with a black-box simulator accommodating arbitrary (including nonlinear) failure probability models. This structure is self-contained: the policy is trained against an independent simulator whose dynamics are not defined by the method itself, and performance is evaluated empirically on 54-bus and 138-bus test cases against MIP baselines. No equations or steps reduce by construction to fitted parameters, self-citations, or ansatzes imported from the authors' prior work. The derivation chain (state-action formulation, PPO updates, simulator rollouts) follows standard RL practice without renaming known results or smuggling assumptions via self-reference. Minor self-citation risk is absent from the provided text.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption A simulator exists that can accurately model any line failure probability without structural restrictions.
Reference graph
Works this paper leans on
-
[1]
Wildfire and wildfire safety,
California Public Utilities Commission, “Wildfire and wildfire safety,” https://www.cpuc.ca.gov/industries-and-topics/wildfires, 2026, accessed: 2026-04-01
2026
-
[2]
Lahaina origin and cause report (fi23-0012446),
Fire and Public Safety Department, “Lahaina origin and cause report (fi23-0012446),” Maui County, Tech. Rep., April 2024. [Online]. Available: https://www.mauicounty.gov/DocumentCenter/ View/149693/FI23-0012446-Lahaina-Origin-and-Cause-Report Plus-Appendix-A-B-C-Redacted
2024
-
[3]
2024 report: A report to the house of representatives, 89th texas legislature,
Investigative Committee on the Panhandle Wildfires, “2024 report: A report to the house of representatives, 89th texas legislature,” Texas House of Representatives, Tech. Rep., May 1 2024. [Online]. Available: https://www.house.texas.gov/pdfs/committees/reports/interim/88interim/ House-Interim-Committee-on-The-Panhandle-Wildfires-Report.pdf
2024
-
[4]
The camp fire public report: A summary of the camp fire investigation,
Butte County District Attorney, “The camp fire public report: A summary of the camp fire investigation,” Office of the District Attorney, Butte County, Public Report, June 16 2020. [Online]. Available: https://www.buttecounty.net/DocumentCenter/View/1881/ Camp-Fire-Public-Report---Summary-of-the-Camp-Fire-Investigation-PDF
2020
-
[5]
Small vulnerable sets determine large network cascades in power grids,
Y . Yang, T. Nishikawa, and A. E. Motter, “Small vulnerable sets determine large network cascades in power grids,”Science, vol. 358, no. 6365, p. eaan3184, 2017
2017
-
[6]
Public safety power shutoffs (PSPS),
California Public Utilities Commission, “Public safety power shutoffs (PSPS),” https://www.cpuc.ca.gov/psps/, 2026, accessed: 2026-04-01
2026
-
[7]
California power shutoffs: Deficiencies in data and reporting,
M. Sotolongo, C. Bolon, and S. H. Baker, “California power shutoffs: Deficiencies in data and reporting,”Initiative for energy justice, 2020
2020
-
[8]
Optimal distribution system operation for enhancing resilience against wildfires,
D. N. Trakas and N. D. Hatziargyriou, “Optimal distribution system operation for enhancing resilience against wildfires,”IEEE Transactions on Power Systems, vol. 33, no. 2, pp. 2260–2271, 2017
2017
-
[9]
Resilient by design: Preventing wildfires and blackouts with micro- grids,
W. Yang, S. N. Sparrow, M. Ashtine, D. C. Wallom, and T. Morstyn, “Resilient by design: Preventing wildfires and blackouts with micro- grids,”Applied Energy, vol. 313, p. 118793, 2022
2022
-
[10]
Balancing wildfire risk and power outages through optimized power shut-offs,
N. Rhodes, L. Ntaimo, and L. Roald, “Balancing wildfire risk and power outages through optimized power shut-offs,”IEEE Transactions on Power Systems, vol. 36, no. 4, pp. 3118–3128, 2020
2020
-
[11]
Quasi second- order stochastic dominance model for balancing wildfire risks and power outages due to proactive public safety de-energizations,
J. Su, S. Mehrani, P. Dehghanian, and M. A. Lejeune, “Quasi second- order stochastic dominance model for balancing wildfire risks and power outages due to proactive public safety de-energizations,”IEEE Transactions on Power Systems, vol. 39, no. 2, pp. 2528–2542, 2023
2023
-
[12]
Decision-dependent uncertainty-aware distribution system planning un- der wildfire risk,
F. Pianc ´o, A. Moreira, B. Fanzeres, R. Jiang, C. Zhao, and M. Heleno, “Decision-dependent uncertainty-aware distribution system planning un- der wildfire risk,”IEEE Transactions on Power Systems, 2025
2025
-
[13]
Tree-related high-impedance fault in distribution systems: modeling, detection, and ignition risk assessment,
C. Yang, W. Zhang, R. Tang, and X. Xiao, “Tree-related high-impedance fault in distribution systems: modeling, detection, and ignition risk assessment,”Energies, vol. 18, no. 3, p. 548, 2025
2025
-
[14]
Distribution system operation amidst wildfire-prone climate conditions under decision-dependent line availability uncertainty,
A. Moreira, F. Pianc ´o, B. Fanzeres, A. Street, R. Jiang, C. Zhao, and M. Heleno, “Distribution system operation amidst wildfire-prone climate conditions under decision-dependent line availability uncertainty,”IEEE Transactions on Power Systems, vol. 39, no. 5, pp. 6522–6538, 2024
2024
-
[15]
Power distribution systems under wildfire risks: Chance-constrained model with decision-dependent probabilities,
S. Zhang, M. Lejeune, and P. Dehghanian, “Power distribution systems under wildfire risks: Chance-constrained model with decision-dependent probabilities,”Available at SSRN 5508618, 2025
2025
-
[16]
Characteriz- ing probability of wildfire ignition caused by power distribution lines,
J. W. Muhs, M. Parvania, H. T. Nguyen, and J. A. Palmer, “Characteriz- ing probability of wildfire ignition caused by power distribution lines,” IEEE Transactions on Power Delivery, vol. 36, no. 6, pp. 3681–3688, 2021
2021
-
[17]
Fire risk mitigation in the overhead electricity distribution network,
M. Van Der Linde, “Fire risk mitigation in the overhead electricity distribution network,” in2019 29th Australasian Universities Power Engineering Conference (AUPEC). IEEE, 2019, pp. 1–6
2019
-
[18]
Proximal Policy Optimization Algorithms
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” 2017. [Online]. Available: https://arxiv.org/abs/1707.06347
work page internal anchor Pith review arXiv 2017
-
[19]
Deep reinforcement learning in power systems resilience: A review,
D. Cao, Y . Liu, Y . Wang, Q. Zhang, and W. Hu, “Deep reinforcement learning in power systems resilience: A review,”IEEE Transactions on Reliability, 2025
2025
-
[20]
Intelligent hur- ricane resilience enhancement of power distribution systems via deep reinforcement learning,
N. L. Dehghani, A. B. Jeddi, and A. Shafieezadeh, “Intelligent hur- ricane resilience enhancement of power distribution systems via deep reinforcement learning,”Applied energy, vol. 285, p. 116355, 2021
2021
-
[21]
Resilient operation of distribution grids using deep reinforcement learning,
M. M. Hosseini and M. Parvania, “Resilient operation of distribution grids using deep reinforcement learning,”IEEE Transactions on Indus- trial Informatics, vol. 18, no. 3, pp. 2100–2109, 2021
2021
-
[22]
Reinforcement-learning-based proactive con- trol for enabling power grid resilience to wildfire,
S. U. Kadir, S. Majumder, A. K. Srivastava, A. D. Chhokra, H. Neema, A. Dubey, and A. Laszka, “Reinforcement-learning-based proactive con- trol for enabling power grid resilience to wildfire,”IEEE Transactions on Industrial Informatics, vol. 20, no. 1, pp. 795–805, 2024
2024
-
[23]
Deep re- inforcement learning in large discrete action spaces,
G. Dulac-Arnold, R. Evans, H. van Hasselt, P. Sunehag, T. Lillicrap, J. Hunt, T. Mann, T. Weber, T. Degris, and B. Coppin, “Deep re- inforcement learning in large discrete action spaces,”arXiv preprint arXiv:1512.07679, 2015
-
[24]
Learning values across many orders of magnitude,
H. van Hasselt, A. Guez, M. Hessel, V . Mnih, and D. Silver, “Learning values across many orders of magnitude,” inProceedings of the 30th International Conference on Neural Information Processing Systems, ser. NIPS’16. Red Hook, NY , USA: Curran Associates Inc., 2016, p. 4294–4302
2016
-
[25]
Security-constrained design of isolated multi-energy microgrids,
S. Mashayekh, M. Stadler, G. Cardoso, M. Heleno, S. Chalil Madathil, H. Nagarajan, R. Bent, M. Mueller-Stoffels, X. Lu, and J. Wang, “Security-constrained design of isolated multi-energy microgrids,”IEEE Transactions on Power Systems, vol. PP, pp. 1–1, 08 2017
2017
-
[26]
High-Dimensional Continuous Control Using Generalized Advantage Estimation
J. Schulman, P. Moritz, S. Levine, M. I. Jordan, and P. Abbeel, “High-dimensional continuous control using generalized advantage estimation,”CoRR, vol. abs/1506.02438, 2015. [Online]. Available: https://api.semanticscholar.org/CorpusID:3075448
work page internal anchor Pith review arXiv 2015
-
[27]
Distribution system operation amidst wildfire- prone climate conditions under decision-dependent line availability uncertainty - dataset,
A. Moreira, F. Pianc ´o, B. F. dos Santos, A. Street, R. Jiang, C. Zhao, and M. Heleno, “Distribution system operation amidst wildfire- prone climate conditions under decision-dependent line availability uncertainty - dataset,” 2023. [Online]. Available: https://dx.doi.org/10. 21227/318q-5k50
2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.