pith. sign in

arxiv: 2606.20950 · v1 · pith:NSOWVBLZnew · submitted 2026-06-18 · 💻 cs.AI · cs.SY· eess.SY

Power Systems Agent Benchmark: Executable Evaluation of AI Agents in Electric Power Engineering

Pith reviewed 2026-06-26 16:56 UTC · model grok-4.3

classification 💻 cs.AI cs.SYeess.SY
keywords power systemsAI agentsexecutable benchmarkdeterministic evaluationpower engineeringtask familiesfeasibility checkingconstraint validation
0
0 comments X

The pith

The paper introduces an executable benchmark where AI agents receive structured power engineering tasks and return solutions that are checked by deterministic code for feasibility and violations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents the Power Systems Agent Benchmark as a way to evaluate AI agents in electric power engineering through executable checks instead of text grading. An agent is given a structured task and must return a structured solution; a program then recomputes engineering quantities, verifies operational constraints, and outputs a feasibility flag, normalized score, and list of violations. The benchmark covers 41 task families across eight areas including power flow, protection, stability, microgrids, reliability, power quality, and forecasting, with each task drawn from citable sources or standards. Tasks are generated on demand from private seeds to prevent contamination while remaining inspectable. A reference evaluation with command-line agents shows performance differences and also serves as a check for defects in the tasks or evaluators themselves.

Core claim

The central discovery is that an executable benchmark consisting of 41 task families with deterministic evaluators can assess power-engineering agents by validating their structured outputs against engineering constraints, returning explicit feasibility, scores, and violations, while allowing future upgrades to simulator-backed checks without altering the task interface.

What carries the argument

The Power Systems Agent Benchmark, which pairs structured tasks with deterministic evaluators that recompute quantities and check constraints to produce feasibility flags, scores, and violations.

If this is right

  • Agents receive concrete scores based on whether their solutions satisfy engineering constraints rather than on the quality of their explanations.
  • The same task format can later support evaluator upgrades to full simulators without changing how agents are instructed or how solutions are submitted.
  • Unanimous failures across multiple agents can flag defects in individual tasks or evaluators for correction.
  • Held-out instances generated from private seeds allow measurement of generalization separate from public-split performance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could be adapted to create similar executable benchmarks in other engineering domains that rely on quantitative checks.
  • If the benchmark correlates with real-world performance, it could guide development of agents that integrate with domain-specific engineering software.
  • Public consistency between reference and held-out results suggests the generation method successfully resists contamination while remaining reproducible.

Load-bearing premise

The 41 task families and their deterministic evaluators are representative enough of real power engineering problems to act as valid proxies for feasibility.

What would settle it

An experiment in which agents that score highly on the benchmark perform poorly when applied to actual power system operations or when the same tasks are solved using full power-system simulators.

Figures

Figures reproduced from arXiv: 2606.20950 by Sergei Trashchenkov.

Figure 1
Figure 1. Figure 1: The benchmark contract. A task is posed to any command-line agent, which writes a [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
read the original abstract

Executable evaluation -- checking the consequences of an agent's actions with a program rather than grading its prose -- has become a prominent way to assess tool-using AI agents in software settings. Electric power engineering has not yet had an analogous benchmark: language-model use is still dominated by retrieval and text question answering, while agents acting on power-system artifacts remain mostly academic prototypes. We introduce the Power Systems Agent Benchmark, an executable benchmark for power-engineering agents. An agent receives a structured task and returns a structured solution; a deterministic evaluator recomputes the engineering quantities, checks operational constraints, and returns a feasibility flag, a normalized score, and explicit violations. The benchmark contains 41 task families across eight areas of power engineering, from power flow and protection to stability, microgrids, reliability, power quality, and forecasting. Each task is grounded in a citable source, standard, or documented engineering formulation. To resist contamination, held-out cases are synthesized on demand by per-family generators from private seeds: the construction is inspectable, but the instances remain private. In a reference evaluation with three command-line agents, the strongest score near the compact tier's ceiling, a smaller open model trails, and public and held-out performance are broadly consistent; a separate public-split grid with OpenCode and Aider probes harness effects. The reference evaluation doubles as quality control: unanimous failures flag candidate task or evaluator defects, and it exposed a latent evaluator bug missed by self-consistency checks. The evaluators are compact deterministic surrogates, but the task contract allows their internals to be upgraded to simulator-backed checks without changing how tasks are posed or solved.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper introduces the Power Systems Agent Benchmark, an executable benchmark for AI agents in electric power engineering. Agents receive structured tasks across 41 families in eight areas (power flow, protection, stability, etc.) and return structured solutions; deterministic evaluators recompute quantities, check constraints, and output feasibility flags, normalized scores, and violations. Tasks are grounded in citable sources/standards, with on-demand synthesis from private seeds for held-out instances to resist contamination. A reference evaluation with three command-line agents is reported, along with quality control via unanimous agent failures that exposed an evaluator bug; evaluators are described as compact deterministic surrogates that can later be upgraded to simulator-backed checks without altering the task interface.

Significance. If the benchmark's evaluators prove reliable, this work supplies a much-needed executable evaluation framework for a domain where AI use remains largely limited to retrieval and text QA. Credit is due for the contamination-resistant design (on-demand synthesis from private seeds), the explicit grounding in external standards, the quality-control mechanism that detected a latent bug, and the forward-compatible task contract that permits evaluator upgrades. These features position the artifact as a reusable, inspectable starting point rather than a one-off leaderboard.

major comments (1)
  1. [Abstract and Reference Evaluation description] The central claim that the benchmark supplies valid executable evaluation rests on the 41 task families' deterministic evaluators correctly recomputing quantities and enforcing constraints. However, the manuscript reports no direct comparison, error-bound analysis, or validation of these compact surrogates against established full simulators (e.g., pandapower or PSSE). Quality control via unanimous failures only flags gross bugs, not systematic approximation errors in the engineering formulations. This leaves the proxy-validity assumption untested and directly affects whether benchmark outputs constitute reliable engineering assessment.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive assessment of the benchmark's design features, including contamination resistance, grounding in standards, and the quality-control mechanism. We address the single major comment below.

read point-by-point responses
  1. Referee: [Abstract and Reference Evaluation description] The central claim that the benchmark supplies valid executable evaluation rests on the 41 task families' deterministic evaluators correctly recomputing quantities and enforcing constraints. However, the manuscript reports no direct comparison, error-bound analysis, or validation of these compact surrogates against established full simulators (e.g., pandapower or PSSE). Quality control via unanimous failures only flags gross bugs, not systematic approximation errors in the engineering formulations. This leaves the proxy-validity assumption untested and directly affects whether benchmark outputs constitute reliable engineering assessment.

    Authors: We agree that the manuscript provides no direct numerical comparison or error-bound analysis of the compact evaluators against full simulators such as pandapower or PSSE. Each evaluator implements a deterministic version of the engineering calculation drawn from the citable source or standard listed for its task family; the formulations are therefore transparent and inspectable rather than black-box approximations. Nevertheless, the absence of an empirical validation study against established simulators leaves the magnitude of any systematic discrepancy unquantified, which is a genuine limitation for claims of proxy validity. In the revised manuscript we will add an explicit Limitations subsection that states this gap, reiterates the forward-compatible task contract that permits future replacement by simulator-backed evaluators, and outlines a concrete plan for such validation on a representative subset of task families. revision: yes

Circularity Check

0 steps flagged

No circularity: benchmark is a new artifact grounded in external sources

full rationale

The paper introduces the Power Systems Agent Benchmark as a new executable evaluation artifact. Tasks are explicitly grounded in citable external standards or documented engineering formulations, with held-out cases generated on demand from private seeds. Deterministic evaluators are described as compact surrogates that can be upgraded without changing task contracts. No self-definitional loops, fitted inputs renamed as predictions, or load-bearing self-citations appear in the provided text. The central construction does not reduce to its own inputs by definition or by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The benchmark rests on standard engineering formulations from external sources and the assumption that deterministic evaluators can serve as valid surrogates; no free parameters, ad-hoc axioms, or invented entities are introduced.

axioms (1)
  • domain assumption Each task is grounded in a citable source, standard, or documented engineering formulation.
    Stated directly in the abstract as the grounding for all 41 task families.

pith-pipeline@v0.9.1-grok · 5824 in / 1223 out tokens · 24668 ms · 2026-06-26T16:56:09.531346+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

60 extracted references · 13 linked inside Pith

  1. [1]

    Large language models for power system applications: A comprehensive literature survey.arXiv preprint arXiv:2512.13004, 2025

    Muhammad Sarwar, Muhammad Rizwan, Mubushra Aziz, and Abdul Rehman Sudais. Large language models for power system applications: A comprehensive literature survey.arXiv preprint arXiv:2512.13004, 2025

  2. [2]

    Agentic AI systems in electrical power systems engineering: Current state-of-the-art and challenges.arXiv preprint arXiv:2511.14478, 2025

    Soham Ghosh and Gaurav Mittal. Agentic AI systems in electrical power systems engineering: Current state-of-the-art and challenges.arXiv preprint arXiv:2511.14478, 2025

  3. [3]

    Gridmind: LLMs-powered agents for power system analysis and operations.arXiv preprint arXiv:2509.02494, 2025

    Hongwei Jin, Kibaek Kim, and Jonghwan Kwon. Gridmind: LLMs-powered agents for power system analysis and operations.arXiv preprint arXiv:2509.02494, 2025

  4. [4]

    X-gridagent: An LLM-powered agentic AI system for assisting power grid analysis.arXiv preprint arXiv:2512.20789, 2025

    Yihan Wen and Xin Chen. X-gridagent: An LLM-powered agentic AI system for assisting power grid analysis.arXiv preprint arXiv:2512.20789, 2025

  5. [5]

    Judging LLM-as-a-judge with MT-bench and chatbot arena

    Lianmin Zheng et al. Judging LLM-as-a-judge with MT-bench and chatbot arena. InNeurIPS Datasets and Benchmarks, 2023. arXiv:2306.05685

  6. [6]

    Largelanguagemodelsarenotfairevaluators.arXiv preprint arXiv:2305.17926, 2023

    PeiyiWangetal. Largelanguagemodelsarenotfairevaluators.arXiv preprint arXiv:2305.17926, 2023

  7. [7]

    Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan

    Carlos E. Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan. SWE-bench: Can language models resolve real-world github issues? In ICLR, 2024. arXiv:2310.06770

  8. [8]

    Shunyu Yao, Noah Shinn, Pedram Razavi, and Karthik Narasimhan.τ-bench: A benchmark for tool-agent-user interaction in real-world domains.arXiv preprint arXiv:2406.12045, 2024

  9. [9]

    Terminal-bench: Benchmarking ai agents on realistic terminal tasks

    Laude Institute and Stanford University. Terminal-bench: Benchmarking ai agents on realistic terminal tasks. https://www.tbench.ai/, 2025

  10. [10]

    Evaluating large language models trained on code.arXiv preprint arXiv:2107.03374, 2021

    Mark Chen et al. Evaluating large language models trained on code.arXiv preprint arXiv:2107.03374, 2021

  11. [11]

    MLE-bench: Evaluating machine learning agents on machine learning engineering.arXiv preprint arXiv:2410.07095, 2024

    Jun Shern Chan et al. MLE-bench: Evaluating machine learning agents on machine learning engineering.arXiv preprint arXiv:2410.07095, 2024

  12. [12]

    Agentbench: Evaluating LLMs as agents

    Xiao Liu et al. Agentbench: Evaluating LLMs as agents. InICLR, 2024. arXiv:2308.03688

  13. [13]

    GAIA: A benchmark for general AI assistants.arXiv preprint arXiv:2311.12983, 2023

    Grégoire Mialon, Clémentine Fourrier, Craig Swift, Thomas Wolf, Yann LeCun, and Thomas Scialom. GAIA: A benchmark for general AI assistants.arXiv preprint arXiv:2311.12983, 2023

  14. [14]

    Webarena: A realistic web environment for building autonomous agents

    Shuyan Zhou et al. Webarena: A realistic web environment for building autonomous agents. In ICLR, 2024. arXiv:2307.13854. 14

  15. [15]

    Agent-as-a-judge: Evaluate agents with agents

    Mingchen Zhuge, Changsheng Zhao, Dylan Ashley, Wenyi Wang, Dmitrii Khizbullin, Yunyang Xiong, Zechun Liu, Ernie Chang, Raghuraman Krishnamoorthi, Yuandong Tian, Yangyang Shi, Vikas Chandra, and Jürgen Schmidhuber. Agent-as-a-judge: Evaluate agents with agents. arXiv preprint arXiv:2410.10934, 2024

  16. [16]

    When AIs judge AIs: The rise of agent-as-a-judge evaluation for LLMs.arXiv preprint arXiv:2508.02994, 2025

    Fangyi Yu. When AIs judge AIs: The rise of agent-as-a-judge evaluation for LLMs.arXiv preprint arXiv:2508.02994, 2025

  17. [17]

    PowerAgentBench: Standardized tasks, environments, and metrics for power-system agents

    PowerAgent community, Harvard SEAS. PowerAgentBench: Standardized tasks, environments, and metrics for power-system agents. GitHub repository, Power-Agent/PowerAgentBench, 2026. URL https://github.com/Power-Agent/PowerAgentBench. Benchmark component of the PowerAgent ecosystem (poweragent.seas.harvard.edu); no dedicated publication at the time of writing

  18. [18]

    Poweragent: A road map toward agentic intelligence in power systems: Foundation model, model context protocol, and workflow.IEEE Power & Energy Magazine, 23(5):93–101, 2025

    Qian Zhang and Le Xie. Poweragent: A road map toward agentic intelligence in power systems: Foundation model, model context protocol, and workflow.IEEE Power & Energy Magazine, 23(5):93–101, 2025

  19. [19]

    Elecbench: a power dispatch evaluation benchmark for large language models

    Xiyuan Zhou et al. Elecbench: a power dispatch evaluation benchmark for large language models. InIEEE PES General Meeting, 2025. arXiv:2407.05365; Best Paper

  20. [20]

    IEEE DataPort, 2025

    PFBench: Power-flow benchmark for LLM-based power system agent evaluation. IEEE DataPort, 2025. URL https://ieee-dataport.org/documents/power-flow-benchmark-llm-based- power-system-agent-evaluation-pfbench

  21. [21]

    Grid-mind: An LLM-orchestrated multi-fidelity agent for automated connection impact assessment.arXiv preprint arXiv:2602.20683, 2026

    Mohamed Shamseldein. Grid-mind: An LLM-orchestrated multi-fidelity agent for automated connection impact assessment.arXiv preprint arXiv:2602.20683, 2026

  22. [22]

    PFAgent: A tractable and self- evolving power-flow agent for interactive grid analysis.arXiv preprint arXiv:2604.10846, 2026

    Buxin She, Brian Chen, Luanzheng Guo, and Fangxing Li. PFAgent: A tractable and self- evolving power-flow agent for interactive grid analysis.arXiv preprint arXiv:2604.10846, 2026

  23. [23]

    Learning to run a power network challenge for training topology controllers.Electric Power Systems Research, 189, 2020

    Antoine Marot, Benjamin Donnot, Camilo Romero, Balthazar Donon, Marvin Lerousseau, Luca Veyrin-Forrer, and Isabelle Guyon. Learning to run a power network challenge for training topology controllers.Electric Power Systems Research, 189, 2020. arXiv:1912.04211

  24. [24]

    Learning to run a power network challenge: a retrospective analysis

    Antoine Marot, Benjamin Donnot, Gabriel Dulac-Arnold, Adrian Kelly, Aïdan O’Sullivan, Jan Viebahn, Mariette Awad, Isabelle Guyon, Patrick Panciatici, and Camilo Romero. Learning to run a power network challenge: a retrospective analysis. InNeurIPS 2020 Competition and Demonstration Track, PMLR v133, pages 112–132, 2021

  25. [25]

    pandapower — an open-source python tool for convenient modeling, analysis, and optimization of electric power systems.IEEE Transactions on Power Systems, 33(6):6510–6521, 2018

    Leon Thurner, Alexander Scheidler, Florian Schäfer, Jan-Hendrik Menke, Julian Dollichon, Friederike Meier, Steffen Meinecke, and Martin Braun. pandapower — an open-source python tool for convenient modeling, analysis, and optimization of electric power systems.IEEE Transactions on Power Systems, 33(6):6510–6521, 2018

  26. [26]

    Zimmerman, Carlos E

    Ray D. Zimmerman, Carlos E. Murillo-Sánchez, and Robert J. Thomas. MATPOWER: Steady- state operations, planning, and analysis tools for power systems research and education.IEEE Transactions on Power Systems, 26(1):12–19, 2011

  27. [27]

    Dugan and Thomas E

    Roger C. Dugan and Thomas E. McDermott. An open source platform for collaborating on smart grid research. InIEEE PES General Meeting, 2011. 15

  28. [28]

    Powermodels.jl: Anopen-sourceframeworkforexploringpowerflowformulations

    Carleton Coffrin, Russell Bent, Kaarthik Sundar, Yeesian Ng, and Miles Lubin. Powermodels.jl: Anopen-sourceframeworkforexploringpowerflowformulations. InPower Systems Computation Conference (PSCC), 2018

  29. [29]

    The power grid library for benchmarking AC optimal power flow algorithms.arXiv preprint arXiv:1908.02788, 2019

    Sogol Babaeinejadsarookolaee et al. The power grid library for benchmarking AC optimal power flow algorithms.arXiv preprint arXiv:1908.02788, 2019

  30. [30]

    Hybrid symbolic-numeric framework for power system modeling and analysis.IEEE Transactions on Power Systems, 36(2):1373–1384, 2021

    Hantao Cui, Fangxing Li, and Kevin Tomsovic. Hybrid symbolic-numeric framework for power system modeling and analysis.IEEE Transactions on Power Systems, 36(2):1373–1384, 2021

  31. [31]

    PyPSA: Python for power system analysis.Journal of Open Research Software, 6(1), 2018

    Tom Brown, Jonas Hörsch, and David Schlachtberger. PyPSA: Python for power system analysis.Journal of Open Research Software, 6(1), 2018

  32. [32]

    A survey on data contamination for large language models.arXiv preprint arXiv:2502.14425, 2025

    Yuxing Cheng, Yi Chang, and Yuan Wu. A survey on data contamination for large language models.arXiv preprint arXiv:2502.14425, 2025

  33. [33]

    Recent advances in large language model benchmarks against data contamination: From static to dynamic evaluation.arXiv preprint arXiv:2502.17521, 2025

    Simin Chen, Yiming Chen, Zexin Li, Yifan Jiang, Zhongwei Wan, Yixin He, Dezhi Ran, Tianle Gu, Haizhou Li, Tao Xie, and Baishakhi Ray. Recent advances in large language model benchmarks against data contamination: From static to dynamic evaluation.arXiv preprint arXiv:2502.17521, 2025

  34. [34]

    Siegel, Nitya Nadgir, and Arvind Narayanan

    Sayash Kapoor, Benedikt Stroebl, Zachary S. Siegel, Nitya Nadgir, and Arvind Narayanan. AI agents that matter.arXiv preprint arXiv:2407.01502, 2024

  35. [35]

    Le, Christopher Ré, and Azalia Mirhoseini

    Bradley Brown, Jordan Juravsky, Ryan Ehrlich, Ronald Clark, Quoc V. Le, Christopher Ré, and Azalia Mirhoseini. Large language monkeys: Scaling inference compute with repeated sampling.arXiv preprint arXiv:2407.21787, 2024

  36. [36]

    Graph computing based fast screening in contingency analysis.arXiv preprint arXiv:1904.00044, 2019

    Yiting Zhao, Chen Yuan, Sun Li, Guangyi Liu, Renchang Dai, and Zhiwei Wang. Graph computing based fast screening in contingency analysis.arXiv preprint arXiv:1904.00044, 2019

  37. [37]

    IEEE Std 738-2012 standard for calculating the current-temperature relationship of bare overhead conductors, 2012

  38. [38]

    systems — part 0: Calculation of currents, 2016

    IEC 60909-0:2016 short-circuit currents in three-phase a.c. systems — part 0: Calculation of currents, 2016

  39. [39]

    Anderson.Analysis of Faulted Power Systems

    Paul M. Anderson.Analysis of Faulted Power Systems. Wiley-IEEE Press, 1995. ISBN 978-0-7803-1145-9

  40. [40]

    Horowitz and Arun G

    Stanley H. Horowitz and Arun G. Phadke.Power System Relaying. Wiley, 4th edition, 2014. ISBN 978-1-118-66200-7

  41. [41]

    IEC 60364-5-52 low-voltage electrical installations — selection and erection of electrical equip- ment — wiring systems, 2009

  42. [42]

    McGraw-Hill, 1994

    Prabha Kundur.Power System Stability and Control. McGraw-Hill, 1994

  43. [43]

    IEEE Std 2800-2022 standard for interconnection and interoperability of inverter-based resources interconnecting with associated transmission electric power systems, 2022

  44. [44]

    ENTSO-E network code on requirements for grid connection of generators (RfG), 2016

  45. [45]

    Local control of reactive power by distributed photovoltaic generators.arXiv preprint arXiv:1006.0160, 2010

    Konstantin Turitsyn, Petr Šulc, Scott Backhaus, and Michael Chertkov. Local control of reactive power by distributed photovoltaic generators.arXiv preprint arXiv:1006.0160, 2010. 16

  46. [47]

    A two-stage service restoration method for electric power distribution systems.arXiv preprint arXiv:2004.07921, 2020

    Shiva Poudel and Anamika Dubey. A two-stage service restoration method for electric power distribution systems.arXiv preprint arXiv:2004.07921, 2020

  47. [48]

    IEEE Std 1366-2012 guide for electric power distribution reliability indices, 2012

  48. [49]

    EN 50160:2010 voltage characteristics of electricity supplied by public distribution networks, 2010

  49. [50]

    IEEE Std 519-2014 recommended practice and requirements for harmonic control in electric power systems, 2014

  50. [51]

    IEC 60076-7:2018 power transformers — part 7: Loading guide for mineral-oil-immersed power transformers, 2018

  51. [52]

    Marcel Dekker, 2004

    Ali Abur and Antonio Gómez Expósito.Power System State Estimation: Theory and Imple- mentation. Marcel Dekker, 2004

  52. [53]

    Yao Liu, Peng Ning, and Michael K. Reiter. False data injection attacks against state estimation in electric power grids.ACM Transactions on Information and System Security, 14(1), 2011. First presented at ACM CCS 2009

  53. [54]

    Harbor: A framework for running agent evaluations and rl environments

    Terminal-Bench Team. Harbor: A framework for running agent evaluations and rl environments. https://github.com/harbor-framework/harbor, 2026. Container-based agent-evaluation harness released with Terminal-Bench 2.0. Appendix A. Task Catalog The 41 families are listed below by domain area, each with its primary source or governing standard and the confide...

  54. [56]

    Short circuit and protection Family Source / standard Confidence three_phase_short_circuitIEC 60909 high earth_fault_calculation Anderson, Analysis of Faulted Power Systems medium breaker_relay_short_circuit IEC 60909; Glover, Overbye & Sarmahigh 17 Family Source / standard Confidence distance_protection_settingsHorowitz & Phadke, Power System Relaying me...

  55. [57]

    Stability, grid code, and inverter-based resources Family Source / standard Confidence critical_clearing_timeEqual-area criterion (Kundur) high transient_stability_predictionChen et al., transient-stability prediction medium frt_complianceIEEE 2800 / ENTSO-E RfG (FRT) medium ibr_short_circuit_frt IBR modeling for short circuit / FRTmedium min_synchronous_...

  56. [58]

    Distributed resources, PV, EV, and storage Family Source / standard Confidence pv_volt_varTuritsyn et al., local Volt-VAR control medium ev_v2g_outage_scheduleEVs for power quality & security medium ev_v2g_voltage_supportEVs for power quality & security medium bess_ancillary_responseGonzalez-Longatt & Rueda Torres medium commercial_pv_lcoe_uncertaintyPV s...

  57. [59]

    Microgrids and dispatch Family Source / standard Confidence microgrid_economic_dispatchEspaña et al., microgrid dispatch medium rolling_microgrid_dispatchEspaña et al., microgrid dispatch medium islanded_microgrid_pq_dispatchEspaña et al., microgrid dispatch medium dispatch_uncertaintyChung, advanced prediction for smart grids low hydro_thermal_storage_uc...

  58. [60]

    Reliability and restoration Family Source / standard Confidence flisr_restorationTwo-stage distribution service restoration medium fault_section_localizationBrown, faulted-circuit indicators medium fci_placementBrown, faulted-circuit indicators medium fci_saidi_caidiBrown, FCIs; IEEE 1366 indices medium operator_breaker_load_actions Glover, Overbye & Sarm...

  59. [61]

    Power quality, standards, assets, and cybersecurity Family Source / standard Confidence en50160_voltage_complianceEN 50160 high power_quality_event_classificationIEC 61000-4-30 high harmonic_ieee519_complianceIEEE 519 high transformer_thermal_loadingIEC 60076-7 high fdi_state_estimationAbur & Exposito, state estimation high protected_meter_placementGraphi...

  60. [62]

    Forecasting under uncertainty Family Source / standard Confidence wind_power_forecastSafari et al., short-term wind forecasting medium wind_prediction_intervalKhorramdel et al., fuzzy prediction intervals medium Appendix B. Experiment Artifacts Run configuration.Each agent was invoked once per case through its own command-line interface under a 600-second...