The Wisdom of the Crowd: High-Fidelity Classification of Cyber-Attacks and Faults in Power Systems Using Ensemble and Machine Learning
Pith reviewed 2026-05-21 20:09 UTC · model grok-4.3
The pith
Offline accuracy is an unreliable indicator of field readiness for machine learning classification of cyber-attacks and faults in power systems.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that offline accuracy alone is an unreliable indicator of field readiness for ML-based classification of cyber-attacks and faults in power systems. Twelve models including ensemble algorithms and a multi-layer perceptron were trained on labeled time-domain measurements from electromagnetic transient simulations with digital substation emulation at 4.8 kHz and then evaluated in a real-time streaming environment that uses a cycle-length smoothing filter and confidence threshold to support sub-cycle responsiveness. While several models achieved up to 99.9 percent offline accuracy, only the multi-layer perceptron sustained 98 to 99 percent coverage under streaming conditions, as
What carries the argument
The real-time streaming evaluation pipeline that applies a cycle-length smoothing filter and confidence threshold to stabilize classification decisions on high-frequency time-domain measurements.
If this is right
- Only the multilayer perceptron maintains robust coverage under streaming conditions while ensembles abstain from many decisions.
- High offline accuracy does not guarantee dependable performance when data arrives continuously in live operation.
- Realistic streaming test pipelines with decision stabilization are required to assess true readiness for deployment.
- Reliable classification supports safer monitoring in networks with high shares of inverter-based resources.
Where Pith is reading between the lines
- Applying the same streaming evaluation to data collected from operating substations would test whether simulation results hold in practice.
- The framework could be adapted to evaluate machine learning classifiers in other critical infrastructure domains that require low-latency decisions.
- Tuning the confidence threshold and filter length might shift the observed trade-off between precision and coverage across different grid conditions.
Load-bearing premise
The electromagnetic transient simulations with digital substation emulation at 4.8 kHz, combined with the cycle-length smoothing filter and confidence threshold, accurately represent real-world conditions and decision-making requirements in IBR-rich networks.
What would settle it
Direct comparison of the same models on actual recorded field data from a power system experiencing documented cyber-attacks and faults to check whether streaming coverage and abstention rates match the simulation outcomes.
Figures
read the original abstract
This paper presents a high-fidelity evaluation framework for machine learning (ML)-based classification of cyber-attacks and physical faults using electromagnetic transient simulations with digital substation emulation at 4.8 kHz. Twelve ML models, including ensemble algorithms and a multi-layer perceptron (MLP), were trained on labeled time-domain measurements and evaluated in a real-time streaming environment designed for sub-cycle responsiveness. The architecture incorporates a cycle-length smoothing filter and confidence threshold to stabilize decisions. Results show that while several models achieved near-perfect offline accuracies (up to 99.9%), only the MLP sustained robust coverage (98-99%) under streaming, whereas ensembles preserved perfect anomaly precision but abstained frequently (10-49% coverage). These findings demonstrate that offline accuracy alone is an unreliable indicator of field readiness and underscore the need for realistic testing and inference pipelines to ensure dependable classification in inverter-based resources (IBR)-rich networks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a high-fidelity evaluation framework for ML-based classification of cyber-attacks and physical faults in power systems. It uses electromagnetic transient simulations with digital substation emulation at 4.8 kHz to train and test twelve models (ensembles and MLP) on labeled time-domain measurements. A cycle-length smoothing filter and confidence threshold are incorporated for streaming decisions. While offline accuracies reach 99.9%, streaming results show MLP coverage at 98-99% versus 10-49% for ensembles (with perfect anomaly precision but frequent abstention), leading to the claim that offline accuracy alone is an unreliable indicator of field readiness for IBR-rich networks.
Significance. If the streaming evaluation serves as a credible proxy for operational conditions, the work provides concrete evidence that offline metrics can overstate readiness for real-time protection tasks in power systems with high inverter-based resource penetration. The empirical comparison across model families and the emphasis on sub-cycle responsiveness address a practical gap in deploying ML for cyber-physical security. The simulation-based approach with explicit filtering and thresholding is a strength, though its generalizability depends on unstated validation steps.
major comments (2)
- [Abstract / Evaluation Framework] Abstract and evaluation framework description: The central claim that offline accuracy is unreliable for field readiness rests on the streaming results (near-perfect offline vs. 10-49% ensemble coverage and 98-99% MLP coverage). This divergence only supports the conclusion if the 4.8 kHz EM transient simulations with digital substation emulation, cycle-length smoothing filter, and confidence threshold accurately represent real-world IBR-rich network dynamics, noise, and timing. No cross-validation against field recordings or hardware-in-the-loop tests is reported, leaving the observed abstention behavior potentially as a simulation artifact rather than a general property of the classifiers.
- [Results] Results section (streaming evaluation): The reported coverage and precision figures for ensembles versus MLP are presented without accompanying statistical analysis (e.g., confidence intervals on the 10-49% range or sensitivity to the confidence threshold hyperparameter). This weakens the load-bearing claim that the gap demonstrates unreliability of offline metrics, as the differences could be sensitive to the specific threshold or filter parameters listed as free in the axiom ledger.
minor comments (2)
- [Abstract] The abstract mentions 'twelve ML models' but does not list them explicitly; a table enumerating the ensemble variants and MLP architecture would improve reproducibility.
- [Methods] Notation for the smoothing filter and confidence threshold should be defined with equations in the methods section to clarify how decisions are stabilized in the streaming pipeline.
Simulated Author's Rebuttal
We thank the referee for their constructive comments on our manuscript. We address each of the major comments in detail below, indicating the revisions we plan to make.
read point-by-point responses
-
Referee: [Abstract / Evaluation Framework] Abstract and evaluation framework description: The central claim that offline accuracy is unreliable for field readiness rests on the streaming results (near-perfect offline vs. 10-49% ensemble coverage and 98-99% MLP coverage). This divergence only supports the conclusion if the 4.8 kHz EM transient simulations with digital substation emulation, cycle-length smoothing filter, and confidence threshold accurately represent real-world IBR-rich network dynamics, noise, and timing. No cross-validation against field recordings or hardware-in-the-loop tests is reported, leaving the observed abstention behavior potentially as a simulation artifact rather than a general property of the classifiers.
Authors: We agree that the absence of direct validation against field recordings or hardware-in-the-loop experiments is a limitation of the current study. Our work employs high-fidelity electromagnetic transient simulations with digital substation emulation at 4.8 kHz to provide a controlled and reproducible environment that captures essential transient behaviors in IBR-rich networks. This approach allows us to isolate the effects of streaming inference and thresholding. In the revised manuscript, we will add an explicit discussion of this limitation in the evaluation framework section and include a subsection on future work that outlines steps toward real-world validation. We maintain that the simulation framework offers meaningful insights into the discrepancy between offline and streaming performance, but we will clarify that these results should be interpreted within the context of the simulated environment. revision: partial
-
Referee: [Results] Results section (streaming evaluation): The reported coverage and precision figures for ensembles versus MLP are presented without accompanying statistical analysis (e.g., confidence intervals on the 10-49% range or sensitivity to the confidence threshold hyperparameter). This weakens the load-bearing claim that the gap demonstrates unreliability of offline metrics, as the differences could be sensitive to the specific threshold or filter parameters listed as free in the axiom ledger.
Authors: We appreciate this observation. To strengthen the statistical rigor of our results, we will incorporate confidence intervals for the reported coverage and precision metrics in the streaming evaluation. Additionally, we will conduct and report a sensitivity analysis with respect to the confidence threshold and the cycle-length smoothing filter parameters. This will demonstrate that the observed performance gap between the MLP and ensemble models is robust across reasonable variations in these hyperparameters. The revised results section will include these analyses to better support our conclusions. revision: yes
Circularity Check
No circularity: purely empirical ML evaluation on simulated data
full rationale
The manuscript reports training twelve ML models (ensembles and MLP) on labeled time-domain measurements from 4.8 kHz EM transient simulations with digital substation emulation, then compares offline accuracy (up to 99.9%) against streaming performance under a cycle-length smoothing filter and confidence threshold. No equations, derivations, uniqueness theorems, or parameter-fitting steps are described. All claims rest on observed coverage and precision numbers from the simulation runs. No self-citations appear in the provided text, and no result is obtained by renaming a fitted quantity or by construction from the inputs. The evaluation is therefore self-contained against its own simulation benchmark; the reader's noted score of 1.0 is consistent with a minor or zero circularity finding.
Axiom & Free-Parameter Ledger
free parameters (2)
- confidence threshold
- ML model hyperparameters
axioms (1)
- domain assumption The 4.8 kHz electromagnetic transient simulation with digital substation emulation faithfully reproduces real power system dynamics and attack/fault signatures.
Reference graph
Works this paper leans on
-
[1]
Cyber-physical system security for the electric power grid,
S. Sridhar, A. Hahn, and M. Govindarasu, “Cyber-physical system security for the electric power grid,”Proc. IEEE, vol. 100, no. 1, pp. 210–224, 2012
work page 2012
-
[2]
Trends in smart grid cyber-physical security: Components, threats, and solutions,
D. M. Manias, A. M. Saber, M. I. Radaideh, A. T. Gaber, M. Maniatakos, H. Zeineldin, D. Svetinovic, and E. F. El-Saadany, “Trends in smart grid cyber-physical security: Components, threats, and solutions,”IEEE Access, vol. 12, pp. 161 329–161 356, 2024
work page 2024
-
[3]
Cyber physical attack- resilient wide-area monitoring, protection, and control for the power grid,
A. Ashok, M. Govindarasu, and J. Wang, “Cyber physical attack- resilient wide-area monitoring, protection, and control for the power grid,”Proceedings of the IEEE, vol. 105, no. 7, pp. 1389–1407, 2017
work page 2017
-
[4]
Scpse: Security-oriented cyber-physical state estimation for power grid critical infrastructures,
S. Zonouz, K. M. Rogers, R. Berthier, R. B. Bobba, W. H. Sanders, and T. J. Overbye, “Scpse: Security-oriented cyber-physical state estimation for power grid critical infrastructures,”IEEE Transactions on Smart Grid, vol. 3, no. 4, pp. 1790–1799, 2012
work page 2012
-
[5]
M. J. Reno, S. Brahma, A. Bidram, and M. E. Ropp, “Influence of inverter-based resources on microgrid protection: Part 1: Microgrids in radial distribution systems,”IEEE Power and Energy Magazine, vol. 19, no. 3, pp. 36–46, 2021
work page 2021
-
[6]
Challenges in microgrid protection,
——, “Challenges in microgrid protection,”IEEE Power and Energy Magazine, vol. 19, no. 2, pp. 34–43, 2021
work page 2021
-
[7]
Machine learning– based intrusion detection for smart grid computing: A survey,
N. Sahani, R. Zhu, J.-H. Cho, and C.-C. Liu, “Machine learning– based intrusion detection for smart grid computing: A survey,”ACM Transactions on Cyber-Physical Systems, vol. 7, no. 2, 2023
work page 2023
-
[8]
Y . Liet al., “Identification and classification for multiple cyber attacks in power grids based on deep capsule convolutional neural network,” Eng. Appl. Artif. Intell., vol. 124, p. 106572, 2023
work page 2023
-
[9]
Ensemble learning methods for anomaly intrusion detection system in smart grid,
M. Alkasassbeh, M. Alauthman, and M. Alweshah, “Ensemble learning methods for anomaly intrusion detection system in smart grid,” inProc. IEEE Jordan Int. Joint Conf. Elect. Eng. Inf. Technol.IEEE, 2021, pp. 1–6
work page 2021
-
[10]
Ensemble voting-based anomaly detection for a smart grid communication infrastructure,
A. Al-Abassiet al., “Ensemble voting-based anomaly detection for a smart grid communication infrastructure,”Intell. Autom. Soft Comput., vol. 36, no. 3, pp. 3257–3278, 2023
work page 2023
-
[11]
A real-time deep learning- based fault diagnosis framework in power distribution system with pvs,
Z. Chen, S. Cai, and A. P. S. Meliopoulos, “A real-time deep learning- based fault diagnosis framework in power distribution system with pvs,” pp. 1–5, Feb. 2024
work page 2024
-
[12]
Anomaly identification in power systems using dynamic state estimation and deep learning,
F. Alsaeed, E. Abukhousa, S. S. F. S. Afroz, A. Qwbaiban, and A. Sakis Meliopoulos, “Anomaly identification in power systems using dynamic state estimation and deep learning,” in2025 IEEE International Conference on Cyber Security and Resilience (CSR), 2025, pp. 530–536
work page 2025
-
[13]
Machine learning for power system disturbance and cyber- attack discrimination,
R. C. Borges Hink, J. M. Beaver, M. A. Buckner, T. Morris, U. Adhikari, and S. Pan, “Machine learning for power system disturbance and cyber- attack discrimination,” in2014 7th International Symposium on Resilient Control Systems (ISRCS), 2014, pp. 1–8
work page 2014
-
[14]
Active distribution system co- ordinated control method via artificial intelligence,
A. Gholami and A. H. Aligholian, “Active distribution system co- ordinated control method via artificial intelligence,”arXiv preprint arXiv:2207.14642, 2022
-
[15]
Rte dse protection demonstration,
A. P. S. Meliopoulos, G. Cokkinides, J. Xie, and Y . Kong, “Rte dse protection demonstration,” Power Systems Engineering Research Center (PSERC), Tempe, AZ, Tech. Rep. PSERC Publication 18-09, Final Project Report T-59G, September 2018, includes modeling and event simulation using WinIGS- T. [Online]. Available: https://documents.pserc.wisc.edu/documents/...
work page 2018
-
[16]
T4tech: Open dataset and code for power system fault and cyber-attack classification,
E. Abukhousa, “T4tech: Open dataset and code for power system fault and cyber-attack classification,” https://github.com/Emadeddin/T4Tech, 2025, accessed: August 28, 2025
work page 2025
-
[17]
S. Meliopoulos, G. J. Cokkinides, P. Myrda, E. Farantatos, R. Elmoudi, B. Fardanesh, G. Stefopoulos, C. Black, and P. Panciatici, “Dynamic estimation-based protection and hidden failure detection and identifica- tion: Inverter-dominated power systems,”IEEE Power & Energy Mag., vol. 21, no. 1, pp. 59–71, 2023
work page 2023
-
[18]
E. A. Abukhousa, S. S. F. Syed Afroz, F. Alsaeed, A. Qwbaiban, and A. S. Meliopoulos, “Centralized dynamic state estimation algorithm for detecting and distinguishing faults and cyber attacks in power systems,” arXiv preprint arXiv:2508.02102, 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.