arxiv: 2605.10822 · v1 · submitted 2026-05-11 · 💻 cs.LG · eess.SP

Recognition: no theorem link

Benchmarking Sensor-Fault Robustness in Forecasting

Alexander Windmann, Gianluca Manca, Jens U. Brandt, Marcel Dix, Oliver Niggemann, Philipp Wittenberg

Pith reviewed 2026-05-12 05:21 UTC · model grok-4.3

classification 💻 cs.LG eess.SP

keywords sensor fault robustnesstime series forecastingbenchmarkingcyber-physical systemsmodel evaluationfault toleranceCPS forecasting

0 comments

The pith

Forecasting models chosen by clean MSE often degrade most under sensor faults, reversing rankings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard forecasting evaluation selects models by accuracy on clean sensor data, yet real cyber-physical systems regularly encounter noisy, biased, missing, or misaligned readings. This paper introduces SensorFault-Bench, a standardized protocol with eight fault scenarios, a severity model, and a disjoint fault-transfer split to measure both clean MSE and worst-scenario fault-time error across four real-world datasets. It establishes that architectures favored by clean MSE can degrade sharply under faults and that clean-MSE rankings frequently disagree with fault-time rankings. The protocol also shows that certain robustness methods, such as adversarial training, reduce degradation selectively depending on whether value or availability faults dominate. A sympathetic reader cares because deploying models that perform well only on perfect data risks costly failures once sensors deviate from nominal behavior.

Core claim

The paper claims that forecasting architectures favored by clean mean squared error can degrade sharply under faults, and clean-MSE rankings can disagree with worst-scenario fault-time error rankings. It introduces SensorFault-Bench as a shared CPS-grounded sensor-fault stress-test protocol that reports worst-scenario degradation, clean MSE, and fault-time MSE using a standardized severity model and a disjoint fault-transfer split for explicit fault-training methods. Empirical evaluation on four datasets and eight scenarios shows selective degradation reductions from methods such as projected gradient descent adversarial training and fault augmentation, while the zero-shot foundation model (

What carries the argument

SensorFault-Bench, the CPS-grounded sensor-fault stress-test protocol that uses a standardized severity model and disjoint fault-transfer split to separate relative robustness from absolute error.

Load-bearing premise

The chosen fault models, severity levels, and eight scenarios accurately represent the distribution of real sensor faults in operational CPS deployments, and the four datasets suffice to generalize the ranking disagreements.

What would settle it

Re-running the full protocol on a new industrial CPS dataset containing naturally recorded sensor faults and observing whether the disagreement between clean-MSE rankings and worst-scenario fault-time rankings disappears.

Figures

Figures reproduced from arXiv: 2605.10822 by Alexander Windmann, Gianluca Manca, Jens U. Brandt, Marcel Dix, Oliver Niggemann, Philipp Wittenberg.

**Figure 2.** Figure 2: Per-dataset architecture degradation by benchmark scenario [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Sample-level sensor-fault examples for Beijing Air Tiantan and Penmanshiel WT08 under [PITH_FULL_IMAGE:figures/full_fig_p028_3.png] view at source ↗

**Figure 4.** Figure 4: Sample-level sensor-fault examples for ETTh1 and Traffic under the timing and availability [PITH_FULL_IMAGE:figures/full_fig_p029_4.png] view at source ↗

**Figure 5.** Figure 5: Additional architecture trade-off views for the baseline setting. The top row plots clean MSE [PITH_FULL_IMAGE:figures/full_fig_p041_5.png] view at source ↗

**Figure 6.** Figure 6: PGD adversarial training trajectories across the four datasets. Each panel plots clean MSE [PITH_FULL_IMAGE:figures/full_fig_p042_6.png] view at source ↗

read the original abstract

Cyber-physical system (CPS) forecasting models depend on sensor streams with noisy, biased, missing, or temporally misaligned readings, yet standard forecasting evaluation often selects models by nominal error without showing whether they remain robust under such faults. We introduce SensorFault-Bench, a shared CPS-grounded sensor-fault stress-test protocol for evaluating forecasting architectures and robustness-improvement methods, and an operational taxonomy organizing the method comparison. Across four real-world datasets and eight scored scenarios governed by a standardized severity model, it reports worst-scenario degradation, clean mean squared error (MSE), and worst-scenario fault-time MSE, separating relative robustness from absolute error. A disjoint fault-transfer split lets explicit fault-training methods train on adjacent fault families while evaluation uses separate benchmark scenarios. Empirically, forecasting architectures favored by clean MSE can degrade sharply under faults, and clean-MSE rankings can disagree with worst-scenario fault-time error rankings. Chronos-2, the evaluated zero-shot foundation-model representative, matches or trails the last-value naive forecaster in clean MSE on the two single-target datasets and has the largest worst-scenario degradation on ETTh1 and Traffic, where all channels are forecast targets. For the evaluated robustness-improvement method set, paired deltas show selective degradation reductions: projected gradient descent adversarial training and randomized training lead where value faults dominate observed degradation, while fault augmentation leads where availability faults dominate. SensorFault-Bench provides open-source code, documented data access, and reproduction and extension guides, so new datasets, architectures, and robustness-improvement methods can be evaluated under the same CPS sensor-fault robustness protocol.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper introduces a reproducible benchmarking protocol that shows clean MSE rankings for forecasting models often reverse or degrade under defined sensor faults.

read the letter

The main takeaway is that models favored by clean-data error can drop sharply when sensors fault, and this setup makes the ranking disagreements concrete rather than anecdotal. The work puts together SensorFault-Bench with a CPS-derived fault taxonomy, a fixed severity model, a disjoint fault-transfer split that keeps training faults separate from evaluation scenarios, and multi-metric reporting that tracks worst-case degradation alongside clean and fault-time MSE. They run it across four real datasets and eight scenarios, report specific reversals, note that Chronos-2 trails a naive forecaster on clean single-target data and degrades most on some fault cases, and show that different robustness methods help only on certain fault types. The open code, data links, and extension guides are included, so the deltas are directly verifiable. That combination of elements is new relative to the cited prior work and addresses a practical gap in how robustness gets measured for industrial forecasting. The protocol itself is the useful output here. The soft spots are mostly around coverage. The eight scenarios and severity levels are reasonable starting points but rest on the assumption that they reflect the distribution of real CPS sensor faults; the paper does not bring external field data to test that match, and the four datasets leave room for broader testing. The robustness-method comparisons are selective and fault-type dependent, which is fine for a benchmark but not a comprehensive sweep. No internal contradictions or metric problems show up in the design. This is for researchers and engineers working on time-series forecasting for cyber-physical or industrial systems who need better ways to evaluate robustness beyond clean leaderboards. A reader interested in benchmarking practices or sensor reliability would get direct value from the protocol and the observed ranking changes. It has enough substance, reproducibility, and a checkable empirical claim to deserve serious referee time rather than a desk reject. I would send it out for review, expecting comments mainly on fault representativeness and possible calls for more datasets or sensitivity checks.

Referee Report

2 major / 3 minor

Summary. The paper introduces SensorFault-Bench, a standardized benchmarking protocol for evaluating time-series forecasting architectures and robustness-improvement methods under sensor faults in cyber-physical systems. Using four real-world datasets, eight fault scenarios with a standardized severity model, and a disjoint fault-transfer split, it reports clean MSE, worst-scenario fault-time MSE, and degradation metrics. The central empirical finding is that architectures favored by clean MSE can degrade sharply under faults and that clean-MSE rankings disagree with worst-scenario fault-time rankings; specific results are given for models including Chronos-2 and for methods such as adversarial training and fault augmentation.

Significance. If the benchmark protocol and reported deltas hold, the work provides a valuable, reproducible, open-source tool for assessing fault robustness in CPS forecasting, where standard clean-MSE evaluation is shown to be insufficient. The explicit separation of relative robustness from absolute error, the taxonomy of methods, and the provision of code/data/reproduction guides are strengths that enable community extension. The findings directly challenge reliance on nominal error for model selection in operational settings.

major comments (2)

[Results / Abstract] The abstract and results claim specific degradation patterns for Chronos-2 (largest worst-scenario degradation on ETTh1 and Traffic) and ranking disagreements; the results section must include the exact per-dataset, per-scenario MSE values, rank tables, and statistical tests supporting these reversals, as they are load-bearing for the central claim that clean rankings are unreliable.
[Methods] The standardized severity model and the eight scored scenarios are central to the protocol; the methods section should provide explicit equations or pseudocode defining how severity is applied to value, availability, and temporal faults, because without this the reported deltas cannot be independently verified or extended.

minor comments (3)

[Introduction] Clarify in the introduction or methods whether the four datasets were chosen to cover diverse CPS domains or simply for availability, and state any limitations on generalizability.
[Methods] The taxonomy organizing robustness-improvement methods should be presented as a table or figure for quick reference, rather than only in text.
[Results] Ensure all figures showing ranking disagreements include error bars or confidence intervals from multiple runs.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive evaluation and constructive suggestions. We address each major comment below and have revised the manuscript to improve transparency and reproducibility.

read point-by-point responses

Referee: [Results / Abstract] The abstract and results claim specific degradation patterns for Chronos-2 (largest worst-scenario degradation on ETTh1 and Traffic) and ranking disagreements; the results section must include the exact per-dataset, per-scenario MSE values, rank tables, and statistical tests supporting these reversals, as they are load-bearing for the central claim that clean rankings are unreliable.

Authors: We agree that explicit per-dataset and per-scenario values, together with rank tables and statistical tests, are necessary to substantiate the central claim. In the revised manuscript we have expanded the Results section with full tables reporting exact MSE values for every model, dataset, and fault scenario. We have added explicit ranking tables that juxtapose clean-MSE orderings against worst-scenario fault-time orderings, and we include Wilcoxon signed-rank tests (with p-values) confirming the statistical significance of the observed ranking reversals, including the pronounced degradation of Chronos-2 on ETTh1 and Traffic. These additions make the evidence load-bearing and fully verifiable. revision: yes
Referee: [Methods] The standardized severity model and the eight scored scenarios are central to the protocol; the methods section should provide explicit equations or pseudocode defining how severity is applied to value, availability, and temporal faults, because without this the reported deltas cannot be independently verified or extended.

Authors: We appreciate the emphasis on reproducibility. The revised Methods section now contains explicit equations and pseudocode for the severity model. Value faults are formalized as additive Gaussian noise with severity-controlled variance; availability faults as independent Bernoulli dropout with severity parameter p; temporal faults as bounded random shifts with severity-controlled offset. The eight scored scenarios are enumerated with their exact severity tuples, enabling direct replication and community extension of the benchmark. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is an empirical benchmarking study that introduces SensorFault-Bench, a protocol with explicitly defined fault scenarios, severity models, datasets, and standard MSE-based metrics. It reports observed degradations and ranking disagreements without any mathematical derivation chain, fitted parameters presented as predictions, or load-bearing self-citations that reduce claims to inputs by construction. All central results are directly checkable via open code and data, rendering the work self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The paper contributes an empirical evaluation framework rather than a derivation; it relies on standard statistical metrics and introduces new benchmark artifacts without additional free parameters or ungrounded entities.

axioms (1)

standard math Standard mean squared error is an appropriate base metric for forecasting evaluation
Used for clean MSE, worst-scenario fault-time MSE, and degradation calculations.

invented entities (1)

SensorFault-Bench no independent evidence
purpose: Shared CPS-grounded sensor-fault stress-test protocol and taxonomy
Newly defined benchmark including severity model and fault-transfer split.

pith-pipeline@v0.9.0 · 5600 in / 1183 out tokens · 44664 ms · 2026-05-12T05:21:11.820351+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

125 extracted references · 125 canonical work pages · 1 internal anchor

[1]

Explainable Artificial Intelligence (XAI): What we know and what is left to attain Trustworthy Artificial Intelligence , shorttitle =

Sajid Ali, Tamer Abuhmed, Shaker El-Sappagh, Khan Muhammad, Jose M. Alonso-Moral, Roberto Confalonieri, Riccardo Guidotti, Javier Del Ser, Natalia Díaz-Rodríguez, and Francisco Herrera. Explainable Artificial Intelligence (XAI): What we know and what is left to attain Trustworthy Artificial Intelligence.Information Fusion, 99:101805, 2023. doi:10.1016/j.i...

work page doi:10.1016/j.inffus.2023.101805 2023
[2]

Andrade, Cecília Rocha, Ricardo Silva, João P

José R. Andrade, Cecília Rocha, Ricardo Silva, João P. Viana, Ricardo J. Bessa, Clara Gouveia, B. Almeida, R. J. Santos, Miguel Louro, P. M. Santos, and A. F. Ribeiro. Data-Driven Anomaly Detection and Event Log Profiling of SCADA Alarms.IEEE Access, 10:73758–73773, 2022. doi:10.1109/ACCESS.2022.3190398

work page doi:10.1109/access.2022.3190398 2022
[3]

Chronos-2: From univariate to universal forecasting.arXiv preprint arXiv:2510.15821, 2025

Abdul Fatir Ansari, Oleksandr Shchur, Jaris Küken, Andreas Auer, Boran Han, Pedro Mercado, Syama Sundar Rangapuram, Huibin Shen, Lorenzo Stella, Xiyuan Zhang, Mononito Goswami, Shubham Kapoor, Danielle C. Maddix, Pablo Guerron, Tony Hu, Junming Yin, Nick Erickson, Prateek Mutalik Desai, Hao Wang, Huzefa Rangwala, George Karypis, Yuyang Wang, and Michael B...

work page arXiv 2025
[4]

Multiple adaptive mechanisms for data-driven soft sensors.Computers & Chemical Engineering, 96:42–54, 2017

Rashid Bakirov, Bogdan Gabrys, and Damien Fay. Multiple adaptive mechanisms for data-driven soft sensors.Computers & Chemical Engineering, 96:42–54, 2017. doi:10.1016/j.compchemeng.2016.08.017

work page doi:10.1016/j.compchemeng.2016.08.017 2017
[5]

Goebel, and Simon Curran

Edward Balaban, Abhinav Saxena, Prasun Bansal, Kai F. Goebel, and Simon Curran. Modeling, Detection, and Disambiguation of Sensor Faults for Aerospace Applications.IEEE Sensors Journal, 9(12):1907–1917, 2009. doi:10.1109/JSEN.2009.2030284

work page doi:10.1109/jsen.2009.2030284 1907
[6]

Systematic Gener- alization in Neural Networks-based Multivariate Time Series Forecasting Models

Hritik Bansal, Gantavya Bhatt, Pankaj Malhotra, and Prathosh AP. Systematic Gener- alization in Neural Networks-based Multivariate Time Series Forecasting Models. In 2021 International Joint Conference on Neural Networks (IJCNN), pages 1–8, 2021. doi:10.1109/IJCNN52387.2021.9534469

work page doi:10.1109/ijcnn52387.2021.9534469 2021
[7]

Jonathan T. Barron. A General and Adaptive Robust Loss Function. In2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4326–4334, 2019. doi:10.1109/CVPR.2019.00446

work page doi:10.1109/cvpr.2019.00446 2019
[8]

Barrow, Sven F

Devon K. Barrow, Sven F. Crone, and Nikolaos Kourentzes. An evaluation of neural net- work ensembles and model selection for time series prediction. InThe 2010 International Joint Conference on Neural Networks (IJCNN), pages 1–8, Barcelona, Spain, 2010. IEEE. doi:10.1109/IJCNN.2010.5596686

work page doi:10.1109/ijcnn.2010.5596686 2010
[9]

Random Search for Hyper-Parameter Optimization

James Bergstra and Yoshua Bengio. Random Search for Hyper-Parameter Optimization. Journal of Machine Learning Research, 13(10):281–305, 2012

work page 2012
[10]

Probably approximately global robustness certification

Peter Blohm, Patrick Indri, Thomas Gärtner, and Sagar Malhotra. Probably approximately global robustness certification. InProceedings of the 42nd International Conference on Machine Learning, volume 267 ofProceedings of Machine Learning Research, pages 4570–

work page
[11]

Resilient Neural Forecasting Systems

Michael Bohlke-Schneider, Shubham Kapoor, and Tim Januschowski. Resilient Neural Forecasting Systems. InProceedings of the Fourth International Workshop on Data Man- agement for End-to-End Machine Learning, pages 1–5, Portland OR USA, 2020. ACM. doi:10.1145/3399579.3399869

work page doi:10.1145/3399579.3399869 2020
[12]

Accounting for variance in machine learning benchmarks

Xavier Bouthillier, Pierre Delaunay, Mirko Bronzi, Assya Trofimov, Brennan Nichyporuk, Justin Szeto, Nazanin Mohammadi Sepahvand, Edward Raff, Kanika Madan, Vikram V oleti, Samira Ebrahimi Kahou, Vincent Michalski, Tal Arbel, Chris Pal, Gael Varoquaux, and Pascal Vincent. Accounting for variance in machine learning benchmarks. In A. Smola, A. Dimakis, and...

work page 2021
[13]

Brandt, Noah C

Jens U. Brandt, Noah C. Pütz, Marcus Greiff, Thomas Jonathan Lew, John Subosits, Marc Hilbert, and Thomas Bartz-Beielstein. From Faults to Features: Pretraining to Learn Robust Representations against Sensor Failures. InNeurIPS, 2025

work page 2025
[14]

Guiding the comparison of neural network local robustness: An empirical study

Hao Bu and Meng Sun. Guiding the comparison of neural network local robustness: An empirical study. InArtificial Neural Networks and Machine Learning – ICANN 2023, vol- ume 14258 ofLecture Notes in Computer Science, pages 312–323, Cham, 2023. Springer. doi:10.1007/978-3-031-44192-9_25

work page doi:10.1007/978-3-031-44192-9_25 2023
[15]

Multi-Variate Time Series Forecasting on Variable Subsets

Jatin Chauhan, Aravindan Raghuveer, Rishi Saket, Jay Nandy, and Balaraman Ravindran. Multi-Variate Time Series Forecasting on Variable Subsets. InProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’22, pages 76–86, New York, NY , USA, 2022. Association for Computing Machinery. doi:10.1145/3534678.3539394

work page doi:10.1145/3534678.3539394 2022
[16]

TSMixer: An all-MLP architecture for time series forecasting.Transactions on Machine Learning Research, 2023

Si-An Chen, Chun-Liang Li, Sercan O Arik, Nathanael Christian Yoder, and Tomas Pfister. TSMixer: An all-MLP architecture for time series forecasting.Transactions on Machine Learning Research, 2023

work page 2023
[17]

Beijing Multi-Site Air Quality

Song Chen. Beijing Multi-Site Air Quality. UCI Machine Learning Repository, 2017. URL https://doi.org/10.24432/C5RK5G

work page doi:10.24432/c5rk5g 2017
[18]

Robusttsf: Towards theory and design of robust time series forecasting with anomalies

Hao Cheng, Qingsong Wen, Yang Liu, and Liang Sun. Robusttsf: Towards theory and design of robust time series forecasting with anomalies. InInternational Conference on Learning Representations, 2024

work page 2024
[19]

Learning Phrase Representations using

Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning Phrase Representations using RNN Encoder– Decoder for Statistical Machine Translation. InProceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1724–1734, Doha, Qatar, 2014. Asso...

work page doi:10.3115/v1/d14-1179 2014
[20]

Certified Adversarial Robustness via Random- ized Smoothing

Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certified Adversarial Robustness via Random- ized Smoothing. InProceedings of the 36th International Conference on Machine Learning, volume 97 ofProceedings of Machine Learning Research, pages 1310–1320. PMLR, 2019

work page 2019
[21]

DAM: Towards a Foundation Model for Forecasting

Luke Darlow, Qiwen Deng, Ahmed Hassan, Martin Asenov, Rajkarn Singh, Artjom Joosen, Adam Barker, and Amos Storkey. DAM: Towards a Foundation Model for Forecasting. In International Conference on Learning Representations, 2024

work page 2024
[22]

Embracing Data Irregularities in Multivariate Time Series with Recurrent and Graph Neural Networks

Marcel Rodrigues De Barros, Thiago Lizier Rissi, Eduardo Faria Cabrera, Eduardo Aoun Tannuri, Edson Satoshi Gomi, Rodrigo Augusto Barreira, and Anna Helena Reali Costa. Embracing Data Irregularities in Multivariate Time Series with Recurrent and Graph Neural Networks. In Murilo C. Naldi and Reinaldo A. C. Bianchi, editors,Intelligent Systems, volume 14195...

work page doi:10.1007/978-3-031-45368- 2023
[23]

Benchmark Datasets for Fault Detection and Classification in Sensor Data

Bas De Bruijn, Tuan Anh Nguyen, Doina Bucur, and Kenji Tei. Benchmark Datasets for Fault Detection and Classification in Sensor Data. InProceedings of the 5th International Conference on Sensor Networks (SENSORNETS 2016), pages 185–195, Rome, Italy, 2016. SciTePress. doi:10.5220/0005637901850195

work page doi:10.5220/0005637901850195 2016
[24]

Measuring the Robustness of ML Models Against Data Quality Issues in Industrial Time Series Data

Marcel Dix, Gianluca Manca, Kenneth Chigozie Okafor, Reuben Borrison, Konstantin Kirch- heim, Divyasheel Sharma, Kr Chandrika, Deepti Maduskar, and Frank Ortmeier. Measuring the Robustness of ML Models Against Data Quality Issues in Industrial Time Series Data. In2023 IEEE 21st International Conference on Industrial Informatics (INDIN), pages 1–8, Lemgo, ...

work page doi:10.1109/indin51400.2023.10218129 2023
[25]

Duchi and Hongseok Namkoong

John C. Duchi and Hongseok Namkoong. Learning models with uniform performance via distributionally robust optimization.The Annals of Statistics, 49(3):1378–1406, 2021. doi:10.1214/20-AOS2004. 11

work page doi:10.1214/20-aos2004 2021
[26]

Gintare Karolina Dziugaite, Alexandre Drouin, Brady Neal, Nitarshan Rajkumar, Ethan Caballero, Linbo Wang, Ioannis Mitliagkas, and Daniel M. Roy. In Search of Robust Measures of Generalization. InAdvances in Neural Information Processing Systems, volume 33, pages 11723–11733, 2020

work page 2020
[27]

Tibshirani.An Introduction to the Bootstrap

Bradley Efron and R.J. Tibshirani.An Introduction to the Bootstrap. Chapman and Hall/CRC,

work page
[28]

doi:10.1201/9780429246593

work page doi:10.1201/9780429246593
[29]

PyTorch Lightning

William Falcon and The PyTorch Lightning team. PyTorch Lightning. Zenodo, jan 2026

work page 2026
[30]

Guar- anteeing Robustness Against Real-World Perturbations In Time Series Classification Using Conformalized Randomized Smoothing

Nicola Franco, Jakob Spiegelberg, Jeanette Miriam Lorenz, and Stephan Günnemann. Guar- anteeing Robustness Against Real-World Perturbations In Time Series Classification Using Conformalized Randomized Smoothing. InProceedings of the Fortieth Conference on Uncer- tainty in Artificial Intelligence, volume 244 ofProceedings of Machine Learning Research, page...

work page 2024
[31]

Comparison of approaches to time- synchronous sampling in wireless sensor networks.Measurement, 56:203–214, 2014

Jürgen Funck and Clemens Gühmann. Comparison of approaches to time- synchronous sampling in wireless sensor networks.Measurement, 56:203–214, 2014. doi:10.1016/j.measurement.2014.07.001

work page doi:10.1016/j.measurement.2014.07.001 2014
[32]

A survey of uncertainty in deep neural networks.Artificial Intelligence Review, 56(S1):1513–1589, 2023

Jakob Gawlikowski, Cedrique Rovile Njieutcheu Tassi, Mohsin Ali, Jongseok Lee, Matthias Humt, Jianxiang Feng, Anna Kruspe, Rudolph Triebel, Peter Jung, Ribana Roscher, Muham- mad Shahzad, Wen Yang, Richard Bamler, and Xiao Xiang Zhu. A survey of uncertainty in deep neural networks.Artificial Intelligence Review, 56(S1):1513–1589, 2023. doi:10.1007/s10462-...

work page doi:10.1007/s10462- 2023
[33]

Monash time series forecasting archive

Rakshitha W Godahewa, Christoph Bergmeir, Geoffrey Webb, Rob Hyndman, and Pablo Montero-Manso. Monash time series forecasting archive. In J. Vanschoren and S. Yeung, editors,Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, volume 1, 2021

work page 2021
[34]

Hansen, Thomas B

Bolette D. Hansen, Thomas B. Hansen, Thomas B. Moeslund, and David G. Jensen. Data- Driven Drift Detection in Real Process Tanks: Bridging the Gap between Academia and Practice.Water, 14(6):926, 2022. doi:10.3390/w14060926

work page doi:10.3390/w14060926 2022
[35]

Targeted adversarial attacks on wind power forecasts.Machine Learning, 113(2):863–889, 2024

René Heinrich, Christoph Scholz, Stephan V ogt, and Malte Lehna. Targeted adversarial attacks on wind power forecasts.Machine Learning, 113(2):863–889, 2024. doi:10.1007/s10994-023- 06396-9

work page doi:10.1007/s10994-023- 2024
[36]

Benchmarking Neural Network Robustness to Com- mon Corruptions and Perturbations

Dan Hendrycks and Thomas Dietterich. Benchmarking Neural Network Robustness to Com- mon Corruptions and Perturbations. InInternational Conference on Learning Representations, 2019

work page 2019
[37]

AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty

Dan Hendrycks, Norman Mu, Ekin Dogus Cubuk, Barret Zoph, Justin Gilmer, and Balaji Lakshminarayanan. AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty. InInternational Conference on Learning Representations, 2020

work page 2020
[38]

2022 , month = jun, journal =

Dan Hendrycks, Nicholas Carlini, John Schulman, and Jacob Steinhardt. Unsolved Problems in ML Safety.arXiv preprint arXiv:2109.13916, 2022

work page arXiv 2022
[39]

Forecast evaluation for data scientists: Common pitfalls and best practices.Data Mining and Knowledge Discovery, 37(2):788–832, 2023

Hansika Hewamalage, Klaus Ackermann, and Christoph Bergmeir. Forecast evaluation for data scientists: Common pitfalls and best practices.Data Mining and Knowledge Discovery, 37(2):788–832, 2023. doi:10.1007/s10618-022-00894-5

work page doi:10.1007/s10618-022-00894-5 2023
[40]

Data losses and synchronization accord- ing to delay in PLC-based industrial automation systems.Heliyon, 10(18):e37560, 2024

Ayah Hijazi, Mátyás Andó, and Zoltán Pödör. Data losses and synchronization accord- ing to delay in PLC-based industrial automation systems.Heliyon, 10(18):e37560, 2024. doi:10.1016/j.heliyon.2024.e37560

work page doi:10.1016/j.heliyon.2024.e37560 2024
[41]

Analysis of data quality issues in real-world industrial data.Annual Conference of the PHM Society, 5(1), 2013

Thomas Hubauer, Steffen Lamparter, Mikhail Roshchin, Nina Solomakhina, and Stuart Watson. Analysis of data quality issues in real-world industrial data.Annual Conference of the PHM Society, 5(1), 2013. doi:10.36001/phmconf.2013.v5i1.2198

work page doi:10.36001/phmconf.2013.v5i1.2198 2013
[42]

Hyndman and George Athanasopoulos.Forecasting: Principles and Practice

Rob J. Hyndman and George Athanasopoulos.Forecasting: Principles and Practice. OTexts, Melbourne, Australia, 3rd edition, 2021. 12

work page 2021
[43]

IEEE Standard Glossary of Software Engineering Terminology.IEEE Std 610.12-1990, pages 1–84, 1990

IEEE. IEEE Standard Glossary of Software Engineering Terminology.IEEE Std 610.12-1990, pages 1–84, 1990. doi:10.1109/IEEESTD.1990.101064

work page doi:10.1109/ieeestd.1990.101064 1990
[44]

Data Augmentation techniques in time series domain: A survey and taxonomy.Neural Computing and Applications, 35(14):10123–10145, 2023

Guillermo Iglesias, Edgar Talavera, Angel González-Prieto, Alberto Mozo, and Sandra Gómez- Canaval. Data Augmentation techniques in time series domain: A survey and taxonomy.Neural Computing and Applications, 35(14):10123–10145, 2023. doi:10.1007/s00521-023-08459-3

work page doi:10.1007/s00521-023-08459-3 2023
[45]

ISO/IEC TR 24029-1:2021 — Artificial Intelligence (AI) — Assessment of the Robustness of Neural Networks — Part 1: Overview, 2021

ISO/IEC. ISO/IEC TR 24029-1:2021 — Artificial Intelligence (AI) — Assessment of the Robustness of Neural Networks — Part 1: Overview, 2021. URL https://www.iso.org/ standard/77609.html. Technical Report

work page 2021
[46]

Artificial intelligence (AI) — Assessment of the robustness of neural networks — Part 2: Methodology for the use of formal methods, 2023

ISO/IEC. Artificial intelligence (AI) — Assessment of the robustness of neural networks — Part 2: Methodology for the use of formal methods, 2023. URL https://www.iso.org/ standard/79804.html. ISO/IEC JTC 1/SC 42 (Artificial intelligence)

work page 2023
[47]

Artificial intelligence — Data quality for analytics and machine learning (ML) — Part 1: Overview, terminology, and examples, 2024

ISO/IEC. Artificial intelligence — Data quality for analytics and machine learning (ML) — Part 1: Overview, terminology, and examples, 2024. URL https://www.iso.org/ standard/81088.html. Prepared by ISO/IEC JTC 1/SC 42

work page 2024
[48]

Artificial intelligence — Data quality for analytics and machine learning (ML) — Part 2: Data quality measures, 2024

ISO/IEC. Artificial intelligence — Data quality for analytics and machine learning (ML) — Part 2: Data quality measures, 2024. URLhttps://www.iso.org/standard/81860.html. Prepared by ISO/IEC JTC 1/SC 42

work page 2024
[49]

Artificial intelligence (AI) — Assessment of the robustness of neural networks — Part 3: Methodology for the use of statistical methods, 2026

ISO/IEC. Artificial intelligence (AI) — Assessment of the robustness of neural networks — Part 3: Methodology for the use of statistical methods, 2026. URL https://www.iso.org/ standard/86901.html. Draft circulated for comments and approval; subject to change

work page 2026
[50]

An empirical survey of data augmentation for time series classification with neural networks.PLOS ONE, 16(7):e0254841, 2021

Brian Kenji Iwana and Seiichi Uchida. An empirical survey of data augmentation for time series classification with neural networks.PLOS ONE, 16(7):e0254841, 2021. doi:10.1371/journal.pone.0254841

work page doi:10.1371/journal.pone.0254841 2021
[51]

Towards Robust Real-World Multivariate Time Series Forecasting: A Unified Framework for Dependency, Asynchrony, and Missingness

Jinkwan Jang, Hyungjin Park, Jinmyeong Choi, and Taesup Kim. Towards Robust Real-World Multivariate Time Series Forecasting: A Unified Framework for Dependency, Asynchrony, and Missingness. InInternational Conference on Learning Representations, 2026

work page 2026
[52]

Benchmarking M-LTSF: Frequency and Noise-Based Evaluation of Multivariate Long Time Series Forecasting Models.arXiv preprint arXiv:2510.04900, 2025

Nick Janssen, Melanie Schaller, and Bodo Rosenhahn. Benchmarking M-LTSF: Frequency and Noise-Based Evaluation of Multivariate Long Time Series Forecasting Models.arXiv preprint arXiv:2510.04900, 2025

work page arXiv 2025
[53]

Continuous detection of concept drift in industrial cyber-physical systems using closed loop incremental machine learning.Discover Artificial Intelligence, 1(1), 2021

Dinithi Jayaratne, Daswin De Silva, Damminda Alahakoon, and Xinghuo Yu. Continuous detection of concept drift in industrial cyber-physical systems using closed loop incremental machine learning.Discover Artificial Intelligence, 1(1), 2021. doi:10.1007/s44163-021-00007- z

work page doi:10.1007/s44163-021-00007- 2021
[54]

Full-Band General Audio Synthesis with Score-Based Diffusion

Eun Som Jeon, Suhas Lohit, Rushil Anirudh, and Pavan Turaga. Robust Time Series Recovery and Classification Using Test-Time Noise Simulator Networks. InICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5, Rhodes Island, Greece, 2023. IEEE. doi:10.1109/ICASSP49357.2023.10096888

work page doi:10.1109/icassp49357.2023.10096888 2023
[55]

A Survey on Data Quality for Dependable Monitoring in Wireless Sensor Networks.Sensors, 17(9):2010, 2017

Gonçalo Jesus, António Casimiro, and Anabela Oliveira. A Survey on Data Quality for Dependable Monitoring in Wireless Sensor Networks.Sensors, 17(9):2010, 2017. doi:10.3390/s17092010

work page doi:10.3390/s17092010 2010
[56]

Domain adaptation for time series forecasting via attention sharing

Xiaoyong Jin, Youngsuk Park, Danielle Maddix, Hao Wang, and Yuyang Wang. Domain adaptation for time series forecasting via attention sharing. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors,Proceedings of the 39th International Conference on Machine Learning, volume 162 ofProceedings of Machine Learn...

work page 2022
[57]

Practical aspects impacting time synchronization data quality in semiconductor manufacturing

Naveen Kalappa, James Moyne, Jonathan Parrott, and Ya-Shian Li-Baboud. Practical aspects impacting time synchronization data quality in semiconductor manufacturing. InProceedings of the 2006 IEEE-1588 Conference, Gaithersburg, MD, USA, 2006. 13

work page 2006
[58]

Local geometry attention for time series forecasting under realistic corruptions

Dongbin Kim, Youngjoo Park, Woojin Jeong, and Jaewook Lee. Local geometry attention for time series forecasting under realistic corruptions. InInternational Conference on Learning Representations, 2026

work page 2026
[59]

Battling the non-stationarity in time series forecasting via test-time adaptation

HyunGi Kim, Siwon Kim, Jisoo Mok, and Sungroh Yoon. Battling the non-stationarity in time series forecasting via test-time adaptation. InProceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence and Thirty-Seventh Conference on Innovative Applications of Artificial Intelligence and Fifteenth Symposium on Educational Advances in Artificial...

work page doi:10.1609/aaai.v39i17.33965 2025
[60]

Reversible Instance Normalization for Accurate Time-Series Forecasting against Distribution Shift

Taesung Kim, Jinhee Kim, Yunwon Tae, Cheonbok Park, Jang-Ho Choi, and Jaegul Choo. Reversible Instance Normalization for Accurate Time-Series Forecasting against Distribution Shift. InInternational Conference on Learning Representations, 2022

work page 2022
[61]

Barrow, and Sven F

Nikolaos Kourentzes, Devon K. Barrow, and Sven F. Crone. Neural network ensemble operators for time series forecasting.Expert Systems with Applications, 41(9):4235–4244,

work page
[62]

doi:10.1016/j.eswa.2013.12.011

work page doi:10.1016/j.eswa.2013.12.011 2013
[63]

Improv- ing resilience of sensors in planetary exploration using data-driven models.Machine Learning: Science and Technology, 4(3):035041, 2023

Dileep Kumar, Manuel Dominguez-Pumar, Elisa Sayrol-Clols, Josefina Torres, Mercedes Marín, Javier Gómez-Elvira, Luis Mora, Sara Navarro, and Jose Rodríguez-Manfredi. Improv- ing resilience of sensors in planetary exploration using data-driven models.Machine Learning: Science and Technology, 4(3):035041, 2023. doi:10.1088/2632-2153/acefaa

work page doi:10.1088/2632-2153/acefaa 2023
[64]

Modeling long- and short- term temporal patterns with deep neural networks

Guokun Lai, Wei-Cheng Chang, Yiming Yang, and Hanxiao Liu. Modeling long- and short- term temporal patterns with deep neural networks. InThe 41st International ACM SIGIR Con- ference on Research & Development in Information Retrieval, SIGIR ’18, pages 95–104, New York, NY , USA, 2018. Association for Computing Machinery. doi:10.1145/3209978.3210006

work page doi:10.1145/3209978.3210006 2018
[65]

Globally-robust neural networks

Klas Leino, Zifan Wang, and Matt Fredrikson. Globally-robust neural networks. InProceedings of the 38th International Conference on Machine Learning (ICML), volume 139 ofProceedings of Machine Learning Research, pages 6212–6222. PMLR, 2021

work page 2021
[66]

Evaluating model performance under worst-case subpopulations

Mike Li, Hongseok Namkoong, and Shangzhou Xia. Evaluating model performance under worst-case subpopulations. InAdvances in Neural Information Processing Systems, volume 34, pages 17325–17334. Curran Associates, Inc., 2021

work page 2021
[67]

Causal Discovery for Irregularly Time Series with Consistency Guarantees

Weihong Li, Anpeng Wu, Kun Kuang, and Keting Yin. ReTimeCausal: EM-Augmented Additive Noise Models for Interpretable Causal Discovery in Irregular Time Series.arXiv preprint arXiv:2507.03310, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[68]

Probabilistic Learning of Multivariate Time Series With Temporal Irregularity.IEEE Transactions on Knowledge and Data Engineering, 37(5): 2874–2887, 2025

Yijun Li, Cheuk Hang Leung, and Qi Wu. Probabilistic Learning of Multivariate Time Series With Temporal Irregularity.IEEE Transactions on Knowledge and Data Engineering, 37(5): 2874–2887, 2025. doi:10.1109/TKDE.2025.3544348

work page doi:10.1109/tkde.2025.3544348 2025
[69]

Robust multi- variate time-series forecasting: Adversarial attacks and defense mechanisms

Linbo Liu, Youngsuk Park, Trong Nghia Hoang, Hilaf Hasson, and Luke Huan. Robust multi- variate time-series forecasting: Adversarial attacks and defense mechanisms. InInternational Conference on Learning Representations, 2023

work page 2023
[70]

López de Prado.Advances in Financial Machine Learning

Marcos M. López de Prado.Advances in Financial Machine Learning. Wiley, Hoboken, New Jersey, 2018

work page 2018
[71]

Alexandra-Veronica Luca, Melinda Simon-Várhelyi, Norbert-Botond Mihály, and Vasile- Mircea Cristea. Fault Type Diagnosis of the WWTP Dissolved Oxygen Sensor Based on Fisher Discriminant Analysis and Assessment of Associated Environmental and Economic Impact.Applied Sciences, 13(4):2554, 2023. doi:10.3390/app13042554

work page doi:10.3390/app13042554 2023
[72]

ModernTCN: A Modern Pure Convolution Structure for General Time Series Analysis

Donghao Luo and Xue Wang. ModernTCN: A Modern Pure Convolution Structure for General Time Series Analysis. InInternational Conference on Learning Representations, 2024

work page 2024
[73]

E2GAN: End-to-End Gener- ative Adversarial Network for Multivariate Time Series Imputation

Yonghong Luo, Ying Zhang, Xiangrui Cai, and Xiaojie Yuan. E2GAN: End-to-End Gener- ative Adversarial Network for Multivariate Time Series Imputation. InProceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, pages 3094–3100, Macao, China, 2019. International Joint Conferences on Artificial Intelligence Organization. doi...

work page doi:10.24963/ijcai.2019/429 2019
[74]

Towards deep learning models resistant to adversarial attacks

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. InInternational Conference on Learning Representations, 2018

work page 2018
[75]

AlShami, and Jugal Kalita

Melkamu Mersha, Khang Lam, Joseph Wood, Ali K. AlShami, and Jugal Kalita. Explain- able artificial intelligence: A survey of needs, techniques, applications, and future direction. Neurocomputing, 599:128111, 2024. doi:10.1016/j.neucom.2024.128111

work page doi:10.1016/j.neucom.2024.128111 2024
[76]

Ecker, Matthias Bethge, and Wieland Brendel

Claudio Michaelis, Benjamin Mitzkus, Robert Geirhos, Evgenia Rusak, Oliver Bring- mann, Alexander S. Ecker, Matthias Bethge, and Wieland Brendel. Benchmarking Ro- bustness in Object Detection: Autonomous Driving when Winter is Coming.arXiv preprint arXiv:1907.07484, 2019

work page arXiv 1907
[77]

Resilience and Resilient Systems of Artificial Intelligence: Taxonomy, Models and Methods

Viacheslav Moskalenko, Vyacheslav Kharchenko, Alona Moskalenko, and Borys Kuzikov. Resilience and Resilient Systems of Artificial Intelligence: Taxonomy, Models and Methods. Algorithms, 16(3):165, 2023. doi:10.3390/a16030165

work page doi:10.3390/a16030165 2023
[78]

Sen- sor network data fault types.ACM Transactions on Sensor Networks, 5(3):1–29, 2009

Kevin Ni, Nithya Ramanathan, Mohamed Nabil Hajj Chehade, Laura Balzano, Sheela Nair, Sadaf Zahedi, Eddie Kohler, Greg Pottie, Mark Hansen, and Mani Srivastava. Sen- sor network data fault types.ACM Transactions on Sensor Networks, 5(3):1–29, 2009. doi:10.1145/1525856.1525863

work page doi:10.1145/1525856.1525863 2009
[79]

Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam

Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers. InInternational Conference on Learning Representations, 2023

work page 2023
[80]

Zainib Noshad, Nadeem Javaid, Tanzila Saba, Zahid Wadud, Muhammad Qaiser Saleem, Mohammad Eid Alzahrani, and Osama E. Sheta. Fault Detection in Wireless Sensor Networks through the Random Forest Classifier.Sensors, 19(7):1568, 2019. doi:10.3390/s19071568

work page doi:10.3390/s19071568 2019

Showing first 80 references.