pith. machine review for the scientific record. sign in

arxiv: 2605.10822 · v1 · submitted 2026-05-11 · 💻 cs.LG · eess.SP

Recognition: no theorem link

Benchmarking Sensor-Fault Robustness in Forecasting

Alexander Windmann, Gianluca Manca, Jens U. Brandt, Marcel Dix, Oliver Niggemann, Philipp Wittenberg

Pith reviewed 2026-05-12 05:21 UTC · model grok-4.3

classification 💻 cs.LG eess.SP
keywords sensor fault robustnesstime series forecastingbenchmarkingcyber-physical systemsmodel evaluationfault toleranceCPS forecasting
0
0 comments X

The pith

Forecasting models chosen by clean MSE often degrade most under sensor faults, reversing rankings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard forecasting evaluation selects models by accuracy on clean sensor data, yet real cyber-physical systems regularly encounter noisy, biased, missing, or misaligned readings. This paper introduces SensorFault-Bench, a standardized protocol with eight fault scenarios, a severity model, and a disjoint fault-transfer split to measure both clean MSE and worst-scenario fault-time error across four real-world datasets. It establishes that architectures favored by clean MSE can degrade sharply under faults and that clean-MSE rankings frequently disagree with fault-time rankings. The protocol also shows that certain robustness methods, such as adversarial training, reduce degradation selectively depending on whether value or availability faults dominate. A sympathetic reader cares because deploying models that perform well only on perfect data risks costly failures once sensors deviate from nominal behavior.

Core claim

The paper claims that forecasting architectures favored by clean mean squared error can degrade sharply under faults, and clean-MSE rankings can disagree with worst-scenario fault-time error rankings. It introduces SensorFault-Bench as a shared CPS-grounded sensor-fault stress-test protocol that reports worst-scenario degradation, clean MSE, and fault-time MSE using a standardized severity model and a disjoint fault-transfer split for explicit fault-training methods. Empirical evaluation on four datasets and eight scenarios shows selective degradation reductions from methods such as projected gradient descent adversarial training and fault augmentation, while the zero-shot foundation model (

What carries the argument

SensorFault-Bench, the CPS-grounded sensor-fault stress-test protocol that uses a standardized severity model and disjoint fault-transfer split to separate relative robustness from absolute error.

Load-bearing premise

The chosen fault models, severity levels, and eight scenarios accurately represent the distribution of real sensor faults in operational CPS deployments, and the four datasets suffice to generalize the ranking disagreements.

What would settle it

Re-running the full protocol on a new industrial CPS dataset containing naturally recorded sensor faults and observing whether the disagreement between clean-MSE rankings and worst-scenario fault-time rankings disappears.

Figures

Figures reproduced from arXiv: 2605.10822 by Alexander Windmann, Gianluca Manca, Jens U. Brandt, Marcel Dix, Oliver Niggemann, Philipp Wittenberg.

Figure 1
Figure 1. Figure 1: Benchmark overview for SensorFault-Bench. The benchmark perturbs observed input [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Per-dataset architecture degradation by benchmark scenario [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Sample-level sensor-fault examples for Beijing Air Tiantan and Penmanshiel WT08 under [PITH_FULL_IMAGE:figures/full_fig_p028_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Sample-level sensor-fault examples for ETTh1 and Traffic under the timing and availability [PITH_FULL_IMAGE:figures/full_fig_p029_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Additional architecture trade-off views for the baseline setting. The top row plots clean MSE [PITH_FULL_IMAGE:figures/full_fig_p041_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: PGD adversarial training trajectories across the four datasets. Each panel plots clean MSE [PITH_FULL_IMAGE:figures/full_fig_p042_6.png] view at source ↗
read the original abstract

Cyber-physical system (CPS) forecasting models depend on sensor streams with noisy, biased, missing, or temporally misaligned readings, yet standard forecasting evaluation often selects models by nominal error without showing whether they remain robust under such faults. We introduce SensorFault-Bench, a shared CPS-grounded sensor-fault stress-test protocol for evaluating forecasting architectures and robustness-improvement methods, and an operational taxonomy organizing the method comparison. Across four real-world datasets and eight scored scenarios governed by a standardized severity model, it reports worst-scenario degradation, clean mean squared error (MSE), and worst-scenario fault-time MSE, separating relative robustness from absolute error. A disjoint fault-transfer split lets explicit fault-training methods train on adjacent fault families while evaluation uses separate benchmark scenarios. Empirically, forecasting architectures favored by clean MSE can degrade sharply under faults, and clean-MSE rankings can disagree with worst-scenario fault-time error rankings. Chronos-2, the evaluated zero-shot foundation-model representative, matches or trails the last-value naive forecaster in clean MSE on the two single-target datasets and has the largest worst-scenario degradation on ETTh1 and Traffic, where all channels are forecast targets. For the evaluated robustness-improvement method set, paired deltas show selective degradation reductions: projected gradient descent adversarial training and randomized training lead where value faults dominate observed degradation, while fault augmentation leads where availability faults dominate. SensorFault-Bench provides open-source code, documented data access, and reproduction and extension guides, so new datasets, architectures, and robustness-improvement methods can be evaluated under the same CPS sensor-fault robustness protocol.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The paper introduces SensorFault-Bench, a standardized benchmarking protocol for evaluating time-series forecasting architectures and robustness-improvement methods under sensor faults in cyber-physical systems. Using four real-world datasets, eight fault scenarios with a standardized severity model, and a disjoint fault-transfer split, it reports clean MSE, worst-scenario fault-time MSE, and degradation metrics. The central empirical finding is that architectures favored by clean MSE can degrade sharply under faults and that clean-MSE rankings disagree with worst-scenario fault-time rankings; specific results are given for models including Chronos-2 and for methods such as adversarial training and fault augmentation.

Significance. If the benchmark protocol and reported deltas hold, the work provides a valuable, reproducible, open-source tool for assessing fault robustness in CPS forecasting, where standard clean-MSE evaluation is shown to be insufficient. The explicit separation of relative robustness from absolute error, the taxonomy of methods, and the provision of code/data/reproduction guides are strengths that enable community extension. The findings directly challenge reliance on nominal error for model selection in operational settings.

major comments (2)
  1. [Results / Abstract] The abstract and results claim specific degradation patterns for Chronos-2 (largest worst-scenario degradation on ETTh1 and Traffic) and ranking disagreements; the results section must include the exact per-dataset, per-scenario MSE values, rank tables, and statistical tests supporting these reversals, as they are load-bearing for the central claim that clean rankings are unreliable.
  2. [Methods] The standardized severity model and the eight scored scenarios are central to the protocol; the methods section should provide explicit equations or pseudocode defining how severity is applied to value, availability, and temporal faults, because without this the reported deltas cannot be independently verified or extended.
minor comments (3)
  1. [Introduction] Clarify in the introduction or methods whether the four datasets were chosen to cover diverse CPS domains or simply for availability, and state any limitations on generalizability.
  2. [Methods] The taxonomy organizing robustness-improvement methods should be presented as a table or figure for quick reference, rather than only in text.
  3. [Results] Ensure all figures showing ranking disagreements include error bars or confidence intervals from multiple runs.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive evaluation and constructive suggestions. We address each major comment below and have revised the manuscript to improve transparency and reproducibility.

read point-by-point responses
  1. Referee: [Results / Abstract] The abstract and results claim specific degradation patterns for Chronos-2 (largest worst-scenario degradation on ETTh1 and Traffic) and ranking disagreements; the results section must include the exact per-dataset, per-scenario MSE values, rank tables, and statistical tests supporting these reversals, as they are load-bearing for the central claim that clean rankings are unreliable.

    Authors: We agree that explicit per-dataset and per-scenario values, together with rank tables and statistical tests, are necessary to substantiate the central claim. In the revised manuscript we have expanded the Results section with full tables reporting exact MSE values for every model, dataset, and fault scenario. We have added explicit ranking tables that juxtapose clean-MSE orderings against worst-scenario fault-time orderings, and we include Wilcoxon signed-rank tests (with p-values) confirming the statistical significance of the observed ranking reversals, including the pronounced degradation of Chronos-2 on ETTh1 and Traffic. These additions make the evidence load-bearing and fully verifiable. revision: yes

  2. Referee: [Methods] The standardized severity model and the eight scored scenarios are central to the protocol; the methods section should provide explicit equations or pseudocode defining how severity is applied to value, availability, and temporal faults, because without this the reported deltas cannot be independently verified or extended.

    Authors: We appreciate the emphasis on reproducibility. The revised Methods section now contains explicit equations and pseudocode for the severity model. Value faults are formalized as additive Gaussian noise with severity-controlled variance; availability faults as independent Bernoulli dropout with severity parameter p; temporal faults as bounded random shifts with severity-controlled offset. The eight scored scenarios are enumerated with their exact severity tuples, enabling direct replication and community extension of the benchmark. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is an empirical benchmarking study that introduces SensorFault-Bench, a protocol with explicitly defined fault scenarios, severity models, datasets, and standard MSE-based metrics. It reports observed degradations and ranking disagreements without any mathematical derivation chain, fitted parameters presented as predictions, or load-bearing self-citations that reduce claims to inputs by construction. All central results are directly checkable via open code and data, rendering the work self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The paper contributes an empirical evaluation framework rather than a derivation; it relies on standard statistical metrics and introduces new benchmark artifacts without additional free parameters or ungrounded entities.

axioms (1)
  • standard math Standard mean squared error is an appropriate base metric for forecasting evaluation
    Used for clean MSE, worst-scenario fault-time MSE, and degradation calculations.
invented entities (1)
  • SensorFault-Bench no independent evidence
    purpose: Shared CPS-grounded sensor-fault stress-test protocol and taxonomy
    Newly defined benchmark including severity model and fault-transfer split.

pith-pipeline@v0.9.0 · 5600 in / 1183 out tokens · 44664 ms · 2026-05-12T05:21:11.820351+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

125 extracted references · 125 canonical work pages · 1 internal anchor

  1. [1]

    Explainable Artificial Intelligence (XAI): What we know and what is left to attain Trustworthy Artificial Intelligence , shorttitle =

    Sajid Ali, Tamer Abuhmed, Shaker El-Sappagh, Khan Muhammad, Jose M. Alonso-Moral, Roberto Confalonieri, Riccardo Guidotti, Javier Del Ser, Natalia Díaz-Rodríguez, and Francisco Herrera. Explainable Artificial Intelligence (XAI): What we know and what is left to attain Trustworthy Artificial Intelligence.Information Fusion, 99:101805, 2023. doi:10.1016/j.i...

  2. [2]

    Andrade, Cecília Rocha, Ricardo Silva, João P

    José R. Andrade, Cecília Rocha, Ricardo Silva, João P. Viana, Ricardo J. Bessa, Clara Gouveia, B. Almeida, R. J. Santos, Miguel Louro, P. M. Santos, and A. F. Ribeiro. Data-Driven Anomaly Detection and Event Log Profiling of SCADA Alarms.IEEE Access, 10:73758–73773, 2022. doi:10.1109/ACCESS.2022.3190398

  3. [3]

    Chronos-2: From univariate to universal forecasting.arXiv preprint arXiv:2510.15821, 2025

    Abdul Fatir Ansari, Oleksandr Shchur, Jaris Küken, Andreas Auer, Boran Han, Pedro Mercado, Syama Sundar Rangapuram, Huibin Shen, Lorenzo Stella, Xiyuan Zhang, Mononito Goswami, Shubham Kapoor, Danielle C. Maddix, Pablo Guerron, Tony Hu, Junming Yin, Nick Erickson, Prateek Mutalik Desai, Hao Wang, Huzefa Rangwala, George Karypis, Yuyang Wang, and Michael B...

  4. [4]

    Multiple adaptive mechanisms for data-driven soft sensors.Computers & Chemical Engineering, 96:42–54, 2017

    Rashid Bakirov, Bogdan Gabrys, and Damien Fay. Multiple adaptive mechanisms for data-driven soft sensors.Computers & Chemical Engineering, 96:42–54, 2017. doi:10.1016/j.compchemeng.2016.08.017

  5. [5]

    Goebel, and Simon Curran

    Edward Balaban, Abhinav Saxena, Prasun Bansal, Kai F. Goebel, and Simon Curran. Modeling, Detection, and Disambiguation of Sensor Faults for Aerospace Applications.IEEE Sensors Journal, 9(12):1907–1917, 2009. doi:10.1109/JSEN.2009.2030284

  6. [6]

    Systematic Gener- alization in Neural Networks-based Multivariate Time Series Forecasting Models

    Hritik Bansal, Gantavya Bhatt, Pankaj Malhotra, and Prathosh AP. Systematic Gener- alization in Neural Networks-based Multivariate Time Series Forecasting Models. In 2021 International Joint Conference on Neural Networks (IJCNN), pages 1–8, 2021. doi:10.1109/IJCNN52387.2021.9534469

  7. [7]

    Jonathan T. Barron. A General and Adaptive Robust Loss Function. In2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4326–4334, 2019. doi:10.1109/CVPR.2019.00446

  8. [8]

    Barrow, Sven F

    Devon K. Barrow, Sven F. Crone, and Nikolaos Kourentzes. An evaluation of neural net- work ensembles and model selection for time series prediction. InThe 2010 International Joint Conference on Neural Networks (IJCNN), pages 1–8, Barcelona, Spain, 2010. IEEE. doi:10.1109/IJCNN.2010.5596686

  9. [9]

    Random Search for Hyper-Parameter Optimization

    James Bergstra and Yoshua Bengio. Random Search for Hyper-Parameter Optimization. Journal of Machine Learning Research, 13(10):281–305, 2012

  10. [10]

    Probably approximately global robustness certification

    Peter Blohm, Patrick Indri, Thomas Gärtner, and Sagar Malhotra. Probably approximately global robustness certification. InProceedings of the 42nd International Conference on Machine Learning, volume 267 ofProceedings of Machine Learning Research, pages 4570–

  11. [11]

    Resilient Neural Forecasting Systems

    Michael Bohlke-Schneider, Shubham Kapoor, and Tim Januschowski. Resilient Neural Forecasting Systems. InProceedings of the Fourth International Workshop on Data Man- agement for End-to-End Machine Learning, pages 1–5, Portland OR USA, 2020. ACM. doi:10.1145/3399579.3399869

  12. [12]

    Accounting for variance in machine learning benchmarks

    Xavier Bouthillier, Pierre Delaunay, Mirko Bronzi, Assya Trofimov, Brennan Nichyporuk, Justin Szeto, Nazanin Mohammadi Sepahvand, Edward Raff, Kanika Madan, Vikram V oleti, Samira Ebrahimi Kahou, Vincent Michalski, Tal Arbel, Chris Pal, Gael Varoquaux, and Pascal Vincent. Accounting for variance in machine learning benchmarks. In A. Smola, A. Dimakis, and...

  13. [13]

    Brandt, Noah C

    Jens U. Brandt, Noah C. Pütz, Marcus Greiff, Thomas Jonathan Lew, John Subosits, Marc Hilbert, and Thomas Bartz-Beielstein. From Faults to Features: Pretraining to Learn Robust Representations against Sensor Failures. InNeurIPS, 2025

  14. [14]

    Guiding the comparison of neural network local robustness: An empirical study

    Hao Bu and Meng Sun. Guiding the comparison of neural network local robustness: An empirical study. InArtificial Neural Networks and Machine Learning – ICANN 2023, vol- ume 14258 ofLecture Notes in Computer Science, pages 312–323, Cham, 2023. Springer. doi:10.1007/978-3-031-44192-9_25

  15. [15]

    Multi-Variate Time Series Forecasting on Variable Subsets

    Jatin Chauhan, Aravindan Raghuveer, Rishi Saket, Jay Nandy, and Balaraman Ravindran. Multi-Variate Time Series Forecasting on Variable Subsets. InProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’22, pages 76–86, New York, NY , USA, 2022. Association for Computing Machinery. doi:10.1145/3534678.3539394

  16. [16]

    TSMixer: An all-MLP architecture for time series forecasting.Transactions on Machine Learning Research, 2023

    Si-An Chen, Chun-Liang Li, Sercan O Arik, Nathanael Christian Yoder, and Tomas Pfister. TSMixer: An all-MLP architecture for time series forecasting.Transactions on Machine Learning Research, 2023

  17. [17]

    Beijing Multi-Site Air Quality

    Song Chen. Beijing Multi-Site Air Quality. UCI Machine Learning Repository, 2017. URL https://doi.org/10.24432/C5RK5G

  18. [18]

    Robusttsf: Towards theory and design of robust time series forecasting with anomalies

    Hao Cheng, Qingsong Wen, Yang Liu, and Liang Sun. Robusttsf: Towards theory and design of robust time series forecasting with anomalies. InInternational Conference on Learning Representations, 2024

  19. [19]

    Learning Phrase Representations using

    Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning Phrase Representations using RNN Encoder– Decoder for Statistical Machine Translation. InProceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1724–1734, Doha, Qatar, 2014. Asso...

  20. [20]

    Certified Adversarial Robustness via Random- ized Smoothing

    Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certified Adversarial Robustness via Random- ized Smoothing. InProceedings of the 36th International Conference on Machine Learning, volume 97 ofProceedings of Machine Learning Research, pages 1310–1320. PMLR, 2019

  21. [21]

    DAM: Towards a Foundation Model for Forecasting

    Luke Darlow, Qiwen Deng, Ahmed Hassan, Martin Asenov, Rajkarn Singh, Artjom Joosen, Adam Barker, and Amos Storkey. DAM: Towards a Foundation Model for Forecasting. In International Conference on Learning Representations, 2024

  22. [22]

    Embracing Data Irregularities in Multivariate Time Series with Recurrent and Graph Neural Networks

    Marcel Rodrigues De Barros, Thiago Lizier Rissi, Eduardo Faria Cabrera, Eduardo Aoun Tannuri, Edson Satoshi Gomi, Rodrigo Augusto Barreira, and Anna Helena Reali Costa. Embracing Data Irregularities in Multivariate Time Series with Recurrent and Graph Neural Networks. In Murilo C. Naldi and Reinaldo A. C. Bianchi, editors,Intelligent Systems, volume 14195...

  23. [23]

    Benchmark Datasets for Fault Detection and Classification in Sensor Data

    Bas De Bruijn, Tuan Anh Nguyen, Doina Bucur, and Kenji Tei. Benchmark Datasets for Fault Detection and Classification in Sensor Data. InProceedings of the 5th International Conference on Sensor Networks (SENSORNETS 2016), pages 185–195, Rome, Italy, 2016. SciTePress. doi:10.5220/0005637901850195

  24. [24]

    Measuring the Robustness of ML Models Against Data Quality Issues in Industrial Time Series Data

    Marcel Dix, Gianluca Manca, Kenneth Chigozie Okafor, Reuben Borrison, Konstantin Kirch- heim, Divyasheel Sharma, Kr Chandrika, Deepti Maduskar, and Frank Ortmeier. Measuring the Robustness of ML Models Against Data Quality Issues in Industrial Time Series Data. In2023 IEEE 21st International Conference on Industrial Informatics (INDIN), pages 1–8, Lemgo, ...

  25. [25]

    Duchi and Hongseok Namkoong

    John C. Duchi and Hongseok Namkoong. Learning models with uniform performance via distributionally robust optimization.The Annals of Statistics, 49(3):1378–1406, 2021. doi:10.1214/20-AOS2004. 11

  26. [26]

    Gintare Karolina Dziugaite, Alexandre Drouin, Brady Neal, Nitarshan Rajkumar, Ethan Caballero, Linbo Wang, Ioannis Mitliagkas, and Daniel M. Roy. In Search of Robust Measures of Generalization. InAdvances in Neural Information Processing Systems, volume 33, pages 11723–11733, 2020

  27. [27]

    Tibshirani.An Introduction to the Bootstrap

    Bradley Efron and R.J. Tibshirani.An Introduction to the Bootstrap. Chapman and Hall/CRC,

  28. [28]

    doi:10.1201/9780429246593

  29. [29]

    PyTorch Lightning

    William Falcon and The PyTorch Lightning team. PyTorch Lightning. Zenodo, jan 2026

  30. [30]

    Guar- anteeing Robustness Against Real-World Perturbations In Time Series Classification Using Conformalized Randomized Smoothing

    Nicola Franco, Jakob Spiegelberg, Jeanette Miriam Lorenz, and Stephan Günnemann. Guar- anteeing Robustness Against Real-World Perturbations In Time Series Classification Using Conformalized Randomized Smoothing. InProceedings of the Fortieth Conference on Uncer- tainty in Artificial Intelligence, volume 244 ofProceedings of Machine Learning Research, page...

  31. [31]

    Comparison of approaches to time- synchronous sampling in wireless sensor networks.Measurement, 56:203–214, 2014

    Jürgen Funck and Clemens Gühmann. Comparison of approaches to time- synchronous sampling in wireless sensor networks.Measurement, 56:203–214, 2014. doi:10.1016/j.measurement.2014.07.001

  32. [32]

    A survey of uncertainty in deep neural networks.Artificial Intelligence Review, 56(S1):1513–1589, 2023

    Jakob Gawlikowski, Cedrique Rovile Njieutcheu Tassi, Mohsin Ali, Jongseok Lee, Matthias Humt, Jianxiang Feng, Anna Kruspe, Rudolph Triebel, Peter Jung, Ribana Roscher, Muham- mad Shahzad, Wen Yang, Richard Bamler, and Xiao Xiang Zhu. A survey of uncertainty in deep neural networks.Artificial Intelligence Review, 56(S1):1513–1589, 2023. doi:10.1007/s10462-...

  33. [33]

    Monash time series forecasting archive

    Rakshitha W Godahewa, Christoph Bergmeir, Geoffrey Webb, Rob Hyndman, and Pablo Montero-Manso. Monash time series forecasting archive. In J. Vanschoren and S. Yeung, editors,Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, volume 1, 2021

  34. [34]

    Hansen, Thomas B

    Bolette D. Hansen, Thomas B. Hansen, Thomas B. Moeslund, and David G. Jensen. Data- Driven Drift Detection in Real Process Tanks: Bridging the Gap between Academia and Practice.Water, 14(6):926, 2022. doi:10.3390/w14060926

  35. [35]

    Targeted adversarial attacks on wind power forecasts.Machine Learning, 113(2):863–889, 2024

    René Heinrich, Christoph Scholz, Stephan V ogt, and Malte Lehna. Targeted adversarial attacks on wind power forecasts.Machine Learning, 113(2):863–889, 2024. doi:10.1007/s10994-023- 06396-9

  36. [36]

    Benchmarking Neural Network Robustness to Com- mon Corruptions and Perturbations

    Dan Hendrycks and Thomas Dietterich. Benchmarking Neural Network Robustness to Com- mon Corruptions and Perturbations. InInternational Conference on Learning Representations, 2019

  37. [37]

    AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty

    Dan Hendrycks, Norman Mu, Ekin Dogus Cubuk, Barret Zoph, Justin Gilmer, and Balaji Lakshminarayanan. AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty. InInternational Conference on Learning Representations, 2020

  38. [38]

    2022 , month = jun, journal =

    Dan Hendrycks, Nicholas Carlini, John Schulman, and Jacob Steinhardt. Unsolved Problems in ML Safety.arXiv preprint arXiv:2109.13916, 2022

  39. [39]

    Forecast evaluation for data scientists: Common pitfalls and best practices.Data Mining and Knowledge Discovery, 37(2):788–832, 2023

    Hansika Hewamalage, Klaus Ackermann, and Christoph Bergmeir. Forecast evaluation for data scientists: Common pitfalls and best practices.Data Mining and Knowledge Discovery, 37(2):788–832, 2023. doi:10.1007/s10618-022-00894-5

  40. [40]

    Data losses and synchronization accord- ing to delay in PLC-based industrial automation systems.Heliyon, 10(18):e37560, 2024

    Ayah Hijazi, Mátyás Andó, and Zoltán Pödör. Data losses and synchronization accord- ing to delay in PLC-based industrial automation systems.Heliyon, 10(18):e37560, 2024. doi:10.1016/j.heliyon.2024.e37560

  41. [41]

    Analysis of data quality issues in real-world industrial data.Annual Conference of the PHM Society, 5(1), 2013

    Thomas Hubauer, Steffen Lamparter, Mikhail Roshchin, Nina Solomakhina, and Stuart Watson. Analysis of data quality issues in real-world industrial data.Annual Conference of the PHM Society, 5(1), 2013. doi:10.36001/phmconf.2013.v5i1.2198

  42. [42]

    Hyndman and George Athanasopoulos.Forecasting: Principles and Practice

    Rob J. Hyndman and George Athanasopoulos.Forecasting: Principles and Practice. OTexts, Melbourne, Australia, 3rd edition, 2021. 12

  43. [43]

    IEEE Standard Glossary of Software Engineering Terminology.IEEE Std 610.12-1990, pages 1–84, 1990

    IEEE. IEEE Standard Glossary of Software Engineering Terminology.IEEE Std 610.12-1990, pages 1–84, 1990. doi:10.1109/IEEESTD.1990.101064

  44. [44]

    Data Augmentation techniques in time series domain: A survey and taxonomy.Neural Computing and Applications, 35(14):10123–10145, 2023

    Guillermo Iglesias, Edgar Talavera, Angel González-Prieto, Alberto Mozo, and Sandra Gómez- Canaval. Data Augmentation techniques in time series domain: A survey and taxonomy.Neural Computing and Applications, 35(14):10123–10145, 2023. doi:10.1007/s00521-023-08459-3

  45. [45]

    ISO/IEC TR 24029-1:2021 — Artificial Intelligence (AI) — Assessment of the Robustness of Neural Networks — Part 1: Overview, 2021

    ISO/IEC. ISO/IEC TR 24029-1:2021 — Artificial Intelligence (AI) — Assessment of the Robustness of Neural Networks — Part 1: Overview, 2021. URL https://www.iso.org/ standard/77609.html. Technical Report

  46. [46]

    Artificial intelligence (AI) — Assessment of the robustness of neural networks — Part 2: Methodology for the use of formal methods, 2023

    ISO/IEC. Artificial intelligence (AI) — Assessment of the robustness of neural networks — Part 2: Methodology for the use of formal methods, 2023. URL https://www.iso.org/ standard/79804.html. ISO/IEC JTC 1/SC 42 (Artificial intelligence)

  47. [47]

    Artificial intelligence — Data quality for analytics and machine learning (ML) — Part 1: Overview, terminology, and examples, 2024

    ISO/IEC. Artificial intelligence — Data quality for analytics and machine learning (ML) — Part 1: Overview, terminology, and examples, 2024. URL https://www.iso.org/ standard/81088.html. Prepared by ISO/IEC JTC 1/SC 42

  48. [48]

    Artificial intelligence — Data quality for analytics and machine learning (ML) — Part 2: Data quality measures, 2024

    ISO/IEC. Artificial intelligence — Data quality for analytics and machine learning (ML) — Part 2: Data quality measures, 2024. URLhttps://www.iso.org/standard/81860.html. Prepared by ISO/IEC JTC 1/SC 42

  49. [49]

    Artificial intelligence (AI) — Assessment of the robustness of neural networks — Part 3: Methodology for the use of statistical methods, 2026

    ISO/IEC. Artificial intelligence (AI) — Assessment of the robustness of neural networks — Part 3: Methodology for the use of statistical methods, 2026. URL https://www.iso.org/ standard/86901.html. Draft circulated for comments and approval; subject to change

  50. [50]

    An empirical survey of data augmentation for time series classification with neural networks.PLOS ONE, 16(7):e0254841, 2021

    Brian Kenji Iwana and Seiichi Uchida. An empirical survey of data augmentation for time series classification with neural networks.PLOS ONE, 16(7):e0254841, 2021. doi:10.1371/journal.pone.0254841

  51. [51]

    Towards Robust Real-World Multivariate Time Series Forecasting: A Unified Framework for Dependency, Asynchrony, and Missingness

    Jinkwan Jang, Hyungjin Park, Jinmyeong Choi, and Taesup Kim. Towards Robust Real-World Multivariate Time Series Forecasting: A Unified Framework for Dependency, Asynchrony, and Missingness. InInternational Conference on Learning Representations, 2026

  52. [52]

    Benchmarking M-LTSF: Frequency and Noise-Based Evaluation of Multivariate Long Time Series Forecasting Models.arXiv preprint arXiv:2510.04900, 2025

    Nick Janssen, Melanie Schaller, and Bodo Rosenhahn. Benchmarking M-LTSF: Frequency and Noise-Based Evaluation of Multivariate Long Time Series Forecasting Models.arXiv preprint arXiv:2510.04900, 2025

  53. [53]

    Continuous detection of concept drift in industrial cyber-physical systems using closed loop incremental machine learning.Discover Artificial Intelligence, 1(1), 2021

    Dinithi Jayaratne, Daswin De Silva, Damminda Alahakoon, and Xinghuo Yu. Continuous detection of concept drift in industrial cyber-physical systems using closed loop incremental machine learning.Discover Artificial Intelligence, 1(1), 2021. doi:10.1007/s44163-021-00007- z

  54. [54]

    Full-Band General Audio Synthesis with Score-Based Diffusion

    Eun Som Jeon, Suhas Lohit, Rushil Anirudh, and Pavan Turaga. Robust Time Series Recovery and Classification Using Test-Time Noise Simulator Networks. InICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5, Rhodes Island, Greece, 2023. IEEE. doi:10.1109/ICASSP49357.2023.10096888

  55. [55]

    A Survey on Data Quality for Dependable Monitoring in Wireless Sensor Networks.Sensors, 17(9):2010, 2017

    Gonçalo Jesus, António Casimiro, and Anabela Oliveira. A Survey on Data Quality for Dependable Monitoring in Wireless Sensor Networks.Sensors, 17(9):2010, 2017. doi:10.3390/s17092010

  56. [56]

    Domain adaptation for time series forecasting via attention sharing

    Xiaoyong Jin, Youngsuk Park, Danielle Maddix, Hao Wang, and Yuyang Wang. Domain adaptation for time series forecasting via attention sharing. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors,Proceedings of the 39th International Conference on Machine Learning, volume 162 ofProceedings of Machine Learn...

  57. [57]

    Practical aspects impacting time synchronization data quality in semiconductor manufacturing

    Naveen Kalappa, James Moyne, Jonathan Parrott, and Ya-Shian Li-Baboud. Practical aspects impacting time synchronization data quality in semiconductor manufacturing. InProceedings of the 2006 IEEE-1588 Conference, Gaithersburg, MD, USA, 2006. 13

  58. [58]

    Local geometry attention for time series forecasting under realistic corruptions

    Dongbin Kim, Youngjoo Park, Woojin Jeong, and Jaewook Lee. Local geometry attention for time series forecasting under realistic corruptions. InInternational Conference on Learning Representations, 2026

  59. [59]

    Battling the non-stationarity in time series forecasting via test-time adaptation

    HyunGi Kim, Siwon Kim, Jisoo Mok, and Sungroh Yoon. Battling the non-stationarity in time series forecasting via test-time adaptation. InProceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence and Thirty-Seventh Conference on Innovative Applications of Artificial Intelligence and Fifteenth Symposium on Educational Advances in Artificial...

  60. [60]

    Reversible Instance Normalization for Accurate Time-Series Forecasting against Distribution Shift

    Taesung Kim, Jinhee Kim, Yunwon Tae, Cheonbok Park, Jang-Ho Choi, and Jaegul Choo. Reversible Instance Normalization for Accurate Time-Series Forecasting against Distribution Shift. InInternational Conference on Learning Representations, 2022

  61. [61]

    Barrow, and Sven F

    Nikolaos Kourentzes, Devon K. Barrow, and Sven F. Crone. Neural network ensemble operators for time series forecasting.Expert Systems with Applications, 41(9):4235–4244,

  62. [62]

    doi:10.1016/j.eswa.2013.12.011

  63. [63]

    Improv- ing resilience of sensors in planetary exploration using data-driven models.Machine Learning: Science and Technology, 4(3):035041, 2023

    Dileep Kumar, Manuel Dominguez-Pumar, Elisa Sayrol-Clols, Josefina Torres, Mercedes Marín, Javier Gómez-Elvira, Luis Mora, Sara Navarro, and Jose Rodríguez-Manfredi. Improv- ing resilience of sensors in planetary exploration using data-driven models.Machine Learning: Science and Technology, 4(3):035041, 2023. doi:10.1088/2632-2153/acefaa

  64. [64]

    Modeling long- and short- term temporal patterns with deep neural networks

    Guokun Lai, Wei-Cheng Chang, Yiming Yang, and Hanxiao Liu. Modeling long- and short- term temporal patterns with deep neural networks. InThe 41st International ACM SIGIR Con- ference on Research & Development in Information Retrieval, SIGIR ’18, pages 95–104, New York, NY , USA, 2018. Association for Computing Machinery. doi:10.1145/3209978.3210006

  65. [65]

    Globally-robust neural networks

    Klas Leino, Zifan Wang, and Matt Fredrikson. Globally-robust neural networks. InProceedings of the 38th International Conference on Machine Learning (ICML), volume 139 ofProceedings of Machine Learning Research, pages 6212–6222. PMLR, 2021

  66. [66]

    Evaluating model performance under worst-case subpopulations

    Mike Li, Hongseok Namkoong, and Shangzhou Xia. Evaluating model performance under worst-case subpopulations. InAdvances in Neural Information Processing Systems, volume 34, pages 17325–17334. Curran Associates, Inc., 2021

  67. [67]

    Causal Discovery for Irregularly Time Series with Consistency Guarantees

    Weihong Li, Anpeng Wu, Kun Kuang, and Keting Yin. ReTimeCausal: EM-Augmented Additive Noise Models for Interpretable Causal Discovery in Irregular Time Series.arXiv preprint arXiv:2507.03310, 2025

  68. [68]

    Probabilistic Learning of Multivariate Time Series With Temporal Irregularity.IEEE Transactions on Knowledge and Data Engineering, 37(5): 2874–2887, 2025

    Yijun Li, Cheuk Hang Leung, and Qi Wu. Probabilistic Learning of Multivariate Time Series With Temporal Irregularity.IEEE Transactions on Knowledge and Data Engineering, 37(5): 2874–2887, 2025. doi:10.1109/TKDE.2025.3544348

  69. [69]

    Robust multi- variate time-series forecasting: Adversarial attacks and defense mechanisms

    Linbo Liu, Youngsuk Park, Trong Nghia Hoang, Hilaf Hasson, and Luke Huan. Robust multi- variate time-series forecasting: Adversarial attacks and defense mechanisms. InInternational Conference on Learning Representations, 2023

  70. [70]

    López de Prado.Advances in Financial Machine Learning

    Marcos M. López de Prado.Advances in Financial Machine Learning. Wiley, Hoboken, New Jersey, 2018

  71. [71]

    Alexandra-Veronica Luca, Melinda Simon-Várhelyi, Norbert-Botond Mihály, and Vasile- Mircea Cristea. Fault Type Diagnosis of the WWTP Dissolved Oxygen Sensor Based on Fisher Discriminant Analysis and Assessment of Associated Environmental and Economic Impact.Applied Sciences, 13(4):2554, 2023. doi:10.3390/app13042554

  72. [72]

    ModernTCN: A Modern Pure Convolution Structure for General Time Series Analysis

    Donghao Luo and Xue Wang. ModernTCN: A Modern Pure Convolution Structure for General Time Series Analysis. InInternational Conference on Learning Representations, 2024

  73. [73]

    E2GAN: End-to-End Gener- ative Adversarial Network for Multivariate Time Series Imputation

    Yonghong Luo, Ying Zhang, Xiangrui Cai, and Xiaojie Yuan. E2GAN: End-to-End Gener- ative Adversarial Network for Multivariate Time Series Imputation. InProceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, pages 3094–3100, Macao, China, 2019. International Joint Conferences on Artificial Intelligence Organization. doi...

  74. [74]

    Towards deep learning models resistant to adversarial attacks

    Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. InInternational Conference on Learning Representations, 2018

  75. [75]

    AlShami, and Jugal Kalita

    Melkamu Mersha, Khang Lam, Joseph Wood, Ali K. AlShami, and Jugal Kalita. Explain- able artificial intelligence: A survey of needs, techniques, applications, and future direction. Neurocomputing, 599:128111, 2024. doi:10.1016/j.neucom.2024.128111

  76. [76]

    Ecker, Matthias Bethge, and Wieland Brendel

    Claudio Michaelis, Benjamin Mitzkus, Robert Geirhos, Evgenia Rusak, Oliver Bring- mann, Alexander S. Ecker, Matthias Bethge, and Wieland Brendel. Benchmarking Ro- bustness in Object Detection: Autonomous Driving when Winter is Coming.arXiv preprint arXiv:1907.07484, 2019

  77. [77]

    Resilience and Resilient Systems of Artificial Intelligence: Taxonomy, Models and Methods

    Viacheslav Moskalenko, Vyacheslav Kharchenko, Alona Moskalenko, and Borys Kuzikov. Resilience and Resilient Systems of Artificial Intelligence: Taxonomy, Models and Methods. Algorithms, 16(3):165, 2023. doi:10.3390/a16030165

  78. [78]

    Sen- sor network data fault types.ACM Transactions on Sensor Networks, 5(3):1–29, 2009

    Kevin Ni, Nithya Ramanathan, Mohamed Nabil Hajj Chehade, Laura Balzano, Sheela Nair, Sadaf Zahedi, Eddie Kohler, Greg Pottie, Mark Hansen, and Mani Srivastava. Sen- sor network data fault types.ACM Transactions on Sensor Networks, 5(3):1–29, 2009. doi:10.1145/1525856.1525863

  79. [79]

    Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam

    Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers. InInternational Conference on Learning Representations, 2023

  80. [80]

    Zainib Noshad, Nadeem Javaid, Tanzila Saba, Zahid Wadud, Muhammad Qaiser Saleem, Mohammad Eid Alzahrani, and Osama E. Sheta. Fault Detection in Wireless Sensor Networks through the Random Forest Classifier.Sensors, 19(7):1568, 2019. doi:10.3390/s19071568

Showing first 80 references.