pith. machine review for the scientific record. sign in

arxiv: 2605.14165 · v1 · pith:H7UOFT2Pnew · submitted 2026-05-13 · 💻 cs.CR

DSTAN-Med: Dual-Channel Spatiotemporal Attention with Physiological Plausibility Filtering for False Data Injection Attack Detection in IoT-Based Medical Devices

Pith reviewed 2026-05-15 04:53 UTC · model grok-4.3

classification 💻 cs.CR
keywords false data injectionIoMTattention mechanismanomaly detectionphysiological filteringmedical sensorscyber-physical security
0
0 comments X

The pith

A dual-channel attention model plus a zero-parameter plausibility filter detects falsified vital signs on IoMT sensors with 7.4-8.3 point sensitivity gains over Transformer baselines.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that false data injection attacks on medical IoT streams can be caught more reliably by routing sensor windows through separate self-attention paths for sensor identity and for time, rather than mixing both in one latent space. A residual 1D-CNN extracts local temporal patterns while a domain-knowledge filter discards readings that violate physiological bounds. Evaluated on ICU vital-sign, continuous-waveform, and wearable datasets, the full system lifts sensitivity 7.4-8.3 points above the strongest prior Transformer detector, with the filter adding 3.1-4.2 points of precision at negligible sensitivity cost. Ablations show every piece is required; dropping residual connections alone cuts sensitivity by 14 points. The approach therefore supplies concrete, physiologically grounded detection that existing shared-latent models lack.

Core claim

DSTAN-Med routes multivariate sensor windows through independent sensor-wise and time-wise self-attention pathways on orthogonal tensor axes, augments them with a residual 1D-CNN block, and applies a zero-parameter Physiological Plausibility Filter that suppresses attack signatures violating domain-knowledge bounds, yielding statistically significant sensitivity gains on three IoMT corpora.

What carries the argument

Dual-channel Attention Mechanism (DAM) that separates sensor-wise (SWA) and time-wise (TWA) self-attention on orthogonal axes, combined with the Physiological Plausibility Filter (PPF).

If this is right

  • Each component is individually necessary; removing residual connections reduces sensitivity by 14 percentage points.
  • The Physiological Plausibility Filter contributes independent precision gains of 3.1-4.2 points with negligible sensitivity cost on all three datasets.
  • Sensitivity improvements remain significant at p < 0.01 under McNemar's test with Holm-Bonferroni correction.
  • The framework applies across ICU vital signs, continuous waveforms, and wearable biosensor signals.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Deployment on live hospital streams would need per-patient or per-condition bound tuning to avoid over-filtering rare but valid states.
  • The same orthogonal separation of sensor and temporal attention could be tested on non-medical multivariate sensor streams such as industrial control or environmental monitoring.
  • A natural next measurement is how often the filter rejects attacks that remain physiologically plausible, which the current synthetic-injection tests do not quantify.

Load-bearing premise

The fixed physiological bounds will not flag genuine but atypical patient states as attacks, and the synthetic injection patterns used for testing match the statistics of real-world false-data attacks.

What would settle it

Run the filter on a corpus of real hospital vital-sign records that contain verified unusual but valid physiological states; if the filter suppresses any of those states at high rate, or if real FDI attacks that stay inside the bounds are missed, the central performance claim fails.

Figures

Figures reproduced from arXiv: 2605.14165 by Md Mehedi Hasan, Md Zakir Hossain, Rafiqul Islam.

Figure 1
Figure 1. Figure 1: DSTAN-Med architecture. Top: End-to-end inference pipeline — input window augmented with class token, processed through N = 7 DAM–CNN blocks, classified by a linear head, and filtered at inference by the PPF. Bottom left: DAM block — parallel SWA (sensor axis) and TWA (time axis) self-attention pathways merged with a residual skip and layer normalisation. Bottom right: CNN block — two 1D convolutional laye… view at source ↗
Figure 2
Figure 2. Figure 2: PhysioNet-2012 single-type FDI detection: Sensitivity (%) and F1 (%) across five methods (columns: four anomaly types; rows: Sensitivity and F1; [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: PPF standalone contribution on PhysioNet-2012 ( [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Ablation study (PhysioNet-2012, mixed-type). Seven configurations [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
read the original abstract

False data injection (FDI) attacks on Internet of Medical Things (IoMT) sensor streams falsify vital signs in transit, threatening patient safety and defeating clinical monitoring systems that lack cyber-physical anomaly detection capability. Existing deep learning detectors conflate inter-sensor spatial correlations with temporal dependencies in a shared latent space, preventing disentanglement of the distinct spatial and temporal signatures that FDI attacks imprint simultaneously; no current method exploits domain knowledge to constrain outputs against physiologically impossible attack patterns. We propose DSTAN-Med, a supervised framework comprising a Dual-channel Attention Mechanism (DAM) that routes multivariate sensor windows through independent sensor-wise (SWA) and time-wise (TWA) self-attention pathways operating on orthogonal tensor axes, a residual 1D-CNN block for local temporal feature extraction, and a zero-parameter Physiological Plausibility Filter (PPF) that suppresses attack signatures violating domain-knowledge bounds. Evaluated across three IoMT sensor datasets - PhysioNet/CinC 2012 (ICU vital signs), MIMIC-III Waveform (continuous ICU waveforms), and WESAD (wearable biosensor signals) - DSTAN-Med achieves mean sensitivity gains of 7.4-8.3 percentage points over the strongest Transformer baseline (TranAD), with improvements significant at p < 0.01 (McNemar's test, Holm-Bonferroni correction). The PPF contributes independent precision gains of 3.1-4.2 percentage points at negligible sensitivity cost across all three corpora. Ablation studies confirm that each component is individually necessary; removal of residual connections alone reduces sensitivity by 14.0 percentage points. The source code is publicly available at https://github.com/mehedi93hasan/DSTAN-MED.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper presents DSTAN-Med, a supervised deep learning framework for detecting false data injection (FDI) attacks on IoMT sensor streams. It comprises a Dual-channel Attention Mechanism (DAM) with independent sensor-wise (SWA) and time-wise (TWA) self-attention pathways on orthogonal tensor axes, a residual 1D-CNN block, and a zero-parameter Physiological Plausibility Filter (PPF) that suppresses physiologically implausible patterns. Evaluated on PhysioNet/CinC 2012, MIMIC-III Waveform, and WESAD datasets, it claims mean sensitivity gains of 7.4-8.3 percentage points over the TranAD baseline (p < 0.01 via McNemar's test with Holm-Bonferroni correction), plus independent 3.1-4.2 pp precision gains from the PPF, with ablations confirming each component's necessity (e.g., 14.0 pp sensitivity drop without residual connections). Source code is stated to be public.

Significance. If the central claims hold under scrutiny, the work would advance FDI detection in medical IoT by explicitly disentangling spatial and temporal attack signatures via orthogonal attention channels and constraining outputs with domain-knowledge physiological bounds. This could yield more reliable anomaly detection for vital-sign monitoring, with the public code release supporting reproducibility. The statistical testing and multi-corpus evaluation are strengths if the synthetic attack distributions are representative.

major comments (3)
  1. [Evaluation] Evaluation section: The process for generating and injecting synthetic FDI attacks into the three corpora is not described in sufficient detail (e.g., no equations or pseudocode for perturbation magnitudes, sensor selection, or temporal patterns), which is load-bearing for verifying whether the reported 7.4-8.3 pp sensitivity gains generalize beyond the evaluation artifacts to real-world FDI distributions.
  2. [Methods (PPF)] Methods (PPF description): The PPF is presented as zero-parameter with fixed domain-knowledge bounds, yet no procedure, reference values, or justification for bound selection is provided; this directly affects the claim that it contributes 3.1-4.2 pp precision gains at negligible sensitivity cost, as genuine extreme but valid states (e.g., sepsis-induced excursions) could be suppressed.
  3. [Results (Ablation studies)] Results (Ablation studies): The assertion that each component is individually necessary rests on reported drops such as the 14.0 pp sensitivity reduction without residual connections, but the exact data splits, training protocol, and confirmation that ablations isolate single factors are not elaborated, preventing independent verification of the post-hoc necessity claims.
minor comments (2)
  1. The GitHub link is given but the manuscript should explicitly state which commit or release tag corresponds to the exact experiments reported, including preprocessing and attack-injection scripts.
  2. [Abstract] Dataset descriptions in the abstract and evaluation could include specific version numbers, sampling rates, and any filtering applied to the raw PhysioNet, MIMIC-III, and WESAD streams for clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which highlight important aspects of reproducibility and transparency. We address each major comment point by point below. We will revise the manuscript to incorporate additional details and clarifications as outlined, strengthening the evaluation and methods sections without altering the core claims or results.

read point-by-point responses
  1. Referee: [Evaluation] Evaluation section: The process for generating and injecting synthetic FDI attacks into the three corpora is not described in sufficient detail (e.g., no equations or pseudocode for perturbation magnitudes, sensor selection, or temporal patterns), which is load-bearing for verifying whether the reported 7.4-8.3 pp sensitivity gains generalize beyond the evaluation artifacts to real-world FDI distributions.

    Authors: We agree that the synthetic FDI attack generation process requires more explicit detail to support reproducibility and to demonstrate that the sensitivity gains generalize appropriately. In the revised manuscript, we will add a dedicated subsection to the Evaluation section that fully describes the attack models. This will include: equations specifying perturbation magnitudes (e.g., additive Gaussian noise with sensor-specific standard deviations calibrated to realistic FDI scenarios); criteria for sensor selection (targeting subsets of vital signs such as heart rate, SpO2, and blood pressure); and temporal injection patterns (including burst durations and random onset times). We will also include pseudocode for the complete injection pipeline. These additions will allow independent verification that the attack distributions are representative of plausible real-world threats. revision: yes

  2. Referee: [Methods (PPF)] Methods (PPF description): The PPF is presented as zero-parameter with fixed domain-knowledge bounds, yet no procedure, reference values, or justification for bound selection is provided; this directly affects the claim that it contributes 3.1-4.2 pp precision gains at negligible sensitivity cost, as genuine extreme but valid states (e.g., sepsis-induced excursions) could be suppressed.

    Authors: We acknowledge that the PPF bound selection lacks sufficient justification and references in the current manuscript. In the revised Methods section, we will expand the PPF description to include: the specific physiological reference values and ranges used (e.g., heart rate 40–220 bpm, systolic blood pressure 70–200 mmHg), drawn from established clinical guidelines with appropriate citations; the procedure for selecting conservative bounds to balance precision gains against the risk of suppressing valid extremes; and an explicit discussion of limitations, including potential effects on rare but physiologically valid states such as sepsis-induced excursions. We will also note any sensitivity checks performed on bound variations. This will better substantiate the reported 3.1–4.2 pp precision improvements while transparently addressing the referee’s concern. revision: yes

  3. Referee: [Results (Ablation studies)] Results (Ablation studies): The assertion that each component is individually necessary rests on reported drops such as the 14.0 pp sensitivity reduction without residual connections, but the exact data splits, training protocol, and confirmation that ablations isolate single factors are not elaborated, preventing independent verification of the post-hoc necessity claims.

    Authors: We agree that the ablation study details are insufficient for independent verification. In the revised Results section, we will provide a more complete description of the ablation protocol, including: the exact data splits used (stratified 70/15/15 train/validation/test ratios applied consistently across all three datasets); the full training protocol (optimizer, learning rate schedule, batch size, maximum epochs, and early-stopping criteria); and explicit confirmation that each ablation isolates a single component by removing only that element while holding all other factors fixed. We will also add supplementary tables reporting mean performance with standard deviations across multiple runs to demonstrate that the observed drops (such as the 14.0 pp sensitivity reduction without residual connections) are attributable to the isolated factor rather than confounding variables. revision: yes

Circularity Check

0 steps flagged

No circularity: architecture uses standard components and zero-parameter domain filter; claims rest on empirical evaluation.

full rationale

The paper introduces DSTAN-Med as a supervised framework with dual-channel self-attention (sensor-wise and time-wise on orthogonal axes), a residual 1D-CNN, and a zero-parameter Physiological Plausibility Filter applying fixed domain-knowledge bounds. No equations or derivations are presented that reduce a claimed prediction or result to a fitted parameter or self-referential definition by construction. The reported sensitivity gains and ablation results are obtained from direct evaluation on three public datasets with synthetic FDI injections; the PPF is explicitly zero-parameter and not tuned to the test data. No self-citation chains or uniqueness theorems are invoked to justify core choices. The derivation chain is therefore self-contained and non-circular.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that the chosen physiological bounds are both complete and safe to apply, and that the three evaluation datasets contain representative FDI attack patterns; no free parameters are explicitly named in the abstract.

axioms (1)
  • domain assumption Physiological bounds used by the PPF are accurate and do not exclude valid clinical states.
    The filter is described as suppressing attack signatures that violate domain-knowledge bounds; correctness depends on the completeness of those bounds.

pith-pipeline@v0.9.0 · 5638 in / 1224 out tokens · 42180 ms · 2026-05-15T04:53:43.260585+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

55 extracted references · 55 canonical work pages · 1 internal anchor

  1. [1]

    A deep reinforcement learning-based robust intrusion detection system for securing IoMT healthcare networks,

    J. A. Shaikh, C. Wang, M. W. U. Sima, M. Arshad, M. Owais, D. S. M. Hassan, R. Alkanhel, and M. S. A. Muthanna, “A deep reinforcement learning-based robust intrusion detection system for securing IoMT healthcare networks,”Frontiers in Medicine, vol. 12, p. 1524286, 2025

  2. [2]

    Implementing anomaly-based intrusion detection for resource-constrained devices in IoMT networks,

    G. Zachos, G. Mantas, K. Porfyrakis, and J. Rodriguez, “Implementing anomaly-based intrusion detection for resource-constrained devices in IoMT networks,”Sensors, vol. 25, no. 4, p. 1216, 2025

  3. [3]

    Internet of medical things (IoMT) market size, share & industry analysis,

    Fortune Business Insights, “Internet of medical things (IoMT) market size, share & industry analysis,” https://www.fortunebusinessinsights. com/industry-reports/internet-of-medical-things-iomt-market-101844, 2025

  4. [4]

    A novel internet of medical things hybrid model for cybersecurity anomaly detection,

    M. Z. Khan, A. Sabur, and H. Ghandorh, “A novel internet of medical things hybrid model for cybersecurity anomaly detection,”Sensors, vol. 25, no. 20, p. 6501, 2025

  5. [5]

    Unpatched and outdated medical devices provide cyber attack opportunities,

    Federal Bureau of Investigation, “Unpatched and outdated medical devices provide cyber attack opportunities,” September 2022. [Online]. Available: https://www.ic3.gov/Media/News/2022/220912.pdf

  6. [6]

    Cybersecurity in healthcare: New threat to patient safety,

    B. Aldosari, “Cybersecurity in healthcare: New threat to patient safety,” Cureus, vol. 17, p. e83614, 2025

  7. [7]

    Attack detection in internet of medical things through ensemble machine learning models,

    R. Sharma and N. Sharma, “Attack detection in internet of medical things through ensemble machine learning models,”Security and Privacy, vol. 8, no. 4, p. e70042, 2025

  8. [8]

    False data injection attacks against state estimation in wireless sensor networks,

    Y . Mo, E. Garone, A. Casavola, and B. Sinopoli, “False data injection attacks against state estimation in wireless sensor networks,” in49th IEEE Conference on Decision and Control (CDC), 2010, pp. 5967– 5972

  9. [9]

    Sensor faults: Detection methods and prevalence in real-world datasets,

    A. B. Sharma, L. Golubchik, and R. Govindan, “Sensor faults: Detection methods and prevalence in real-world datasets,”ACM Transactions on Sensor Networks, vol. 6, no. 3, pp. 1–39, 2010

  10. [10]

    A machine learning based framework for real-time detection and mitigation of sensor false data injection cyber-physical attacks in industrial control systems,

    M. Elnour, M. Noorizadeh, M. Shakerpour, N. Meskin, K. Khan, and R. Jain, “A machine learning based framework for real-time detection and mitigation of sensor false data injection cyber-physical attacks in industrial control systems,”IEEe Access, vol. 11, pp. 86 977–86 998, 2023

  11. [11]

    False data injection attack dataset for classification, identification, and detection for iiot in industry 5.0,

    A. A. Habib, M. K. Hasan, R. Hassan, S. Islam, and H. S. Abbas, “False data injection attack dataset for classification, identification, and detection for iiot in industry 5.0,”Data in Brief, vol. 61, p. 111692, 2025

  12. [12]

    Secure state estimation and control of cyber-physical systems: A survey,

    D. Ding, Q.-L. Han, X. Ge, and J. Wang, “Secure state estimation and control of cyber-physical systems: A survey,”IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 51, no. 1, pp. 176–190, 2021

  13. [13]

    Anomaly-based intrusion detection for IoMT networks: De- sign, implementation, dataset generation, and ML algorithms evalua- tion,

    G. Zachos, G. Mantas, K. Porfyrakis, J. M. C. S. Bastos, and J. Ro- driguez, “Anomaly-based intrusion detection for IoMT networks: De- sign, implementation, dataset generation, and ML algorithms evalua- tion,”IEEE Access, vol. 13, pp. 41 994–42 028, 2025

  14. [14]

    Isolation forest,

    F. T. Liu, K. M. Ting, and Z.-H. Zhou, “Isolation forest,” inEighth IEEE International Conference on Data Mining (ICDM), 2008, pp. 413–422

  15. [15]

    Multivariate time-series anomaly detection via temporal convolutional and graph attention net- works,

    Q. He, G. Wang, H. Wang, and L. Chen, “Multivariate time-series anomaly detection via temporal convolutional and graph attention net- works,”Journal of Intelligent & Fuzzy Systems, vol. 44, no. 4, pp. 5953– 5962, 2023

  16. [16]

    TranAD: Deep transformer networks for anomaly detection in multivariate time series data,

    S. Tuli, G. Casale, and N. R. Jennings, “TranAD: Deep transformer networks for anomaly detection in multivariate time series data,” in Proceedings of the VLDB Endowment, vol. 15, no. 6, 2022, pp. 1201– 1214

  17. [17]

    Gradient-based learning applied to document recognition,

    Y . Lecun, L. Bottou, Y . Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” vol. 86, no. 11, 1998, pp. 2278–2324

  18. [18]

    PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals,

    A. L. Goldberger, L. A. N. Amaral, L. Glass, J. M. Hausdorff, P. C. Ivanov, R. G. Mark, J. E. Mietus, G. B. Moody, C.-K. Peng, and H. E. Stanley, “PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals,”Circulation, vol. 101, no. 23, pp. e215–e220, 2000

  19. [19]

    Predicting in-hospital mortality of ICU patients: The PhysioNet/Computing in Cardiology Challenge 2012,

    I. Silva, G. Moody, D. J. Scott, L. A. Celi, and R. G. Mark, “Predicting in-hospital mortality of ICU patients: The PhysioNet/Computing in Cardiology Challenge 2012,” inComputing in Cardiology, vol. 39, 2012, pp. 245–248

  20. [20]

    MIMIC-III, a freely accessible critical care database,

    A. E. W. Johnson, T. J. Pollard, L. Shen, H. L. Li-wei, M. Feng, M. Ghassemi, B. Moody, P. Szolovits, L. A. Celi, and R. G. Mark, “MIMIC-III, a freely accessible critical care database,”Scientific Data, vol. 3, p. 160035, 2016

  21. [21]

    Introducing wesad, a multimodal dataset for wearable stress and affect detection

    P. Schmidt, A. Reiss, R. Duerichen, C. Marberger, and K. Van Laerhoven, “Introducing wesad, a multimodal dataset for wearable stress and affect detection,” ser. ICMI ’18. New York, NY , USA: Association for Computing Machinery, 2018, p. 400–408. [Online]. Available: https://doi.org/10.1145/3242969.3242985

  22. [22]

    The Battle of the attack detection algorithms: Disclosing cyber attacks on water distribu- tion networks,

    R. Taormina, S. Galelli, N. O. Tippenhauer, E. Salomons, A. Ostfeld, D. Eliades, M. Aghashahi, R. Sundararajanet al., “The Battle of the attack detection algorithms: Disclosing cyber attacks on water distribu- tion networks,”Journal of Water Resources Planning and Management, vol. 144, no. 8, p. 04018048, 2018

  23. [23]

    Time series anomaly detection in vehicle sensors using self- attention mechanisms,

    Z. Zhang, Y . Yao, W. Hutabarat, M. Farnsworth, D. Tiwari, and A. Tiwari, “Time series anomaly detection in vehicle sensors using self- attention mechanisms,”IEEE Transactions on Intelligent Transportation Systems, vol. 25, no. 11, pp. 15 964–15 976, 2024

  24. [24]

    RCLNet: An effective anomaly-based intrusion detection for securing the IoMT system,

    J. A. Shaikh, C. Wang, W. U. S. Muhammad, M. Arshad, M. Owais, R. O. Alnashwan, S. A. Chelloug, and M. S. A. Muthanna, “RCLNet: An effective anomaly-based intrusion detection for securing the IoMT system,”Frontiers in Digital Health, vol. 6, p. 1467241, 2024

  25. [25]

    Anomaly detection: A survey,

    V . Chandola, A. Banerjee, and V . Kumar, “Anomaly detection: A survey,” ACM Computing Surveys, vol. 41, no. 3, pp. 1–58, 2009

  26. [26]

    A survey of deep anomaly detection in multivariate time series: Taxonomy, applications, and directions,

    F. Wang, Y . Jiang, R. Zhang, A. Wei, J. Xie, and X. Pang, “A survey of deep anomaly detection in multivariate time series: Taxonomy, applications, and directions,”Sensors, vol. 25, no. 1, p. 190, 2025

  27. [27]

    Estimating the support of a high-dimensional distribution,

    B. Sch ¨olkopf, J. C. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson, “Estimating the support of a high-dimensional distribution,” Neural Computation, vol. 13, no. 7, pp. 1443–1471, 2001

  28. [28]

    Detecting spacecraft anomalies using LSTMs and nonparametric dynamic thresholding,

    K. Hundman, V . Constantinou, C. Laporte, I. Colwell, and T. Soder- strom, “Detecting spacecraft anomalies using LSTMs and nonparametric dynamic thresholding,” pp. 387–395, 2018

  29. [29]

    Robust anomaly detection for multivariate time series through stochastic recurrent neural network,

    Y . Su, Y . Zhao, C. Niu, R. Liu, W. Sun, and D. Pei, “Robust anomaly detection for multivariate time series through stochastic recurrent neural network,”Proceedings of the 25th ACM SIGKDD International Confer- ence on Knowledge Discovery & Data Mining, pp. 2828–2837, 2019

  30. [30]

    Deep autoencoding Gaussian mixture model for unsupervised anomaly detection,

    B. Zong, Q. Song, M. R. Min, W. Cheng, C. Lumezanu, D. Cho, and H. Chen, “Deep autoencoding Gaussian mixture model for unsupervised anomaly detection,”International Conference on Learning Representa- tions (ICLR), 2018, published at ICLR 2018, OpenReview.net

  31. [31]

    Long short-term memory,

    S. Hochreiter and J. Schmidhuber, “Long short-term memory,” vol. 9, no. 8, 1997, pp. 1735–1780

  32. [32]

    Adversarial transformer-based anomaly detection for multivariate time series,

    X. Yu, K. Zhang, Y . Liu, B. Zou, J. Wang, W. Wang, and R. Qian, “Adversarial transformer-based anomaly detection for multivariate time series,”IEEE Transactions on Industrial Informatics, vol. 21, no. 3, pp. 2471–2480, 2024

  33. [33]

    Cyber approach for ddos attack detection using hybrid cnn-lstm model in iot-based healthcare,

    M. Belhaj Mohamed, D. Bouzidi, M. Khalid Ibraheem, A. A. J. Al- Abadi, and A. Fakhfakh, “Cyber approach for ddos attack detection using hybrid cnn-lstm model in iot-based healthcare,”Future Internet, vol. 18, no. 1, p. 52, 2026

  34. [34]

    Attention is all you need,

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” inAdvances in Neural Information Processing Systems (NeurIPS), vol. 30, 2017, pp. 5998–6008

  35. [35]

    Mamba adaptive anomaly transformer with association discrepancy for time series,

    A. Z. Sellam, I. Benaissa, A. Taleb-Ahmed, L. Patrono, and C. Distante, “Mamba adaptive anomaly transformer with association discrepancy for time series,”Engineering Applications of Artificial 14 Intelligence, vol. 160, p. 111685, 2025. [Online]. Available: https: //www.sciencedirect.com/science/article/pii/S0952197625016872

  36. [36]

    Informer: Beyond efficient transformer for long sequence time-series forecasting,

    H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, and W. Zhang, “Informer: Beyond efficient transformer for long sequence time-series forecasting,”Proceedings of the AAAI Conference on Artificial Intelli- gence, vol. 35, no. 12, pp. 11 106–11 115, 2021

  37. [37]

    Mtad-tf: Multivariate time series anomaly detection using the combination of temporal pattern and feature pattern,

    Q. He, Y . Zheng, C. Zhang, and H.-Y . Wang, “Mtad-tf: Multivariate time series anomaly detection using the combination of temporal pattern and feature pattern,”Complexity, vol. 2020, no. 1, p. 8846608, 2020

  38. [38]

    Grelen: Multivariate time series anomaly detection from the perspective of graph relational learning

    W. Zhang, C. Zhang, and F. Tsung, “Grelen: Multivariate time series anomaly detection from the perspective of graph relational learning.” pp. 2390–2397, 2022

  39. [39]

    Multivariate time series anomaly detection based on inverted transformer with multivariate memory gate,

    Y . Ma, W. Liu, C. Xu, L. Bai, E. Zhang, and J. Wang, “Multivariate time series anomaly detection based on inverted transformer with multivariate memory gate,”Entropy, vol. 27, no. 9, p. 939, 2025

  40. [40]

    Non-stationary transformers: Exploring the stationarity in time series forecasting,

    Y . Liu, H. Wu, J. Wang, and M. Long, “Non-stationary transformers: Exploring the stationarity in time series forecasting,”Advances in neural information processing systems, vol. 35, pp. 9881–9893, 2022

  41. [41]

    Transformers in time series: a survey,

    Q. Wen, T. Zhou, C. Zhang, W. Chen, Z. Ma, J. Yan, and L. Sun, “Transformers in time series: a survey,” 2023. [Online]. Available: https://doi.org/10.24963/ijcai.2023/759

  42. [42]

    Multi-dimensional anomaly detection and fault localization in microservice architectures: A dual-channel deep learning approach with causal inference for intelligent sensing,

    S. Xing, Y . Wang, and W. Liu, “Multi-dimensional anomaly detection and fault localization in microservice architectures: A dual-channel deep learning approach with causal inference for intelligent sensing,”Sensors, vol. 25, no. 11, p. 3396, 2025

  43. [43]

    Fusion of transformer and RBF for anomalous traffic detection in sensor networks,

    A. Dai, J. Guo, Y . Hou, and Y . Wang, “Fusion of transformer and RBF for anomalous traffic detection in sensor networks,”Sensors, vol. 26, no. 2, p. 515, 2026

  44. [44]

    Federated learning for anomaly detection on internet of medical things: A survey,

    R. P. Pinto, B. M. Silva, and P. R. In ´acio, “Federated learning for anomaly detection on internet of medical things: A survey,”Internet of Things, vol. 33, p. 101677, 2025

  45. [45]

    Design of secure key management and user authentication scheme for fog computing services,

    M. Wazid, A. K. Das, N. Kumar, and A. V . Vasilakos, “Design of secure key management and user authentication scheme for fog computing services,”Future Generation Computer Systems, vol. 91, pp. 475–492, 2019

  46. [46]

    The impact of the MIT-BIH arrhyth- mia database,

    G. B. Moody and R. G. Mark, “The impact of the MIT-BIH arrhyth- mia database,”IEEE Engineering in Medicine and Biology Magazine, vol. 20, no. 3, pp. 45–50, 2001

  47. [47]

    PTB-XL, a large publicly available electrocardiography dataset,

    P. Wagner, N. Strodthoff, R.-D. Bousseljot, D. Kreiseler, F. I. Lunze, W. Samek, and T. Schaeffter, “PTB-XL, a large publicly available electrocardiography dataset,”Scientific Data, vol. 7, p. 154, 2020

  48. [48]

    Introducing a new benchmarked dataset for activity monitoring,

    A. Reiss and D. Stricker, “Introducing a new benchmarked dataset for activity monitoring,”Proceedings of the 16th International Symposium on Wearable Computers (ISWC), pp. 108–109, 2012

  49. [49]

    An image is worth 16x16 words: Transformers for image recognition at scale,

    A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” inInternational Conference on Learning Representations, 2021. [Online]. Available: https://openreview.net/forum?id=YicbFdNTTy

  50. [50]

    Layer Normalization

    J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,”arXiv preprint arXiv:1607.06450, 2016

  51. [51]

    Group normalization,

    Y . Wu and K. He, “Group normalization,” inProceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 3–19

  52. [52]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778

  53. [53]

    On the biological plausibility of orthogo- nal initialisation for solving gradient instability in deep neural networks,

    N. Manchev and M. Spratling, “On the biological plausibility of orthogo- nal initialisation for solving gradient instability in deep neural networks,” in2022 9th International Conference on Soft Computing & Machine Intelligence (ISCMI). IEEE, 2022, pp. 47–55

  54. [54]

    Adam: A method for stochastic optimization,

    D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” 2014

  55. [55]

    Safe-med for privacy-preserving federated learning in iomt via adversarial neural cryptography,

    M. Z. Khan, W. Abbass, N. Abbas, M. A. Javed, A. Alahmadi, and U. Majeed, “Safe-med for privacy-preserving federated learning in iomt via adversarial neural cryptography,”Mathematics, vol. 13, no. 18, p. 2954, 2025