MorphStrata: Layer-Specific Perturbations for Generating Morphence Students in Time-Series Moving Target Defense

Abhishek Bhardwaj; Anusri Nagarajan; Arnav Doshi; Jaydip Sen; Mohammad Masum; Robert Chun; Saptarshi Sengupta; Thanh Quynh Nhu Ta

arxiv: 2606.17435 · v2 · pith:MMPIMWTAnew · submitted 2026-06-16 · 💻 cs.LG

MorphStrata: Layer-Specific Perturbations for Generating Morphence Students in Time-Series Moving Target Defense

Abhishek Bhardwaj , Arnav Doshi , Anusri Nagarajan , Thanh Quynh Nhu Ta , Mohammad Masum , Robert Chun , Jaydip Sen , Saptarshi Sengupta This is my paper

Pith reviewed 2026-06-27 01:50 UTC · model grok-4.3

classification 💻 cs.LG

keywords MorphStrataMoving Target Defenseadversarial robustnesstime-series forecastingTransformerstochastic perturbationsFGSM BIM PGD attacks

0 comments

The pith

MorphStrata generates heterogeneous student models via layer-specific stochastic perturbations on a Transformer teacher to strengthen moving target defense against gradient-based attacks in time-series forecasting.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents MorphStrata as a way to extend Morphence by injecting stochastic noise into randomly selected architectural blocks of a Transformer backbone. This produces student models with measurable structured heterogeneity that serve as randomized instances in a moving target defense setup. Evaluation on Jena Climate, Electricity Load Diagrams, and Appliances Energy Prediction datasets shows the approach holds or improves adversarial RMSE against FGSM, BIM, and PGD attacks across perturbation budgets. Gains reach 24.11 percent under FGSM and 97.97 percent under BIM at epsilon 0.5 on the AEP data relative to a static baseline, while training overhead stays below 1 percent compared with the Morphence baseline. A positive correlation is reported between pairwise L2 distance among students and defense effectiveness.

Core claim

MorphStrata maintains adversarial robustness as an MTD defense at marginal cost deltas when compared to existing baselines by using selective, layer-specific stochastic noise injection on randomly chosen architectural blocks of the Transformer teacher to create structured heterogeneity across student models.

What carries the argument

MorphStrata student generation strategy: selective, layer-specific stochastic noise injection on randomly selected Transformer blocks to induce structured heterogeneity.

If this is right

The generated ensemble maintains comparable or lower adversarial RMSE than vanilla Transformer and Morphence baselines across the tested datasets and attack regimes.
On high-entropy periodic data such as AEP, MorphStrata records the lowest RMSE for every attack and budget examined, with double-digit percentage reductions versus the static baseline.
Layer-targeted perturbation adds less than 1 percent to training time over the Morphence baseline in most experiments.
Higher pairwise L2 distance among the generated students correlates with stronger overall defense effectiveness.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same selective perturbation pattern could be adapted to other sequence models such as LSTMs without requiring full retraining of each student.
Pairwise L2 distance among students might serve as an inexpensive online metric to decide when to refresh the MorphStrata ensemble during deployment.
Because the overhead remains low, the method opens the possibility of scaling MTD to larger numbers of students on resource-constrained forecasting pipelines.

Load-bearing premise

Selective layer-specific stochastic perturbations on randomly chosen architectural blocks will reliably produce enough structured heterogeneity to drive defense gains across data distributions and threat models without dataset-specific tuning.

What would settle it

Running the same layer-targeted perturbation procedure on new time-series datasets or against additional attack variants and finding that RMSE improvements over Morphence vanish while pairwise L2 distances remain comparable would falsify the central claim.

Figures

Figures reproduced from arXiv: 2606.17435 by Abhishek Bhardwaj, Anusri Nagarajan, Arnav Doshi, Jaydip Sen, Mohammad Masum, Robert Chun, Saptarshi Sengupta, Thanh Quynh Nhu Ta.

**Figure 1.** Figure 1: MorphStrata pipeline. A cleanly trained base Transformer is used to generate student models by applying masked Gaussian perturbations to selected parameter strata, including input projection, attention, feed-forward, LayerNorm, and output-head components. The resulting student pool is adversarially trained under FGSM, BIM, and PGD. During moving target defense (MTD) inference, M students are stochastically… view at source ↗

**Figure 2.** Figure 2: RMSE under adversarial attacks on JENA. 0.1381 0.1545 0.1709 =0.1 =0.2 =0.3 =0.5 0.1998 0.2297 0.2595 =0.1 =0.2 =0.3 =0.5 0.1901 0.2131 0.2362 =0.1 =0.2 =0.3 =0.5 RMSE - ECL FGSM BIM PGD Base Model Vanilla Ensemble MorphStrata [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: RMSE under adversarial attacks on ECL. 3 [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: RMSE under adversarial attacks on AEP. 0.1269 0.1331 0.1393 =0.1 =0.2 =0.3 =0.5 0.1289 0.1330 0.1371 =0.1 =0.2 =0.3 =0.5 0.1279 0.1308 0.1337 =0.1 =0.2 =0.3 =0.5 RMSE - Synthetic High Entropy Periodic FGSM BIM PGD Base Model Vanilla Ensemble MorphStrata [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: RMSE under adversarial attacks on the Synthetic High Entropy Periodic dataset. 0.1280 0.1409 0.1538 =0.1 =0.2 =0.3 =0.5 0.1481 0.1812 0.2143 =0.1 =0.2 =0.3 =0.5 0.1776 0.2292 0.2808 =0.1 =0.2 =0.3 =0.5 RMSE - Synthetic Low Entropy Periodic FGSM BIM PGD Base Model Vanilla Ensemble MorphStrata [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗

**Figure 6.** Figure 6: RMSE under adversarial attacks on the Synthetic Low Entropy Periodic dataset. Full numerical results for all radar charts provided in Appendix G. 6.1. Adversarial RMSE Figures 2–6 summarize all dataset-attack conditions; full RMSE tables with standard deviations are in Appendix G. Both MTD ensembles substantially reduce adversarial RMSE over the static base model, particularly under BIM and PGD where the u… view at source ↗

**Figure 7.** Figure 7: Autocorrelation decay for AEP, Jena Climate, and ECL. AEP drops below the 0.1 threshold at 260 minutes; Jena and ECL remain strongly autocorrelated over the full measured window. C.2. Power Spectral Density [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

**Figure 8.** Figure 8: Normalized power spectral density for AEP, Jena Climate, and ECL. Spectral entropy is highest for AEP (diffuse) and lowest for Jena (concentrated periodic structure). C.3. Target Distribution [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗

**Figure 9.** Figure 9: Target distribution and coefficient of variation. ECL has the highest normalized volatility; AEP has a right-skewed residential energy distribution; Jena has a smooth unimodal temperature distribution. D. Synthetic Dataset Experiments Two synthetic datasets test whether the behavioral differences observed across JENA, ECL, and AEP generalize to controlled temporal structure variations. D.1. Generation Meth… view at source ↗

read the original abstract

Time-series forecasting models remain vulnerable to gradient-based adversarial attacks while existing defense mechanisms typically incur a trade-off in robustness for bounded response and compute cost. The problem is pronounced in Moving Target Defense where maintaining multiple randomized model instances substantially exacerbates the training overhead. In this work, we introduce MorphStrata, a student generation strategy with selective, layer-specific stochastic noise injection that extends the traditional Morphence defense. MorphStrata uses a Transformer backbone as the teacher and perturbs randomly selected architectural blocks to create structured heterogeneity across student models in response to varied data distributions and threat models. We evaluate against vanilla Transformer and Morphence backbones on a suite of benchmarks including the Jena Climate, Electricity Load Diagrams, and Appliances Energy Prediction using FGSM, BIM and PGD attacks across multiple attack strengths. Across datasets and attack regimes, the proposed ensemble maintains comparable adversarial RMSE. Specifically, for high entropy, periodic datasets as in the case of the AEP data, MorphStrata achieves the lowest RMSE across all attacks and perturbation budgets, improving over the static baseline by up to 24.11% and 97.97% under FGSM and BIM respectively at an epsilon value of 0.5 over 30 randomized trials. Targeting the layers to generate MorphStrata students accounts for less than 1% increase in train-times over the Morphence MTD baseline for most of the experiments, while accounting for double digit gains in adversarial RMSE reduction. We also observe a positive correlation between higher pairwise L2 distance (among generated students) and overall defense effectiveness. In summary, MorphStrata maintains adversarial robustness as an MTD defense at marginal cost deltas when compared to existing baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MorphStrata adds layer-targeted noise to Morphence and shows clear RMSE drops on AEP data at under 1% extra training cost.

read the letter

The core takeaway is that selective perturbations on random Transformer blocks produce enough student diversity (measured by pairwise L2) to improve moving-target defense on time-series data without blowing up training time.

The new piece is the layer-specific stochastic injection rather than uniform noise across the whole model. The paper runs the same teacher-student setup as prior Morphence work but restricts the noise to randomly chosen architectural blocks. On the AEP dataset it reports the lowest RMSE under FGSM and BIM at epsilon 0.5, with gains of 24% and 98% over the static baseline across 30 trials. Overhead stays below 1% relative to the Morphence baseline on most runs, and the authors plot a positive link between higher student L2 distance and defense strength.

The experiments cover three datasets and three attacks, which is more than the abstract alone suggested. The protocol includes concrete trial counts and bounded overhead numbers, so the main claims can be checked.

The soft spots are modest. The L2-robustness link is presented as an observation, not a controlled test, so it does not establish that the layer choice is the causal driver. Gains are clearest on the high-entropy AEP data; the other two datasets are described as comparable rather than markedly better. Everything is shown only on Transformer backbones, so the method's behavior on other architectures is unknown.

This is useful for groups already running MTD on forecasting models who need a low-cost way to increase ensemble diversity. It is not reshaping the broader field, but the empirical grounding is solid enough that a serious referee should see it. I would send it to review.

Referee Report

0 major / 3 minor

Summary. The paper introduces MorphStrata, an extension to Morphence for moving target defense in time-series forecasting. It applies selective layer-specific stochastic perturbations to randomly chosen architectural blocks of a Transformer teacher to generate heterogeneous student models. Evaluations on Jena Climate, Electricity Load Diagrams, and Appliances Energy Prediction datasets against FGSM, BIM, and PGD attacks report that MorphStrata achieves the lowest RMSE across regimes, with up to 24.11% and 97.97% improvement over the static baseline on AEP data at epsilon=0.5 over 30 trials, less than 1% added training time versus Morphence, and a positive correlation between pairwise student L2 distance and defense effectiveness.

Significance. If the results hold, MorphStrata supplies a practical, low-overhead mechanism for increasing MTD robustness in time-series models by inducing measurable structured heterogeneity. The work is strengthened by its use of multiple datasets, three attack types, bounded overhead claims, and an observational correlation that could serve as a design metric; these elements make the contribution concrete and potentially actionable for adversarial forecasting settings.

minor comments (3)

[Abstract, §4] Abstract and §4: the reported percentage improvements (24.11%, 97.97%) should explicitly state whether they represent maximum, mean, or median values across the 30 trials and whether any statistical test (e.g., paired t-test) was applied.
[§3] §3: the procedure for randomly selecting architectural blocks and the exact noise distribution (variance, per-layer probability) are described at a high level; adding pseudocode or a precise algorithmic listing would improve reproducibility.
[Figures] Figure 3 or equivalent (L2-distance vs. RMSE plot): axis scales, trial count per point, and whether the correlation coefficient is Pearson or Spearman should be stated directly in the caption.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of MorphStrata, the detailed summary of its contributions, and the recommendation for minor revision. No major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The manuscript presents MorphStrata as an empirical extension of Morphence via layer-specific stochastic perturbations on a Transformer teacher. All load-bearing claims consist of reported RMSE values from concrete randomized trials on fixed datasets (Jena, Electricity, AEP) under FGSM/BIM/PGD, plus an observational correlation between pairwise L2 distance and defense effectiveness. No equations, derivations, fitted parameters renamed as predictions, or self-citation chains appear; the central argument is therefore self-contained experimental comparison rather than any reduction to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no mathematical derivations, fitted constants, or postulated entities; the contribution is described at the level of an empirical strategy without exposing free parameters, axioms, or invented constructs.

pith-pipeline@v0.9.1-grok · 5874 in / 1261 out tokens · 54093 ms · 2026-06-27T01:50:06.095486+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 4 canonical work pages

[1]

doi: 10.1145/3485832. 3485899. URL https://dl.acm.org/doi/10. 1145/3485832.3485899. arXiv:2108.13952. Awad, Z., Amich, A., and Eshete, B. Morphence 2.0: Eva- sion resilient moving target defense powered by out-of- distribution detection.arXiv preprint arXiv:2206.07321,

work page doi:10.1145/3485832
[2]

Robusttsf: Towards theory and design of robust time series forecasting under anomalies.arXiv preprint arXiv:2402.02032,

Cheng, H., Wen, Q., Liu, Y ., Sun, L., Che, J., and Wang, Z. Robusttsf: Towards theory and design of robust time series forecasting under anomalies.arXiv preprint arXiv:2402.02032,

arXiv
[3]

URL https://arxiv.org/ abs/1902.02918. Gal, Y . and Ghahramani, Z. Dropout as a bayesian approx- imation: Representing model uncertainty in deep learn- ing. InInternational Conference on Machine Learning (ICML), pp. 1050–1059,

Pith/arXiv arXiv 1902
[4]

org/abs/1506.02142

URL https://arxiv. org/abs/1506.02142. Goodfellow, I. J., Shlens, J., and Szegedy, C. Explaining and harnessing adversarial examples. InInternational Conference on Learning Representations (ICLR),

Pith/arXiv arXiv
[5]

Govindarajulu, Y ., Amballa, A., Kulkarni, P., and Par- mar, M

URLhttps://arxiv.org/abs/1412.6572. Govindarajulu, Y ., Amballa, A., Kulkarni, P., and Par- mar, M. Targeted attacks on timeseries forecasting. arXiv preprint arXiv:2301.11544,

Pith/arXiv arXiv
[6]

Krishan, P., Mohapatra, R., Das, S., and Sengupta, S

URL https: //arxiv.org/abs/2301.11544. Krishan, P., Mohapatra, R., Das, S., and Sengupta, S. Adversarial attacks and defenses in multivariate time- series forecasting for smart and connected infrastruc- tures. InProceedings of the Annual Conference of the Prognostics and Health Management Society, volume

arXiv
[7]

doi: 10.36001/phmconf.2024.v16i1

work page doi:10.36001/phmconf.2024.v16i1 2024
[8]

Lakshminarayanan, B., Pritzel, A., and Blundell, C

URL https: //arxiv.org/abs/1611.01236. Lakshminarayanan, B., Pritzel, A., and Blundell, C. Sim- ple and scalable predictive uncertainty estimation using deep ensembles. InAdvances in Neural Information Pro- cessing Systems (NeurIPS), pp. 6405–6416,

Pith/arXiv arXiv
[9]

Liu, L., Park, Y ., Hoang, T

URL https://arxiv.org/abs/1612.01474. Liu, L., Park, Y ., Hoang, T. N., Hasson, H., and Huan, J. Towards robust multivariate time-series forecasting: Adversarial attacks and defense mechanisms. InPro- ceedings of the 8th SIGKDD Workshop on Mining and Learning from Time Series (MileTS), pp. 1–9,

Pith/arXiv arXiv
[10]

Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and Vladu, A

URL https://arxiv.org/abs/2207.09572. Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and Vladu, A. Towards deep learning models resistant to adversarial attacks. InInternational Conference on Learning Representations (ICLR),

arXiv
[11]

URL https: //arxiv.org/abs/1706.06083. Meng, D. and Chen, H. Magnet: A two-pronged defense against adversarial examples. InProceedings of the 2017 ACM SIGSAC Conference on Computer and Com- munications Security (CCS), pp. 135–147,

Pith/arXiv arXiv 2017
[12]

URL https://dl.acm

doi: 10.1145/3133956.3134057. URL https://dl.acm. org/doi/10.1145/3133956.3134057. Nie, Y ., Nguyen, N. H., Sinthong, P., and Kalagnanam, J. A time series is worth 64 words: Long-term forecasting with transformers.arXiv preprint arXiv:2211.14730,

work page doi:10.1145/3133956.3134057
[13]

doi: 10.1007/978-3-030-32430-8

work page doi:10.1007/978-3-030-32430-8
[15]

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A

URL https: //arxiv.org/abs/2008.13261. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. Atten- tion is all you need. InAdvances in Neural Information Processing Systems (NeurIPS), volume 30,

arXiv 2008
[16]

Wen, Q., Zhou, T., Zhang, C., Chen, W., Ma, Z., Yan, J., and Sun, L

URL https://arxiv.org/abs/1706.03762. Wen, Q., Zhou, T., Zhang, C., Chen, W., Ma, Z., Yan, J., and Sun, L. Transformers in time series: A survey.arXiv preprint arXiv:2202.07125,

Pith/arXiv arXiv
[17]

Are time series foundation models deployment-ready? a systematic study of adversarial robustness across domains

Zhang, J., Zhang, Z., Zheng, S., Wen, X., Li, J., and Bian, J. Are time series foundation models deployment-ready? a systematic study of adversarial robustness across domains. arXiv preprint arXiv:2505.19397,

arXiv
[19]

5 MorphStrata: Layer-Specific Perturbations for Generating Morphence Students in Time-Series Moving Target Defense A

URL https:// arxiv.org/abs/1806.00580. 5 MorphStrata: Layer-Specific Perturbations for Generating Morphence Students in Time-Series Moving Target Defense A. Appendix This appendix contains dataset pipelines, temporal structure analysis, synthetic dataset experiments and generation method- ology, full RMSE tables, statistical heterogeneity and differential...

Pith/arXiv arXiv
[20]

Electricity Load Diagrams (ECL) ECL captures electricity load at 15-minute resolution for a single meter

B.2. Electricity Load Diagrams (ECL) ECL captures electricity load at 15-minute resolution for a single meter. The task is multi-step ahead forecasting over a long historical context. Because the input history is long, input patching is applied before Transformer encoding to compress the sequence into a manageable length while preserving coarse temporal s...

2009
[21]

E. Model and Training Details All experiments use a shared Transformer architecture: input projection to dmodel = 128, 4 attention heads, 4 encoder layers, feed-forward dimension 256, pre-norm (norm-first) configuration, dropout 0.1. The same architecture is used for the base model, vanilla students, and MorphStrata students across all three datasets, ens...

2019

[1] [1]

doi: 10.1145/3485832. 3485899. URL https://dl.acm.org/doi/10. 1145/3485832.3485899. arXiv:2108.13952. Awad, Z., Amich, A., and Eshete, B. Morphence 2.0: Eva- sion resilient moving target defense powered by out-of- distribution detection.arXiv preprint arXiv:2206.07321,

work page doi:10.1145/3485832

[2] [2]

Robusttsf: Towards theory and design of robust time series forecasting under anomalies.arXiv preprint arXiv:2402.02032,

Cheng, H., Wen, Q., Liu, Y ., Sun, L., Che, J., and Wang, Z. Robusttsf: Towards theory and design of robust time series forecasting under anomalies.arXiv preprint arXiv:2402.02032,

arXiv

[3] [3]

URL https://arxiv.org/ abs/1902.02918. Gal, Y . and Ghahramani, Z. Dropout as a bayesian approx- imation: Representing model uncertainty in deep learn- ing. InInternational Conference on Machine Learning (ICML), pp. 1050–1059,

Pith/arXiv arXiv 1902

[4] [4]

org/abs/1506.02142

URL https://arxiv. org/abs/1506.02142. Goodfellow, I. J., Shlens, J., and Szegedy, C. Explaining and harnessing adversarial examples. InInternational Conference on Learning Representations (ICLR),

Pith/arXiv arXiv

[5] [5]

Govindarajulu, Y ., Amballa, A., Kulkarni, P., and Par- mar, M

URLhttps://arxiv.org/abs/1412.6572. Govindarajulu, Y ., Amballa, A., Kulkarni, P., and Par- mar, M. Targeted attacks on timeseries forecasting. arXiv preprint arXiv:2301.11544,

Pith/arXiv arXiv

[6] [6]

Krishan, P., Mohapatra, R., Das, S., and Sengupta, S

URL https: //arxiv.org/abs/2301.11544. Krishan, P., Mohapatra, R., Das, S., and Sengupta, S. Adversarial attacks and defenses in multivariate time- series forecasting for smart and connected infrastruc- tures. InProceedings of the Annual Conference of the Prognostics and Health Management Society, volume

arXiv

[7] [7]

doi: 10.36001/phmconf.2024.v16i1

work page doi:10.36001/phmconf.2024.v16i1 2024

[8] [8]

Lakshminarayanan, B., Pritzel, A., and Blundell, C

URL https: //arxiv.org/abs/1611.01236. Lakshminarayanan, B., Pritzel, A., and Blundell, C. Sim- ple and scalable predictive uncertainty estimation using deep ensembles. InAdvances in Neural Information Pro- cessing Systems (NeurIPS), pp. 6405–6416,

Pith/arXiv arXiv

[9] [9]

Liu, L., Park, Y ., Hoang, T

URL https://arxiv.org/abs/1612.01474. Liu, L., Park, Y ., Hoang, T. N., Hasson, H., and Huan, J. Towards robust multivariate time-series forecasting: Adversarial attacks and defense mechanisms. InPro- ceedings of the 8th SIGKDD Workshop on Mining and Learning from Time Series (MileTS), pp. 1–9,

Pith/arXiv arXiv

[10] [10]

Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and Vladu, A

URL https://arxiv.org/abs/2207.09572. Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and Vladu, A. Towards deep learning models resistant to adversarial attacks. InInternational Conference on Learning Representations (ICLR),

arXiv

[11] [11]

URL https: //arxiv.org/abs/1706.06083. Meng, D. and Chen, H. Magnet: A two-pronged defense against adversarial examples. InProceedings of the 2017 ACM SIGSAC Conference on Computer and Com- munications Security (CCS), pp. 135–147,

Pith/arXiv arXiv 2017

[12] [12]

URL https://dl.acm

doi: 10.1145/3133956.3134057. URL https://dl.acm. org/doi/10.1145/3133956.3134057. Nie, Y ., Nguyen, N. H., Sinthong, P., and Kalagnanam, J. A time series is worth 64 words: Long-term forecasting with transformers.arXiv preprint arXiv:2211.14730,

work page doi:10.1145/3133956.3134057

[13] [13]

doi: 10.1007/978-3-030-32430-8

work page doi:10.1007/978-3-030-32430-8

[14] [15]

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A

URL https: //arxiv.org/abs/2008.13261. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. Atten- tion is all you need. InAdvances in Neural Information Processing Systems (NeurIPS), volume 30,

arXiv 2008

[15] [16]

Wen, Q., Zhou, T., Zhang, C., Chen, W., Ma, Z., Yan, J., and Sun, L

URL https://arxiv.org/abs/1706.03762. Wen, Q., Zhou, T., Zhang, C., Chen, W., Ma, Z., Yan, J., and Sun, L. Transformers in time series: A survey.arXiv preprint arXiv:2202.07125,

Pith/arXiv arXiv

[16] [17]

Are time series foundation models deployment-ready? a systematic study of adversarial robustness across domains

Zhang, J., Zhang, Z., Zheng, S., Wen, X., Li, J., and Bian, J. Are time series foundation models deployment-ready? a systematic study of adversarial robustness across domains. arXiv preprint arXiv:2505.19397,

arXiv

[17] [19]

5 MorphStrata: Layer-Specific Perturbations for Generating Morphence Students in Time-Series Moving Target Defense A

URL https:// arxiv.org/abs/1806.00580. 5 MorphStrata: Layer-Specific Perturbations for Generating Morphence Students in Time-Series Moving Target Defense A. Appendix This appendix contains dataset pipelines, temporal structure analysis, synthetic dataset experiments and generation method- ology, full RMSE tables, statistical heterogeneity and differential...

Pith/arXiv arXiv

[18] [20]

Electricity Load Diagrams (ECL) ECL captures electricity load at 15-minute resolution for a single meter

B.2. Electricity Load Diagrams (ECL) ECL captures electricity load at 15-minute resolution for a single meter. The task is multi-step ahead forecasting over a long historical context. Because the input history is long, input patching is applied before Transformer encoding to compress the sequence into a manageable length while preserving coarse temporal s...

2009

[19] [21]

E. Model and Training Details All experiments use a shared Transformer architecture: input projection to dmodel = 128, 4 attention heads, 4 encoder layers, feed-forward dimension 256, pre-norm (norm-first) configuration, dropout 0.1. The same architecture is used for the base model, vanilla students, and MorphStrata students across all three datasets, ens...

2019