pith. machine review for the scientific record. sign in

arxiv: 2604.07393 · v1 · submitted 2026-04-08 · 💻 cs.LG · cs.AI

Recognition: no theorem link

DSPR: Dual-Stream Physics-Residual Networks for Trustworthy Industrial Time Series Forecasting

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:09 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords industrial time series forecastingphysics-informed neural networksdual-stream architecturesdynamic graphsregime shiftstransport delaysphysical consistencytrustworthy forecasting
0
0 comments X

The pith

DSPR decouples stable temporal patterns from regime-dependent residual dynamics to forecast industrial time series with high accuracy and physical consistency.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that industrial time series forecasting improves when a model explicitly separates the statistical evolution of individual variables from the residual dynamics that change with operating regimes. It does so by routing the residuals through an adaptive module for transport delays and a graph module that encodes physical priors to capture real interactions while ignoring spurious ones. A sympathetic reader would care because factories, power grids, and similar systems need forecasts that stay reliable when conditions shift and that never violate basic conservation rules, since those violations can break downstream control. If the separation works, forecasts become both more accurate on benchmarks and more usable for long-term autonomous operation.

Core claim

DSPR decouples stable temporal patterns from regime-dependent residual dynamics. The first stream models the statistical temporal evolution of individual variables. The second stream focuses on residual dynamics through two key mechanisms: an Adaptive Window module that estimates flow-dependent transport delays, and a Physics-Guided Dynamic Graph that incorporates physical priors to learn time-varying interaction structures while suppressing spurious correlations. Experiments on four industrial benchmarks spanning heterogeneous regimes demonstrate that DSPR consistently improves forecasting accuracy and robustness under regime shifts while maintaining strong physical plausibility, with Mean

What carries the argument

Dual-stream architecture in which a statistical stream captures fixed temporal evolution while a residual stream combines an Adaptive Window for flow-dependent delays with a Physics-Guided Dynamic Graph for regime-dependent interactions.

If this is right

  • Forecasting accuracy and robustness both rise across heterogeneous industrial regimes with regime shifts.
  • Physical plausibility is preserved at mean conservation accuracy above 99 percent and total variation ratio up to 97.2 percent.
  • Learned interaction structures and adaptive lags match known domain mechanisms such as flow-dependent transport delays.
  • The approach supports trustworthy long-term deployment in autonomous control systems by keeping predictions physically consistent.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same decoupling could be tested on environmental or climate time series that also exhibit regime shifts and known conservation constraints.
  • The dynamic graphs produced during training might be examined as candidate causal maps for process engineers.
  • Adding an online update rule for the physics-guided graph could let the model track slow drifts in plant equipment without full retraining.

Load-bearing premise

The physical priors supplied to the dynamic graph correctly reflect the true regime-dependent interaction structures without introducing bias or suppressing valid correlations, and the adaptive window reliably recovers transport delays from data alone.

What would settle it

On a fresh industrial dataset containing documented regime shifts, if the model produces conservation accuracy below 95 percent or fails to beat standard neural-network baselines on predictive error, the central claim would be refuted.

Figures

Figures reproduced from arXiv: 2604.07393 by Guoqing Wang, Pengwei Yang, Tianyu Li, Yeran Zhang.

Figure 1
Figure 1. Figure 1: Fidelity collapse in state-of-the-art industrial fore [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the proposed DSPR framework. a) Overall Architecture: Decouples dynamics into statistical patterns and [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Regime adaptation visualization (𝐿 = 24, 𝐻 = 24). Under High-Load transients (c), statistical baselines (TimeMixer, PatchTST) exhibit significant phase lag. DSPR (red) aligns tightly with ground truth, demonstrating that the Physics-Residual stream successfully adapts effective transport delays [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Fidelity validation on SCR dataset. DSPR (red) main [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: Mechanism identification map. In the SDWPF tur [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 5
Figure 5. Figure 5: Mechanism recovery in SCR. Distributions of [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: Closed-loop response comparison over 4-hour win [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
read the original abstract

Accurate forecasting of industrial time series requires balancing predictive accuracy with physical plausibility under non-stationary operating conditions. Existing data-driven models often achieve strong statistical performance but struggle to respect regime-dependent interaction structures and transport delays inherent in real-world systems. To address this challenge, we propose DSPR (Dual-Stream Physics-Residual Networks), a forecasting framework that explicitly decouples stable temporal patterns from regime-dependent residual dynamics. The first stream models the statistical temporal evolution of individual variables. The second stream focuses on residual dynamics through two key mechanisms: an Adaptive Window module that estimates flow-dependent transport delays, and a Physics-Guided Dynamic Graph that incorporates physical priors to learn time-varying interaction structures while suppressing spurious correlations. Experiments on four industrial benchmarks spanning heterogeneous regimes demonstrate that DSPR consistently improves forecasting accuracy and robustness under regime shifts while maintaining strong physical plausibility. It achieves state-of-the-art predictive performance, with Mean Conservation Accuracy exceeding 99% and Total Variation Ratio reaching up to 97.2%. Beyond forecasting, the learned interaction structures and adaptive lags provide interpretable insights that are consistent with known domain mechanisms, such as flow-dependent transport delays and wind-to-power scaling behaviors. These results suggest that architectural decoupling with physics-consistent inductive biases offers an effective path toward trustworthy industrial time-series forecasting. Furthermore, DSPR's demonstrated robust performance in long-term industrial deployment bridges the gap between advanced forecasting models and trustworthy autonomous control systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes DSPR, a dual-stream neural architecture for industrial time series forecasting. One stream models stable statistical temporal evolution of variables; the residual stream employs an Adaptive Window module to estimate flow-dependent transport delays and a Physics-Guided Dynamic Graph to learn time-varying interaction structures while incorporating physical priors and suppressing spurious correlations. On four heterogeneous industrial benchmarks, the method is claimed to achieve state-of-the-art predictive accuracy and robustness under regime shifts, with custom physical-plausibility metrics (Mean Conservation Accuracy >99%, Total Variation Ratio up to 97.2%) and interpretable insights consistent with domain mechanisms such as wind-to-power scaling.

Significance. If the performance gains and plausibility metrics are robustly attributable to the proposed components rather than implementation artifacts or metric design, the dual-stream decoupling with physics-consistent inductive biases would represent a meaningful advance toward trustworthy hybrid forecasting models for non-stationary industrial systems. The emphasis on regime-dependent residuals and interpretable structures addresses a recognized gap between pure data-driven accuracy and physical consistency in control-relevant applications.

major comments (3)
  1. [Experiments] Experiments section: no ablation studies are reported that isolate the contribution of the Adaptive Window module versus the Physics-Guided Dynamic Graph versus the dual-stream architecture itself. Without these, the central claim that the proposed mechanisms drive the reported SOTA accuracy, robustness, and >99% Mean Conservation Accuracy cannot be verified and may be confounded by other modeling choices.
  2. [Abstract / Experiments] Abstract and experimental evaluation: the custom metrics Mean Conservation Accuracy and Total Variation Ratio are introduced and used to support the physical-plausibility claim without reference to independent external benchmarks, sensitivity analysis, or comparison against standard conservation or variation measures. This raises a circularity concern because the metrics may embed the same priors used in the Physics-Guided Dynamic Graph.
  3. [Method] Method section (Adaptive Window and Physics-Guided Dynamic Graph): no validation is provided that the learned delays and interaction structures recover ground-truth regime-dependent mechanisms rather than data-tuned biases. The skeptic note correctly identifies that failure here would make the residual stream's inductive bias unverifiable and the reported gains dependent on untested domain assumptions.
minor comments (2)
  1. [Experiments] Implementation details (hyperparameters, training procedure, exact baseline configurations) are insufficiently specified to allow reproduction of the benchmark results.
  2. [Experiments] The paper should include statistical significance tests (e.g., paired t-tests or Wilcoxon tests with p-values) for the claimed improvements over baselines to substantiate the SOTA claim.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive comments on our paper. We address each of the major concerns point by point below, providing clarifications and indicating revisions to the manuscript where necessary to strengthen the work.

read point-by-point responses
  1. Referee: [Experiments] Experiments section: no ablation studies are reported that isolate the contribution of the Adaptive Window module versus the Physics-Guided Dynamic Graph versus the dual-stream architecture itself. Without these, the central claim that the proposed mechanisms drive the reported SOTA accuracy, robustness, and >99% Mean Conservation Accuracy cannot be verified and may be confounded by other modeling choices.

    Authors: We agree that ablation studies are essential to isolate the contributions of each proposed component. The original manuscript focused on overall performance but did not include systematic ablations. In the revised version, we will add detailed ablation experiments that evaluate the model with and without the Adaptive Window module, the Physics-Guided Dynamic Graph, and the dual-stream architecture. These will include quantitative impacts on forecasting accuracy, robustness under regime shifts, and the physical plausibility metrics. revision: yes

  2. Referee: [Abstract / Experiments] Abstract and experimental evaluation: the custom metrics Mean Conservation Accuracy and Total Variation Ratio are introduced and used to support the physical-plausibility claim without reference to independent external benchmarks, sensitivity analysis, or comparison against standard conservation or variation measures. This raises a circularity concern because the metrics may embed the same priors used in the Physics-Guided Dynamic Graph.

    Authors: The metrics are tailored to industrial time series to measure conservation of physical quantities and smoothness of variations, which are critical for trustworthiness but not directly assessed by standard error metrics. To address the circularity concern, we will revise the manuscript to include: (1) sensitivity analysis of the metrics to hyperparameter choices, (2) comparisons with standard measures such as mean absolute conservation error and total variation distance, and (3) explicit discussion of how the metrics are computed independently from the model training priors. We believe this will demonstrate their validity as external evaluation tools. revision: partial

  3. Referee: [Method] Method section (Adaptive Window and Physics-Guided Dynamic Graph): no validation is provided that the learned delays and interaction structures recover ground-truth regime-dependent mechanisms rather than data-tuned biases. The skeptic note correctly identifies that failure here would make the residual stream's inductive bias unverifiable and the reported gains dependent on untested domain assumptions.

    Authors: We provide evidence through qualitative alignment with domain expertise, such as recovered flow-dependent transport delays matching physical expectations and interaction graphs reflecting known wind-to-power relationships. However, in real-world industrial datasets, precise ground-truth for time-varying delays and interaction structures is typically unavailable, making quantitative recovery validation difficult without synthetic data. We will expand the method section with additional visualizations and case studies showing consistency with physical mechanisms, and discuss this as a limitation. If feasible, we may include experiments on synthetic data with known ground truth. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected in DSPR derivation chain

full rationale

The DSPR framework is presented as an empirical architecture that decouples statistical temporal modeling from residual dynamics via an Adaptive Window and Physics-Guided Dynamic Graph. No load-bearing equations, predictions, or uniqueness claims in the abstract or description reduce by construction to fitted inputs, self-citations, or ansatzes. Performance metrics (including custom conservation and variation ratios) are reported as experimental outcomes on external benchmarks rather than tautological re-expressions of model parameters. The derivation remains self-contained through architectural design choices and benchmark validation, with no evident self-definitional or fitted-input reductions.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 2 invented entities

The central claim rests on domain assumptions about physical interaction structures and transport delays plus two newly introduced architectural components whose effectiveness is shown empirically but not derived from first principles.

free parameters (2)
  • Adaptive window size and learning parameters
    Used to estimate flow-dependent transport delays; values are learned or selected during training on the target datasets.
  • Dynamic graph regularization weights
    Control how strongly physical priors suppress spurious correlations in the second stream.
axioms (2)
  • domain assumption Physical priors can be encoded into a dynamic graph to learn time-varying interaction structures while suppressing spurious correlations
    Invoked to justify the Physics-Guided Dynamic Graph mechanism.
  • domain assumption Transport delays in industrial flows are regime-dependent and can be adaptively estimated from observed data
    Basis for the Adaptive Window module in the residual stream.
invented entities (2)
  • Adaptive Window module no independent evidence
    purpose: Estimates flow-dependent transport delays in the residual dynamics stream
    New component introduced to handle non-stationary delays; no independent falsifiable prediction outside the model is provided.
  • Physics-Guided Dynamic Graph no independent evidence
    purpose: Incorporates physical priors to learn time-varying interaction structures
    New mechanism proposed to enforce physical plausibility; effectiveness demonstrated only within the paper's experiments.

pith-pipeline@v0.9.0 · 5560 in / 1671 out tokens · 83972 ms · 2026-05-10T18:09:01.243903+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

39 extracted references · 11 canonical work pages · 1 internal anchor

  1. [1]

    Tom Beucler, Michael Pritchard, Stephan Rasp, Jordan Ott, Pierre Baldi, and Pierre Gentine. 2021. Enforcing Analytic Constraints in Neural Networks Emulating Physical Systems.Physical Review Letters126, 9 (2021). http://dx.doi.org/10. 1103/PhysRevLett.126.098302

  2. [2]

    Brunton, Joshua L

    Steven L. Brunton, Joshua L. Proctor, and J. Nathan Kutz. 2016. Sparse Identi- fication of Nonlinear Dynamics with Control (SINDYc).IFAC-PapersOnLine49, 18 (2016), 710–715. doi:10.1016/j.ifacol.2016.10.249 10th IFAC Symposium on Nonlinear Control Systems NOLCOS 2016

  3. [3]

    Shengze Cai, Zhiping Mao, Zhicheng Wang, Minglang Yin, and George Em Karniadakis. 2021. Physics-informed neural networks (PINNs) for fluid mechanics: A review. arXiv:2105.09506 [physics.flu-dyn] https://arxiv.org/abs/2105.09506

  4. [4]

    Wanlin Cai, Yuxuan Liang, Xianggen Liu, Jianshuai Feng, and Yuankai Wu. 2023. MSGNet: Learning Multi-Scale Inter-Series Correlations for Multivariate Time Series Forecasting.arXiv preprint arXiv:2401.00423(2023)

  5. [5]

    Camacho and C

    E.F. Camacho and C. Bordons. 2007.Model Predictive Control(2nd ed.). Springer

  6. [6]

    Peng Han, Jin Wang, Di Yao, Shuo Shang, and Xiangliang Zhang. 2021. InA Graph-based Approach for Trajectory Similarity Computation in Spatial Networks (KDD ’21). 556–564. https://doi.org/10.1145/3447548.3467337

  7. [7]

    Yifan Hu, Guibin Zhang, Peiyuan Liu, Disen Lan, Naiqi Li, Dawei Cheng, Tao Dai, Shu-Tao Xia, and Shirui Pan. 2025. TimeFilter: Patch-Specific Spatial-Temporal Graph Filtration for Time Series Forecasting. InForty-second International Con- ference on Machine Learning. https://openreview.net/forum?id=490VcNtjh7

  8. [8]

    Kevrekidis, Lu Lu, Paris Perdikaris, Sifan Wang, and Liu Yang

    George Em Karniadakis, Ioannis G. Kevrekidis, Lu Lu, Paris Perdikaris, Sifan Wang, and Liu Yang. 2021. Physics-informed machine learning.Nature Reviews Physics3, 6 (2021), 422–440

  9. [9]

    Xiangjie Kong, Zhenghao Chen, Weiyao Liu, Kaili Ning, Lechao Zhang, Syauqie Muhammad Marier, Yichen Liu, Yuhao Chen, and Feng Xia. 2025. Deep learning for time series forecasting: a survey.International Journal of Machine Learning and Cybernetics16, 7 (2025), 5079–5112

  10. [10]

    Lawrence, Seshu Kumar Damarla, Jong Woo Kim, Aditya Tulsyan, Faraz Amjad, Kai Wang, Benoit Chachuat, Jong Min Lee, Biao Huang, and R

    Nathan P. Lawrence, Seshu Kumar Damarla, Jong Woo Kim, Aditya Tulsyan, Faraz Amjad, Kai Wang, Benoit Chachuat, Jong Min Lee, Biao Huang, and R. Bhushan Gopaluni. 2024. Machine learning for industrial sensing and control: A survey and practical perspective.Control Engineering Practice145 (2024), 105841

  11. [11]

    Yaguang Li, Rose Yu, Cyrus Shahabi, and Yan Liu. 2018. Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting. InInternational Conference on Learning Representations (ICLR ’18)

  12. [12]

    Yong Liu, Tengge Hu, Haoran Zhang, Haixu Wu, Shiyu Wang, Lintao Ma, and Mingsheng Long. 2023. iTransformer: Inverted Transformers Are Effective for Time Series Forecasting.arXiv preprint arXiv:2310.06625(2023)

  13. [13]

    Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George Em Karni- adakis. 2021. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators.Nature Machine Intelligence3, 3 (2021), 218–229

  14. [14]

    Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam

    Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. 2023. A Time Series is Worth 64 Words: Long-term Forecasting with Transformers. In International Conference on Learning Representations

  15. [15]

    Hashem Pesaran and Allan Timmermann

    M. Hashem Pesaran and Allan Timmermann. 1992. A Simple Nonparametric Test of Predictive Performance.Journal of Business & Economic Statistics10, 4 (1992), 461–465. http://www.jstor.org/stable/1391822

  16. [16]

    Badgwell

    S.Joe Qin and Thomas A. Badgwell. 2003. A survey of industrial model predictive control technology.Control Engineering Practice11, 7 (2003), 733–764

  17. [17]

    Abdur Rahman and Md Mahmudul Hasan. 2017. Modeling and Forecasting of Carbon Dioxide Emissions in Bangladesh Using Autoregressive Integrated Moving Average (ARIMA) Models.Open Journal of Statistics7, 4 (July 2017), 560–566. doi:10.4236/ojs.2017.74038

  18. [18]

    Maziar Raissi, Paris Perdikaris, and George E Karniadakis. 2019. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations.J. Comput. Phys.378 (2019), 686–707

  19. [19]

    Rieth, Ben D

    Cory A. Rieth, Ben D. Amsel, Randy Tran, and Maia B. Cook. 2017. Additional Tennessee Eastman Process Simulation Data for Anomaly Detection Evaluation. doi:10.7910/DVN/6C3JR1

  20. [20]

    Nonlinear total variation based noise removal algorithms,

    Leonid I. Rudin, Stanley Osher, and Emad Fatemi. 1992. Nonlinear total variation based noise removal algorithms.Physica D: Nonlinear Phenomena60, 1 (1992), 259–268. https://www.sciencedirect.com/science/article/pii/016727899290242F

  21. [21]

    Christopher A. Sims. 1980. Macroeconomics and Reality.Econometrica48, 1 (1980), 1–48. http://www.jstor.org/stable/1912017

  22. [22]

    Z. Skaf, T. Aliyev, L. Shead, and T. Steffen. 2014. The State of the Art in Selective Catalytic Reduction Control. InSAE 2014 World Congress and Exhibition

  23. [23]

    Zhang, and JUN ZHOU

    Shiyu Wang, Haixu Wu, Xiaoming Shi, Tengge Hu, Huakun Luo, Lintao Ma, James Y. Zhang, and JUN ZHOU. 2024. TimeMixer: Decomposable Multiscale Mixing for Time Series Forecasting. InThe Twelfth International Conference on Learning Representations

  24. [24]

    Jared Willard, Xiaowei Jia, Shaoming Xu, Michael Steinbach, and Vipin Kumar

  25. [25]

    Surveys55, 4 (2022), 1–37

    Integrating scientific knowledge with machine learning for engineering and environmental systems.Comput. Surveys55, 4 (2022), 1–37

  26. [26]

    Haixu Wu, Tengge Hu, Yong Liu, Hang Zhou, Jianmin Wang, and Mingsheng Long. 2023. TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis. InInternational Conference on Learning Representations

  27. [27]

    Haixu Wu, Jiehui Xu, Jianmin Wang, and Mingsheng Long. 2021. Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Fore- casting. InAdvances in Neural Information Processing Systems

  28. [28]

    Zonghan Wu, Shirui Pan, Guodong Long, Jing Jiang, Xiaojun Chang, and Chengqi Zhang. 2020. Connecting the Dots: Multivariate Time Series Forecasting with Graph Neural Networks. InProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

  29. [29]

    Z. Yang, P. Liu, W. Zhou, and Q. Wang. 2022. Deep learning-enhanced NMPC for DeNOx systems.IEEE Transactions on Control Systems Technology30, 2 (2022), 589–603

  30. [30]

    Bing Yu, Haoteng Yin, and Zhanxing Zhu. 2018. Spatio-Temporal Graph Con- volutional Networks: A Deep Learning Framework for Traffic Forecasting. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI’18). AAAI Press, 3634–3640

  31. [31]

    Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. 2021. Informer: Beyond Efficient Transformer for Long Se- quence Time-Series Forecasting. InThe Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Virtual Conference, Vol. 35. 11106–11115

  32. [32]

    Jingbo Zhou, Xinjiang Lu, Yixiong Xiao, Jiantao Su, Junfu Lyu, Yanjun Ma, and De- jing Dou. 2022. SDWPF: A Dataset for Spatial Dynamic Wind Power Forecasting Challenge at KDD Cup 2022.arXiv preprint arXiv:2208.04360(2022)

  33. [33]

    Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin. 2022. FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting. InProc. 39th International Conference on Machine Learning (ICML 2022)(Baltimore, Maryland). A Implementation Details A.1 Dataset Descriptions To comprehensively evaluate DSPR across diverse phy...

  34. [34]

    Linear MPC (ARX)[ 16]: The industrial standard ARX model for process control, serving as a robustness baseline limited by linearity

    Classical Methods. Linear MPC (ARX)[ 16]: The industrial standard ARX model for process control, serving as a robustness baseline limited by linearity

  35. [35]

    Informer[ 30]: Uses ProbSparse at- tention for efficient long-sequence forecasting.PatchTST[ 14]: Applies channel-independent patching to capture local semantics

    Transformer Variants. Informer[ 30]: Uses ProbSparse at- tention for efficient long-sequence forecasting.PatchTST[ 14]: Applies channel-independent patching to capture local semantics. iTransformer[ 12]: Inverts attention to embed variates as tokens for multivariate correlations.TimeMixer[ 23]: Uses multi-scale MLP mixing.Note: This serves as our Trend St...

  36. [36]

    TimesNet[ 25]: Transforms 1D series into 2D tensors to apply convolutions for intra- and inter-period variations

    CNN-based Methods. TimesNet[ 25]: Transforms 1D series into 2D tensors to apply convolutions for intra- and inter-period variations

  37. [37]

    MSGNet[ 4]: Leverages frequency- domain graph convolutions for multi-scale inter-series correlations

    Spectral & Graph Methods. MSGNet[ 4]: Leverages frequency- domain graph convolutions for multi-scale inter-series correlations. TimeFilter[ 7]: Uses learnable frequency filters to decompose tem- poral dynamics efficiently

  38. [38]

    loss-level

    Physics-Informed Methods. Physics-Guided NN (PG-NN): To compare "loss-level" vs. "architecture-level" integration, we aug- ment the TimeMixer with a soft physical regularization term. The total loss is Ltotal =L MSE +𝝀phy ∥ ˆy−𝒇cons (x) ∥2 2, where𝒇cons (·)rep- resents conservation laws and 𝝀phy balances data fit with physical consistency. A.3 Experimenta...

  39. [39]

    Loss weights are set to 𝝀phys = 10−2 and 𝝀sparse = 10−4

    ThePhysics-Residual Streamuses 𝒅emb = 64, adaptive win- dow range 𝝎𝒕,𝒄 ∈ [ 0, 20], and gating initialization 𝜶init = 0. Loss weights are set to 𝝀phys = 10−2 and 𝝀sparse = 10−4. Baseline mod- els were reproduced following the Time-Series Library framework (https://github.com/thuml/Time-Series-Library). To facilitate repro- ducibility, the complete DSPR imp...