FinStressTS: A Parametric Synthetic Benchmark for Time-Series Forecasting in Finance

Haonan Chen; Jiaze Sun; Kelvin J.L. Koa; Ke-Wei Huang; Ruiyang Ni; Yize Liu

arxiv: 2606.03184 · v1 · pith:OTEJAMSHnew · submitted 2026-06-02 · 💱 q-fin.CP · cs.LG· q-fin.ST

FinStressTS: A Parametric Synthetic Benchmark for Time-Series Forecasting in Finance

Jiaze Sun , Kelvin J.L. Koa , Ruiyang Ni , Yize Liu , Haonan Chen , Ke-Wei Huang This is my paper

Pith reviewed 2026-06-28 07:39 UTC · model grok-4.3

classification 💱 q-fin.CP cs.LGq-fin.ST

keywords synthetic benchmarkfinancial time seriesforecasting evaluationvolatility clusteringregime switchingprobabilistic forecastingmodel diagnosticsjump processes

0 comments

The pith

FinStressTS uses controlled parametric environments to show that autoregressive and linear models often outperform Transformers in volatility, tail, and jump settings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper creates a synthetic benchmark called FinStressTS built from six explicit mechanism families to generate thirty diagnostic environments. Researchers can then measure how different forecasting models respond when the data-generating process is known exactly rather than hidden in real market records. A reader would care because real financial series entangle low signal-to-noise ratios, regime shifts, and heavy tails, so failures cannot be traced to any single cause. The evaluations compare fifteen models on both point and probabilistic tasks and report that performance depends on the active mechanism, that distributional alignment affects calibration, and that neural models usually need more samples than simpler baselines.

Core claim

FinStressTS comprises thirty diagnostic environments around six mechanism families—volatility clustering, multi-scale persistence, heavy-tailed shocks, regime switching, self-exciting jumps, and zero-inflated processes—and shows that autoregressive and linear models remain competitive or superior in several volatility-, tail-, and jump-driven settings while parametric probabilistic models calibrate well in stationary regimes.

What carries the argument

The six parametric mechanism families that generate the thirty diagnostic environments, each isolating one structural cause so that model errors can be attributed to a known data-generating process.

If this is right

Autoregressive and linear models are highly competitive in volatility-, tail-, and jump-driven environments.
Parametric probabilistic models such as DeepAR calibrate well in stationary settings, while flexible models help when distributions become multimodal or sparse.
Neural models often require more data to match simple baselines, with larger gains mainly when learning latent regimes or complex distributions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Model selection pipelines in finance could become conditional on detected mechanism type rather than fixed across all assets.
The same parametric construction could be reused to generate stress-test suites for regulatory capital calculations under known tail and jump scenarios.
Learning curves measured on these environments offer a direct way to quantify how much additional data a new architecture needs before it surpasses a linear baseline.

Load-bearing premise

The six parametric mechanism families accurately reproduce the latent structural causes present in real financial time series without adding correlations or artifacts absent from actual markets.

What would settle it

If the relative performance ordering of the fifteen models on real financial series differs systematically from their ordering on the matching FinStressTS environments, the claim that the benchmark isolates the relevant mechanisms would be undermined.

Figures

Figures reproduced from arXiv: 2606.03184 by Haonan Chen, Jiaze Sun, Kelvin J.L. Koa, Ke-Wei Huang, Ruiyang Ni, Yize Liu.

**Figure 1.** Figure 1: Illustrative examples of synthetic time series generated by each mechanism family under the Level 1 diagnostic [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗

**Figure 2.** Figure 2: Data-efficiency learning curves for 10 models across six synthetic mechanism families. Each subplot shows [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: CRPS as a function of training sample ratio across the six synthetic cases for each probabilistic model. Each line corresponds to one case (Case 1–6). Lower is better. a fundamental mismatch between popular time-series inductive biases and the structural properties of financial data. Notably, many of these architectures were originally designed for long-horizon forecasting in strongly periodic domains (… view at source ↗

read the original abstract

Financial forecasting is difficult due to low signal-to-noise ratios, latent factors, heavy tails, regime shifts, and jumps. Real-world benchmarks offer limited failure attribution: researchers can observe underperformance, but often cannot isolate why because mechanisms are unobservable and entangled. Real financial data reveal only one realized path, making it difficult to assess tail-risk calibration or data efficiency. We introduce FinStressTS, a mechanism-aware synthetic benchmark that links model behavior to controlled structural causes. FinStressTS comprises 30 diagnostic environments around six mechanism families: volatility clustering, multi-scale persistence, heavy-tailed shocks, regime switching, self-exciting jumps, and zero-inflated processes. We evaluate two tasks: point forecasting, using NMAE across five settings, and probabilistic forecasting, using CRPS under known data-generating mechanisms. We benchmark 15 models, from classical methods (HAR, VAR) to Transformer forecasters (PatchTST, iTransformer) and deep probabilistic architectures (DeepAR, TSFlow), and use learning curves to measure sample efficiency. Our evaluation reveals three insights. First, performance is mechanism-dependent: autoregressive and linear models are highly competitive, and often outperform Transformer-based models, in several volatility-, tail-, and jump-driven environments. Second, distributional alignment matters: parametric probabilistic models such as DeepAR calibrate well in stationary settings, while flexible models can help when distributions become multimodal or sparse. Third, neural models often require more data to match simple baselines, with larger gains mainly when learning latent regimes or complex distributions. FinStressTS provides an open framework for diagnosing failure modes and advancing risk-aware forecasting.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FinStressTS introduces a new synthetic benchmark with 30 controlled environments across six mechanism families, but the claimed clean attribution of model performance to specific mechanisms rests on unverified isolation.

read the letter

The main thing to know is that this paper builds FinStressTS, a set of 30 synthetic environments drawn from six parametric families (volatility clustering, multi-scale persistence, heavy tails, regime switching, self-exciting jumps, zero-inflated processes) and then evaluates 15 models on point and probabilistic forecasting tasks plus sample-efficiency curves.

The work is new in packaging these environments together with standardized NMAE and CRPS protocols and in reporting the three mechanism-dependent insights: autoregressive and linear models often beat Transformers in volatility/tail/jump settings, parametric models calibrate better in stationary cases while flexible ones help with multimodal or sparse distributions, and neural models need more data except when learning latent regimes.

That controlled diagnostic angle is the useful part. Real financial series give only one path and entangled causes, so a benchmark that lets you turn mechanisms on and off has clear practical value for risk-aware model selection.

The soft spot is exactly the stress-test concern. If a self-exciting jump process automatically produces volatility clustering or regime switches create spurious heavy tails, then performance differences cannot be cleanly mapped back to one family. The abstract gives no equations, no parameter values, no moment or spectral checks for orthogonality, and no validation against real series. Without those, the attribution in the three insights is weaker than presented.

This is for researchers who build or compare time-series forecasters in finance and want better failure diagnostics than real data supplies. A reader focused on benchmark construction or model robustness would get concrete comparisons to think about.

It deserves peer review. The benchmark idea is worth referee time even if the isolation needs more evidence in the full text.

Referee Report

1 major / 1 minor

Summary. The paper introduces FinStressTS, a synthetic benchmark with 30 diagnostic environments built from six parametric mechanism families (volatility clustering, multi-scale persistence, heavy-tailed shocks, regime switching, self-exciting jumps, zero-inflated processes). It evaluates 15 models (HAR, VAR, PatchTST, iTransformer, DeepAR, TSFlow, etc.) on point forecasting via NMAE and probabilistic forecasting via CRPS under known DGPs, plus learning curves for sample efficiency, and reports three insights: mechanism-dependent performance (AR/linear models competitive in volatility/tail/jump settings), importance of distributional alignment for calibration, and higher data needs for neural models except in regime/complex-distribution cases.

Significance. If the environments isolate the claimed mechanisms, the benchmark supplies a controlled testbed that real financial series cannot, because the latter entangle latent factors and provide only one path. This would allow precise attribution of failure modes (e.g., poor tail calibration under jumps versus regime shifts) and targeted model improvement for risk-aware forecasting. The open framework and broad model coverage constitute a useful public resource for the field.

major comments (1)

[six parametric mechanism families and 30 diagnostic environments] The claim that 'performance is mechanism-dependent' (abstract) and the three insights require that each of the six families operates in isolation within its 30 environments. Self-exciting jumps (Hawkes-type) generate clustered large increments that automatically raise the autocorrelation of squared returns, thereby injecting an unintended volatility-clustering signature into a 'jumps-only' environment. Regime-switching constructions can similarly induce spurious persistence or heavy tails. The manuscript must supply explicit orthogonality diagnostics (e.g., moment or spectral comparisons across families) to confirm the diagnostic mapping is valid; absent such checks the attribution of model rankings to specific structural causes is compromised.

minor comments (1)

[abstract] The abstract states the benchmark design and high-level results but supplies no equations, parameter values, or validation statistics; readers must consult the full text for reproducibility details.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for highlighting the critical need to verify isolation of the six mechanism families. We agree that potential cross-contamination (e.g., Hawkes-induced volatility clustering) could weaken attribution of the reported insights, and we will strengthen the manuscript with the requested diagnostics.

read point-by-point responses

Referee: [six parametric mechanism families and 30 diagnostic environments] The claim that 'performance is mechanism-dependent' (abstract) and the three insights require that each of the six families operates in isolation within its 30 environments. Self-exciting jumps (Hawkes-type) generate clustered large increments that automatically raise the autocorrelation of squared returns, thereby injecting an unintended volatility-clustering signature into a 'jumps-only' environment. Regime-switching constructions can similarly induce spurious persistence or heavy tails. The manuscript must supply explicit orthogonality diagnostics (e.g., moment or spectral comparisons across families) to confirm the diagnostic mapping is valid; absent such checks the attribution of model rankings to specific structural causes is compromised.

Authors: We agree that the validity of mechanism-dependent performance claims rests on demonstrating that each family primarily isolates its intended structure. While the parametric constructions were chosen to emphasize one dominant feature per family (e.g., Hawkes intensity for jumps, Markov switching for regimes), we acknowledge that secondary signatures such as elevated squared-return autocorrelation in jump environments or induced kurtosis in regime environments may exist. In the revision we will add a new subsection (Section 3.3) containing explicit orthogonality checks: (i) autocorrelation functions of raw and squared series, (ii) kurtosis and tail-index estimates, and (iii) spectral density comparisons across all 30 environments. These diagnostics will quantify the degree of unintended overlap and, where necessary, adjust environment parameters to improve isolation. The three insights will be re-stated with reference to these checks. revision: yes

Circularity Check

0 steps flagged

No circularity: benchmark generation and empirical evaluation are independent of fitted results

full rationale

The paper constructs FinStressTS by specifying six parametric mechanism families (volatility clustering, regime switching, etc.) and 30 diagnostic environments, then runs standard point and probabilistic forecasting metrics (NMAE, CRPS) plus learning curves on 15 models. No equation or claim reduces a reported performance insight to a quantity defined by parameters fitted inside the same paper; the data-generating processes are fixed by construction before any model is applied, and the reported mechanism-dependent rankings are direct outputs of those evaluations rather than self-referential fits. No self-citation chain is invoked to justify uniqueness or an ansatz, and no known empirical pattern is merely renamed.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities beyond the high-level description of the six mechanism families; the benchmark itself is the primary new artifact introduced by the paper.

pith-pipeline@v0.9.1-grok · 5845 in / 1240 out tokens · 25256 ms · 2026-06-28T07:39:02.668805+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

64 extracted references · 16 canonical work pages · 3 internal anchors

[1]

Alexander Alexandrov, Konstantinos Benidis, Michael Bohlke-Schneider, Valentin Flunkert, Jan Gasthaus, Tim Januschowski, Danielle C Maddix, Syama Rangapu- ram, David Salinas, Jasper Schulz, et al. 2020. Gluonts: Probabilistic and neural time series modeling in python.Journal of Machine Learning Research21, 116 (2020), 1–6

2020
[2]

Torben G Andersen, Tim Bollerslev, and Francis X Diebold. 2007. Roughing it up: Including jump components in the measurement, modeling, and forecasting of return volatility.The review of economics and statistics89, 4 (2007), 701–720

2007
[3]

Torben G Andersen, Tim Bollerslev, Francis X Diebold, and Paul Labys. 2003. Modeling and forecasting realized volatility.Econometrica71, 2 (2003), 579–625

2003
[4]

Andersen, Tim Bollerslev, Francis X

Torben G. Andersen, Tim Bollerslev, Francis X. Diebold, and Paul Labys. 2003. Modeling and Forecasting Realized Volatility.Econometrica71, 2 (2003), 579–625. doi:10.1111/1468-0262.00402

work page doi:10.1111/1468-0262.00402 2003
[5]

Yihao Ang, Qiang Huang, Yifan Bao, Anthony KH Tung, and Zhiyong Huang
[6]

TSGBench: Time Series Generation Benchmark.Proceedings of the VLDB Endowment17, 3 (2023), 305–318

2023
[7]

Emmanuel Bacry, Iacopo Mastromatteo, and Jean-François Muzy. 2015. Hawkes Processes in Finance.Market Microstructure and Liquidity1, 1 (2015), 1550005. doi:10.1142/S2382626615500057

work page doi:10.1142/s2382626615500057 2015
[8]

Tim Bollerslev. 1986. Generalized Autoregressive Conditional Heteroskedasticity. Journal of Econometrics31, 3 (1986), 307–327. doi:10.1016/0304-4076(86)90063-1

work page doi:10.1016/0304-4076(86)90063-1 1986
[9]

Tim Bollerslev. 1987. A Conditionally Heteroskedastic Time Series Model for Speculative Prices and Rates of Return.The Review of Economics and Statistics69, 3 (1987), 542–547

1987
[10]

George Box and GM Jenkins. 1976. Analysis: Forecasting and Control.San francisco(1976)

1976
[11]

Weijun Chen, Shun Li, Xipu Yu, Heyuan Wang, Wei Chen, and Tengjiao Wang
[12]

InProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence

Automatic de-biased temporal-relational modeling for stock investment recommendation. InProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence. 1999–2008

1999
[13]

Rama Cont. 2001. Empirical properties of asset returns: stylized facts and statisti- cal issues.Quantitative Finance1, 2 (2001), 223–236. doi:10.1080/713665670

work page doi:10.1080/713665670 2001
[14]

Rama Cont. 2001. Empirical properties of asset returns: stylized facts and statisti- cal issues.Quantitative finance1, 2 (2001), 223. KDD ’26, August 09–13, 2026, Jeju Island, Republic of Korea Jiaze Sun et al

2001
[15]

Fulvio Corsi. 2009. A Simple Approximate Long-Memory Model of Realized Volatility.Journal of Financial Econometrics7, 2 (2009), 174–196. doi:10.1093/ jjfinec/nbp001

2009
[16]

Fulvio Corsi. 2009. A simple approximate long-memory model of realized volatil- ity.Journal of financial econometrics7, 2 (2009), 174–196

2009
[17]

Adrien Cortés, Rémi Rehm, and Victor Letzelter. 2025. Winner-takes-all for Multi- variate Probabilistic Time Series Forecasting. InICML 2025: The 42nd International Conference on Machine Learning

2025
[18]

Yitong Duan, Lei Wang, Qizhong Zhang, and Jian Li. 2022. Factorvae: A proba- bilistic dynamic factor model based on variational autoencoder for predicting cross-sectional stock returns. InProceedings of the AAAI conference on artificial intelligence, Vol. 36. 4468–4476

2022
[19]

Straßburger (2013): Cut Elimination in Nested Sequents for Intuitionistic Modal Logics

Paul Embrechts, Claudia Klüppelberg, and Thomas Mikosch. 1997.Modelling Extremal Events for Insurance and Finance. Springer. doi:10.1007/978-3-642- 33483-2

work page doi:10.1007/978-3-642- 1997
[20]

Robert F Engle. 1982. Autoregressive conditional heteroscedasticity with esti- mates of the variance of United Kingdom inflation.Econometrica: Journal of the econometric society(1982), 987–1007

1982
[21]

Robert F. Engle. 1982. Autoregressive Conditional Heteroskedasticity with Esti- mates of the Variance of United Kingdom Inflation.Econometrica50, 4 (1982), 987–1007. doi:10.2307/1912773

work page doi:10.2307/1912773 1982
[22]

Eugene F Fama and Kenneth R French. 1993. Common risk factors in the returns on stocks and bonds.Journal of financial economics33, 1 (1993), 3–56

1993
[23]

Muhammad Hasan Ferdous, Emam Hossain, and Md Osman Gani. 2025. Time- graph: Synthetic benchmark datasets for robust time-series causal discovery. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 2. 5425–5435

2025
[24]

Tilmann Gneiting and Matthias Katzfuss. 2014. Probabilistic forecasting.Annual Review of Statistics and Its Application1, 1 (2014), 125–151

2014
[25]

Tilmann Gneiting and Adrian E. Raftery. 2007. Strictly Proper Scoring Rules, Prediction, and Estimation.J. Amer. Statist. Assoc.102, 477 (2007), 359–378. doi:10.1198/016214506000001437

work page doi:10.1198/016214506000001437 2007
[26]

Rakshitha Godahewa, Christoph Bergmeir, Geoffrey I Webb, Rob J Hyndman, and Pablo Montero-Manso. 2021. Monash time series forecasting archive.arXiv preprint arXiv:2105.06643(2021)

work page arXiv 2021
[27]

James D Hamilton. 1989. A new approach to the economic analysis of nonstation- ary time series and the business cycle.Econometrica: Journal of the econometric society(1989), 357–384

1989
[28]

Alan G. Hawkes. 1971. Spectra of some self-exciting and mutually exciting point processes.Biometrika58, 1 (1971), 83–90. doi:10.1093/biomet/58.1.83

work page doi:10.1093/biomet/58.1.83 1971
[29]

Yifan Hu, Yuante Li, Peiyuan Liu, Yuxia Zhu, Naiqi Li, Tao Dai, Shu-tao Xia, Dawei Cheng, and Changjun Jiang. 2025. Fintsb: A comprehensive and practical benchmark for financial time series forecasting.arXiv preprint arXiv:2502.18834 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[30]

Yanfei Kang, Rob J Hyndman, and Feng Li. 2020. GRATIS: GeneRAting TIme Series with diverse and controllable characteristics.Statistical Analysis and Data Mining: The ASA Data Science Journal13, 4 (2020), 354–376

2020
[31]

Marcel Kollovieh, Marten Lienen, David Lüdke, Leo Schwinn, and Stephan Gün- nemann. 2024. Flow matching with gaussian process priors for probabilistic time series forecasting.arXiv preprint arXiv:2410.03024(2024)

work page arXiv 2024
[32]

Diane Lambert. 1992. Zero-Inflated Poisson Regression, with an Application to Defects in Manufacturing.Technometrics34, 1 (1992), 1–14. doi:10.1080/00401706. 1992.10485228

work page doi:10.1080/00401706 1992
[33]

Lesmond, Joseph P

David A. Lesmond, Joseph P. Ogden, and Charles A. Trzcinka. 1999. A New Estimate of Transaction Costs.The Review of Financial Studies12, 5 (1999), 1113–1141. doi:10.1093/rfs/12.5.1113

work page doi:10.1093/rfs/12.5.1113 1999
[34]

Jingwei Liu, Ling Yang, Hongyan Li, and Shenda Hong. 2024. Retrieval-augmented diffusion models for time series forecasting.Advances in Neural Information Processing Systems37 (2024), 2766–2786

2024
[35]

Yong Liu, Tengge Hu, Haoran Zhang, Haixu Wu, Shiyu Wang, Lintao Ma, and Mingsheng Long. 2023. itransformer: Inverted transformers are effective for time series forecasting.arXiv preprint arXiv:2310.06625(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[36]

Yong Liu, Haixu Wu, Jianmin Wang, and Mingsheng Long. 2022. Non-stationary transformers: Exploring the stationarity in time series forecasting.Advances in neural information processing systems35 (2022), 9881–9893

2022
[37]

Markus Löning, Anthony Bagnall, Sajaysurya Ganesh, Viktor Kazakov, Jason Lines, and Franz J Király. 2019. sktime: A unified interface for machine learning with time series.arXiv preprint arXiv:1909.07872(2019)

work page arXiv 2019
[38]

2013.Introduction to multiple time series analysis

Helmut Lütkepohl. 2013.Introduction to multiple time series analysis. Springer Science & Business Media

2013
[39]

Spyros Makridakis and Michele Hibon. 2000. The M3-Competition: results, conclusions and implications.International journal of forecasting16, 4 (2000), 451–476

2000
[40]

Spyros Makridakis, Evangelos Spiliotis, and Vassilios Assimakopoulos. 2018. The M4 Competition: Results, findings, conclusion and way forward.International Journal of forecasting34, 4 (2018), 802–808

2018
[41]

Spyros Makridakis, Evangelos Spiliotis, Ross Hollyman, Fotios Petropoulos, Nor- man Swanson, and Anil Gaba. 2024. The M6 forecasting competition: Bridging the gap between forecasting and investment decisions.International Journal of Forecasting(2024)

2024
[42]

James E Matheson and Robert L Winkler. 1976. Scoring rules for continuous probability distributions.Management science22, 10 (1976), 1087–1096

1976
[43]

Y Nie. 2022. A Time Series is Worth 64Words: Long-term Forecasting with Transformers.arXiv preprint arXiv:2211.14730(2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[44]

Alexander Nikitin, Letizia Iannucci, and Samuel Kaski. 2024. TSGM: a flexible framework for generative modeling of synthetic time series.Advances in Neural Information Processing Systems37 (2024), 129042–129061

2024
[45]

Olivares, Cristian Challú, Azul Garza, Max Mergenthaler Canseco, and Artur Dubrawski

Kin G. Olivares, Cristian Challú, Azul Garza, Max Mergenthaler Canseco, and Artur Dubrawski. 2022. NeuralForecast: User friendly state-of-the-art neural forecasting models. PyCon Salt Lake City, Utah, US 2022. https://github.com/ Nixtla/neuralforecast

2022
[46]

Cemal Öztürk. 2024. Enhancing Financial Time-Series Analysis with TimeGAN: A Novel Approach. In2024 9th International Conference on Computer Science and Engineering (UBMK). IEEE, 447–450

2024
[47]

Xiangfei Qiu, Jilin Hu, Lekui Zhou, Xingjian Wu, Junyang Du, Buang Zhang, Chenjuan Guo, Aoying Zhou, Christian S Jensen, Zhenli Sheng, et al. 2024. TFB: Towards Comprehensive and Fair Benchmarking of Time Series Forecasting Methods.Proceedings of the VLDB Endowment17, 9 (2024), 2363–2377

2024
[48]

Kashif Rasul, Calvin Seward, Ingmar Schuster, and Roland Vollgraf. 2021. Au- toregressive denoising diffusion models for multivariate probabilistic time series forecasting. InInternational conference on machine learning. PMLR, 8857–8868

2021
[49]

Stephen A Ross. 2013. The arbitrage theory of capital asset pricing. InHandbook of the fundamentals of financial decision making: Part I. World Scientific, 11–30

2013
[50]

David Salinas, Valentin Flunkert, Jan Gasthaus, and Tim Januschowski. 2020. DeepAR: Probabilistic forecasting with autoregressive recurrent networks.Inter- national journal of forecasting36, 3 (2020), 1181–1191

2020
[51]

Yimiao Shao, Wenzhong Li, Kang Xia, Kaijie Lin, Mingkai Lin, and Sanglu Lu. 2025. QuantileFormer: Probabilistic Time Series Forecasting with a Pattern-Mixture Decomposed VAE Transformer. InProceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence. 6147–6155

2025
[52]

Zezhi Shao, Fei Wang, Yongjun Xu, Wei Wei, Chengqing Yu, Zhao Zhang, Di Yao, Tao Sun, Guangyin Jin, Xin Cao, et al. 2024. Exploring progress in multi- variate time series forecasting: Comprehensive benchmarking and heterogeneity analysis.IEEE Transactions on Knowledge and Data Engineering37, 1 (2024), 291–305

2024
[53]

Sean J Taylor and Benjamin Letham. 2018. Forecasting at scale.The American Statistician72, 1 (2018), 37–45

2018
[54]

Heyuan Wang, Tengjiao Wang, Shun Li, Jiayi Zheng, Shijie Guan, and Wei Chen
[55]

Adaptive Long-Short Pattern Transformer for Stock Investment Selection.. InIJCAI. 3970–3977
[56]

Yuxuan Wang, Haixu Wu, Jiaxiang Dong, Yong Liu, Mingsheng Long, and Jianmin Wang. 2024. Deep Time Series Models: A Comprehensive Survey and Benchmark. (2024)

2024
[57]

Yuxuan Wang, Haixu Wu, Jiaxiang Dong, Guo Qin, Haoran Zhang, Yong Liu, Yunzhong Qiu, Jianmin Wang, and Mingsheng Long. 2024. Timexer: Empowering transformers for time series forecasting with exogenous variables.Advances in Neural Information Processing Systems37 (2024), 469–498

2024
[58]

Magnus Wiese, Robert Knobloch, Ralf Korn, and Peter Kretschmer. 2020. Quant GANs: deep generation of financial time series.Quantitative Finance20, 9 (2020), 1419–1440

2020
[59]

Haixu Wu, Jiehui Xu, Jianmin Wang, and Mingsheng Long. 2021. Autoformer: De- composition transformers with auto-correlation for long-term series forecasting. Advances in neural information processing systems34 (2021), 22419–22430

2021
[60]

Ailing Zeng, Muxi Chen, Lei Zhang, and Qiang Xu. 2023. Are transformers effective for time series forecasting?. InProceedings of the AAAI conference on artificial intelligence, Vol. 37. 11121–11128

2023
[61]

Liang Zeng, Lei Wang, Hui Niu, Ruchen Zhang, Ling Wang, and Jian Li. 2024. Trade when opportunity comes: price movement forecasting via locality-aware attention and iterative refinement labeling. InProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence. 6134–6142

2024
[62]

Jiawen Zhang, Xumeng Wen, Zhenwei Zhang, Shun Zheng, Jia Li, and Jiang Bian
[63]

ProbTS: Benchmarking point and distributional forecasting across diverse prediction horizons.Advances in Neural Information Processing Systems37 (2024), 48045–48082

2024
[64]

Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin. 2022. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. InInternational conference on machine learning. PMLR, 27268–27286. FinStressTS: A Parametric Synthetic Benchmark for Time-Series Forecasting in Finance KDD ’26, August 09–13, 2026, Jeju Island,...

2022

[1] [1]

Alexander Alexandrov, Konstantinos Benidis, Michael Bohlke-Schneider, Valentin Flunkert, Jan Gasthaus, Tim Januschowski, Danielle C Maddix, Syama Rangapu- ram, David Salinas, Jasper Schulz, et al. 2020. Gluonts: Probabilistic and neural time series modeling in python.Journal of Machine Learning Research21, 116 (2020), 1–6

2020

[2] [2]

Torben G Andersen, Tim Bollerslev, and Francis X Diebold. 2007. Roughing it up: Including jump components in the measurement, modeling, and forecasting of return volatility.The review of economics and statistics89, 4 (2007), 701–720

2007

[3] [3]

Torben G Andersen, Tim Bollerslev, Francis X Diebold, and Paul Labys. 2003. Modeling and forecasting realized volatility.Econometrica71, 2 (2003), 579–625

2003

[4] [4]

Andersen, Tim Bollerslev, Francis X

Torben G. Andersen, Tim Bollerslev, Francis X. Diebold, and Paul Labys. 2003. Modeling and Forecasting Realized Volatility.Econometrica71, 2 (2003), 579–625. doi:10.1111/1468-0262.00402

work page doi:10.1111/1468-0262.00402 2003

[5] [5]

Yihao Ang, Qiang Huang, Yifan Bao, Anthony KH Tung, and Zhiyong Huang

[6] [6]

TSGBench: Time Series Generation Benchmark.Proceedings of the VLDB Endowment17, 3 (2023), 305–318

2023

[7] [7]

Emmanuel Bacry, Iacopo Mastromatteo, and Jean-François Muzy. 2015. Hawkes Processes in Finance.Market Microstructure and Liquidity1, 1 (2015), 1550005. doi:10.1142/S2382626615500057

work page doi:10.1142/s2382626615500057 2015

[8] [8]

Tim Bollerslev. 1986. Generalized Autoregressive Conditional Heteroskedasticity. Journal of Econometrics31, 3 (1986), 307–327. doi:10.1016/0304-4076(86)90063-1

work page doi:10.1016/0304-4076(86)90063-1 1986

[9] [9]

Tim Bollerslev. 1987. A Conditionally Heteroskedastic Time Series Model for Speculative Prices and Rates of Return.The Review of Economics and Statistics69, 3 (1987), 542–547

1987

[10] [10]

George Box and GM Jenkins. 1976. Analysis: Forecasting and Control.San francisco(1976)

1976

[11] [11]

Weijun Chen, Shun Li, Xipu Yu, Heyuan Wang, Wei Chen, and Tengjiao Wang

[12] [12]

InProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence

Automatic de-biased temporal-relational modeling for stock investment recommendation. InProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence. 1999–2008

1999

[13] [13]

Rama Cont. 2001. Empirical properties of asset returns: stylized facts and statisti- cal issues.Quantitative Finance1, 2 (2001), 223–236. doi:10.1080/713665670

work page doi:10.1080/713665670 2001

[14] [14]

Rama Cont. 2001. Empirical properties of asset returns: stylized facts and statisti- cal issues.Quantitative finance1, 2 (2001), 223. KDD ’26, August 09–13, 2026, Jeju Island, Republic of Korea Jiaze Sun et al

2001

[15] [15]

Fulvio Corsi. 2009. A Simple Approximate Long-Memory Model of Realized Volatility.Journal of Financial Econometrics7, 2 (2009), 174–196. doi:10.1093/ jjfinec/nbp001

2009

[16] [16]

Fulvio Corsi. 2009. A simple approximate long-memory model of realized volatil- ity.Journal of financial econometrics7, 2 (2009), 174–196

2009

[17] [17]

Adrien Cortés, Rémi Rehm, and Victor Letzelter. 2025. Winner-takes-all for Multi- variate Probabilistic Time Series Forecasting. InICML 2025: The 42nd International Conference on Machine Learning

2025

[18] [18]

Yitong Duan, Lei Wang, Qizhong Zhang, and Jian Li. 2022. Factorvae: A proba- bilistic dynamic factor model based on variational autoencoder for predicting cross-sectional stock returns. InProceedings of the AAAI conference on artificial intelligence, Vol. 36. 4468–4476

2022

[19] [19]

Straßburger (2013): Cut Elimination in Nested Sequents for Intuitionistic Modal Logics

Paul Embrechts, Claudia Klüppelberg, and Thomas Mikosch. 1997.Modelling Extremal Events for Insurance and Finance. Springer. doi:10.1007/978-3-642- 33483-2

work page doi:10.1007/978-3-642- 1997

[20] [20]

Robert F Engle. 1982. Autoregressive conditional heteroscedasticity with esti- mates of the variance of United Kingdom inflation.Econometrica: Journal of the econometric society(1982), 987–1007

1982

[21] [21]

Robert F. Engle. 1982. Autoregressive Conditional Heteroskedasticity with Esti- mates of the Variance of United Kingdom Inflation.Econometrica50, 4 (1982), 987–1007. doi:10.2307/1912773

work page doi:10.2307/1912773 1982

[22] [22]

Eugene F Fama and Kenneth R French. 1993. Common risk factors in the returns on stocks and bonds.Journal of financial economics33, 1 (1993), 3–56

1993

[23] [23]

Muhammad Hasan Ferdous, Emam Hossain, and Md Osman Gani. 2025. Time- graph: Synthetic benchmark datasets for robust time-series causal discovery. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 2. 5425–5435

2025

[24] [24]

Tilmann Gneiting and Matthias Katzfuss. 2014. Probabilistic forecasting.Annual Review of Statistics and Its Application1, 1 (2014), 125–151

2014

[25] [25]

Tilmann Gneiting and Adrian E. Raftery. 2007. Strictly Proper Scoring Rules, Prediction, and Estimation.J. Amer. Statist. Assoc.102, 477 (2007), 359–378. doi:10.1198/016214506000001437

work page doi:10.1198/016214506000001437 2007

[26] [26]

Rakshitha Godahewa, Christoph Bergmeir, Geoffrey I Webb, Rob J Hyndman, and Pablo Montero-Manso. 2021. Monash time series forecasting archive.arXiv preprint arXiv:2105.06643(2021)

work page arXiv 2021

[27] [27]

James D Hamilton. 1989. A new approach to the economic analysis of nonstation- ary time series and the business cycle.Econometrica: Journal of the econometric society(1989), 357–384

1989

[28] [28]

Alan G. Hawkes. 1971. Spectra of some self-exciting and mutually exciting point processes.Biometrika58, 1 (1971), 83–90. doi:10.1093/biomet/58.1.83

work page doi:10.1093/biomet/58.1.83 1971

[29] [29]

Yifan Hu, Yuante Li, Peiyuan Liu, Yuxia Zhu, Naiqi Li, Tao Dai, Shu-tao Xia, Dawei Cheng, and Changjun Jiang. 2025. Fintsb: A comprehensive and practical benchmark for financial time series forecasting.arXiv preprint arXiv:2502.18834 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[30] [30]

Yanfei Kang, Rob J Hyndman, and Feng Li. 2020. GRATIS: GeneRAting TIme Series with diverse and controllable characteristics.Statistical Analysis and Data Mining: The ASA Data Science Journal13, 4 (2020), 354–376

2020

[31] [31]

Marcel Kollovieh, Marten Lienen, David Lüdke, Leo Schwinn, and Stephan Gün- nemann. 2024. Flow matching with gaussian process priors for probabilistic time series forecasting.arXiv preprint arXiv:2410.03024(2024)

work page arXiv 2024

[32] [32]

Diane Lambert. 1992. Zero-Inflated Poisson Regression, with an Application to Defects in Manufacturing.Technometrics34, 1 (1992), 1–14. doi:10.1080/00401706. 1992.10485228

work page doi:10.1080/00401706 1992

[33] [33]

Lesmond, Joseph P

David A. Lesmond, Joseph P. Ogden, and Charles A. Trzcinka. 1999. A New Estimate of Transaction Costs.The Review of Financial Studies12, 5 (1999), 1113–1141. doi:10.1093/rfs/12.5.1113

work page doi:10.1093/rfs/12.5.1113 1999

[34] [34]

Jingwei Liu, Ling Yang, Hongyan Li, and Shenda Hong. 2024. Retrieval-augmented diffusion models for time series forecasting.Advances in Neural Information Processing Systems37 (2024), 2766–2786

2024

[35] [35]

Yong Liu, Tengge Hu, Haoran Zhang, Haixu Wu, Shiyu Wang, Lintao Ma, and Mingsheng Long. 2023. itransformer: Inverted transformers are effective for time series forecasting.arXiv preprint arXiv:2310.06625(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[36] [36]

Yong Liu, Haixu Wu, Jianmin Wang, and Mingsheng Long. 2022. Non-stationary transformers: Exploring the stationarity in time series forecasting.Advances in neural information processing systems35 (2022), 9881–9893

2022

[37] [37]

Markus Löning, Anthony Bagnall, Sajaysurya Ganesh, Viktor Kazakov, Jason Lines, and Franz J Király. 2019. sktime: A unified interface for machine learning with time series.arXiv preprint arXiv:1909.07872(2019)

work page arXiv 2019

[38] [38]

2013.Introduction to multiple time series analysis

Helmut Lütkepohl. 2013.Introduction to multiple time series analysis. Springer Science & Business Media

2013

[39] [39]

Spyros Makridakis and Michele Hibon. 2000. The M3-Competition: results, conclusions and implications.International journal of forecasting16, 4 (2000), 451–476

2000

[40] [40]

Spyros Makridakis, Evangelos Spiliotis, and Vassilios Assimakopoulos. 2018. The M4 Competition: Results, findings, conclusion and way forward.International Journal of forecasting34, 4 (2018), 802–808

2018

[41] [41]

Spyros Makridakis, Evangelos Spiliotis, Ross Hollyman, Fotios Petropoulos, Nor- man Swanson, and Anil Gaba. 2024. The M6 forecasting competition: Bridging the gap between forecasting and investment decisions.International Journal of Forecasting(2024)

2024

[42] [42]

James E Matheson and Robert L Winkler. 1976. Scoring rules for continuous probability distributions.Management science22, 10 (1976), 1087–1096

1976

[43] [43]

Y Nie. 2022. A Time Series is Worth 64Words: Long-term Forecasting with Transformers.arXiv preprint arXiv:2211.14730(2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022

[44] [44]

Alexander Nikitin, Letizia Iannucci, and Samuel Kaski. 2024. TSGM: a flexible framework for generative modeling of synthetic time series.Advances in Neural Information Processing Systems37 (2024), 129042–129061

2024

[45] [45]

Olivares, Cristian Challú, Azul Garza, Max Mergenthaler Canseco, and Artur Dubrawski

Kin G. Olivares, Cristian Challú, Azul Garza, Max Mergenthaler Canseco, and Artur Dubrawski. 2022. NeuralForecast: User friendly state-of-the-art neural forecasting models. PyCon Salt Lake City, Utah, US 2022. https://github.com/ Nixtla/neuralforecast

2022

[46] [46]

Cemal Öztürk. 2024. Enhancing Financial Time-Series Analysis with TimeGAN: A Novel Approach. In2024 9th International Conference on Computer Science and Engineering (UBMK). IEEE, 447–450

2024

[47] [47]

Xiangfei Qiu, Jilin Hu, Lekui Zhou, Xingjian Wu, Junyang Du, Buang Zhang, Chenjuan Guo, Aoying Zhou, Christian S Jensen, Zhenli Sheng, et al. 2024. TFB: Towards Comprehensive and Fair Benchmarking of Time Series Forecasting Methods.Proceedings of the VLDB Endowment17, 9 (2024), 2363–2377

2024

[48] [48]

Kashif Rasul, Calvin Seward, Ingmar Schuster, and Roland Vollgraf. 2021. Au- toregressive denoising diffusion models for multivariate probabilistic time series forecasting. InInternational conference on machine learning. PMLR, 8857–8868

2021

[49] [49]

Stephen A Ross. 2013. The arbitrage theory of capital asset pricing. InHandbook of the fundamentals of financial decision making: Part I. World Scientific, 11–30

2013

[50] [50]

David Salinas, Valentin Flunkert, Jan Gasthaus, and Tim Januschowski. 2020. DeepAR: Probabilistic forecasting with autoregressive recurrent networks.Inter- national journal of forecasting36, 3 (2020), 1181–1191

2020

[51] [51]

Yimiao Shao, Wenzhong Li, Kang Xia, Kaijie Lin, Mingkai Lin, and Sanglu Lu. 2025. QuantileFormer: Probabilistic Time Series Forecasting with a Pattern-Mixture Decomposed VAE Transformer. InProceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence. 6147–6155

2025

[52] [52]

Zezhi Shao, Fei Wang, Yongjun Xu, Wei Wei, Chengqing Yu, Zhao Zhang, Di Yao, Tao Sun, Guangyin Jin, Xin Cao, et al. 2024. Exploring progress in multi- variate time series forecasting: Comprehensive benchmarking and heterogeneity analysis.IEEE Transactions on Knowledge and Data Engineering37, 1 (2024), 291–305

2024

[53] [53]

Sean J Taylor and Benjamin Letham. 2018. Forecasting at scale.The American Statistician72, 1 (2018), 37–45

2018

[54] [54]

Heyuan Wang, Tengjiao Wang, Shun Li, Jiayi Zheng, Shijie Guan, and Wei Chen

[55] [55]

Adaptive Long-Short Pattern Transformer for Stock Investment Selection.. InIJCAI. 3970–3977

[56] [56]

Yuxuan Wang, Haixu Wu, Jiaxiang Dong, Yong Liu, Mingsheng Long, and Jianmin Wang. 2024. Deep Time Series Models: A Comprehensive Survey and Benchmark. (2024)

2024

[57] [57]

Yuxuan Wang, Haixu Wu, Jiaxiang Dong, Guo Qin, Haoran Zhang, Yong Liu, Yunzhong Qiu, Jianmin Wang, and Mingsheng Long. 2024. Timexer: Empowering transformers for time series forecasting with exogenous variables.Advances in Neural Information Processing Systems37 (2024), 469–498

2024

[58] [58]

Magnus Wiese, Robert Knobloch, Ralf Korn, and Peter Kretschmer. 2020. Quant GANs: deep generation of financial time series.Quantitative Finance20, 9 (2020), 1419–1440

2020

[59] [59]

Haixu Wu, Jiehui Xu, Jianmin Wang, and Mingsheng Long. 2021. Autoformer: De- composition transformers with auto-correlation for long-term series forecasting. Advances in neural information processing systems34 (2021), 22419–22430

2021

[60] [60]

Ailing Zeng, Muxi Chen, Lei Zhang, and Qiang Xu. 2023. Are transformers effective for time series forecasting?. InProceedings of the AAAI conference on artificial intelligence, Vol. 37. 11121–11128

2023

[61] [61]

Liang Zeng, Lei Wang, Hui Niu, Ruchen Zhang, Ling Wang, and Jian Li. 2024. Trade when opportunity comes: price movement forecasting via locality-aware attention and iterative refinement labeling. InProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence. 6134–6142

2024

[62] [62]

Jiawen Zhang, Xumeng Wen, Zhenwei Zhang, Shun Zheng, Jia Li, and Jiang Bian

[63] [63]

ProbTS: Benchmarking point and distributional forecasting across diverse prediction horizons.Advances in Neural Information Processing Systems37 (2024), 48045–48082

2024

[64] [64]

Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin. 2022. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. InInternational conference on machine learning. PMLR, 27268–27286. FinStressTS: A Parametric Synthetic Benchmark for Time-Series Forecasting in Finance KDD ’26, August 09–13, 2026, Jeju Island,...

2022