RAVEN: A Regime-Aware Variable-context Expert Network for Financial Time Series Forecasting

Cheng He; Defu Lian; Enhong Chen; Geng Hu; Jiajia Li; Patrick P. C. Lee; Xijie Liang; Zehao Chen; Zhenyu Guan

arxiv: 2606.24062 · v1 · pith:356DW43Unew · submitted 2026-06-23 · 💻 cs.LG · cs.AI

RAVEN: A Regime-Aware Variable-context Expert Network for Financial Time Series Forecasting

Cheng He , Zhenyu Guan , Xijie Liang , Defu Lian , Jiajia Li , Enhong Chen , Patrick P. C. Lee , Geng Hu

show 1 more author

Zehao Chen

This is my paper

Pith reviewed 2026-06-26 00:57 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords financial time series forecastingmixture of expertsvariable context windownon-stationary processescumulative importance thresholdingregime-aware routinglog-return prediction

0 comments

The pith

RAVEN determines per-sample context lengths for financial forecasts by routing nested windows to scale-specialized experts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Financial returns are non-stationary, so the amount of past data that best predicts the next step changes from one period to the next. Fixed-window models therefore feed every sample the same history length even when a shorter or longer prefix would be more informative. RAVEN scores recent patches by learned importance, then uses cumulative thresholding to carve out a ladder of nested windows whose lengths are chosen from the data. Each window feeds its own expert; a parallel global branch keeps long-range coherence and a weighting step penalizes redundant expert outputs before they combine. The resulting forecasts raise correlation on equity indices and cut error on fund sales while also topping traffic benchmarks.

Core claim

The paper claims that a fixed context window is mismatched to the time-varying optimal look-back of non-stationary price processes. RAVEN addresses this by constructing a hierarchy of nested contiguous windows whose lengths are determined by the data itself: patches are scored by learned importance in reverse chronological order, Cumulative Importance Thresholding selects the nested prefix windows, each is routed to a scale-specialized expert, a Global Compressed Representation branch runs in parallel over the full context, and Correlation-Aware Weighting aligns the variable-length outputs before aggregation.

What carries the argument

Cumulative Importance Thresholding (CIT), which scores patches in reverse chronological order by learned importance and selects nested prefix windows at a cumulative threshold.

If this is right

Pearson correlation rises 9.2 percent on HS300 and 20.2 percent on S&P500 cumulative log-return forecasts.
MSE drops 18.2 percent on fund sales forecasting.
Best results appear in 14 of 16 metrics across four PEMS traffic data sets.
Regime-dependent temporal dependencies are handled without a single fixed horizon.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same nested-window routing could be tested on other non-stationary series such as electricity load or weather variables.
If the importance scorer is replaced by a simple recency prior, the performance gap would reveal how much the learned scoring contributes.
Extending the hierarchy to include non-contiguous windows might further reduce redundancy among experts.

Load-bearing premise

That importance scores learned on patches, when accumulated from most recent backward, will produce window lengths that match each sample's time-varying optimal look-back.

What would settle it

An experiment that replaces the CIT-selected windows with randomly chosen nested windows of the same length distribution and measures whether forecasting gains disappear.

Figures

Figures reproduced from arXiv: 2606.24062 by Cheng He, Defu Lian, Enhong Chen, Geng Hu, Jiajia Li, Patrick P. C. Lee, Xijie Liang, Zehao Chen, Zhenyu Guan.

**Figure 1.** Figure 1: CWT scalograms for multi-scale analysis. Financial data (a) exhibits non-stationary energy distribution with no fixed periodicity, while traffic data (b) shows stable, periodic patterns. horizons and to a linear functional form; the complementary horizons themselves, and the optimal way to combine them, remain hand-crafted. More recent deep-learning multi-period research, e.g. MLF [42], extend this multi-p… view at source ↗

**Figure 2.** Figure 2: Overview of RAVEN. The pipeline consists of three modules. Preprocess applies instance normalization, channel-independent processing, and patch partitioning to produce embedded patches E = [e1,..., eN]. Backbone operates via two parallel branches. (i) The local adaptive branch scores patch importance and accumulates scores in reverse chronological order against CIT-based thresholds, generating K nested con… view at source ↗

**Figure 3.** Figure 3: Production deployment pipeline of RAVEN in a quantitative trading system. The offline phase handles model training and backtesting validation on historical data. The online phase appends newly available market data after each close, generates return predictions via daily inference, optimizes portfolio allocations, and routes orders through pre-trade risk checks to execution venues. A production monitor tr… view at source ↗

**Figure 4.** Figure 4: Cumulative return advantage of RAVEN over baselines on HS300, from 2020 to 2024. ∆(·) denotes the cumulative return of RAVEN minus that of baseline (·). All curves show a persistent upward trend across varying market regimes. dataset-metric pairs. The consistently higher ICIR suggests that RAVEN produces not only stronger average predictions but also more stable predictive signals across rebalancing period… view at source ↗

**Figure 5.** Figure 5: Efficiency comparison on HS300 (batch size 512, look-back window 120). Training and inference time are reported in milliseconds per iteration; memory denotes peak GPU memory during training. Table VIII: Effect of number of experts K on HS300 (Corr ↑). Year K=2 K=3 K=4 (0.3, 0.9) (0.3, 0.6, 0.9) (0.3, 0.5, 0.7, 0.9) 2020 0.0533 0.0567 0.0582 2021 0.0273 0.0292 0.0274 2022 0.0300 0.0422 0.0321 2023 0.0173 0.… view at source ↗

**Figure 6.** Figure 6: Distributions of Mean Patch Importance Score (MPIS) s˜i on HS300. Each data point represents the annual average of the learned importance at a given patch index. Patch index 1 corresponds to the most recent time segment. (a) Annual mean importance profiles of stock 600176.SS across five years, illustrating temporal regime adaptation. (b) Annual mean importance profiles of four stocks within 2023, illustrat… view at source ↗

**Figure 7.** Figure 7: Empirical distributions of expert aggregation weights on HS300. Expert 1 corresponds to the short-horizon expert and Expert 3 to the long-horizon expert. Each bar shows the annual mean weight allocated to each expert. (a) Weight evolution of stock 605117.SS across five years, reflecting regime-driven reallocation. (b) Weight distribution across four stocks within 2023, reflecting asset-specific routing pre… view at source ↗

read the original abstract

Financial time series forecasting presents structural challenges absent from standard benchmarks. Log-returns are non-stationary, exhibit exceptionally low signal-to-noise (SNR) ratios, and are governed by regime-dependent temporal dependencies. We identify a key limitation of state-of-the-art (SOTA) time series models in financial settings. A fixed context window is mismatched to the time-varying optimal look-back of non-stationary price processes. We propose the Regime-Aware Variable-context Expert Network (RAVEN), a Mixture-of-Experts framework designed to adaptively determine the temporal context for each input sample. Instead of relying on a fixed look-back horizon, RAVEN constructs a hierarchy of nested contiguous windows whose lengths are determined by the data itself. Specifically, RAVEN scores patches by learned importance in reverse chronological order and applies the Cumulative Importance Thresholding (CIT) mechanism to derive nested prefix windows, each routed to a scale-specialized expert. A Global Compressed Representation (GCR) branch runs in parallel over the full context, preserving global temporal coherence that local experts cannot guarantee. Because the nested routing induces structured overlap among expert inputs, we introduce a Correlation-Aware Weighting (CAW) to align variable-length expert outputs and penalize pairwise cosine similarity prior to aggregation. Experiments on cumulative log-return prediction (HS300, S&P500) and fund sales forecasting demonstrate that RAVEN achieves SOTA performances, improves Pearson correlation by 9.2% on HS300 and 20.2% on S&P500, and reduces MSE by 18.2% on fund sales forecasting, while achieving the best results in 14 of 16 metrics on four PEMS traffic benchmarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RAVEN's CIT-based nested windows and MoE routing target a real issue in non-stationary financial forecasting, but the abstract gives no evidence that the learned scores recover the right time-varying horizons.

read the letter

The main new element is the combination of reverse-chronological patch scoring, Cumulative Importance Thresholding to build nested prefix windows, a parallel Global Compressed Representation branch, and the Correlation-Aware Weighting penalty to manage overlaps. This is a concrete architecture for letting context length vary per sample inside an MoE, aimed at regime shifts in log-returns.

It does a solid job naming the mismatch between fixed windows and non-stationary price processes, and the design choices (scale-specialized experts plus global coherence term) follow logically from that diagnosis. Extending the tests to both equity indices and traffic data is a reasonable way to check generality.

The soft spot is exactly the one in the stress-test note. Nothing supplied shows that the learned importances produce windows whose lengths track the sample-specific optimal look-back rather than some other pattern. Without ablations on the CIT step, diagnostics on window length distributions across regimes, or even basic dataset statistics, the reported Pearson and MSE gains cannot be attributed to the variable-context mechanism. They could come from capacity, the penalty, or other unmentioned factors. The abstract also omits error bars and any discussion of how the free parameters (patch scores, threshold) were set.

This is for people working on adaptive time-series models in finance or other regime-shifting domains. A reader interested in concrete proposals for variable context would find the architecture description useful even if the results need more checking.

I would send it to peer review. The idea is specific enough that referees can test whether the central assumption holds once the full experiments and code are available.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes RAVEN, a Mixture-of-Experts architecture for financial time series forecasting that adaptively determines per-sample temporal context via a Cumulative Importance Thresholding (CIT) mechanism: patches are scored by learned importance in reverse chronological order to form nested prefix windows routed to scale-specialized experts. A parallel Global Compressed Representation (GCR) branch preserves global coherence, and Correlation-Aware Weighting (CAW) aligns variable-length outputs while penalizing cosine similarity. The paper claims this addresses the mismatch of fixed context windows with regime-dependent dependencies in non-stationary, low-SNR log-return series, reporting SOTA results including 9.2% Pearson correlation improvement on HS300, 20.2% on S&P500, 18.2% MSE reduction on fund sales forecasting, and best performance in 14 of 16 metrics across four PEMS traffic benchmarks.

Significance. If the CIT mechanism reliably recovers time-varying optimal look-backs rather than arbitrary patterns, the variable-context MoE design could offer a practical way to handle regime shifts in financial forecasting where fixed windows are suboptimal. The addition of GCR and CAW to manage overlap and coherence is a reasonable engineering response to the nested routing. Evaluation on both financial and non-financial (PEMS) benchmarks provides some breadth, though significance hinges on confirming that reported gains are attributable to the adaptive context rather than capacity or regularization effects alone.

major comments (2)

[Abstract] Abstract (CIT mechanism paragraph): The assertion that learned reverse-chronological patch importances followed by Cumulative Importance Thresholding produce nested prefix windows whose lengths match each sample's time-varying optimal look-back is presented without any ablation, diagnostic (e.g., window-length histograms per regime), or sensitivity analysis on the threshold value; if this alignment does not hold, the variable-context routing, GCR, and CAW cannot explain the claimed Pearson/MSE gains and the improvements could arise from other factors.
[Abstract] Abstract (experimental claims): The reported metric improvements (9.2% Pearson on HS300, 20.2% on S&P500, 18.2% MSE on fund sales, best in 14/16 PEMS metrics) are stated without error bars, number of runs, dataset statistics, or explicit comparisons isolating the CIT variable-context component against strong fixed-window MoE baselines, which is required to establish that the architectural choices drive the SOTA results.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract and experimental claims. We address each point below and will revise the manuscript accordingly to strengthen the evidence for the CIT mechanism and the attribution of performance gains.

read point-by-point responses

Referee: [Abstract] Abstract (CIT mechanism paragraph): The assertion that learned reverse-chronological patch importances followed by Cumulative Importance Thresholding produce nested prefix windows whose lengths match each sample's time-varying optimal look-back is presented without any ablation, diagnostic (e.g., window-length histograms per regime), or sensitivity analysis on the threshold value; if this alignment does not hold, the variable-context routing, GCR, and CAW cannot explain the claimed Pearson/MSE gains and the improvements could arise from other factors.

Authors: We agree that the abstract presents the CIT alignment claim without accompanying diagnostics. In the revised manuscript we will add (i) ablation studies removing or replacing CIT, (ii) histograms of selected window lengths stratified by detected market regimes, and (iii) sensitivity plots for the cumulative-importance threshold. These additions will directly test whether the learned nested prefixes correspond to regime-dependent optimal look-backs and will clarify the contribution of variable-context routing relative to GCR and CAW. revision: yes
Referee: [Abstract] Abstract (experimental claims): The reported metric improvements (9.2% Pearson on HS300, 20.2% on S&P500, 18.2% MSE on fund sales, best in 14/16 PEMS metrics) are stated without error bars, number of runs, dataset statistics, or explicit comparisons isolating the CIT variable-context component against strong fixed-window MoE baselines, which is required to establish that the architectural choices drive the SOTA results.

Authors: We acknowledge that the abstract omits error bars, run counts, and component-isolating ablations. The full paper already reports multiple random seeds and basic dataset statistics; we will augment the experimental section with (i) mean ± std over the reported seeds, (ii) explicit fixed-window MoE baselines that keep all other RAVEN components identical, and (iii) an ablation table that isolates the CIT variable-context branch. These results will be summarized in the abstract revision as well. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical architecture with no self-referential derivations or fitted predictions

full rationale

The paper presents an empirical neural architecture (Mixture-of-Experts with CIT, GCR, CAW) whose performance claims rest on experimental results rather than any closed-form derivation. No equations are supplied that define a quantity in terms of itself, no fitted parameters are relabeled as predictions, and no load-bearing uniqueness theorems or ansatzes are imported via self-citation. The central mechanism (learned reverse-chronological patch importance + CIT) is an architectural choice whose validity is tested empirically on HS300, S&P500, and PEMS benchmarks; it does not reduce to the reported Pearson/MSE gains by construction. This is the normal case of a self-contained empirical contribution.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

Review performed on abstract only; full manuscript details on parameters, assumptions, and training are unavailable.

free parameters (2)

patch importance scoring weights
Learned parameters that determine which patches are retained by the CIT threshold.
CIT threshold value
Controls the length of each nested prefix window; must be either learned or set by hand.

axioms (1)

domain assumption Financial log-returns are governed by regime-dependent temporal dependencies that a fixed context window cannot capture.
Stated directly in the abstract as the key limitation of SOTA models.

pith-pipeline@v0.9.1-grok · 5868 in / 1321 out tokens · 25853 ms · 2026-06-26T00:57:34.024337+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

47 extracted references · 17 canonical work pages

[1]

Timesqueeze: Dynamic patching for efficient time series forecasting,

S. K. Ankireddy, N. Seleznev, N. H. Nguyen, Y . Wu, S. Kumar, F. Huang, and C. B. Bruss, “Timesqueeze: Dynamic patching for efficient time series forecasting,” CoRR, vol. abs/2603.11352, 2026. [Online]. Available: https://doi.org/10.48550/arXiv.2603.11352

work page doi:10.48550/arxiv.2603.11352 2026
[2]

Chronos: Learning the language of time series,

A. F. Ansari, L. Stella, A. C. Türkmen, X. Zhang, P. Mercado, H. Shen, O. Shchur, S. S. Rangapuram, S. Pineda-Arango, S. Kapoor, J. Zschiegner, D. C. Maddix, H. Wang, M. W. Mahoney, K. Torkkola, A. G. Wilson, M. Bohlke-Schneider, and B. Wang, “Chronos: Learning the language of time series,” Trans. Mach. Learn. Res., vol. 2024, 2024. [Online]. Available: h...

2024
[3]

Machine learning strategies for time series forecasting,

G. Bontempi, S. B. Taieb, and Y . L. Borgne, “Machine learning strategies for time series forecasting,” in Business Intelligence - Second European Summer School, eBISS 2012, Brussels, Belgium, July 15-21, 2012, Tutorial Lectures, ser. Lecture Notes in Business Information Processing, M. Aufaure and E. Zimányi, Eds. Springer, 2012, pp. 62–77. [Online]. Ava...

work page doi:10.1007/978-3-642-36318-4_3 2012
[4]

NHITS: neural hierarchical interpolation for time series forecasting,

C. Challu, K. G. Olivares, B. N. Oreshkin, F. G. Ramírez, M. M. Canseco, and A. Dubrawski, “NHITS: neural hierarchical interpolation for time series forecasting,” in Thirty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence, IAAI 2023, Thirteenth Symposium on Educati...

work page doi:10.1609/aaai.v37i6.25854 2023
[5]

Pathformer: Multi-scale transformers with adaptive pathways for time series forecasting,

P. Chen, Y . Zhang, Y . Cheng, Y . Shu, Y . Wang, Q. Wen, B. Yang, and C. Guo, “Pathformer: Multi-scale transformers with adaptive pathways for time series forecasting,” in The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net, 2024. [Online]. Available: https://openreview.net/forum?id...

2024
[6]

Xgboost: A scalable tree boosting system,

T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016, B. Krishnapuram, M. Shah, A. J. Smola, C. C. Aggarwal, D. Shen, and R. Rastogi, Eds. ACM, 2016, pp. 785–794. [Online]. Available: https://doi...

work page doi:10.1145/2939672.2939785 2016
[7]

In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp

K. Cho, B. van Merrienboer, Ç. Gülçehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y . Bengio, “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT,a Specia...

work page doi:10.3115/v1/d14-1179 2014
[8]

A simple approximate long-memory model of realized volatility,

F. Corsi, “A simple approximate long-memory model of realized volatility,” Journal of financial econometrics, vol. 7, no. 2, pp. 174–196, 2009

2009
[9]

Comparing predictive accuracy,

F. X. Diebold and R. S. Mariano, “Comparing predictive accuracy,” Journal of Business & economic statistics, vol. 20, no. 1, pp. 134–144, 2002

2002
[10]

Finding structure in time,

J. L. Elman, “Finding structure in time,” Cogn. Sci., vol. 14, no. 2, pp. 179–211, 1990. [Online]. Available: https://doi.org/10.1207/ s15516709cog1402_1

1990
[11]

Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity,

W. Fedus, B. Zoph, and N. Shazeer, “Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity,” J. Mach. Learn. Res., vol. 23, pp. 120:1–120:39, 2022. [Online]. Available: https://jmlr.org/papers/v23/21-0998.html

2022
[12]

Greedy function approximation: a gradient boosting machine,

J. H. Friedman, “Greedy function approximation: a gradient boosting machine,” Annals of statistics, pp. 1189–1232, 2001

2001
[13]

Empirical asset pricing via machine learning,

S. Gu, B. Kelly, and D. Xiu, “Empirical asset pricing via machine learning,” The Review of Financial Studies, vol. 33, no. 5, pp. 2223–2273, 2020

2020
[14]

Attention based spatial-temporal graph convolutional networks for traffic flow forecasting,

S. Guo, Y . Lin, N. Feng, C. Song, and H. Wan, “Attention based spatial-temporal graph convolutional networks for traffic flow forecasting,” in The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances i...

work page doi:10.1609/aaai.v33i01.3301922 2019
[15]

Measuring the information content of stock trades,

J. Hasbrouck, “Measuring the information content of stock trades,” The Journal of Finance, vol. 46, no. 1, pp. 179–207, 1991

1991
[16]

Neural Computation 9(8), 1735–1780 (1997) https://doi.org/10.1162/neco.1997.9.8.1735

S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997. [Online]. Available: https://doi.org/10.1162/neco.1997.9.8.1735

work page doi:10.1162/neco.1997.9.8.1735 1997
[17]

Lightgbm: A highly efficient gradient boosting decision tree,

G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T. Liu, “Lightgbm: A highly efficient gradient boosting decision tree,” in Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, I. Guyon, U. von Luxburg, S. Bengio, H. M. Wallach, R. Ferg...

2017
[18]

Reversible instance normalization for accurate time-series forecasting against distribution shift,

T. Kim, J. Kim, Y . Tae, C. Park, J. Choi, and J. Choo, “Reversible instance normalization for accurate time-series forecasting against distribution shift,” in The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022. [Online]. Available: https: //openreview.net/forum?id=cGDAkQo1C0p

2022
[19]

Master: Market- guided stock transformer for stock price forecasting,

T. Li, Z. Liu, Y . Shen, X. Wang, H. Chen, and S. Huang, “Master: Market- guided stock transformer for stock price forecasting,” in Proceedings of the AAAI conference on artificial intelligence, vol. 38, no. 1, 2024, pp. 162–170

2024
[20]

Learning multiple stock trading patterns with temporal routing adaptor and optimal transport,

H. Lin, D. Zhou, W. Liu, and J. Bian, “Learning multiple stock trading patterns with temporal routing adaptor and optimal transport,” in KDD ’21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, Singapore, August 14-18, 2021, F. Zhu, B. C. Ooi, and C. Miao, Eds. ACM, 2021, pp. 1017–1026. [Online]. Available: https://do...

work page doi:10.1145/3447548.3467358 2021
[21]

Moirai-moe: Empowering time series foundation models with sparse mixture of experts,

X. Liu, J. Liu, G. Woo, T. Aksu, Y . Liang, R. Zimmermann, C. Liu, J. Li, S. Savarese, C. Xiong, and D. Sahoo, “Moirai-moe: Empowering time series foundation models with sparse mixture of experts,” in Forty-second International Conference on Machine Learning, ICML 2025, Vancouver, BC, Canada, July 13-19, 2025, ser. Proceedings of Machine Learning Research...

2025
[22]

itransformer: Inverted transformers are effective for time series forecasting,

Y . Liu, T. Hu, H. Zhang, H. Wu, S. Wang, L. Ma, and M. Long, “itransformer: Inverted transformers are effective for time series forecasting,” in The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net, 2024. [Online]. Available: https: //openreview.net/forum?id=JePfAI8fah

2024
[23]

The adaptive markets hypothesis: Market efficiency from an evolutionary perspective,

A. W. Lo, “The adaptive markets hypothesis: Market efficiency from an evolutionary perspective,” Journal of Portfolio Management, Forthcoming, 2004

2004
[24]

Moderntcn: A modern pure convolution structure for general time series analysis,

D. Luo and X. Wang, “Moderntcn: A modern pure convolution structure for general time series analysis,” in The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net, 2024. [Online]. Available: https://openreview.net/forum?id=vpJMJerXHU

2024
[25]

Wpmixer: Efficient multi-resolution mixing for long-term time series forecasting,

M. M. N. Murad, M. Aktukmak, and Y . Yilmaz, “Wpmixer: Efficient multi-resolution mixing for long-term time series forecasting,” in Thirty-Ninth AAAI Conference on Artificial Intelligence, Thirty-Seventh Conference on Innovative Applications of Artificial Intelligence, Fifteenth Symposium on Educational Advances in Artificial Intelligence, AAAI 2025, Phil...

work page doi:10.1609/aaai.v39i18.34156 2025
[26]

A time series is worth 64 words: Long-term forecasting with transformers,

Y . Nie, N. H. Nguyen, P. Sinthong, and J. Kalagnanam, “A time series is worth 64 words: Long-term forecasting with transformers,” in The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023. [Online]. Available: https://openreview.net/forum?id=Jbdc0vTOcol

2023
[27]

Fredformer: Frequency debiased transformer for time series forecasting,

X. Piao, Z. Chen, T. Murayama, Y . Matsubara, and Y . Sakurai, “Fredformer: Frequency debiased transformer for time series forecasting,” in Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2024, Barcelona, Spain, August 25-29, 2024, R. Baeza-Yates and F. Bonchi, Eds. ACM, 2024, pp. 2400–2410. [Online]. Available: h...

work page doi:10.1145/3637528.3671928 2024
[28]

Applied Soft Computing90, 106181 (2020) https://doi.org/10.1016/j.asoc.2020.106181

O. B. Sezer, M. U. Gudelek, and A. M. Özbayoglu, “Financial time series forecasting with deep learning : A systematic literature review: 2005-2019,” Appl. Soft Comput., vol. 90, p. 106181, 2020. [Online]. Available: https://doi.org/10.1016/j.asoc.2020.106181 13

work page doi:10.1016/j.asoc.2020.106181 2005
[29]

Scaleformer: Iterative multi-scale refining transformers for time series forecasting,

M. A. Shabani, A. H. Abdi, L. Meng, and T. Sylvain, “Scaleformer: Iterative multi-scale refining transformers for time series forecasting,” in The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023. [Online]. Available: https://openreview.net/forum?id=sCrnllCtjoE

2023
[30]

Outrageously large neural networks: The sparsely-gated mixture-of-experts layer,

N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. V . Le, G. E. Hinton, and J. Dean, “Outrageously large neural networks: The sparsely-gated mixture-of-experts layer,” in 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, 2017. [Online]. Available: https:/...

2017
[31]

Time-moe: Billion-scale time series foundation models with mixture of experts,

X. Shi, S. Wang, Y . Nie, D. Li, Z. Ye, Q. Wen, and M. Jin, “Time-moe: Billion-scale time series foundation models with mixture of experts,” in The Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025. OpenReview.net, 2025. [Online]. Available: https://openreview.net/forum?id=e1wDDFmlVu

2025
[32]

Kronos: A foundation model for the language of financial markets,

Y . Shi, Z. Fu, S. Chen, B. Zhao, W. Xu, C. Zhang, and J. Li, “Kronos: A foundation model for the language of financial markets,” in Fortieth AAAI Conference on Artificial Intelligence, Thirty-Eighth Conference on Innovative Applications of Artificial Intelligence, Sixteenth Symposium on Educational Advances in Artificial Intelligence, AAAI 2026, Singapor...

work page doi:10.1609/aaai.v40i30.39730 2026
[33]

Learning pattern- specific experts for time series forecasting under patch-level distribution shift,

Y . Sun, Z. Xie, E. Eldele, D. Chen, Q. Hu, and M. Wu, “Learning pattern- specific experts for time series forecasting under patch-level distribution shift,” in Advances in Neural Information Processing Systems, D. Belgrave, C. Zhang, H. Lin, R. Pascanu, P. Koniusz, M. Ghassemi, and N. Chen, Eds., vol. 38. Curran Associates, Inc., 2025, pp. 91 810–91 844....

2025
[34]

CARD: channel aligned robust blend transformer for time series forecasting,

X. Wang, T. Zhou, Q. Wen, J. Gao, B. Ding, and R. Jin, “CARD: channel aligned robust blend transformer for time series forecasting,” in The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net, 2024. [Online]. Available: https://openreview.net/forum?id=MJksrOhurE

2024
[35]

Asymptotic inference about predictive ability,

K. D. West, “Asymptotic inference about predictive ability,” Econometrica, vol. 64, no. 5, pp. 1067–1084, 1996

1996
[36]

Timesnet: Temporal 2d-variation modeling for general time series analysis,

H. Wu, T. Hu, Y . Liu, H. Zhou, J. Wang, and M. Long, “Timesnet: Temporal 2d-variation modeling for general time series analysis,” in The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023. [Online]. Available: https://openreview.net/forum?id=ju_Uqw384Oq

2023
[37]

Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting,

H. Wu, J. Xu, J. Wang, and M. Long, “Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting,” in Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, M. Ranzato, A. Beygelzimer, Y . N. Dauphin, P. Liang, and ...

2021
[38]

HIST: A graph-based framework for stock trend forecasting via mining concept-oriented shared information,

W. Xu, W. Liu, L. Wang, Y . Xia, J. Bian, J. Yin, and T. Liu, “HIST: A graph-based framework for stock trend forecasting via mining concept-oriented shared information,” CoRR, vol. abs/2110.13716, 2021. [Online]. Available: https://arxiv.org/abs/2110.13716

arXiv 2021
[39]

Finmultitime: A four-modal bilingual dataset for financial time-series analysis,

W. Xu, D. Xiang, Y . Liu, X. Wang, Y . Ma, L. Zhang, C. Xu, and J. Zhang, “Finmultitime: A four-modal bilingual dataset for financial time-series analysis,” CoRR, vol. abs/2506.05019, 2025. [Online]. Available: https://doi.org/10.48550/arXiv.2506.05019

work page doi:10.48550/arxiv.2506.05019 2025
[40]

Qlib: An ai-oriented quantitative investment platform,

X. Yang, W. Liu, D. Zhou, J. Bian, and T. Liu, “Qlib: An ai-oriented quantitative investment platform,” CoRR, vol. abs/2009.11189, 2020. [Online]. Available: https://arxiv.org/abs/2009.11189

arXiv 2009
[41]

Are transformers effective for time series forecasting?

A. Zeng, M. Chen, L. Zhang, and Q. Xu, “Are transformers effective for time series forecasting?” in Thirty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence, IAAI 2023, Thirteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2023, Washington, D...

work page doi:10.1609/aaai.v37i9.26317 2023
[42]

Multi-period learning for financial time series forecasting,

X. Zhang, Z. Huang, Y . Wu, X. Lu, E. Qi, Y . Chen, Z. Xue, Q. Wang, P. Wang, and W. Wang, “Multi-period learning for financial time series forecasting,” in Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, V .1,KDD 2025, Toronto, ON, Canada, August 3-7, 2025, Y . Sun, F. Chierichetti, H. W. Lauw, C. Perlich, W. H. Tok,...

work page doi:10.1145/3690624.3709422 2025
[43]

Crossformer: Transformer utilizing cross- dimension dependency for multivariate time series forecasting,

Y . Zhang and J. Yan, “Crossformer: Transformer utilizing cross- dimension dependency for multivariate time series forecasting,” in The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023. [Online]. Available: https://openreview.net/forum?id=vSVLM2j9eie

2023
[44]

Doubleadapt: A meta-learning approach to incremental learning for stock trend forecasting,

L. Zhao, S. Kong, and Y . Shen, “Doubleadapt: A meta-learning approach to incremental learning for stock trend forecasting,” in Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2023, Long Beach, CA, USA, August 6-10, 2023, A. K. Singh, Y . Sun, L. Akoglu, D. Gunopulos, X. Yan, R. Kumar, F. Ozcan, and J. Ye, Eds. AC...

work page doi:10.1145/3580305.3599315 2023
[45]

Informer: Beyond efficient transformer for long sequence time-series forecasting,

H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, and W. Zhang, “Informer: Beyond efficient transformer for long sequence time-series forecasting,” in Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advan...

work page doi:10.1609/aaai.v35i12.17325 2021
[46]

Film: Frequency improved legendre memory model for long-term time series forecasting,

T. Zhou, Z. Ma, X. Wang, Q. Wen, L. Sun, T. Yao, W. Yin, and R. Jin, “Film: Frequency improved legendre memory model for long-term time series forecasting,” in Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, S. Koyejo,...

2022
[47]

Fedformer: Frequency enhanced decomposed transformer for long- term series forecasting,

T. Zhou, Z. Ma, Q. Wen, X. Wang, L. Sun, and R. Jin, “Fedformer: Frequency enhanced decomposed transformer for long- term series forecasting,” in International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, ser. Proceedings of Machine Learning Research, K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvári, G. Niu, and S....

2022

[1] [1]

Timesqueeze: Dynamic patching for efficient time series forecasting,

S. K. Ankireddy, N. Seleznev, N. H. Nguyen, Y . Wu, S. Kumar, F. Huang, and C. B. Bruss, “Timesqueeze: Dynamic patching for efficient time series forecasting,” CoRR, vol. abs/2603.11352, 2026. [Online]. Available: https://doi.org/10.48550/arXiv.2603.11352

work page doi:10.48550/arxiv.2603.11352 2026

[2] [2]

Chronos: Learning the language of time series,

A. F. Ansari, L. Stella, A. C. Türkmen, X. Zhang, P. Mercado, H. Shen, O. Shchur, S. S. Rangapuram, S. Pineda-Arango, S. Kapoor, J. Zschiegner, D. C. Maddix, H. Wang, M. W. Mahoney, K. Torkkola, A. G. Wilson, M. Bohlke-Schneider, and B. Wang, “Chronos: Learning the language of time series,” Trans. Mach. Learn. Res., vol. 2024, 2024. [Online]. Available: h...

2024

[3] [3]

Machine learning strategies for time series forecasting,

G. Bontempi, S. B. Taieb, and Y . L. Borgne, “Machine learning strategies for time series forecasting,” in Business Intelligence - Second European Summer School, eBISS 2012, Brussels, Belgium, July 15-21, 2012, Tutorial Lectures, ser. Lecture Notes in Business Information Processing, M. Aufaure and E. Zimányi, Eds. Springer, 2012, pp. 62–77. [Online]. Ava...

work page doi:10.1007/978-3-642-36318-4_3 2012

[4] [4]

NHITS: neural hierarchical interpolation for time series forecasting,

C. Challu, K. G. Olivares, B. N. Oreshkin, F. G. Ramírez, M. M. Canseco, and A. Dubrawski, “NHITS: neural hierarchical interpolation for time series forecasting,” in Thirty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence, IAAI 2023, Thirteenth Symposium on Educati...

work page doi:10.1609/aaai.v37i6.25854 2023

[5] [5]

Pathformer: Multi-scale transformers with adaptive pathways for time series forecasting,

P. Chen, Y . Zhang, Y . Cheng, Y . Shu, Y . Wang, Q. Wen, B. Yang, and C. Guo, “Pathformer: Multi-scale transformers with adaptive pathways for time series forecasting,” in The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net, 2024. [Online]. Available: https://openreview.net/forum?id...

2024

[6] [6]

Xgboost: A scalable tree boosting system,

T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016, B. Krishnapuram, M. Shah, A. J. Smola, C. C. Aggarwal, D. Shen, and R. Rastogi, Eds. ACM, 2016, pp. 785–794. [Online]. Available: https://doi...

work page doi:10.1145/2939672.2939785 2016

[7] [7]

In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp

K. Cho, B. van Merrienboer, Ç. Gülçehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y . Bengio, “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT,a Specia...

work page doi:10.3115/v1/d14-1179 2014

[8] [8]

A simple approximate long-memory model of realized volatility,

F. Corsi, “A simple approximate long-memory model of realized volatility,” Journal of financial econometrics, vol. 7, no. 2, pp. 174–196, 2009

2009

[9] [9]

Comparing predictive accuracy,

F. X. Diebold and R. S. Mariano, “Comparing predictive accuracy,” Journal of Business & economic statistics, vol. 20, no. 1, pp. 134–144, 2002

2002

[10] [10]

Finding structure in time,

J. L. Elman, “Finding structure in time,” Cogn. Sci., vol. 14, no. 2, pp. 179–211, 1990. [Online]. Available: https://doi.org/10.1207/ s15516709cog1402_1

1990

[11] [11]

Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity,

W. Fedus, B. Zoph, and N. Shazeer, “Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity,” J. Mach. Learn. Res., vol. 23, pp. 120:1–120:39, 2022. [Online]. Available: https://jmlr.org/papers/v23/21-0998.html

2022

[12] [12]

Greedy function approximation: a gradient boosting machine,

J. H. Friedman, “Greedy function approximation: a gradient boosting machine,” Annals of statistics, pp. 1189–1232, 2001

2001

[13] [13]

Empirical asset pricing via machine learning,

S. Gu, B. Kelly, and D. Xiu, “Empirical asset pricing via machine learning,” The Review of Financial Studies, vol. 33, no. 5, pp. 2223–2273, 2020

2020

[14] [14]

Attention based spatial-temporal graph convolutional networks for traffic flow forecasting,

S. Guo, Y . Lin, N. Feng, C. Song, and H. Wan, “Attention based spatial-temporal graph convolutional networks for traffic flow forecasting,” in The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances i...

work page doi:10.1609/aaai.v33i01.3301922 2019

[15] [15]

Measuring the information content of stock trades,

J. Hasbrouck, “Measuring the information content of stock trades,” The Journal of Finance, vol. 46, no. 1, pp. 179–207, 1991

1991

[16] [16]

Neural Computation 9(8), 1735–1780 (1997) https://doi.org/10.1162/neco.1997.9.8.1735

S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997. [Online]. Available: https://doi.org/10.1162/neco.1997.9.8.1735

work page doi:10.1162/neco.1997.9.8.1735 1997

[17] [17]

Lightgbm: A highly efficient gradient boosting decision tree,

G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T. Liu, “Lightgbm: A highly efficient gradient boosting decision tree,” in Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, I. Guyon, U. von Luxburg, S. Bengio, H. M. Wallach, R. Ferg...

2017

[18] [18]

Reversible instance normalization for accurate time-series forecasting against distribution shift,

T. Kim, J. Kim, Y . Tae, C. Park, J. Choi, and J. Choo, “Reversible instance normalization for accurate time-series forecasting against distribution shift,” in The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022. [Online]. Available: https: //openreview.net/forum?id=cGDAkQo1C0p

2022

[19] [19]

Master: Market- guided stock transformer for stock price forecasting,

T. Li, Z. Liu, Y . Shen, X. Wang, H. Chen, and S. Huang, “Master: Market- guided stock transformer for stock price forecasting,” in Proceedings of the AAAI conference on artificial intelligence, vol. 38, no. 1, 2024, pp. 162–170

2024

[20] [20]

Learning multiple stock trading patterns with temporal routing adaptor and optimal transport,

H. Lin, D. Zhou, W. Liu, and J. Bian, “Learning multiple stock trading patterns with temporal routing adaptor and optimal transport,” in KDD ’21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, Singapore, August 14-18, 2021, F. Zhu, B. C. Ooi, and C. Miao, Eds. ACM, 2021, pp. 1017–1026. [Online]. Available: https://do...

work page doi:10.1145/3447548.3467358 2021

[21] [21]

Moirai-moe: Empowering time series foundation models with sparse mixture of experts,

X. Liu, J. Liu, G. Woo, T. Aksu, Y . Liang, R. Zimmermann, C. Liu, J. Li, S. Savarese, C. Xiong, and D. Sahoo, “Moirai-moe: Empowering time series foundation models with sparse mixture of experts,” in Forty-second International Conference on Machine Learning, ICML 2025, Vancouver, BC, Canada, July 13-19, 2025, ser. Proceedings of Machine Learning Research...

2025

[22] [22]

itransformer: Inverted transformers are effective for time series forecasting,

Y . Liu, T. Hu, H. Zhang, H. Wu, S. Wang, L. Ma, and M. Long, “itransformer: Inverted transformers are effective for time series forecasting,” in The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net, 2024. [Online]. Available: https: //openreview.net/forum?id=JePfAI8fah

2024

[23] [23]

The adaptive markets hypothesis: Market efficiency from an evolutionary perspective,

A. W. Lo, “The adaptive markets hypothesis: Market efficiency from an evolutionary perspective,” Journal of Portfolio Management, Forthcoming, 2004

2004

[24] [24]

Moderntcn: A modern pure convolution structure for general time series analysis,

D. Luo and X. Wang, “Moderntcn: A modern pure convolution structure for general time series analysis,” in The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net, 2024. [Online]. Available: https://openreview.net/forum?id=vpJMJerXHU

2024

[25] [25]

Wpmixer: Efficient multi-resolution mixing for long-term time series forecasting,

M. M. N. Murad, M. Aktukmak, and Y . Yilmaz, “Wpmixer: Efficient multi-resolution mixing for long-term time series forecasting,” in Thirty-Ninth AAAI Conference on Artificial Intelligence, Thirty-Seventh Conference on Innovative Applications of Artificial Intelligence, Fifteenth Symposium on Educational Advances in Artificial Intelligence, AAAI 2025, Phil...

work page doi:10.1609/aaai.v39i18.34156 2025

[26] [26]

A time series is worth 64 words: Long-term forecasting with transformers,

Y . Nie, N. H. Nguyen, P. Sinthong, and J. Kalagnanam, “A time series is worth 64 words: Long-term forecasting with transformers,” in The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023. [Online]. Available: https://openreview.net/forum?id=Jbdc0vTOcol

2023

[27] [27]

Fredformer: Frequency debiased transformer for time series forecasting,

X. Piao, Z. Chen, T. Murayama, Y . Matsubara, and Y . Sakurai, “Fredformer: Frequency debiased transformer for time series forecasting,” in Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2024, Barcelona, Spain, August 25-29, 2024, R. Baeza-Yates and F. Bonchi, Eds. ACM, 2024, pp. 2400–2410. [Online]. Available: h...

work page doi:10.1145/3637528.3671928 2024

[28] [28]

Applied Soft Computing90, 106181 (2020) https://doi.org/10.1016/j.asoc.2020.106181

O. B. Sezer, M. U. Gudelek, and A. M. Özbayoglu, “Financial time series forecasting with deep learning : A systematic literature review: 2005-2019,” Appl. Soft Comput., vol. 90, p. 106181, 2020. [Online]. Available: https://doi.org/10.1016/j.asoc.2020.106181 13

work page doi:10.1016/j.asoc.2020.106181 2005

[29] [29]

Scaleformer: Iterative multi-scale refining transformers for time series forecasting,

M. A. Shabani, A. H. Abdi, L. Meng, and T. Sylvain, “Scaleformer: Iterative multi-scale refining transformers for time series forecasting,” in The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023. [Online]. Available: https://openreview.net/forum?id=sCrnllCtjoE

2023

[30] [30]

Outrageously large neural networks: The sparsely-gated mixture-of-experts layer,

N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. V . Le, G. E. Hinton, and J. Dean, “Outrageously large neural networks: The sparsely-gated mixture-of-experts layer,” in 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, 2017. [Online]. Available: https:/...

2017

[31] [31]

Time-moe: Billion-scale time series foundation models with mixture of experts,

X. Shi, S. Wang, Y . Nie, D. Li, Z. Ye, Q. Wen, and M. Jin, “Time-moe: Billion-scale time series foundation models with mixture of experts,” in The Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025. OpenReview.net, 2025. [Online]. Available: https://openreview.net/forum?id=e1wDDFmlVu

2025

[32] [32]

Kronos: A foundation model for the language of financial markets,

Y . Shi, Z. Fu, S. Chen, B. Zhao, W. Xu, C. Zhang, and J. Li, “Kronos: A foundation model for the language of financial markets,” in Fortieth AAAI Conference on Artificial Intelligence, Thirty-Eighth Conference on Innovative Applications of Artificial Intelligence, Sixteenth Symposium on Educational Advances in Artificial Intelligence, AAAI 2026, Singapor...

work page doi:10.1609/aaai.v40i30.39730 2026

[33] [33]

Learning pattern- specific experts for time series forecasting under patch-level distribution shift,

Y . Sun, Z. Xie, E. Eldele, D. Chen, Q. Hu, and M. Wu, “Learning pattern- specific experts for time series forecasting under patch-level distribution shift,” in Advances in Neural Information Processing Systems, D. Belgrave, C. Zhang, H. Lin, R. Pascanu, P. Koniusz, M. Ghassemi, and N. Chen, Eds., vol. 38. Curran Associates, Inc., 2025, pp. 91 810–91 844....

2025

[34] [34]

CARD: channel aligned robust blend transformer for time series forecasting,

X. Wang, T. Zhou, Q. Wen, J. Gao, B. Ding, and R. Jin, “CARD: channel aligned robust blend transformer for time series forecasting,” in The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net, 2024. [Online]. Available: https://openreview.net/forum?id=MJksrOhurE

2024

[35] [35]

Asymptotic inference about predictive ability,

K. D. West, “Asymptotic inference about predictive ability,” Econometrica, vol. 64, no. 5, pp. 1067–1084, 1996

1996

[36] [36]

Timesnet: Temporal 2d-variation modeling for general time series analysis,

H. Wu, T. Hu, Y . Liu, H. Zhou, J. Wang, and M. Long, “Timesnet: Temporal 2d-variation modeling for general time series analysis,” in The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023. [Online]. Available: https://openreview.net/forum?id=ju_Uqw384Oq

2023

[37] [37]

Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting,

H. Wu, J. Xu, J. Wang, and M. Long, “Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting,” in Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, M. Ranzato, A. Beygelzimer, Y . N. Dauphin, P. Liang, and ...

2021

[38] [38]

HIST: A graph-based framework for stock trend forecasting via mining concept-oriented shared information,

W. Xu, W. Liu, L. Wang, Y . Xia, J. Bian, J. Yin, and T. Liu, “HIST: A graph-based framework for stock trend forecasting via mining concept-oriented shared information,” CoRR, vol. abs/2110.13716, 2021. [Online]. Available: https://arxiv.org/abs/2110.13716

arXiv 2021

[39] [39]

Finmultitime: A four-modal bilingual dataset for financial time-series analysis,

W. Xu, D. Xiang, Y . Liu, X. Wang, Y . Ma, L. Zhang, C. Xu, and J. Zhang, “Finmultitime: A four-modal bilingual dataset for financial time-series analysis,” CoRR, vol. abs/2506.05019, 2025. [Online]. Available: https://doi.org/10.48550/arXiv.2506.05019

work page doi:10.48550/arxiv.2506.05019 2025

[40] [40]

Qlib: An ai-oriented quantitative investment platform,

X. Yang, W. Liu, D. Zhou, J. Bian, and T. Liu, “Qlib: An ai-oriented quantitative investment platform,” CoRR, vol. abs/2009.11189, 2020. [Online]. Available: https://arxiv.org/abs/2009.11189

arXiv 2009

[41] [41]

Are transformers effective for time series forecasting?

A. Zeng, M. Chen, L. Zhang, and Q. Xu, “Are transformers effective for time series forecasting?” in Thirty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence, IAAI 2023, Thirteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2023, Washington, D...

work page doi:10.1609/aaai.v37i9.26317 2023

[42] [42]

Multi-period learning for financial time series forecasting,

X. Zhang, Z. Huang, Y . Wu, X. Lu, E. Qi, Y . Chen, Z. Xue, Q. Wang, P. Wang, and W. Wang, “Multi-period learning for financial time series forecasting,” in Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, V .1,KDD 2025, Toronto, ON, Canada, August 3-7, 2025, Y . Sun, F. Chierichetti, H. W. Lauw, C. Perlich, W. H. Tok,...

work page doi:10.1145/3690624.3709422 2025

[43] [43]

Crossformer: Transformer utilizing cross- dimension dependency for multivariate time series forecasting,

Y . Zhang and J. Yan, “Crossformer: Transformer utilizing cross- dimension dependency for multivariate time series forecasting,” in The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023. [Online]. Available: https://openreview.net/forum?id=vSVLM2j9eie

2023

[44] [44]

Doubleadapt: A meta-learning approach to incremental learning for stock trend forecasting,

L. Zhao, S. Kong, and Y . Shen, “Doubleadapt: A meta-learning approach to incremental learning for stock trend forecasting,” in Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2023, Long Beach, CA, USA, August 6-10, 2023, A. K. Singh, Y . Sun, L. Akoglu, D. Gunopulos, X. Yan, R. Kumar, F. Ozcan, and J. Ye, Eds. AC...

work page doi:10.1145/3580305.3599315 2023

[45] [45]

Informer: Beyond efficient transformer for long sequence time-series forecasting,

H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, and W. Zhang, “Informer: Beyond efficient transformer for long sequence time-series forecasting,” in Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advan...

work page doi:10.1609/aaai.v35i12.17325 2021

[46] [46]

Film: Frequency improved legendre memory model for long-term time series forecasting,

T. Zhou, Z. Ma, X. Wang, Q. Wen, L. Sun, T. Yao, W. Yin, and R. Jin, “Film: Frequency improved legendre memory model for long-term time series forecasting,” in Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, S. Koyejo,...

2022

[47] [47]

Fedformer: Frequency enhanced decomposed transformer for long- term series forecasting,

T. Zhou, Z. Ma, Q. Wen, X. Wang, L. Sun, and R. Jin, “Fedformer: Frequency enhanced decomposed transformer for long- term series forecasting,” in International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, ser. Proceedings of Machine Learning Research, K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvári, G. Niu, and S....

2022