Pretrained Time-Series Foundation Models for Financial Return Forecasting

Miquel Noguer i Alonso; Rodolfo Pereira Franklin

arxiv: 2606.27100 · v1 · pith:4ZUB3OHMnew · submitted 2026-06-25 · 💱 q-fin.MF

Pretrained Time-Series Foundation Models for Financial Return Forecasting

Miquel Noguer I Alonso , Rodolfo Pereira Franklin This is my paper

Pith reviewed 2026-06-26 01:33 UTC · model grok-4.3

classification 💱 q-fin.MF

keywords time-series foundation modelsfinancial return forecastingpretrained modelsbenchmark evaluationDiebold-Mariano testrolling-origin protocolrandom walk benchmark

0 comments

The pith

Pretrained time-series foundation models often rank first in equity return forecasts yet produce only sparse, small gains over random walks that pass statistical tests in just two cases.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether large pretrained time-series models can forecast daily returns on liquid stocks better than models trained from scratch or a simple random walk. It runs a rolling-origin evaluation on five U.S. equities under equal context length and applies Diebold-Mariano tests to check whether any model beats the random walk. Pretrained models win most head-to-head comparisons, yet the absolute improvements remain tiny and reach statistical significance in only two model-asset pairs. The authors frame pretraining as an inductive prior that transfers useful structure without guaranteeing economic predictability in low signal-to-noise environments. They conclude the models lower the cost of building a forecaster but do not function as universal alpha engines.

Core claim

Under a conservative rolling-origin protocol with equalized context on AAPL, AMZN, GOOG, JPM and META, pretrained models account for eight of ten task-level wins, with Moirai-2.0 and TimesFM-2.5 posting the best average ranks; however, only Chronos on AMZN and Moirai-2.0 on GOOG reject the null of equal or worse accuracy than a random walk at conventional significance levels, while the iTransformer trained locally wins both META tasks.

What carries the argument

Pretraining as an inductive prior that supplies useful attention geometry and PAC-Bayes-style transfer without requiring asset-specific data, evaluated through equal-context rolling-origin forecasts and Diebold-Mariano tests against random-walk and scratch-trained baselines.

If this is right

Pretrained models lower the data and compute needed to reach competitive forecast accuracy in new assets.
Local supervised training can still beat generic pretraining on particular stocks.
Model ranking order does not translate into reliable economic alpha once noise and multiple-testing effects are accounted for.
Information-theoretic limits on predictability remain binding even for the best-ranked pretrained models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same cost-saving role could appear in other low-signal domains such as energy demand or macroeconomic nowcasting where labeled data are scarce.
Adding explicit economic loss functions or position-sizing constraints during fine-tuning might change which models survive the Diebold-Mariano filter.
The observed pattern suggests that future benchmarks should report both ranking and economic metrics rather than ranking alone.

Load-bearing premise

The five chosen liquid equities together with the equal-context rolling-origin protocol and Diebold-Mariano testing give a representative picture of whether pretrained models can produce economically usable predictability.

What would settle it

A replication that expands to at least twenty equities across multiple sectors and frequencies and finds that at least half the pretrained models pass the Diebold-Mariano test against random walk with positive Sharpe improvement after transaction costs would falsify the central claim.

Figures

Figures reproduced from arXiv: 2606.27100 by Miquel Noguer i Alonso, Rodolfo Pereira Franklin.

**Figure 1.** Figure 1: A concrete Prompt-as-Prefix example that conditions a frozen LLM on task context, instructions, and summary statistics of the input series [Jin et al., 2024]. 3.12 TimeGPT and TimeGPT Long Horizon TimeGPT [Garza et al., 2024] is a foundation model designed for time-series forecasting that leverages transfer learning to perform zero-shot inference. It provides a mapping fθ : X 7→ Y with X = {y[0:t] , x[0:t+… view at source ↗

**Figure 3.** Figure 3: N-HiTS architecture. Source: Challu et al. [2022]. Like NBEATS, NHITS performs local nonlinear projections onto basis functions across multiple blocks, each comprising an MLP that generates backcast and forecast coefficients. Blocks are organized into stacks, each specializing in a distinct characteristic of the data with its own set of basis functions. A MaxPool layer is introduced at the input of each bl… view at source ↗

**Figure 4.** Figure 4: PatchTST illustrative multivariate forecasting case study. Source: Nie et al. [2023]. A univariate series of length L, denoted x (i) 1:L for i = 1, . . . , M, is fed into the backbone under a channel-independence scheme and produces prediction results xˆ (i) = (xˆ (i) L+1, . . . , xˆ (i) L+T ) ∈ R 1×T . Each series is partitioned into (possibly overlapping) patches of length P with stride S, yielding patch… view at source ↗

**Figure 5.** Figure 5: Representative actual-versus-predicted trajectories for a single 20-business-day forecast window. The plot illustrates the core difficulty of return forecasting: even the lowest-error models fail to track the realized path closely, which is why small MAE differences in the aggregate tables translate into the cautious overall interpretation of this benchmark. 5.1 Aggregate findings The aggregate ranking ind… view at source ↗

read the original abstract

Financial return forecasting is a difficult test case for time-series foundation models (TSFMs) due to low signal-to-noise ratios, structural breaks, heavy tails, and weak persistence. This paper benchmarks pretrained TSFMs against train-from-scratch neural baselines in a deliberately conservative financial setting. We evaluate TimeGPT/TimeGPT-LH, TimesFM-2.5, Moirai-2.0, Chronos, and Chronos-2 against NBEATS, NHITS, PatchTST, iTransformer, and KAN on five liquid U.S. equities (AAPL, AMZN, GOOG, JPM, META) using linear and log returns. Models are compared under an equalized context budget, a rolling-origin protocol, and against random-walk benchmarks. We provide a theoretical framing of pretraining as an inductive prior, linking PAC-Bayes transfer intuition, information-theoretic predictability limits, and attention geometry. This clarifies why strong model rankings need not imply economically meaningful predictability in noisy markets. Pragmatically, pretrained TSFMs dominate the ranking distribution, accounting for 8 of 10 task-level wins. Moirai-2.0 and TimesFM-2.5 achieve the strongest average ranks, leading tasks for AAPL, JPM, GOOG, and AMZN, while Chronos wins the remaining AMZN task. However, the iTransformer baseline wins both META tasks, showing local supervised learning can still outperform generic pretraining for specific assets. Crucially, gains over the random-walk benchmark are small and sparse. A one-sided Diebold-Mariano test rejects equal or inferior predictive accuracy only for Chronos on AMZN and Moirai-2.0 on GOOG. We conclude that TSFMs serve as useful practical priors that reduce model-development costs in low-data financial forecasting, but are not universal engines for statistically reliable alpha generation in realistic empirical deployment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TSFMs top the rankings on these five stocks but only beat random walk significantly in two of ten tasks under a tight protocol.

read the letter

The key point is that this paper shows pretrained TSFMs winning most head-to-head rankings against scratch-trained models and baselines, yet the actual edge over a random-walk null is small and statistically reliable in just two cases (Chronos on AMZN, Moirai on GOOG). That distinction is the real contribution.

What works is the deliberately conservative setup: equal context budgets, rolling-origin evaluation, one-sided Diebold-Mariano tests, and an explicit random-walk anchor. The PAC-Bayes and information-theoretic framing is used to explain why strong rankings need not translate into usable predictability in low-signal markets. The results line up with that framing rather than contradicting it. The asset-level variation (iTransformer beating everything on META) is also reported plainly.

The main limitation is scope. Five liquid U.S. equities is a narrow slice, and the paper does not claim broader coverage. Without released code or exact data splits it is harder to replicate the exact numbers, though the protocol itself is standard. No load-bearing circularity appears in the reported tests.

This is worth a referee for groups working on foundation models in finance or on the limits of predictability in noisy series. It gives a clear empirical benchmark rather than another optimistic claim. I would send it to review.

Referee Report

0 major / 2 minor

Summary. The manuscript benchmarks pretrained time-series foundation models (TimeGPT, TimesFM-2.5, Moirai-2.0, Chronos, Chronos-2) against train-from-scratch baselines (NBEATS, NHITS, PatchTST, iTransformer, KAN) for linear and log return forecasting on five liquid U.S. equities. Using an equal-context rolling-origin protocol and one-sided Diebold-Mariano tests against random-walk benchmarks, it reports that TSFMs achieve 8 of 10 task-level ranking wins (led by Moirai-2.0 and TimesFM-2.5) but statistically significant gains over the random walk occur in only two cases (Chronos on AMZN, Moirai-2.0 on GOOG). The paper frames pretraining as an inductive prior and concludes that TSFMs reduce model-development costs in low-signal settings without delivering reliable alpha.

Significance. If the results hold, the work supplies a conservative, protocol-grounded demonstration that TSFMs function as practical priors for financial forecasting while underscoring the distinction between ranking dominance and statistically reliable predictability. The explicit linkage of PAC-Bayes transfer ideas to attention geometry and information-theoretic limits, together with the external random-walk anchor and Diebold-Mariano testing, strengthens the pragmatic takeaway.

minor comments (2)

The evaluation is restricted to five equities; while the paper correctly labels the setting conservative, a brief discussion of how results might generalize to a broader cross-section (e.g., small-cap or international assets) would clarify scope.
The manuscript notes the absence of public code artifacts; releasing the rolling-origin evaluation scripts would directly support the reproducibility claim already implicit in the protocol description.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive and accurate summary of the manuscript, the assessment of its significance, and the recommendation to accept. The report correctly identifies the core empirical result (ranking dominance with sparse statistical gains over the random walk) and the pragmatic framing of TSFMs as inductive priors.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents an empirical benchmarking study of pretrained TSFMs versus baselines on five equities under a rolling-origin protocol with Diebold-Mariano tests against an external random-walk benchmark. No load-bearing derivations, equations, or self-citations reduce the central claims (model rankings and sparse statistical significance) to fitted inputs or prior author results by construction. The theoretical framing on inductive priors is interpretive context rather than a closed derivation loop, and all performance metrics are anchored to independent data and standard external tests.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard time-series evaluation assumptions and statistical test validity without introducing new fitted parameters or postulated entities.

axioms (2)

domain assumption Diebold-Mariano test assumptions hold for the forecast errors in this financial setting
Invoked to interpret the two rejections of the null as evidence of superior accuracy.
domain assumption Equalized context budget and rolling-origin protocol remove confounding differences in information access across models
Central to the claim that pretrained models dominate the ranking distribution.

pith-pipeline@v0.9.1-grok · 5880 in / 1263 out tokens · 25812 ms · 2026-06-26T01:33:42.292398+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

17 extracted references · 9 linked inside Pith

[1]

URLhttps://arxiv.org/abs/2403.07815. Abdul Fatir Ansari, Oleksandr Shchur, Jaris Küken, Andreas Auer, Boran Han, Pedro Mercado, Syama Sundar Rangapuram, Huibin Shen, Lorenzo Stella, Xiyuan Zhang, Mononito Goswami, Shubham Kapoor, Danielle C. Maddix, Pablo Guerron, Tony Hu, Junming Yin, Nick Erickson, Prateek Mutalik Desai, Hao Wang, Huzefa Rangwala, Georg...

Pith/arXiv arXiv
[2]

Vladimir I

URL https://arxiv.org/abs/2510.15821. Vladimir I. Arnold. On functions of three variables.Doklady Akademii Nauk SSSR, 114:679–681,

Pith/arXiv arXiv
[3]

Olivier Catoni.PAC-Bayesian Supervised Classification: The Thermodynamics of Statistical Learning, volume 56 ofInstitute of Mathematical Statistics Lecture Notes – Monograph Series

URL https://arxiv.org/abs/2005.14165. Olivier Catoni.PAC-Bayesian Supervised Classification: The Thermodynamics of Statistical Learning, volume 56 ofInstitute of Mathematical Statistics Lecture Notes – Monograph Series. Institute of Mathematical Statistics,

Pith/arXiv arXiv 2005
[4]

Ching Chang, Wei-Yao Wang, Wen-Chih Peng, and Tien-Fu Chen

URLhttps://arxiv.org/abs/2201.12886. Ching Chang, Wei-Yao Wang, Wen-Chih Peng, and Tien-Fu Chen. Llm4ts: Aligning pre-trained llms as data-efficient time-series forecasters,

arXiv
[5]

Kuo-Tsai Chen

URLhttps://arxiv.org/abs/2308.08469. Kuo-Tsai Chen. Integration of paths—a faithful representation of paths by noncommutative formal power series.Transactions of the American Mathematical Society, 89(2):395–407,

arXiv
[6]

Thomas M

URLhttps://arxiv.org/abs/1603.03788. Thomas M. Cover and Joy A. Thomas.Elements of Information Theory. Wiley-Interscience, 2nd edition,

arXiv
[7]

Carl de Boor.A Practical Guide to Splines, volume 27 ofApplied Mathematical Sciences

URLhttps://arxiv.org/abs/2310.10688. Carl de Boor.A Practical Guide to Splines, volume 27 ofApplied Mathematical Sciences. Springer, revised edition,

Pith/arXiv arXiv
[8]

Tilmann Gneiting and Adrian E

URL https: //arxiv.org/abs/2310.03589. Tilmann Gneiting and Adrian E. Raftery. Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102(477):359–378,

arXiv
[9]

URLhttps://arxiv.org/abs/2310.01728. J. L. Kelly. A new interpretation of information rate.Bell System Technical Journal, 35(4):917–926,

Pith/arXiv arXiv
[10]

Version cited in manuscript as Moirai 2.0

URL https://arxiv.org/abs/2511.11698. Version cited in manuscript as Moirai 2.0. Yong Liu, Tengge Hu, Haoran Zhang, Haixu Wu, Shiyu Wang, Lintao Ma, and Mingsheng Long. itransformer: Inverted transformers are effective for time series forecasting,

arXiv
[11]

URLhttps: //arxiv.org/abs/2310.06625. Terry J. Lyons. Differential equations driven by rough signals.Revista Matemática Iberoamericana, 14(2):215–310,

Pith/arXiv arXiv
[12]

URLhttps://arxiv.org/abs/2303.08774. Boris N. Oreshkin, Dmitri Carpov, Nicolas Chapados, and Yoshua Bengio. N-beats: Neural basis expansion analysis for interpretable time series forecasting,

Pith/arXiv arXiv
[13]

Felix Otto

URLhttps://arxiv.org/abs/ 1905.10437. Felix Otto. The geometry of dissipative evolution equations: the porous medium equation.Commu- nications in Partial Differential Equations, 26(1–2):101–174,

arXiv 1905
[14]

URL https: //arxiv.org/abs/2312.11805. Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. Llama: Open and efficient foundation language models,

Pith/arXiv arXiv
[15]

Cédric Villani.Topics in Optimal Transportation, volume 58 ofGraduate Studies in Mathematics

URLhttps://arxiv.org/abs/2302.13971. Cédric Villani.Topics in Optimal Transportation, volume 58 ofGraduate Studies in Mathematics. American Mathematical Society,

Pith/arXiv arXiv
[16]

Hao Xue and Flora D

URLhttps://arxiv.org/abs/2406.02496. Hao Xue and Flora D. Salim. Promptcast: A new prompt-based learning paradigm for time series forecasting,

arXiv
[17]

URLhttps://arxiv.org/abs/2210.08964. 37

arXiv

[1] [1]

URLhttps://arxiv.org/abs/2403.07815. Abdul Fatir Ansari, Oleksandr Shchur, Jaris Küken, Andreas Auer, Boran Han, Pedro Mercado, Syama Sundar Rangapuram, Huibin Shen, Lorenzo Stella, Xiyuan Zhang, Mononito Goswami, Shubham Kapoor, Danielle C. Maddix, Pablo Guerron, Tony Hu, Junming Yin, Nick Erickson, Prateek Mutalik Desai, Hao Wang, Huzefa Rangwala, Georg...

Pith/arXiv arXiv

[2] [2]

Vladimir I

URL https://arxiv.org/abs/2510.15821. Vladimir I. Arnold. On functions of three variables.Doklady Akademii Nauk SSSR, 114:679–681,

Pith/arXiv arXiv

[3] [3]

Olivier Catoni.PAC-Bayesian Supervised Classification: The Thermodynamics of Statistical Learning, volume 56 ofInstitute of Mathematical Statistics Lecture Notes – Monograph Series

URL https://arxiv.org/abs/2005.14165. Olivier Catoni.PAC-Bayesian Supervised Classification: The Thermodynamics of Statistical Learning, volume 56 ofInstitute of Mathematical Statistics Lecture Notes – Monograph Series. Institute of Mathematical Statistics,

Pith/arXiv arXiv 2005

[4] [4]

Ching Chang, Wei-Yao Wang, Wen-Chih Peng, and Tien-Fu Chen

URLhttps://arxiv.org/abs/2201.12886. Ching Chang, Wei-Yao Wang, Wen-Chih Peng, and Tien-Fu Chen. Llm4ts: Aligning pre-trained llms as data-efficient time-series forecasters,

arXiv

[5] [5]

Kuo-Tsai Chen

URLhttps://arxiv.org/abs/2308.08469. Kuo-Tsai Chen. Integration of paths—a faithful representation of paths by noncommutative formal power series.Transactions of the American Mathematical Society, 89(2):395–407,

arXiv

[6] [6]

Thomas M

URLhttps://arxiv.org/abs/1603.03788. Thomas M. Cover and Joy A. Thomas.Elements of Information Theory. Wiley-Interscience, 2nd edition,

arXiv

[7] [7]

Carl de Boor.A Practical Guide to Splines, volume 27 ofApplied Mathematical Sciences

URLhttps://arxiv.org/abs/2310.10688. Carl de Boor.A Practical Guide to Splines, volume 27 ofApplied Mathematical Sciences. Springer, revised edition,

Pith/arXiv arXiv

[8] [8]

Tilmann Gneiting and Adrian E

URL https: //arxiv.org/abs/2310.03589. Tilmann Gneiting and Adrian E. Raftery. Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102(477):359–378,

arXiv

[9] [9]

URLhttps://arxiv.org/abs/2310.01728. J. L. Kelly. A new interpretation of information rate.Bell System Technical Journal, 35(4):917–926,

Pith/arXiv arXiv

[10] [10]

Version cited in manuscript as Moirai 2.0

URL https://arxiv.org/abs/2511.11698. Version cited in manuscript as Moirai 2.0. Yong Liu, Tengge Hu, Haoran Zhang, Haixu Wu, Shiyu Wang, Lintao Ma, and Mingsheng Long. itransformer: Inverted transformers are effective for time series forecasting,

arXiv

[11] [11]

URLhttps: //arxiv.org/abs/2310.06625. Terry J. Lyons. Differential equations driven by rough signals.Revista Matemática Iberoamericana, 14(2):215–310,

Pith/arXiv arXiv

[12] [12]

URLhttps://arxiv.org/abs/2303.08774. Boris N. Oreshkin, Dmitri Carpov, Nicolas Chapados, and Yoshua Bengio. N-beats: Neural basis expansion analysis for interpretable time series forecasting,

Pith/arXiv arXiv

[13] [13]

Felix Otto

URLhttps://arxiv.org/abs/ 1905.10437. Felix Otto. The geometry of dissipative evolution equations: the porous medium equation.Commu- nications in Partial Differential Equations, 26(1–2):101–174,

arXiv 1905

[14] [14]

URL https: //arxiv.org/abs/2312.11805. Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. Llama: Open and efficient foundation language models,

Pith/arXiv arXiv

[15] [15]

Cédric Villani.Topics in Optimal Transportation, volume 58 ofGraduate Studies in Mathematics

URLhttps://arxiv.org/abs/2302.13971. Cédric Villani.Topics in Optimal Transportation, volume 58 ofGraduate Studies in Mathematics. American Mathematical Society,

Pith/arXiv arXiv

[16] [16]

Hao Xue and Flora D

URLhttps://arxiv.org/abs/2406.02496. Hao Xue and Flora D. Salim. Promptcast: A new prompt-based learning paradigm for time series forecasting,

arXiv

[17] [17]

URLhttps://arxiv.org/abs/2210.08964. 37

arXiv