Tyan-WP: A Wind Power Foundation Model for Ultra-Short-Term Probabilistic Forecasting

Ao Luo; Bin Li; Bo Wang; Hongwei Zhao; Jiahui Huang; Lei Liu; Ruibo Guo; Tengyuan Liu; Zhao Wang

arxiv: 2606.08630 · v1 · pith:MFRJCZ4Nnew · submitted 2026-06-07 · 💻 cs.LG · cs.AI

Tyan-WP: A Wind Power Foundation Model for Ultra-Short-Term Probabilistic Forecasting

Jiahui Huang , Ao Luo , Lei Liu , Hongwei Zhao , Tengyuan Liu , Ruibo Guo , Bo Wang , Zhao Wang

show 1 more author

Bin Li

This is my paper

Pith reviewed 2026-06-27 18:51 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords wind power forecastingfoundation modelprobabilistic forecastingzero-shot learningtime series modelsmeteorological covariatesgeneralization

0 comments

The pith

A foundation model pretrained on over 126000 U.S. wind sites delivers accurate zero-shot probabilistic forecasts at new locations without local training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Tyan-WP as the first wind power foundation model for ultra-short-term probabilistic forecasting. It claims that pretraining on a massive U.S. dataset combined with static site embeddings and a power-aware meteorological fusion module allows the model to outperform both site-specific supervised models and generic large time series models. A sympathetic reader would care because this approach targets the practical problem of data scarcity at newly commissioned wind farms, enabling faster grid integration without lengthy site-specific data collection. The work emphasizes cross-geography generalization, including to U.K. sites, as evidence that the model captures transferable patterns rather than location-specific ones.

Core claim

Tyan-WP is pretrained on a large-scale dataset covering more than 126000 U.S. sites over seven years; it incorporates static site embedding from coordinate, terrain, and ecoregion metadata plus a power-aware meteorological fusion module that models interactions between historical power and meteorological covariates, thereby achieving zero-shot ultra-short-term probabilistic forecasting that surpasses eight site-specific supervised time series models on 10 in-domain sites and eleven generic large time series models on 127 in-domain sites while also generalizing to six real U.K. sites.

What carries the argument

Static site embedding using coordinate, terrain, and ecoregion metadata together with the power-aware meteorological fusion (PAMF) module that models interactions between historical power and meteorological covariates.

If this is right

Tyan-WP reduces MAE by 19.9 percent, RMSE by 16.6 percent, CRPS by 22.2 percent, and AQL by 21.7 percent while raising R squared by 16.7 percent relative to the compared baselines.
The model outperforms eight site-specific supervised time series models on 10 in-domain sites and eleven generic large time series models on 127 in-domain sites under a unified evaluation protocol.
Tyan-WP demonstrates strong cross-geography generalization when tested on six real U.K. sites without target-site training.
Accurate zero-shot forecasting without target-site training supplies a practical route for rapid turbine onboarding and probabilistic risk management at new wind farms.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Grid operators could shorten the commissioning timeline for new wind farms by deploying the model immediately upon turbine installation rather than waiting for months of local data.
Probabilistic outputs from such models might feed directly into reserve sizing and market bidding decisions, lowering the cost of integrating variable wind generation.
Similar static embedding and covariate-fusion designs could be tested on solar or other renewable forecasting tasks where site metadata also influences output.

Load-bearing premise

The large-scale U.S. pretraining dataset is sufficiently representative of new sites including U.K. locations so that the static site embedding and PAMF module produce genuine zero-shot generalization rather than U.S.-specific patterns.

What would settle it

Measured forecast errors on a fresh collection of sites whose terrain, climate, or ecoregion distributions lie substantially outside the U.S. pretraining distribution would exceed the reported reductions if the representativeness assumption does not hold.

Figures

Figures reproduced from arXiv: 2606.08630 by Ao Luo, Bin Li, Bo Wang, Hongwei Zhao, Jiahui Huang, Lei Liu, Ruibo Guo, Tengyuan Liu, Zhao Wang.

**Figure 1.** Figure 1: Geographical distribution of all sites in the WTK dataset, covering the continental United States, categorized by pretraining and evaluation pools. Unseen sites within the domain (used for evaluation) are selected randomly and represent various ecological regions, including inland and coastal areas [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 2.** Figure 2: Visualization of power curve and wind rose at a representative site in the WTK dataset. 𝐋𝐞𝐟 𝐭: joint distribution of wind speed and turbine-level power at site 9094, with marginal histograms and a two-dimensional bin-count colour scale summarising 736,416 valid samples. 𝐑𝐢𝐠𝐡𝐭: wind rose summarising the directional frequency and wind-speed distribution at the same site. The point forecast is taken as the me… view at source ↗

**Figure 3.** Figure 3: The overall framework of Tyan-WP. The model takes historical power sequences, historical meteorological sequences, timestamp information, and static site metadata as inputs, without using future meteorological variables. Dynamic sequences are mapped into dual-branch patch embeddings, while timestamps and static metadata are mapped into time-level calendar embeddings and site-level geography-ecology embeddi… view at source ↗

**Figure 4.** Figure 4: Visualization of power curve and wind rose at a representative site in the Kelmarsh dataset. 𝐋𝐞𝐟 𝐭: joint distribution of wind speed and turbine-level power for turbine KWF1, with marginal histograms and a two-dimensional bin-count colour scale summarising 309,984 valid samples after clipping 32,212 negative-power records. 𝐑𝐢𝐠𝐡𝐭: wind rose summarising the directional frequency and wind-speed distribution f… view at source ↗

**Figure 5.** Figure 5: A comparison of site-level deterministic distributions for zero-shot Tyan-WP and site-specific supervised TSM baselines at the WTK 10-site [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: A comparison of site-level probabilistic distributions for zero-shot Tyan-WP and site-specific supervised TSM baselines at the WTK 10-site [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

**Figure 7.** Figure 7: A comparison of site-level deterministic distributions of zero-shot forecasting for Tyan-WP and generic LTSMs at the WTK 127-site [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

**Figure 8.** Figure 8: A comparison of site-level probabilistic distributions of zero-shot forecasting for Tyan-WP and generic LTSMs at the WTK 127-site. 22.2%, and AQL by 21.7%, while increasing R 2 by 16.7%. Compared to LTSMs pretrained on extensive time series corpora, these improvements indicate that pretraining on domain-specific large-scale wind power sequence data, along with the two domain-specific module designs, static… view at source ↗

**Figure 9.** Figure 9: A comparison of site-level deterministic distributions of zero-shot forecasting for Tyan-WP and generic LTSMs at the Kelmarsh 6-site. importantly, this benchmark involves a pronounced domain shift from the simulated U.S. WTK source domain to the real U.K. SCADA target domain, leading to substantial differences in data distribution and measurement characteristics. Relative to the strongest generic LTSM base… view at source ↗

**Figure 10.** Figure 10: A comparison of site-level probabilistic distributions of zero-shot forecasting for Tyan-WP and generic LTSMs at the Kelmarsh 6-site. 5.4. Ablation study To individually evaluate the contribution of each architectural component, we evaluate four structural ablation schemes on the in-domain WTK 127-site and out-of-domain Kelmarsh 6-site zero-shot benchmarks. These variants sequentially replace the power an… view at source ↗

read the original abstract

Global wind power capacity, especially in China, is booming, with new farms spanning diverse terrains and climates. The industry urgently needs accurate wind power foundation models to shorten commissioning and accelerate grid connection. This is because site-specific time series models (TSMs) are not well suited to data-scarce scenarios and generalize poorly, while generic large time series models (LTSMs) are mostly limited to univariate inputs and cannot fully exploit static site attributes or the dependencies between power and meteorological covariates, leading to insufficient accuracy. To fill this gap, we propose \textbf{Tyan-WP}, the first wind power foundation model for ultra-short-term probabilistic forecasting. Pretrained on a large-scale wind power dataset covering more than 126,000 U.S. sites over seven years, Tyan-WP further improves zero-shot forecasting through two domain-specific module designs: static site embedding using coordinate, terrain, and ecoregion metadata, and a power-aware meteorological fusion (PAMF) module that models interactions between historical power and meteorological covariates. Under a unified evaluation protocol, Tyan-WP surpasses eight site-specific supervised TSMs on 10 in-domain sites and outperforms eleven generic LTSMs on 127 in-domain sites, reducing MAE by 19.9%, RMSE by 16.6%, CRPS by 22.2%, and AQL by 21.7%, while raising R^2 by 16.7%. It further demonstrates strong cross-geography generalization on six real U.K. sites. These results show that the wind power foundation model can achieve accurate zero-shot forecasting without target-site training, providing a practical pathway for rapid turbine onboarding and probabilistic risk management at new wind farms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Tyan-WP adds useful domain-specific modules to a large US wind pretraining setup and reports clear in-domain gains, but the UK zero-shot results rest on an unverified transfer assumption without shift diagnostics.

read the letter

Tyan-WP is a foundation model pretrained on 126k US wind sites that adds two targeted pieces: static embeddings drawn from site coordinates, terrain, and ecoregion data, plus a power-aware meteorological fusion module that mixes historical power with meteo covariates. These let it run zero-shot probabilistic forecasts that beat eight site-specific TSMs on 10 in-domain sites and eleven generic LTSMs on 127 sites, with the reported drops in MAE, RMSE, CRPS, and AQL plus the R2 lift.

The architecture choices are the real addition. Most LTSMs stay univariate or treat static features lightly; here the embeddings and fusion module directly tackle wind-specific structure that site models usually handle only after local training. The pretraining scale is also larger than typical energy time-series work, which gives the zero-shot angle some grounding.

The evaluation protocol is a plus for comparability, but the abstract leaves gaps on splits, baseline sizing, and statistical tests. The UK results on six sites are the weakest part: the paper states strong cross-geography performance yet supplies no covariate histograms, embedding distances, or error breakdowns that would show the US data actually bridges the distribution shift. Without those, the gains could partly reflect overlap rather than robust transfer.

This is for groups working on renewables forecasting or domain-adapted time-series models who need fast deployment at new sites. A reader focused on practical zero-shot methods will find the modules worth looking at.

It should go to peer review. The scale and the two modules give it enough substance for referees to evaluate, even if the transfer checks need tightening.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces Tyan-WP, a foundation model for ultra-short-term probabilistic wind power forecasting pretrained on data from over 126,000 U.S. sites spanning seven years. It incorporates a static site embedding module (using coordinate, terrain, and ecoregion metadata) and a power-aware meteorological fusion (PAMF) module to model interactions between power and meteorological covariates. The central claims are that Tyan-WP outperforms eight site-specific supervised TSMs on 10 in-domain sites and eleven generic LTSMs on 127 in-domain sites (with reported reductions of 19.9% MAE, 16.6% RMSE, 22.2% CRPS, 21.7% AQL and 16.7% R² increase) under a unified protocol, while also demonstrating strong zero-shot cross-geography generalization on six real U.K. sites without target-site training.

Significance. If the empirical claims hold after verification, the work would be significant for the wind energy sector by offering a practical pathway to accurate probabilistic forecasting at new sites with limited data, shortening commissioning times. The scale of the U.S. pretraining corpus (126k sites) represents a concrete strength for foundation-model-style transfer in this domain.

major comments (2)

[Abstract and §4] Abstract and §4 (Experimental Setup): the unified evaluation protocol reports specific metric deltas but supplies no information on data splits, statistical significance tests, baseline hyperparameter matching, or controls for model size differences. These omissions are load-bearing for the in-domain superiority claims.
[§5.3] §5.3 (U.K. Generalization Results): the claim of strong cross-geography zero-shot transfer on six U.K. sites rests on the static site embedding and PAMF module bridging U.S. and U.K. distributions, yet no quantitative domain-shift diagnostics (covariate histograms, embedding-space distances, or per-site error breakdowns) are provided to compare the 126k-site U.S. pretraining distribution against the U.K. test sites. This directly affects the generalization claim.

minor comments (1)

[Throughout] Ensure all acronyms (TSM, LTSM, PAMF, AQL) are defined on first use and used consistently in figure captions and tables.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. The comments highlight important aspects of the evaluation protocol and generalization analysis that merit clarification and expansion. We address each major comment below, indicating where revisions will be made to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract and §4] Abstract and §4 (Experimental Setup): the unified evaluation protocol reports specific metric deltas but supplies no information on data splits, statistical significance tests, baseline hyperparameter matching, or controls for model size differences. These omissions are load-bearing for the in-domain superiority claims.

Authors: We agree that the current description of the unified evaluation protocol in §4 is insufficiently detailed. In the revised manuscript we will expand this section to explicitly document: (i) the precise train/validation/test split ratios and temporal partitioning used for the 10 in-domain and 127 in-domain experiments, (ii) the results of statistical significance tests (paired t-tests or Wilcoxon signed-rank tests with p-values) on all reported metric improvements, (iii) the hyperparameter search procedure applied uniformly to both Tyan-WP and the eight site-specific TSM baselines, and (iv) a table comparing parameter counts and computational budgets across all compared models to address capacity differences. These additions will be incorporated without changing the numerical results already reported. revision: yes
Referee: [§5.3] §5.3 (U.K. Generalization Results): the claim of strong cross-geography zero-shot transfer on six U.K. sites rests on the static site embedding and PAMF module bridging U.S. and U.K. distributions, yet no quantitative domain-shift diagnostics (covariate histograms, embedding-space distances, or per-site error breakdowns) are provided to compare the 126k-site U.S. pretraining distribution against the U.K. test sites. This directly affects the generalization claim.

Authors: We concur that quantitative domain-shift analysis would better substantiate the cross-geography zero-shot claims. In the revised §5.3 we will add: (i) side-by-side histograms and Kolmogorov-Smirnov tests for key meteorological covariates between the U.S. pretraining corpus and the six U.K. sites, (ii) Euclidean or cosine distances in the learned static embedding space between U.S. and U.K. site embeddings, and (iii) per-site MAE, CRPS and error distribution plots for the U.K. zero-shot forecasts. These diagnostics will be computed from the existing trained model and data and included to directly illustrate the bridging effect of the static embeddings and PAMF module. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical evaluation on held-out data after separate pretraining

full rationale

The paper reports standard empirical results from pretraining on a large separate U.S. dataset (>126k sites) followed by evaluation on held-out in-domain sites and cross-geography U.K. sites. No equations, self-citations, or fitted parameters reduce the reported metrics (MAE, RMSE, CRPS, etc.) to quantities computed on the test sets by construction. The evaluation protocol is independent of the training data.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Central claim rests on the representativeness of the 126,000-site U.S. dataset for cross-geography generalization and on the two domain-specific modules capturing relevant interactions; abstract provides no explicit free parameters, axioms, or invented entities.

pith-pipeline@v0.9.1-grok · 5866 in / 1152 out tokens · 19874 ms · 2026-06-27T18:51:30.293805+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

50 extracted references · 17 canonical work pages

[1]

Global wind report 2026

Global Wind Energy Council. Global wind report 2026. 2026. https://www.gwec.net/reports/globalwindreport

2026
[2]

China’s newly installed wind and solar capacity exceeded 430 gw in 2025, reaching a record high

National Energy Administration. China’s newly installed wind and solar capacity exceeded 430 gw in 2025, reaching a record high. 2026. https://www.nea.gov.cn/20260212/d9f714e91a7f40d39282d87e384ea94a/c.html

arXiv 2025
[3]

National energy administration releases 2025 national power statistics

National Energy Administration. National energy administration releases 2025 national power statistics. 2026. https://www.nea.gov.cn/20260129/6874f211acd0417eab7ac10c3061a7c2/c.html

arXiv 2025
[4]

Bayesian averaging-enabled transfer learning method for probabilistic wind power forecasting of newly built wind farms

Hu J., Hu W., Cao D., Huang Y., Chen J., Li Y., et al. Bayesian averaging-enabled transfer learning method for probabilistic wind power forecasting of newly built wind farms. Appl Energy 2024;355:122185. https://doi.org/10.1016/j.apenergy.2023.122185

work page doi:10.1016/j.apenergy.2023.122185 2024
[5]

A deep asymmetric laplace neural network for deterministic and probabilistic wind power forecasting

Wang Y., Xu H., Zou R., Zhang L., Zhang F.. A deep asymmetric laplace neural network for deterministic and probabilistic wind power forecasting. Renew Energy 2022;196:497–517. https://doi.org/10.1016/j.renene.2022.07.009

work page doi:10.1016/j.renene.2022.07.009 2022
[6]

S., Zareipour H., Malik O., Mandal P

Soman S. S., Zareipour H., Malik O., Mandal P.. A review of wind power and wind speed forecasting methods with diﬀerent time horizons. In: North American Power Symposium 2010. IEEE; 2010. p. 1–8

2010
[7]

Jung J., Broadwater R. P.. Current status and future advances for wind speed and power forecasting. Renew Sustain Energy Rev 2014;31:762–

2014
[8]

https://doi.org/10.1016/j.rser.2013.12.054

work page doi:10.1016/j.rser.2013.12.054 2013
[9]

Ultra-short-term wind power forecasting based on deep bayesian model with uncertainty

Liu L., Liu J., Ye Y., Liu H., Chen K., Li D., et al. Ultra-short-term wind power forecasting based on deep bayesian model with uncertainty. Renew Energy 2023;205:598–607. https://doi.org/10.1016/j.renene.2023.01.038

work page doi:10.1016/j.renene.2023.01.038 2023
[10]

Approaches to wind power curve modeling: A review and discussion

Wang Y., Hu Q., Li L., Foley A., Srinivasan D.. Approaches to wind power curve modeling: A review and discussion. Renew Sustain Energy Rev 2019;116:109422. https://doi.org/10.1016/j.rser.2019.109422

work page doi:10.1016/j.rser.2019.109422 2019
[11]

Interpretable feature-temporal transformer for short-term wind power forecasting with multivariate time series

Liu L., Wang X., Dong X., Chen K., Chen Q., Li B.. Interpretable feature-temporal transformer for short-term wind power forecasting with multivariate time series. Appl Energy 2024;374:124035. https://doi.org/10.1016/j.apenergy.2024.124035

work page doi:10.1016/j.apenergy.2024.124035 2024
[12]

Bulaevskaya V ., Wharton S., Clifton A., Qualley G., Miller W. O.. Wind power curve modeling in complex terrain using statistical models. J Renew Sustain Energy 2015;7:013103. https://doi.org/10.1063/1.4904430

work page doi:10.1063/1.4904430 2015
[13]

A review on the recent history of wind power ramp forecasting

Gallego-Castillo C., Cuerva-Tejero Á., Lopez-Garcia O.. A review on the recent history of wind power ramp forecasting. Renew Sustain Energy Rev 2015;52:1148–1157. https://doi.org/10.1016/j.rser.2015.07.154

work page doi:10.1016/j.rser.2015.07.154 2015
[14]

A review of wind speed and wind power forecasting with deep neural networks

Wang J., Li Y.. A review of wind speed and wind power forecasting with deep neural networks. Appl Energy 2021;304:117766. https://doi.org/10.1016/j.apenergy.2021.117766

work page doi:10.1016/j.apenergy.2021.117766 2021
[15]

Deep learning based ensemble approach for probabilistic wind power forecasting

Wang H.-z., Li G.-q., Wang G.-b., Peng J.-c., Jiang H., Liu Y.-t.. Deep learning based ensemble approach for probabilistic wind power forecasting. Appl Energy 2017;188:56–70. https://doi.org/10.1016/j.apenergy.2016.11.111

work page doi:10.1016/j.apenergy.2016.11.111 2017
[16]

FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting

Zhou T., Ma Z., Wen Q., Wang X., Sun L., Jin R.. FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting. In: International Conference on Machine Learning. 2022. p. 27268–27286

2022
[17]

Are transformers eﬀective for time series forecasting?

Zeng A., Chen M., Zhang L., Xu Q.. Are transformers eﬀective for time series forecasting?. In: AAAI Conference on Artiﬁcial Intelligence
[18]

Non-stationary transformers: Exploring the stationarity in time series forecasting

Liu Y., Wu H., Wang J., Long M.. Non-stationary transformers: Exploring the stationarity in time series forecasting. In: Advances in Neural Information Processing Systems. 2022. p. 9881–9893. Huang, Luo et al.: Preprint submitted to Elsevier Page 19 of 20

2022
[19]

TimesNet: Temporal 2d-variation modeling for general time series analysis

Wu H., Hu T., Liu Y., Zhou H., Wang J., Long M.. TimesNet: Temporal 2d-variation modeling for general time series analysis. In: International Conference on Learning Representations. 2023

2023
[20]

H., Sinthong P., Kalagnanam J

Nie Y., Nguyen N. H., Sinthong P., Kalagnanam J.. A time series is worth 64 words: Long-term forecasting with transformers. In: International Conference on Learning Representations. 2023

2023
[21]

iTransformer: Inverted transformers are eﬀective for time series forecasting

Liu Y., Hu T., Zhang H., Wu H., Wang S., Ma L., et al. iTransformer: Inverted transformers are eﬀective for time series forecasting. In: International Conference on Learning Representations. 2024

2024
[22]

TimeXer: Empowering transformers for time series forecasting with exogenous variables

Wang Y., Wu H., Dong J., Liu Y., Qiu Y., Zhang H., et al. TimeXer: Empowering transformers for time series forecasting with exogenous variables. In: Advances in Neural Information Processing Systems. 2024

2024
[23]

TimeMixer: Decomposable multiscale mixing for time series forecasting

Wang S., Wu H., Shi X., Hu T., Luo H., Ma L., et al. TimeMixer: Decomposable multiscale mixing for time series forecasting. In: International Conference on Learning Representations. 2024

2024
[24]

MOMENT: A family of open time-series foundation models

Goswami M., Szafer K., Choudhry A., Cai Y., Li S., Dubrawski A.. MOMENT: A family of open time-series foundation models. In: International Conference on Machine Learning. 2024

2024
[25]

Time-MoE: Billion-scale time series foundation models with mixture of experts

Shi X., Wang S., Nie Y., Li D., Ye Z., Wen Q., et al. Time-MoE: Billion-scale time series foundation models with mixture of experts. arXiv preprint arXiv:2409.16040. 2025. https://arxiv.org/abs/2409.16040

arXiv 2025
[26]

Timer: Generative pre-trained transformers are large time series models

Liu Y., Zhang H., Li C., Huang X., Wang J., Long M.. Timer: Generative pre-trained transformers are large time series models. arXiv preprint arXiv:2402.02368. 2024. https://arxiv.org/abs/2402.02368

arXiv 2024
[27]

Timer-XL: Long-context transformers for uniﬁed time series forecasting

Liu Y., Qin G., Huang X., Wang J., Long M.. Timer-XL: Long-context transformers for uniﬁed time series forecasting. arXiv preprint arXiv:2410.04803. 2024. https://arxiv.org/abs/2410.04803

arXiv 2024
[28]

Timer-S1: A billion-scale time series foundation model with serial scaling

Liu Y., Su X., Wang S., Zhang H., Liu H., Wang Y., et al. Timer-S1: A billion-scale time series foundation model with serial scaling. arXiv preprint arXiv:2603.04791. 2026. https://arxiv.org/abs/2603.04791

Pith/arXiv arXiv 2026
[29]

Sundial: A family of highly capable time series foundation models

Liu Y., Qin G., Shi Z., Huang X., Wang J., Long M.. Sundial: A family of highly capable time series foundation models. arXiv preprint arXiv:2502.00816. 2025. https://arxiv.org/abs/2502.00816

Pith/arXiv arXiv 2025
[30]

TiRex: Zero-shot forecasting across long and short horizons with enhanced in-context learning

Auer A., Podest P., Klotz D., Böck S., Klambauer G., Hochreiter S.. TiRex: Zero-shot forecasting across long and short horizons with enhanced in-context learning. arXiv preprint arXiv:2505.23719. 2025. https://arxiv.org/abs/2505.23719

arXiv 2025
[31]

A decoder-only foundation model for time-series forecasting

Das A., Kong W., Sen R., Zhou Y.. A decoder-only foundation model for time-series forecasting. arXiv preprint arXiv:2310.10688. 2024. https://arxiv.org/abs/2310.10688

Pith/arXiv arXiv 2024
[32]

F., Stella L., Turkmen C., Zhang X., Mercado P., Shen H., et al

Ansari A. F., Stella L., Turkmen C., Zhang X., Mercado P., Shen H., et al. Chronos: Learning the language of time series. Trans Mach Learn Res 2024. https://arxiv.org/abs/2403.07815

Pith/arXiv arXiv 2024
[33]

F., Shchur O., Küken J., Auer A., Han B., Mercado P., et al

Ansari A. F., Shchur O., Küken J., Auer A., Han B., Mercado P., et al. Chronos-2: From univariate to universal forecasting. arXiv preprint arXiv:2510.15821. 2025. https://arxiv.org/abs/2510.15821

Pith/arXiv arXiv 2025
[34]

Uniﬁed training of universal time series forecasting transformers

Woo G., Liu C., Kumar A., Xiong C., Savarese S., Sahoo D.. Uniﬁed training of universal time series forecasting transformers. In: International Conference on Machine Learning. 2024

2024
[35]

Moirai-MoE: Empowering time series foundation models with sparse mixture of experts

Liu X., Liu J., Woo G., Aksu T., Liang Y., Zimmermann R., et al. Moirai-MoE: Empowering time series foundation models with sparse mixture of experts. In: International Conference on Machine Learning. 2025

2025
[36]

Moirai 2.0: When less is more for time series forecasting

Liu C., Aksu T., Liu J., Liu X., Yan H., Pham Q., et al. Moirai 2.0: When less is more for time series forecasting. arXiv preprint arXiv:2511.11698. 2025. https://arxiv.org/abs/2511.11698

arXiv 2025
[37]

S., Qi L., et al

Yu R., Gu C., Stiasny J., Wen Q., Dilov W. S., Qi L., et al. PriceFM: Foundation model for probabilistic electricity price forecasting. arXiv preprint arXiv:2508.04875. 2025. https://arxiv.org/abs/2508.04875

Pith/arXiv arXiv 2025
[38]

Kronos: A foundation model for the language of ﬁnancial markets

Shi Y., Fu Z., Chen S., Zhao B., Xu W., Zhang C., et al. Kronos: A foundation model for the language of ﬁnancial markets. arXiv preprint arXiv:2508.02739. 2025. https://arxiv.org/abs/2508.02739

arXiv 2025
[39]

MIRA: Medical time series foundation model for real-world health data

Li H., Deng B., Xu C., Feng Z., Schlegel V ., Huang Y.-H., et al. MIRA: Medical time series foundation model for real-world health data. arXiv preprint arXiv:2506.07584. 2025. https://arxiv.org/abs/2506.07584

arXiv 2025
[40]

Applied Energy , author =

Draxl C., Clifton A., Hodge B.-M., McCaa J.. The wind integration national dataset (WIND) toolkit. Appl Energy 2015;151:355–366. https://doi.org/10.1016/j.apenergy.2015.03.121

work page doi:10.1016/j.apenergy.2015.03.121 2015
[41]

N., et al

Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A. N., et al. Attention is all you need. In: Advances in Neural Information Processing Systems. 2017. p. 5998–6008

2017
[43]

https://arxiv.org/abs/2104.09864

Pith/arXiv arXiv
[44]

Outrageously large neural networks: The sparsely-gated mixture-of- experts layer

Shazeer N., Mirhoseini A., Maziarz K., Davis A., Le Q., Hinton G., et al. Outrageously large neural networks: The sparsely-gated mixture-of- experts layer. In: International Conference on Learning Representations. 2017

2017
[45]

Switch transformers: Scaling to trillion parameter models with simple and eﬃcient sparsity

Fedus W., Zoph B., Shazeer N.. Switch transformers: Scaling to trillion parameter models with simple and eﬃcient sparsity. J Mach Learn Res 2022;23:1–39. https://arxiv.org/abs/2101.03961

Pith/arXiv arXiv 2022
[46]

Econometrica46(1), 33–50 (1978) https://doi.org/10.2307/1913643

Koenker R., Bassett G.. Regression quantiles. Econometrica 1978;46:33–50. https://doi.org/10.2307/1913643

work page doi:10.2307/1913643 1978
[47]

Kelmarsh wind farm data

Plumley C., Takeuchi R.. Kelmarsh wind farm data. Zenodo. 2025. https://doi.org/10.5281/zenodo.16807551

work page doi:10.5281/zenodo.16807551 2025
[48]

W., Pinson P., Browell J., Bjerregärd M

Messner J. W., Pinson P., Browell J., Bjerregärd M. B., Schicker I.. Evaluation of wind power forecasts – an up-to-date view. Wind Energy 2020;23:1461–1481. https://doi.org/10.1002/we.2497

work page doi:10.1002/we.2497 2020
[49]

Gneiting T., Raftery A. E.. Strictly proper scoring rules, prediction, and estimation. J Am Stat Assoc 2007;102:359–378. https://doi.org/10.1198/016214506000001437

work page doi:10.1198/016214506000001437 2007
[50]

Accurate medium-range global weather forecasting with 3D neural networks , volume =

Bi K., Xie L., Zhang H., Chen X., Gu X., Tian Q.. Accurate medium-range global weather forecasting with 3d neural networks. Nature 2023;619:533–538. https://doi.org/10.1038/s41586-023-06185-3

work page doi:10.1038/s41586-023-06185-3 2023
[51]

The operational medium-range deterministic weather forecasting can be extended beyond a 10-day lead time

Chen K., Han T., Ling F., Gong J., Bai L., Wang X., et al. The operational medium-range deterministic weather forecasting can be extended beyond a 10-day lead time. Commun Earth Environ 2025;6:518. https://doi.org/10.1038/s43247-025-02502-y Huang, Luo et al.: Preprint submitted to Elsevier Page 20 of 20

work page doi:10.1038/s43247-025-02502-y 2025

[1] [1]

Global wind report 2026

Global Wind Energy Council. Global wind report 2026. 2026. https://www.gwec.net/reports/globalwindreport

2026

[2] [2]

China’s newly installed wind and solar capacity exceeded 430 gw in 2025, reaching a record high

National Energy Administration. China’s newly installed wind and solar capacity exceeded 430 gw in 2025, reaching a record high. 2026. https://www.nea.gov.cn/20260212/d9f714e91a7f40d39282d87e384ea94a/c.html

arXiv 2025

[3] [3]

National energy administration releases 2025 national power statistics

National Energy Administration. National energy administration releases 2025 national power statistics. 2026. https://www.nea.gov.cn/20260129/6874f211acd0417eab7ac10c3061a7c2/c.html

arXiv 2025

[4] [4]

Bayesian averaging-enabled transfer learning method for probabilistic wind power forecasting of newly built wind farms

Hu J., Hu W., Cao D., Huang Y., Chen J., Li Y., et al. Bayesian averaging-enabled transfer learning method for probabilistic wind power forecasting of newly built wind farms. Appl Energy 2024;355:122185. https://doi.org/10.1016/j.apenergy.2023.122185

work page doi:10.1016/j.apenergy.2023.122185 2024

[5] [5]

A deep asymmetric laplace neural network for deterministic and probabilistic wind power forecasting

Wang Y., Xu H., Zou R., Zhang L., Zhang F.. A deep asymmetric laplace neural network for deterministic and probabilistic wind power forecasting. Renew Energy 2022;196:497–517. https://doi.org/10.1016/j.renene.2022.07.009

work page doi:10.1016/j.renene.2022.07.009 2022

[6] [6]

S., Zareipour H., Malik O., Mandal P

Soman S. S., Zareipour H., Malik O., Mandal P.. A review of wind power and wind speed forecasting methods with diﬀerent time horizons. In: North American Power Symposium 2010. IEEE; 2010. p. 1–8

2010

[7] [7]

Jung J., Broadwater R. P.. Current status and future advances for wind speed and power forecasting. Renew Sustain Energy Rev 2014;31:762–

2014

[8] [8]

https://doi.org/10.1016/j.rser.2013.12.054

work page doi:10.1016/j.rser.2013.12.054 2013

[9] [9]

Ultra-short-term wind power forecasting based on deep bayesian model with uncertainty

Liu L., Liu J., Ye Y., Liu H., Chen K., Li D., et al. Ultra-short-term wind power forecasting based on deep bayesian model with uncertainty. Renew Energy 2023;205:598–607. https://doi.org/10.1016/j.renene.2023.01.038

work page doi:10.1016/j.renene.2023.01.038 2023

[10] [10]

Approaches to wind power curve modeling: A review and discussion

Wang Y., Hu Q., Li L., Foley A., Srinivasan D.. Approaches to wind power curve modeling: A review and discussion. Renew Sustain Energy Rev 2019;116:109422. https://doi.org/10.1016/j.rser.2019.109422

work page doi:10.1016/j.rser.2019.109422 2019

[11] [11]

Interpretable feature-temporal transformer for short-term wind power forecasting with multivariate time series

Liu L., Wang X., Dong X., Chen K., Chen Q., Li B.. Interpretable feature-temporal transformer for short-term wind power forecasting with multivariate time series. Appl Energy 2024;374:124035. https://doi.org/10.1016/j.apenergy.2024.124035

work page doi:10.1016/j.apenergy.2024.124035 2024

[12] [12]

Bulaevskaya V ., Wharton S., Clifton A., Qualley G., Miller W. O.. Wind power curve modeling in complex terrain using statistical models. J Renew Sustain Energy 2015;7:013103. https://doi.org/10.1063/1.4904430

work page doi:10.1063/1.4904430 2015

[13] [13]

A review on the recent history of wind power ramp forecasting

Gallego-Castillo C., Cuerva-Tejero Á., Lopez-Garcia O.. A review on the recent history of wind power ramp forecasting. Renew Sustain Energy Rev 2015;52:1148–1157. https://doi.org/10.1016/j.rser.2015.07.154

work page doi:10.1016/j.rser.2015.07.154 2015

[14] [14]

A review of wind speed and wind power forecasting with deep neural networks

Wang J., Li Y.. A review of wind speed and wind power forecasting with deep neural networks. Appl Energy 2021;304:117766. https://doi.org/10.1016/j.apenergy.2021.117766

work page doi:10.1016/j.apenergy.2021.117766 2021

[15] [15]

Deep learning based ensemble approach for probabilistic wind power forecasting

Wang H.-z., Li G.-q., Wang G.-b., Peng J.-c., Jiang H., Liu Y.-t.. Deep learning based ensemble approach for probabilistic wind power forecasting. Appl Energy 2017;188:56–70. https://doi.org/10.1016/j.apenergy.2016.11.111

work page doi:10.1016/j.apenergy.2016.11.111 2017

[16] [16]

FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting

Zhou T., Ma Z., Wen Q., Wang X., Sun L., Jin R.. FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting. In: International Conference on Machine Learning. 2022. p. 27268–27286

2022

[17] [17]

Are transformers eﬀective for time series forecasting?

Zeng A., Chen M., Zhang L., Xu Q.. Are transformers eﬀective for time series forecasting?. In: AAAI Conference on Artiﬁcial Intelligence

[18] [18]

Non-stationary transformers: Exploring the stationarity in time series forecasting

Liu Y., Wu H., Wang J., Long M.. Non-stationary transformers: Exploring the stationarity in time series forecasting. In: Advances in Neural Information Processing Systems. 2022. p. 9881–9893. Huang, Luo et al.: Preprint submitted to Elsevier Page 19 of 20

2022

[19] [19]

TimesNet: Temporal 2d-variation modeling for general time series analysis

Wu H., Hu T., Liu Y., Zhou H., Wang J., Long M.. TimesNet: Temporal 2d-variation modeling for general time series analysis. In: International Conference on Learning Representations. 2023

2023

[20] [20]

H., Sinthong P., Kalagnanam J

Nie Y., Nguyen N. H., Sinthong P., Kalagnanam J.. A time series is worth 64 words: Long-term forecasting with transformers. In: International Conference on Learning Representations. 2023

2023

[21] [21]

iTransformer: Inverted transformers are eﬀective for time series forecasting

Liu Y., Hu T., Zhang H., Wu H., Wang S., Ma L., et al. iTransformer: Inverted transformers are eﬀective for time series forecasting. In: International Conference on Learning Representations. 2024

2024

[22] [22]

TimeXer: Empowering transformers for time series forecasting with exogenous variables

Wang Y., Wu H., Dong J., Liu Y., Qiu Y., Zhang H., et al. TimeXer: Empowering transformers for time series forecasting with exogenous variables. In: Advances in Neural Information Processing Systems. 2024

2024

[23] [23]

TimeMixer: Decomposable multiscale mixing for time series forecasting

Wang S., Wu H., Shi X., Hu T., Luo H., Ma L., et al. TimeMixer: Decomposable multiscale mixing for time series forecasting. In: International Conference on Learning Representations. 2024

2024

[24] [24]

MOMENT: A family of open time-series foundation models

Goswami M., Szafer K., Choudhry A., Cai Y., Li S., Dubrawski A.. MOMENT: A family of open time-series foundation models. In: International Conference on Machine Learning. 2024

2024

[25] [25]

Time-MoE: Billion-scale time series foundation models with mixture of experts

Shi X., Wang S., Nie Y., Li D., Ye Z., Wen Q., et al. Time-MoE: Billion-scale time series foundation models with mixture of experts. arXiv preprint arXiv:2409.16040. 2025. https://arxiv.org/abs/2409.16040

arXiv 2025

[26] [26]

Timer: Generative pre-trained transformers are large time series models

Liu Y., Zhang H., Li C., Huang X., Wang J., Long M.. Timer: Generative pre-trained transformers are large time series models. arXiv preprint arXiv:2402.02368. 2024. https://arxiv.org/abs/2402.02368

arXiv 2024

[27] [27]

Timer-XL: Long-context transformers for uniﬁed time series forecasting

Liu Y., Qin G., Huang X., Wang J., Long M.. Timer-XL: Long-context transformers for uniﬁed time series forecasting. arXiv preprint arXiv:2410.04803. 2024. https://arxiv.org/abs/2410.04803

arXiv 2024

[28] [28]

Timer-S1: A billion-scale time series foundation model with serial scaling

Liu Y., Su X., Wang S., Zhang H., Liu H., Wang Y., et al. Timer-S1: A billion-scale time series foundation model with serial scaling. arXiv preprint arXiv:2603.04791. 2026. https://arxiv.org/abs/2603.04791

Pith/arXiv arXiv 2026

[29] [29]

Sundial: A family of highly capable time series foundation models

Liu Y., Qin G., Shi Z., Huang X., Wang J., Long M.. Sundial: A family of highly capable time series foundation models. arXiv preprint arXiv:2502.00816. 2025. https://arxiv.org/abs/2502.00816

Pith/arXiv arXiv 2025

[30] [30]

TiRex: Zero-shot forecasting across long and short horizons with enhanced in-context learning

Auer A., Podest P., Klotz D., Böck S., Klambauer G., Hochreiter S.. TiRex: Zero-shot forecasting across long and short horizons with enhanced in-context learning. arXiv preprint arXiv:2505.23719. 2025. https://arxiv.org/abs/2505.23719

arXiv 2025

[31] [31]

A decoder-only foundation model for time-series forecasting

Das A., Kong W., Sen R., Zhou Y.. A decoder-only foundation model for time-series forecasting. arXiv preprint arXiv:2310.10688. 2024. https://arxiv.org/abs/2310.10688

Pith/arXiv arXiv 2024

[32] [32]

F., Stella L., Turkmen C., Zhang X., Mercado P., Shen H., et al

Ansari A. F., Stella L., Turkmen C., Zhang X., Mercado P., Shen H., et al. Chronos: Learning the language of time series. Trans Mach Learn Res 2024. https://arxiv.org/abs/2403.07815

Pith/arXiv arXiv 2024

[33] [33]

F., Shchur O., Küken J., Auer A., Han B., Mercado P., et al

Ansari A. F., Shchur O., Küken J., Auer A., Han B., Mercado P., et al. Chronos-2: From univariate to universal forecasting. arXiv preprint arXiv:2510.15821. 2025. https://arxiv.org/abs/2510.15821

Pith/arXiv arXiv 2025

[34] [34]

Uniﬁed training of universal time series forecasting transformers

Woo G., Liu C., Kumar A., Xiong C., Savarese S., Sahoo D.. Uniﬁed training of universal time series forecasting transformers. In: International Conference on Machine Learning. 2024

2024

[35] [35]

Moirai-MoE: Empowering time series foundation models with sparse mixture of experts

Liu X., Liu J., Woo G., Aksu T., Liang Y., Zimmermann R., et al. Moirai-MoE: Empowering time series foundation models with sparse mixture of experts. In: International Conference on Machine Learning. 2025

2025

[36] [36]

Moirai 2.0: When less is more for time series forecasting

Liu C., Aksu T., Liu J., Liu X., Yan H., Pham Q., et al. Moirai 2.0: When less is more for time series forecasting. arXiv preprint arXiv:2511.11698. 2025. https://arxiv.org/abs/2511.11698

arXiv 2025

[37] [37]

S., Qi L., et al

Yu R., Gu C., Stiasny J., Wen Q., Dilov W. S., Qi L., et al. PriceFM: Foundation model for probabilistic electricity price forecasting. arXiv preprint arXiv:2508.04875. 2025. https://arxiv.org/abs/2508.04875

Pith/arXiv arXiv 2025

[38] [38]

Kronos: A foundation model for the language of ﬁnancial markets

Shi Y., Fu Z., Chen S., Zhao B., Xu W., Zhang C., et al. Kronos: A foundation model for the language of ﬁnancial markets. arXiv preprint arXiv:2508.02739. 2025. https://arxiv.org/abs/2508.02739

arXiv 2025

[39] [39]

MIRA: Medical time series foundation model for real-world health data

Li H., Deng B., Xu C., Feng Z., Schlegel V ., Huang Y.-H., et al. MIRA: Medical time series foundation model for real-world health data. arXiv preprint arXiv:2506.07584. 2025. https://arxiv.org/abs/2506.07584

arXiv 2025

[40] [40]

Applied Energy , author =

Draxl C., Clifton A., Hodge B.-M., McCaa J.. The wind integration national dataset (WIND) toolkit. Appl Energy 2015;151:355–366. https://doi.org/10.1016/j.apenergy.2015.03.121

work page doi:10.1016/j.apenergy.2015.03.121 2015

[41] [41]

N., et al

Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A. N., et al. Attention is all you need. In: Advances in Neural Information Processing Systems. 2017. p. 5998–6008

2017

[42] [43]

https://arxiv.org/abs/2104.09864

Pith/arXiv arXiv

[43] [44]

Outrageously large neural networks: The sparsely-gated mixture-of- experts layer

Shazeer N., Mirhoseini A., Maziarz K., Davis A., Le Q., Hinton G., et al. Outrageously large neural networks: The sparsely-gated mixture-of- experts layer. In: International Conference on Learning Representations. 2017

2017

[44] [45]

Switch transformers: Scaling to trillion parameter models with simple and eﬃcient sparsity

Fedus W., Zoph B., Shazeer N.. Switch transformers: Scaling to trillion parameter models with simple and eﬃcient sparsity. J Mach Learn Res 2022;23:1–39. https://arxiv.org/abs/2101.03961

Pith/arXiv arXiv 2022

[45] [46]

Econometrica46(1), 33–50 (1978) https://doi.org/10.2307/1913643

Koenker R., Bassett G.. Regression quantiles. Econometrica 1978;46:33–50. https://doi.org/10.2307/1913643

work page doi:10.2307/1913643 1978

[46] [47]

Kelmarsh wind farm data

Plumley C., Takeuchi R.. Kelmarsh wind farm data. Zenodo. 2025. https://doi.org/10.5281/zenodo.16807551

work page doi:10.5281/zenodo.16807551 2025

[47] [48]

W., Pinson P., Browell J., Bjerregärd M

Messner J. W., Pinson P., Browell J., Bjerregärd M. B., Schicker I.. Evaluation of wind power forecasts – an up-to-date view. Wind Energy 2020;23:1461–1481. https://doi.org/10.1002/we.2497

work page doi:10.1002/we.2497 2020

[48] [49]

Gneiting T., Raftery A. E.. Strictly proper scoring rules, prediction, and estimation. J Am Stat Assoc 2007;102:359–378. https://doi.org/10.1198/016214506000001437

work page doi:10.1198/016214506000001437 2007

[49] [50]

Accurate medium-range global weather forecasting with 3D neural networks , volume =

Bi K., Xie L., Zhang H., Chen X., Gu X., Tian Q.. Accurate medium-range global weather forecasting with 3d neural networks. Nature 2023;619:533–538. https://doi.org/10.1038/s41586-023-06185-3

work page doi:10.1038/s41586-023-06185-3 2023

[50] [51]

The operational medium-range deterministic weather forecasting can be extended beyond a 10-day lead time

Chen K., Han T., Ling F., Gong J., Bai L., Wang X., et al. The operational medium-range deterministic weather forecasting can be extended beyond a 10-day lead time. Commun Earth Environ 2025;6:518. https://doi.org/10.1038/s43247-025-02502-y Huang, Luo et al.: Preprint submitted to Elsevier Page 20 of 20

work page doi:10.1038/s43247-025-02502-y 2025