pith. sign in

arxiv: 2606.08630 · v1 · pith:MFRJCZ4Nnew · submitted 2026-06-07 · 💻 cs.LG · cs.AI

Tyan-WP: A Wind Power Foundation Model for Ultra-Short-Term Probabilistic Forecasting

Pith reviewed 2026-06-27 18:51 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords wind power forecastingfoundation modelprobabilistic forecastingzero-shot learningtime series modelsmeteorological covariatesgeneralization
0
0 comments X

The pith

A foundation model pretrained on over 126000 U.S. wind sites delivers accurate zero-shot probabilistic forecasts at new locations without local training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Tyan-WP as the first wind power foundation model for ultra-short-term probabilistic forecasting. It claims that pretraining on a massive U.S. dataset combined with static site embeddings and a power-aware meteorological fusion module allows the model to outperform both site-specific supervised models and generic large time series models. A sympathetic reader would care because this approach targets the practical problem of data scarcity at newly commissioned wind farms, enabling faster grid integration without lengthy site-specific data collection. The work emphasizes cross-geography generalization, including to U.K. sites, as evidence that the model captures transferable patterns rather than location-specific ones.

Core claim

Tyan-WP is pretrained on a large-scale dataset covering more than 126000 U.S. sites over seven years; it incorporates static site embedding from coordinate, terrain, and ecoregion metadata plus a power-aware meteorological fusion module that models interactions between historical power and meteorological covariates, thereby achieving zero-shot ultra-short-term probabilistic forecasting that surpasses eight site-specific supervised time series models on 10 in-domain sites and eleven generic large time series models on 127 in-domain sites while also generalizing to six real U.K. sites.

What carries the argument

Static site embedding using coordinate, terrain, and ecoregion metadata together with the power-aware meteorological fusion (PAMF) module that models interactions between historical power and meteorological covariates.

If this is right

  • Tyan-WP reduces MAE by 19.9 percent, RMSE by 16.6 percent, CRPS by 22.2 percent, and AQL by 21.7 percent while raising R squared by 16.7 percent relative to the compared baselines.
  • The model outperforms eight site-specific supervised time series models on 10 in-domain sites and eleven generic large time series models on 127 in-domain sites under a unified evaluation protocol.
  • Tyan-WP demonstrates strong cross-geography generalization when tested on six real U.K. sites without target-site training.
  • Accurate zero-shot forecasting without target-site training supplies a practical route for rapid turbine onboarding and probabilistic risk management at new wind farms.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Grid operators could shorten the commissioning timeline for new wind farms by deploying the model immediately upon turbine installation rather than waiting for months of local data.
  • Probabilistic outputs from such models might feed directly into reserve sizing and market bidding decisions, lowering the cost of integrating variable wind generation.
  • Similar static embedding and covariate-fusion designs could be tested on solar or other renewable forecasting tasks where site metadata also influences output.

Load-bearing premise

The large-scale U.S. pretraining dataset is sufficiently representative of new sites including U.K. locations so that the static site embedding and PAMF module produce genuine zero-shot generalization rather than U.S.-specific patterns.

What would settle it

Measured forecast errors on a fresh collection of sites whose terrain, climate, or ecoregion distributions lie substantially outside the U.S. pretraining distribution would exceed the reported reductions if the representativeness assumption does not hold.

Figures

Figures reproduced from arXiv: 2606.08630 by Ao Luo, Bin Li, Bo Wang, Hongwei Zhao, Jiahui Huang, Lei Liu, Ruibo Guo, Tengyuan Liu, Zhao Wang.

Figure 1
Figure 1. Figure 1: Geographical distribution of all sites in the WTK dataset, covering the continental United States, categorized by pretraining and evaluation pools. Unseen sites within the domain (used for evaluation) are selected randomly and represent various ecological regions, including inland and coastal areas [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Visualization of power curve and wind rose at a representative site in the WTK dataset. 𝐋𝐞𝐟 𝐭: joint distribution of wind speed and turbine-level power at site 9094, with marginal histograms and a two-dimensional bin-count colour scale summarising 736,416 valid samples. 𝐑𝐢𝐠𝐡𝐭: wind rose summarising the directional frequency and wind-speed distribution at the same site. The point forecast is taken as the me… view at source ↗
Figure 3
Figure 3. Figure 3: The overall framework of Tyan-WP. The model takes historical power sequences, historical meteorological sequences, timestamp information, and static site metadata as inputs, without using future meteorological variables. Dynamic sequences are mapped into dual-branch patch embeddings, while timestamps and static metadata are mapped into time-level calendar embeddings and site-level geography-ecology embeddi… view at source ↗
Figure 4
Figure 4. Figure 4: Visualization of power curve and wind rose at a representative site in the Kelmarsh dataset. 𝐋𝐞𝐟 𝐭: joint distribution of wind speed and turbine-level power for turbine KWF1, with marginal histograms and a two-dimensional bin-count colour scale summarising 309,984 valid samples after clipping 32,212 negative-power records. 𝐑𝐢𝐠𝐡𝐭: wind rose summarising the directional frequency and wind-speed distribution f… view at source ↗
Figure 5
Figure 5. Figure 5: A comparison of site-level deterministic distributions for zero-shot Tyan-WP and site-specific supervised TSM baselines at the WTK 10-site [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: A comparison of site-level probabilistic distributions for zero-shot Tyan-WP and site-specific supervised TSM baselines at the WTK 10-site [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: A comparison of site-level deterministic distributions of zero-shot forecasting for Tyan-WP and generic LTSMs at the WTK 127-site [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: A comparison of site-level probabilistic distributions of zero-shot forecasting for Tyan-WP and generic LTSMs at the WTK 127-site. 22.2%, and AQL by 21.7%, while increasing R 2 by 16.7%. Compared to LTSMs pretrained on extensive time series corpora, these improvements indicate that pretraining on domain-specific large-scale wind power sequence data, along with the two domain-specific module designs, static… view at source ↗
Figure 9
Figure 9. Figure 9: A comparison of site-level deterministic distributions of zero-shot forecasting for Tyan-WP and generic LTSMs at the Kelmarsh 6-site. importantly, this benchmark involves a pronounced domain shift from the simulated U.S. WTK source domain to the real U.K. SCADA target domain, leading to substantial differences in data distribution and measurement characteristics. Relative to the strongest generic LTSM base… view at source ↗
Figure 10
Figure 10. Figure 10: A comparison of site-level probabilistic distributions of zero-shot forecasting for Tyan-WP and generic LTSMs at the Kelmarsh 6-site. 5.4. Ablation study To individually evaluate the contribution of each architectural component, we evaluate four structural ablation schemes on the in-domain WTK 127-site and out-of-domain Kelmarsh 6-site zero-shot benchmarks. These variants sequentially replace the power an… view at source ↗
read the original abstract

Global wind power capacity, especially in China, is booming, with new farms spanning diverse terrains and climates. The industry urgently needs accurate wind power foundation models to shorten commissioning and accelerate grid connection. This is because site-specific time series models (TSMs) are not well suited to data-scarce scenarios and generalize poorly, while generic large time series models (LTSMs) are mostly limited to univariate inputs and cannot fully exploit static site attributes or the dependencies between power and meteorological covariates, leading to insufficient accuracy. To fill this gap, we propose \textbf{Tyan-WP}, the first wind power foundation model for ultra-short-term probabilistic forecasting. Pretrained on a large-scale wind power dataset covering more than 126,000 U.S. sites over seven years, Tyan-WP further improves zero-shot forecasting through two domain-specific module designs: static site embedding using coordinate, terrain, and ecoregion metadata, and a power-aware meteorological fusion (PAMF) module that models interactions between historical power and meteorological covariates. Under a unified evaluation protocol, Tyan-WP surpasses eight site-specific supervised TSMs on 10 in-domain sites and outperforms eleven generic LTSMs on 127 in-domain sites, reducing MAE by 19.9%, RMSE by 16.6%, CRPS by 22.2%, and AQL by 21.7%, while raising R^2 by 16.7%. It further demonstrates strong cross-geography generalization on six real U.K. sites. These results show that the wind power foundation model can achieve accurate zero-shot forecasting without target-site training, providing a practical pathway for rapid turbine onboarding and probabilistic risk management at new wind farms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces Tyan-WP, a foundation model for ultra-short-term probabilistic wind power forecasting pretrained on data from over 126,000 U.S. sites spanning seven years. It incorporates a static site embedding module (using coordinate, terrain, and ecoregion metadata) and a power-aware meteorological fusion (PAMF) module to model interactions between power and meteorological covariates. The central claims are that Tyan-WP outperforms eight site-specific supervised TSMs on 10 in-domain sites and eleven generic LTSMs on 127 in-domain sites (with reported reductions of 19.9% MAE, 16.6% RMSE, 22.2% CRPS, 21.7% AQL and 16.7% R² increase) under a unified protocol, while also demonstrating strong zero-shot cross-geography generalization on six real U.K. sites without target-site training.

Significance. If the empirical claims hold after verification, the work would be significant for the wind energy sector by offering a practical pathway to accurate probabilistic forecasting at new sites with limited data, shortening commissioning times. The scale of the U.S. pretraining corpus (126k sites) represents a concrete strength for foundation-model-style transfer in this domain.

major comments (2)
  1. [Abstract and §4] Abstract and §4 (Experimental Setup): the unified evaluation protocol reports specific metric deltas but supplies no information on data splits, statistical significance tests, baseline hyperparameter matching, or controls for model size differences. These omissions are load-bearing for the in-domain superiority claims.
  2. [§5.3] §5.3 (U.K. Generalization Results): the claim of strong cross-geography zero-shot transfer on six U.K. sites rests on the static site embedding and PAMF module bridging U.S. and U.K. distributions, yet no quantitative domain-shift diagnostics (covariate histograms, embedding-space distances, or per-site error breakdowns) are provided to compare the 126k-site U.S. pretraining distribution against the U.K. test sites. This directly affects the generalization claim.
minor comments (1)
  1. [Throughout] Ensure all acronyms (TSM, LTSM, PAMF, AQL) are defined on first use and used consistently in figure captions and tables.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. The comments highlight important aspects of the evaluation protocol and generalization analysis that merit clarification and expansion. We address each major comment below, indicating where revisions will be made to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract and §4] Abstract and §4 (Experimental Setup): the unified evaluation protocol reports specific metric deltas but supplies no information on data splits, statistical significance tests, baseline hyperparameter matching, or controls for model size differences. These omissions are load-bearing for the in-domain superiority claims.

    Authors: We agree that the current description of the unified evaluation protocol in §4 is insufficiently detailed. In the revised manuscript we will expand this section to explicitly document: (i) the precise train/validation/test split ratios and temporal partitioning used for the 10 in-domain and 127 in-domain experiments, (ii) the results of statistical significance tests (paired t-tests or Wilcoxon signed-rank tests with p-values) on all reported metric improvements, (iii) the hyperparameter search procedure applied uniformly to both Tyan-WP and the eight site-specific TSM baselines, and (iv) a table comparing parameter counts and computational budgets across all compared models to address capacity differences. These additions will be incorporated without changing the numerical results already reported. revision: yes

  2. Referee: [§5.3] §5.3 (U.K. Generalization Results): the claim of strong cross-geography zero-shot transfer on six U.K. sites rests on the static site embedding and PAMF module bridging U.S. and U.K. distributions, yet no quantitative domain-shift diagnostics (covariate histograms, embedding-space distances, or per-site error breakdowns) are provided to compare the 126k-site U.S. pretraining distribution against the U.K. test sites. This directly affects the generalization claim.

    Authors: We concur that quantitative domain-shift analysis would better substantiate the cross-geography zero-shot claims. In the revised §5.3 we will add: (i) side-by-side histograms and Kolmogorov-Smirnov tests for key meteorological covariates between the U.S. pretraining corpus and the six U.K. sites, (ii) Euclidean or cosine distances in the learned static embedding space between U.S. and U.K. site embeddings, and (iii) per-site MAE, CRPS and error distribution plots for the U.K. zero-shot forecasts. These diagnostics will be computed from the existing trained model and data and included to directly illustrate the bridging effect of the static embeddings and PAMF module. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical evaluation on held-out data after separate pretraining

full rationale

The paper reports standard empirical results from pretraining on a large separate U.S. dataset (>126k sites) followed by evaluation on held-out in-domain sites and cross-geography U.K. sites. No equations, self-citations, or fitted parameters reduce the reported metrics (MAE, RMSE, CRPS, etc.) to quantities computed on the test sets by construction. The evaluation protocol is independent of the training data.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Central claim rests on the representativeness of the 126,000-site U.S. dataset for cross-geography generalization and on the two domain-specific modules capturing relevant interactions; abstract provides no explicit free parameters, axioms, or invented entities.

pith-pipeline@v0.9.1-grok · 5866 in / 1152 out tokens · 19874 ms · 2026-06-27T18:51:30.293805+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

50 extracted references · 17 canonical work pages

  1. [1]

    Global wind report 2026

    Global Wind Energy Council. Global wind report 2026. 2026. https://www.gwec.net/reports/globalwindreport

  2. [2]

    China’s newly installed wind and solar capacity exceeded 430 gw in 2025, reaching a record high

    National Energy Administration. China’s newly installed wind and solar capacity exceeded 430 gw in 2025, reaching a record high. 2026. https://www.nea.gov.cn/20260212/d9f714e91a7f40d39282d87e384ea94a/c.html

  3. [3]

    National energy administration releases 2025 national power statistics

    National Energy Administration. National energy administration releases 2025 national power statistics. 2026. https://www.nea.gov.cn/20260129/6874f211acd0417eab7ac10c3061a7c2/c.html

  4. [4]

    Bayesian averaging-enabled transfer learning method for probabilistic wind power forecasting of newly built wind farms

    Hu J., Hu W., Cao D., Huang Y., Chen J., Li Y., et al. Bayesian averaging-enabled transfer learning method for probabilistic wind power forecasting of newly built wind farms. Appl Energy 2024;355:122185. https://doi.org/10.1016/j.apenergy.2023.122185

  5. [5]

    A deep asymmetric laplace neural network for deterministic and probabilistic wind power forecasting

    Wang Y., Xu H., Zou R., Zhang L., Zhang F.. A deep asymmetric laplace neural network for deterministic and probabilistic wind power forecasting. Renew Energy 2022;196:497–517. https://doi.org/10.1016/j.renene.2022.07.009

  6. [6]

    S., Zareipour H., Malik O., Mandal P

    Soman S. S., Zareipour H., Malik O., Mandal P.. A review of wind power and wind speed forecasting methods with different time horizons. In: North American Power Symposium 2010. IEEE; 2010. p. 1–8

  7. [7]

    Jung J., Broadwater R. P.. Current status and future advances for wind speed and power forecasting. Renew Sustain Energy Rev 2014;31:762–

  8. [8]

    https://doi.org/10.1016/j.rser.2013.12.054

  9. [9]

    Ultra-short-term wind power forecasting based on deep bayesian model with uncertainty

    Liu L., Liu J., Ye Y., Liu H., Chen K., Li D., et al. Ultra-short-term wind power forecasting based on deep bayesian model with uncertainty. Renew Energy 2023;205:598–607. https://doi.org/10.1016/j.renene.2023.01.038

  10. [10]

    Approaches to wind power curve modeling: A review and discussion

    Wang Y., Hu Q., Li L., Foley A., Srinivasan D.. Approaches to wind power curve modeling: A review and discussion. Renew Sustain Energy Rev 2019;116:109422. https://doi.org/10.1016/j.rser.2019.109422

  11. [11]

    Interpretable feature-temporal transformer for short-term wind power forecasting with multivariate time series

    Liu L., Wang X., Dong X., Chen K., Chen Q., Li B.. Interpretable feature-temporal transformer for short-term wind power forecasting with multivariate time series. Appl Energy 2024;374:124035. https://doi.org/10.1016/j.apenergy.2024.124035

  12. [12]

    Bulaevskaya V ., Wharton S., Clifton A., Qualley G., Miller W. O.. Wind power curve modeling in complex terrain using statistical models. J Renew Sustain Energy 2015;7:013103. https://doi.org/10.1063/1.4904430

  13. [13]

    A review on the recent history of wind power ramp forecasting

    Gallego-Castillo C., Cuerva-Tejero Á., Lopez-Garcia O.. A review on the recent history of wind power ramp forecasting. Renew Sustain Energy Rev 2015;52:1148–1157. https://doi.org/10.1016/j.rser.2015.07.154

  14. [14]

    A review of wind speed and wind power forecasting with deep neural networks

    Wang J., Li Y.. A review of wind speed and wind power forecasting with deep neural networks. Appl Energy 2021;304:117766. https://doi.org/10.1016/j.apenergy.2021.117766

  15. [15]

    Deep learning based ensemble approach for probabilistic wind power forecasting

    Wang H.-z., Li G.-q., Wang G.-b., Peng J.-c., Jiang H., Liu Y.-t.. Deep learning based ensemble approach for probabilistic wind power forecasting. Appl Energy 2017;188:56–70. https://doi.org/10.1016/j.apenergy.2016.11.111

  16. [16]

    FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting

    Zhou T., Ma Z., Wen Q., Wang X., Sun L., Jin R.. FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting. In: International Conference on Machine Learning. 2022. p. 27268–27286

  17. [17]

    Are transformers effective for time series forecasting?

    Zeng A., Chen M., Zhang L., Xu Q.. Are transformers effective for time series forecasting?. In: AAAI Conference on Artificial Intelligence

  18. [18]

    Non-stationary transformers: Exploring the stationarity in time series forecasting

    Liu Y., Wu H., Wang J., Long M.. Non-stationary transformers: Exploring the stationarity in time series forecasting. In: Advances in Neural Information Processing Systems. 2022. p. 9881–9893. Huang, Luo et al.: Preprint submitted to Elsevier Page 19 of 20

  19. [19]

    TimesNet: Temporal 2d-variation modeling for general time series analysis

    Wu H., Hu T., Liu Y., Zhou H., Wang J., Long M.. TimesNet: Temporal 2d-variation modeling for general time series analysis. In: International Conference on Learning Representations. 2023

  20. [20]

    H., Sinthong P., Kalagnanam J

    Nie Y., Nguyen N. H., Sinthong P., Kalagnanam J.. A time series is worth 64 words: Long-term forecasting with transformers. In: International Conference on Learning Representations. 2023

  21. [21]

    iTransformer: Inverted transformers are effective for time series forecasting

    Liu Y., Hu T., Zhang H., Wu H., Wang S., Ma L., et al. iTransformer: Inverted transformers are effective for time series forecasting. In: International Conference on Learning Representations. 2024

  22. [22]

    TimeXer: Empowering transformers for time series forecasting with exogenous variables

    Wang Y., Wu H., Dong J., Liu Y., Qiu Y., Zhang H., et al. TimeXer: Empowering transformers for time series forecasting with exogenous variables. In: Advances in Neural Information Processing Systems. 2024

  23. [23]

    TimeMixer: Decomposable multiscale mixing for time series forecasting

    Wang S., Wu H., Shi X., Hu T., Luo H., Ma L., et al. TimeMixer: Decomposable multiscale mixing for time series forecasting. In: International Conference on Learning Representations. 2024

  24. [24]

    MOMENT: A family of open time-series foundation models

    Goswami M., Szafer K., Choudhry A., Cai Y., Li S., Dubrawski A.. MOMENT: A family of open time-series foundation models. In: International Conference on Machine Learning. 2024

  25. [25]

    Time-MoE: Billion-scale time series foundation models with mixture of experts

    Shi X., Wang S., Nie Y., Li D., Ye Z., Wen Q., et al. Time-MoE: Billion-scale time series foundation models with mixture of experts. arXiv preprint arXiv:2409.16040. 2025. https://arxiv.org/abs/2409.16040

  26. [26]

    Timer: Generative pre-trained transformers are large time series models

    Liu Y., Zhang H., Li C., Huang X., Wang J., Long M.. Timer: Generative pre-trained transformers are large time series models. arXiv preprint arXiv:2402.02368. 2024. https://arxiv.org/abs/2402.02368

  27. [27]

    Timer-XL: Long-context transformers for unified time series forecasting

    Liu Y., Qin G., Huang X., Wang J., Long M.. Timer-XL: Long-context transformers for unified time series forecasting. arXiv preprint arXiv:2410.04803. 2024. https://arxiv.org/abs/2410.04803

  28. [28]

    Timer-S1: A billion-scale time series foundation model with serial scaling

    Liu Y., Su X., Wang S., Zhang H., Liu H., Wang Y., et al. Timer-S1: A billion-scale time series foundation model with serial scaling. arXiv preprint arXiv:2603.04791. 2026. https://arxiv.org/abs/2603.04791

  29. [29]

    Sundial: A family of highly capable time series foundation models

    Liu Y., Qin G., Shi Z., Huang X., Wang J., Long M.. Sundial: A family of highly capable time series foundation models. arXiv preprint arXiv:2502.00816. 2025. https://arxiv.org/abs/2502.00816

  30. [30]

    TiRex: Zero-shot forecasting across long and short horizons with enhanced in-context learning

    Auer A., Podest P., Klotz D., Böck S., Klambauer G., Hochreiter S.. TiRex: Zero-shot forecasting across long and short horizons with enhanced in-context learning. arXiv preprint arXiv:2505.23719. 2025. https://arxiv.org/abs/2505.23719

  31. [31]

    A decoder-only foundation model for time-series forecasting

    Das A., Kong W., Sen R., Zhou Y.. A decoder-only foundation model for time-series forecasting. arXiv preprint arXiv:2310.10688. 2024. https://arxiv.org/abs/2310.10688

  32. [32]

    F., Stella L., Turkmen C., Zhang X., Mercado P., Shen H., et al

    Ansari A. F., Stella L., Turkmen C., Zhang X., Mercado P., Shen H., et al. Chronos: Learning the language of time series. Trans Mach Learn Res 2024. https://arxiv.org/abs/2403.07815

  33. [33]

    F., Shchur O., Küken J., Auer A., Han B., Mercado P., et al

    Ansari A. F., Shchur O., Küken J., Auer A., Han B., Mercado P., et al. Chronos-2: From univariate to universal forecasting. arXiv preprint arXiv:2510.15821. 2025. https://arxiv.org/abs/2510.15821

  34. [34]

    Unified training of universal time series forecasting transformers

    Woo G., Liu C., Kumar A., Xiong C., Savarese S., Sahoo D.. Unified training of universal time series forecasting transformers. In: International Conference on Machine Learning. 2024

  35. [35]

    Moirai-MoE: Empowering time series foundation models with sparse mixture of experts

    Liu X., Liu J., Woo G., Aksu T., Liang Y., Zimmermann R., et al. Moirai-MoE: Empowering time series foundation models with sparse mixture of experts. In: International Conference on Machine Learning. 2025

  36. [36]

    Moirai 2.0: When less is more for time series forecasting

    Liu C., Aksu T., Liu J., Liu X., Yan H., Pham Q., et al. Moirai 2.0: When less is more for time series forecasting. arXiv preprint arXiv:2511.11698. 2025. https://arxiv.org/abs/2511.11698

  37. [37]

    S., Qi L., et al

    Yu R., Gu C., Stiasny J., Wen Q., Dilov W. S., Qi L., et al. PriceFM: Foundation model for probabilistic electricity price forecasting. arXiv preprint arXiv:2508.04875. 2025. https://arxiv.org/abs/2508.04875

  38. [38]

    Kronos: A foundation model for the language of financial markets

    Shi Y., Fu Z., Chen S., Zhao B., Xu W., Zhang C., et al. Kronos: A foundation model for the language of financial markets. arXiv preprint arXiv:2508.02739. 2025. https://arxiv.org/abs/2508.02739

  39. [39]

    MIRA: Medical time series foundation model for real-world health data

    Li H., Deng B., Xu C., Feng Z., Schlegel V ., Huang Y.-H., et al. MIRA: Medical time series foundation model for real-world health data. arXiv preprint arXiv:2506.07584. 2025. https://arxiv.org/abs/2506.07584

  40. [40]

    Applied Energy , author =

    Draxl C., Clifton A., Hodge B.-M., McCaa J.. The wind integration national dataset (WIND) toolkit. Appl Energy 2015;151:355–366. https://doi.org/10.1016/j.apenergy.2015.03.121

  41. [41]

    N., et al

    Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A. N., et al. Attention is all you need. In: Advances in Neural Information Processing Systems. 2017. p. 5998–6008

  42. [43]

    https://arxiv.org/abs/2104.09864

  43. [44]

    Outrageously large neural networks: The sparsely-gated mixture-of- experts layer

    Shazeer N., Mirhoseini A., Maziarz K., Davis A., Le Q., Hinton G., et al. Outrageously large neural networks: The sparsely-gated mixture-of- experts layer. In: International Conference on Learning Representations. 2017

  44. [45]

    Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity

    Fedus W., Zoph B., Shazeer N.. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. J Mach Learn Res 2022;23:1–39. https://arxiv.org/abs/2101.03961

  45. [46]

    Econometrica46(1), 33–50 (1978) https://doi.org/10.2307/1913643

    Koenker R., Bassett G.. Regression quantiles. Econometrica 1978;46:33–50. https://doi.org/10.2307/1913643

  46. [47]

    Kelmarsh wind farm data

    Plumley C., Takeuchi R.. Kelmarsh wind farm data. Zenodo. 2025. https://doi.org/10.5281/zenodo.16807551

  47. [48]

    W., Pinson P., Browell J., Bjerregärd M

    Messner J. W., Pinson P., Browell J., Bjerregärd M. B., Schicker I.. Evaluation of wind power forecasts – an up-to-date view. Wind Energy 2020;23:1461–1481. https://doi.org/10.1002/we.2497

  48. [49]

    Gneiting T., Raftery A. E.. Strictly proper scoring rules, prediction, and estimation. J Am Stat Assoc 2007;102:359–378. https://doi.org/10.1198/016214506000001437

  49. [50]

    Accurate medium-range global weather forecasting with 3D neural networks , volume =

    Bi K., Xie L., Zhang H., Chen X., Gu X., Tian Q.. Accurate medium-range global weather forecasting with 3d neural networks. Nature 2023;619:533–538. https://doi.org/10.1038/s41586-023-06185-3

  50. [51]

    The operational medium-range deterministic weather forecasting can be extended beyond a 10-day lead time

    Chen K., Han T., Ling F., Gong J., Bai L., Wang X., et al. The operational medium-range deterministic weather forecasting can be extended beyond a 10-day lead time. Commun Earth Environ 2025;6:518. https://doi.org/10.1038/s43247-025-02502-y Huang, Luo et al.: Preprint submitted to Elsevier Page 20 of 20