Tyan-WP: A Wind Power Foundation Model for Ultra-Short-Term Probabilistic Forecasting
Pith reviewed 2026-06-27 18:51 UTC · model grok-4.3
The pith
A foundation model pretrained on over 126000 U.S. wind sites delivers accurate zero-shot probabilistic forecasts at new locations without local training.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Tyan-WP is pretrained on a large-scale dataset covering more than 126000 U.S. sites over seven years; it incorporates static site embedding from coordinate, terrain, and ecoregion metadata plus a power-aware meteorological fusion module that models interactions between historical power and meteorological covariates, thereby achieving zero-shot ultra-short-term probabilistic forecasting that surpasses eight site-specific supervised time series models on 10 in-domain sites and eleven generic large time series models on 127 in-domain sites while also generalizing to six real U.K. sites.
What carries the argument
Static site embedding using coordinate, terrain, and ecoregion metadata together with the power-aware meteorological fusion (PAMF) module that models interactions between historical power and meteorological covariates.
If this is right
- Tyan-WP reduces MAE by 19.9 percent, RMSE by 16.6 percent, CRPS by 22.2 percent, and AQL by 21.7 percent while raising R squared by 16.7 percent relative to the compared baselines.
- The model outperforms eight site-specific supervised time series models on 10 in-domain sites and eleven generic large time series models on 127 in-domain sites under a unified evaluation protocol.
- Tyan-WP demonstrates strong cross-geography generalization when tested on six real U.K. sites without target-site training.
- Accurate zero-shot forecasting without target-site training supplies a practical route for rapid turbine onboarding and probabilistic risk management at new wind farms.
Where Pith is reading between the lines
- Grid operators could shorten the commissioning timeline for new wind farms by deploying the model immediately upon turbine installation rather than waiting for months of local data.
- Probabilistic outputs from such models might feed directly into reserve sizing and market bidding decisions, lowering the cost of integrating variable wind generation.
- Similar static embedding and covariate-fusion designs could be tested on solar or other renewable forecasting tasks where site metadata also influences output.
Load-bearing premise
The large-scale U.S. pretraining dataset is sufficiently representative of new sites including U.K. locations so that the static site embedding and PAMF module produce genuine zero-shot generalization rather than U.S.-specific patterns.
What would settle it
Measured forecast errors on a fresh collection of sites whose terrain, climate, or ecoregion distributions lie substantially outside the U.S. pretraining distribution would exceed the reported reductions if the representativeness assumption does not hold.
Figures
read the original abstract
Global wind power capacity, especially in China, is booming, with new farms spanning diverse terrains and climates. The industry urgently needs accurate wind power foundation models to shorten commissioning and accelerate grid connection. This is because site-specific time series models (TSMs) are not well suited to data-scarce scenarios and generalize poorly, while generic large time series models (LTSMs) are mostly limited to univariate inputs and cannot fully exploit static site attributes or the dependencies between power and meteorological covariates, leading to insufficient accuracy. To fill this gap, we propose \textbf{Tyan-WP}, the first wind power foundation model for ultra-short-term probabilistic forecasting. Pretrained on a large-scale wind power dataset covering more than 126,000 U.S. sites over seven years, Tyan-WP further improves zero-shot forecasting through two domain-specific module designs: static site embedding using coordinate, terrain, and ecoregion metadata, and a power-aware meteorological fusion (PAMF) module that models interactions between historical power and meteorological covariates. Under a unified evaluation protocol, Tyan-WP surpasses eight site-specific supervised TSMs on 10 in-domain sites and outperforms eleven generic LTSMs on 127 in-domain sites, reducing MAE by 19.9%, RMSE by 16.6%, CRPS by 22.2%, and AQL by 21.7%, while raising R^2 by 16.7%. It further demonstrates strong cross-geography generalization on six real U.K. sites. These results show that the wind power foundation model can achieve accurate zero-shot forecasting without target-site training, providing a practical pathway for rapid turbine onboarding and probabilistic risk management at new wind farms.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Tyan-WP, a foundation model for ultra-short-term probabilistic wind power forecasting pretrained on data from over 126,000 U.S. sites spanning seven years. It incorporates a static site embedding module (using coordinate, terrain, and ecoregion metadata) and a power-aware meteorological fusion (PAMF) module to model interactions between power and meteorological covariates. The central claims are that Tyan-WP outperforms eight site-specific supervised TSMs on 10 in-domain sites and eleven generic LTSMs on 127 in-domain sites (with reported reductions of 19.9% MAE, 16.6% RMSE, 22.2% CRPS, 21.7% AQL and 16.7% R² increase) under a unified protocol, while also demonstrating strong zero-shot cross-geography generalization on six real U.K. sites without target-site training.
Significance. If the empirical claims hold after verification, the work would be significant for the wind energy sector by offering a practical pathway to accurate probabilistic forecasting at new sites with limited data, shortening commissioning times. The scale of the U.S. pretraining corpus (126k sites) represents a concrete strength for foundation-model-style transfer in this domain.
major comments (2)
- [Abstract and §4] Abstract and §4 (Experimental Setup): the unified evaluation protocol reports specific metric deltas but supplies no information on data splits, statistical significance tests, baseline hyperparameter matching, or controls for model size differences. These omissions are load-bearing for the in-domain superiority claims.
- [§5.3] §5.3 (U.K. Generalization Results): the claim of strong cross-geography zero-shot transfer on six U.K. sites rests on the static site embedding and PAMF module bridging U.S. and U.K. distributions, yet no quantitative domain-shift diagnostics (covariate histograms, embedding-space distances, or per-site error breakdowns) are provided to compare the 126k-site U.S. pretraining distribution against the U.K. test sites. This directly affects the generalization claim.
minor comments (1)
- [Throughout] Ensure all acronyms (TSM, LTSM, PAMF, AQL) are defined on first use and used consistently in figure captions and tables.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback. The comments highlight important aspects of the evaluation protocol and generalization analysis that merit clarification and expansion. We address each major comment below, indicating where revisions will be made to strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract and §4] Abstract and §4 (Experimental Setup): the unified evaluation protocol reports specific metric deltas but supplies no information on data splits, statistical significance tests, baseline hyperparameter matching, or controls for model size differences. These omissions are load-bearing for the in-domain superiority claims.
Authors: We agree that the current description of the unified evaluation protocol in §4 is insufficiently detailed. In the revised manuscript we will expand this section to explicitly document: (i) the precise train/validation/test split ratios and temporal partitioning used for the 10 in-domain and 127 in-domain experiments, (ii) the results of statistical significance tests (paired t-tests or Wilcoxon signed-rank tests with p-values) on all reported metric improvements, (iii) the hyperparameter search procedure applied uniformly to both Tyan-WP and the eight site-specific TSM baselines, and (iv) a table comparing parameter counts and computational budgets across all compared models to address capacity differences. These additions will be incorporated without changing the numerical results already reported. revision: yes
-
Referee: [§5.3] §5.3 (U.K. Generalization Results): the claim of strong cross-geography zero-shot transfer on six U.K. sites rests on the static site embedding and PAMF module bridging U.S. and U.K. distributions, yet no quantitative domain-shift diagnostics (covariate histograms, embedding-space distances, or per-site error breakdowns) are provided to compare the 126k-site U.S. pretraining distribution against the U.K. test sites. This directly affects the generalization claim.
Authors: We concur that quantitative domain-shift analysis would better substantiate the cross-geography zero-shot claims. In the revised §5.3 we will add: (i) side-by-side histograms and Kolmogorov-Smirnov tests for key meteorological covariates between the U.S. pretraining corpus and the six U.K. sites, (ii) Euclidean or cosine distances in the learned static embedding space between U.S. and U.K. site embeddings, and (iii) per-site MAE, CRPS and error distribution plots for the U.K. zero-shot forecasts. These diagnostics will be computed from the existing trained model and data and included to directly illustrate the bridging effect of the static embeddings and PAMF module. revision: yes
Circularity Check
No circularity: empirical evaluation on held-out data after separate pretraining
full rationale
The paper reports standard empirical results from pretraining on a large separate U.S. dataset (>126k sites) followed by evaluation on held-out in-domain sites and cross-geography U.K. sites. No equations, self-citations, or fitted parameters reduce the reported metrics (MAE, RMSE, CRPS, etc.) to quantities computed on the test sets by construction. The evaluation protocol is independent of the training data.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Global wind report 2026
Global Wind Energy Council. Global wind report 2026. 2026. https://www.gwec.net/reports/globalwindreport
2026
-
[2]
China’s newly installed wind and solar capacity exceeded 430 gw in 2025, reaching a record high
National Energy Administration. China’s newly installed wind and solar capacity exceeded 430 gw in 2025, reaching a record high. 2026. https://www.nea.gov.cn/20260212/d9f714e91a7f40d39282d87e384ea94a/c.html
arXiv 2025
-
[3]
National energy administration releases 2025 national power statistics
National Energy Administration. National energy administration releases 2025 national power statistics. 2026. https://www.nea.gov.cn/20260129/6874f211acd0417eab7ac10c3061a7c2/c.html
arXiv 2025
-
[4]
Hu J., Hu W., Cao D., Huang Y., Chen J., Li Y., et al. Bayesian averaging-enabled transfer learning method for probabilistic wind power forecasting of newly built wind farms. Appl Energy 2024;355:122185. https://doi.org/10.1016/j.apenergy.2023.122185
-
[5]
A deep asymmetric laplace neural network for deterministic and probabilistic wind power forecasting
Wang Y., Xu H., Zou R., Zhang L., Zhang F.. A deep asymmetric laplace neural network for deterministic and probabilistic wind power forecasting. Renew Energy 2022;196:497–517. https://doi.org/10.1016/j.renene.2022.07.009
-
[6]
S., Zareipour H., Malik O., Mandal P
Soman S. S., Zareipour H., Malik O., Mandal P.. A review of wind power and wind speed forecasting methods with different time horizons. In: North American Power Symposium 2010. IEEE; 2010. p. 1–8
2010
-
[7]
Jung J., Broadwater R. P.. Current status and future advances for wind speed and power forecasting. Renew Sustain Energy Rev 2014;31:762–
2014
-
[8]
https://doi.org/10.1016/j.rser.2013.12.054
-
[9]
Ultra-short-term wind power forecasting based on deep bayesian model with uncertainty
Liu L., Liu J., Ye Y., Liu H., Chen K., Li D., et al. Ultra-short-term wind power forecasting based on deep bayesian model with uncertainty. Renew Energy 2023;205:598–607. https://doi.org/10.1016/j.renene.2023.01.038
-
[10]
Approaches to wind power curve modeling: A review and discussion
Wang Y., Hu Q., Li L., Foley A., Srinivasan D.. Approaches to wind power curve modeling: A review and discussion. Renew Sustain Energy Rev 2019;116:109422. https://doi.org/10.1016/j.rser.2019.109422
-
[11]
Liu L., Wang X., Dong X., Chen K., Chen Q., Li B.. Interpretable feature-temporal transformer for short-term wind power forecasting with multivariate time series. Appl Energy 2024;374:124035. https://doi.org/10.1016/j.apenergy.2024.124035
-
[12]
Bulaevskaya V ., Wharton S., Clifton A., Qualley G., Miller W. O.. Wind power curve modeling in complex terrain using statistical models. J Renew Sustain Energy 2015;7:013103. https://doi.org/10.1063/1.4904430
-
[13]
A review on the recent history of wind power ramp forecasting
Gallego-Castillo C., Cuerva-Tejero Á., Lopez-Garcia O.. A review on the recent history of wind power ramp forecasting. Renew Sustain Energy Rev 2015;52:1148–1157. https://doi.org/10.1016/j.rser.2015.07.154
-
[14]
A review of wind speed and wind power forecasting with deep neural networks
Wang J., Li Y.. A review of wind speed and wind power forecasting with deep neural networks. Appl Energy 2021;304:117766. https://doi.org/10.1016/j.apenergy.2021.117766
-
[15]
Deep learning based ensemble approach for probabilistic wind power forecasting
Wang H.-z., Li G.-q., Wang G.-b., Peng J.-c., Jiang H., Liu Y.-t.. Deep learning based ensemble approach for probabilistic wind power forecasting. Appl Energy 2017;188:56–70. https://doi.org/10.1016/j.apenergy.2016.11.111
-
[16]
FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting
Zhou T., Ma Z., Wen Q., Wang X., Sun L., Jin R.. FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting. In: International Conference on Machine Learning. 2022. p. 27268–27286
2022
-
[17]
Are transformers effective for time series forecasting?
Zeng A., Chen M., Zhang L., Xu Q.. Are transformers effective for time series forecasting?. In: AAAI Conference on Artificial Intelligence
-
[18]
Non-stationary transformers: Exploring the stationarity in time series forecasting
Liu Y., Wu H., Wang J., Long M.. Non-stationary transformers: Exploring the stationarity in time series forecasting. In: Advances in Neural Information Processing Systems. 2022. p. 9881–9893. Huang, Luo et al.: Preprint submitted to Elsevier Page 19 of 20
2022
-
[19]
TimesNet: Temporal 2d-variation modeling for general time series analysis
Wu H., Hu T., Liu Y., Zhou H., Wang J., Long M.. TimesNet: Temporal 2d-variation modeling for general time series analysis. In: International Conference on Learning Representations. 2023
2023
-
[20]
H., Sinthong P., Kalagnanam J
Nie Y., Nguyen N. H., Sinthong P., Kalagnanam J.. A time series is worth 64 words: Long-term forecasting with transformers. In: International Conference on Learning Representations. 2023
2023
-
[21]
iTransformer: Inverted transformers are effective for time series forecasting
Liu Y., Hu T., Zhang H., Wu H., Wang S., Ma L., et al. iTransformer: Inverted transformers are effective for time series forecasting. In: International Conference on Learning Representations. 2024
2024
-
[22]
TimeXer: Empowering transformers for time series forecasting with exogenous variables
Wang Y., Wu H., Dong J., Liu Y., Qiu Y., Zhang H., et al. TimeXer: Empowering transformers for time series forecasting with exogenous variables. In: Advances in Neural Information Processing Systems. 2024
2024
-
[23]
TimeMixer: Decomposable multiscale mixing for time series forecasting
Wang S., Wu H., Shi X., Hu T., Luo H., Ma L., et al. TimeMixer: Decomposable multiscale mixing for time series forecasting. In: International Conference on Learning Representations. 2024
2024
-
[24]
MOMENT: A family of open time-series foundation models
Goswami M., Szafer K., Choudhry A., Cai Y., Li S., Dubrawski A.. MOMENT: A family of open time-series foundation models. In: International Conference on Machine Learning. 2024
2024
-
[25]
Time-MoE: Billion-scale time series foundation models with mixture of experts
Shi X., Wang S., Nie Y., Li D., Ye Z., Wen Q., et al. Time-MoE: Billion-scale time series foundation models with mixture of experts. arXiv preprint arXiv:2409.16040. 2025. https://arxiv.org/abs/2409.16040
arXiv 2025
-
[26]
Timer: Generative pre-trained transformers are large time series models
Liu Y., Zhang H., Li C., Huang X., Wang J., Long M.. Timer: Generative pre-trained transformers are large time series models. arXiv preprint arXiv:2402.02368. 2024. https://arxiv.org/abs/2402.02368
arXiv 2024
-
[27]
Timer-XL: Long-context transformers for unified time series forecasting
Liu Y., Qin G., Huang X., Wang J., Long M.. Timer-XL: Long-context transformers for unified time series forecasting. arXiv preprint arXiv:2410.04803. 2024. https://arxiv.org/abs/2410.04803
arXiv 2024
-
[28]
Timer-S1: A billion-scale time series foundation model with serial scaling
Liu Y., Su X., Wang S., Zhang H., Liu H., Wang Y., et al. Timer-S1: A billion-scale time series foundation model with serial scaling. arXiv preprint arXiv:2603.04791. 2026. https://arxiv.org/abs/2603.04791
Pith/arXiv arXiv 2026
-
[29]
Sundial: A family of highly capable time series foundation models
Liu Y., Qin G., Shi Z., Huang X., Wang J., Long M.. Sundial: A family of highly capable time series foundation models. arXiv preprint arXiv:2502.00816. 2025. https://arxiv.org/abs/2502.00816
Pith/arXiv arXiv 2025
-
[30]
TiRex: Zero-shot forecasting across long and short horizons with enhanced in-context learning
Auer A., Podest P., Klotz D., Böck S., Klambauer G., Hochreiter S.. TiRex: Zero-shot forecasting across long and short horizons with enhanced in-context learning. arXiv preprint arXiv:2505.23719. 2025. https://arxiv.org/abs/2505.23719
arXiv 2025
-
[31]
A decoder-only foundation model for time-series forecasting
Das A., Kong W., Sen R., Zhou Y.. A decoder-only foundation model for time-series forecasting. arXiv preprint arXiv:2310.10688. 2024. https://arxiv.org/abs/2310.10688
Pith/arXiv arXiv 2024
-
[32]
F., Stella L., Turkmen C., Zhang X., Mercado P., Shen H., et al
Ansari A. F., Stella L., Turkmen C., Zhang X., Mercado P., Shen H., et al. Chronos: Learning the language of time series. Trans Mach Learn Res 2024. https://arxiv.org/abs/2403.07815
Pith/arXiv arXiv 2024
-
[33]
F., Shchur O., Küken J., Auer A., Han B., Mercado P., et al
Ansari A. F., Shchur O., Küken J., Auer A., Han B., Mercado P., et al. Chronos-2: From univariate to universal forecasting. arXiv preprint arXiv:2510.15821. 2025. https://arxiv.org/abs/2510.15821
Pith/arXiv arXiv 2025
-
[34]
Unified training of universal time series forecasting transformers
Woo G., Liu C., Kumar A., Xiong C., Savarese S., Sahoo D.. Unified training of universal time series forecasting transformers. In: International Conference on Machine Learning. 2024
2024
-
[35]
Moirai-MoE: Empowering time series foundation models with sparse mixture of experts
Liu X., Liu J., Woo G., Aksu T., Liang Y., Zimmermann R., et al. Moirai-MoE: Empowering time series foundation models with sparse mixture of experts. In: International Conference on Machine Learning. 2025
2025
-
[36]
Moirai 2.0: When less is more for time series forecasting
Liu C., Aksu T., Liu J., Liu X., Yan H., Pham Q., et al. Moirai 2.0: When less is more for time series forecasting. arXiv preprint arXiv:2511.11698. 2025. https://arxiv.org/abs/2511.11698
arXiv 2025
-
[37]
Yu R., Gu C., Stiasny J., Wen Q., Dilov W. S., Qi L., et al. PriceFM: Foundation model for probabilistic electricity price forecasting. arXiv preprint arXiv:2508.04875. 2025. https://arxiv.org/abs/2508.04875
Pith/arXiv arXiv 2025
-
[38]
Kronos: A foundation model for the language of financial markets
Shi Y., Fu Z., Chen S., Zhao B., Xu W., Zhang C., et al. Kronos: A foundation model for the language of financial markets. arXiv preprint arXiv:2508.02739. 2025. https://arxiv.org/abs/2508.02739
arXiv 2025
-
[39]
MIRA: Medical time series foundation model for real-world health data
Li H., Deng B., Xu C., Feng Z., Schlegel V ., Huang Y.-H., et al. MIRA: Medical time series foundation model for real-world health data. arXiv preprint arXiv:2506.07584. 2025. https://arxiv.org/abs/2506.07584
arXiv 2025
-
[40]
Draxl C., Clifton A., Hodge B.-M., McCaa J.. The wind integration national dataset (WIND) toolkit. Appl Energy 2015;151:355–366. https://doi.org/10.1016/j.apenergy.2015.03.121
-
[41]
N., et al
Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A. N., et al. Attention is all you need. In: Advances in Neural Information Processing Systems. 2017. p. 5998–6008
2017
-
[43]
https://arxiv.org/abs/2104.09864
-
[44]
Outrageously large neural networks: The sparsely-gated mixture-of- experts layer
Shazeer N., Mirhoseini A., Maziarz K., Davis A., Le Q., Hinton G., et al. Outrageously large neural networks: The sparsely-gated mixture-of- experts layer. In: International Conference on Learning Representations. 2017
2017
-
[45]
Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity
Fedus W., Zoph B., Shazeer N.. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. J Mach Learn Res 2022;23:1–39. https://arxiv.org/abs/2101.03961
Pith/arXiv arXiv 2022
-
[46]
Econometrica46(1), 33–50 (1978) https://doi.org/10.2307/1913643
Koenker R., Bassett G.. Regression quantiles. Econometrica 1978;46:33–50. https://doi.org/10.2307/1913643
-
[47]
Plumley C., Takeuchi R.. Kelmarsh wind farm data. Zenodo. 2025. https://doi.org/10.5281/zenodo.16807551
-
[48]
W., Pinson P., Browell J., Bjerregärd M
Messner J. W., Pinson P., Browell J., Bjerregärd M. B., Schicker I.. Evaluation of wind power forecasts – an up-to-date view. Wind Energy 2020;23:1461–1481. https://doi.org/10.1002/we.2497
-
[49]
Gneiting T., Raftery A. E.. Strictly proper scoring rules, prediction, and estimation. J Am Stat Assoc 2007;102:359–378. https://doi.org/10.1198/016214506000001437
-
[50]
Accurate medium-range global weather forecasting with 3D neural networks , volume =
Bi K., Xie L., Zhang H., Chen X., Gu X., Tian Q.. Accurate medium-range global weather forecasting with 3d neural networks. Nature 2023;619:533–538. https://doi.org/10.1038/s41586-023-06185-3
-
[51]
Chen K., Han T., Ling F., Gong J., Bai L., Wang X., et al. The operational medium-range deterministic weather forecasting can be extended beyond a 10-day lead time. Commun Earth Environ 2025;6:518. https://doi.org/10.1038/s43247-025-02502-y Huang, Luo et al.: Preprint submitted to Elsevier Page 20 of 20
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.