Probabilistic Low-Voltage Peak Load Forecasting with Time Series Foundation Models Evaluated on Application-Oriented Metrics
Pith reviewed 2026-07-03 17:21 UTC · model grok-4.3
The pith
Time series foundation models outperform baselines on probabilistic low-voltage peak load forecasts and tie accuracy to grid cost-risk trade-offs via a new metric.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Chronos-2 and the other foundation models achieve superior performance over the baselines across the 200 feeders, with the new application-oriented metric showing how improved peak prediction supports lower-cost asset decisions while controlling failure risk; the models also adapt to higher uncertainty when weather covariates are omitted.
What carries the argument
The novel application-oriented metric that converts peak-forecast skill into an explicit cost-reduction versus failure-risk trade-off for grid asset planning and operation.
If this is right
- Operators can use Chronos-2 outputs to reduce asset over-provisioning while keeping failure risk within acceptable bounds.
- Foundation models remain effective for low-voltage forecasting even when weather forecasts are unavailable or unreliable.
- Peak-focused evaluation reveals advantages that standard error metrics miss.
- The same models can support probabilistic planning without extensive manual feature engineering.
Where Pith is reading between the lines
- The metric could be adapted to medium-voltage or transmission-level planning problems where similar cost-risk tensions exist.
- Retraining the foundation models on more diverse feeder data might further improve generalization beyond the tested 200 sites.
- The observed robustness to missing weather data suggests these models could integrate easily into systems with incomplete sensor coverage.
Load-bearing premise
The 200 real-world low-voltage feeders are representative of the wider population and the new metric correctly reflects the actual cost-risk trade-off that operators face.
What would settle it
Running the same models and metric on a substantially larger or differently distributed set of feeders and checking whether the implied cost-risk balance matches recorded planning costs and observed failure events.
Figures
read the original abstract
Low-voltage load forecasting is an important component in current and future energy systems with a high degree of electrification and decentralized generation. However, current forecasting methods require significant manual effort, often lack uncertainty estimation and proper peak prediction, and they are often not adequately evaluated in terms of grid requirements. In the present study, we provide an extensive evaluation of short-term net load forecasts of 200 real-world low-voltage feeders with a focus on the rapidly evolving time series foundation models. Our study compares Chronos-Bolt, Chronos-2 and TabPFN-TS to six baseline models and demonstrates superior performance, in particular for Chronos-2. An ablation study, in which weather covariates are omitted, shows that time series foundation models adapt to increased uncertainty, despite the importance of weather information. A novel application-oriented metric links the model's forecasting capabilities in peak prediction to the trade-off in grid asset planning and operation between cost reduction and minimizing the risk of failure.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper evaluates short-term probabilistic net load forecasting on 200 real-world low-voltage feeders, comparing three time series foundation models (Chronos-Bolt, Chronos-2, TabPFN-TS) against six baselines. It reports superior performance for Chronos-2, presents an ablation study omitting weather covariates to show adaptation to uncertainty, and introduces a novel application-oriented metric intended to connect peak-prediction accuracy to the cost-risk trade-off in grid asset planning and operation.
Significance. If the empirical superiority and metric hold under scrutiny, the work would be significant for demonstrating practical utility of foundation models in energy systems with real feeder data and an application-specific evaluation. The ablation study and use of 200 feeders are strengths that support the central empirical claim.
major comments (2)
- [Abstract] Abstract: the novel application-oriented metric is presented as linking peak prediction to the explicit cost-risk trade-off faced by grid operators, yet no derivation, mapping to cost or failure-probability functions, or external validation against operator decisions is supplied. This is load-bearing for the claim of practical utility.
- [Results] Results / Discussion: the claim that the 200 feeders support generalization of Chronos-2 superiority requires explicit discussion of representativeness and sampling; without it, performance on this specific set does not establish broader applicability to the population of low-voltage feeders.
minor comments (2)
- [Methods] Methods: baseline implementations, exact probabilistic metrics (e.g., CRPS variants), statistical significance tests, and train/test splits should be described with sufficient detail for reproducibility.
- Ensure figure captions and table footnotes clearly define all abbreviations and units used in the application-oriented metric.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address the two major comments below and will revise the manuscript to incorporate additional details where needed.
read point-by-point responses
-
Referee: [Abstract] Abstract: the novel application-oriented metric is presented as linking peak prediction to the explicit cost-risk trade-off faced by grid operators, yet no derivation, mapping to cost or failure-probability functions, or external validation against operator decisions is supplied. This is load-bearing for the claim of practical utility.
Authors: We agree that the presentation of the novel metric would be strengthened by an explicit derivation and mapping to cost and failure-probability functions. In the revised manuscript we will expand the relevant section (and update the abstract accordingly) to include the mathematical formulation of the metric, its connection to asset cost-risk trade-offs, and a clearer statement of its assumptions and limitations. No external validation against real operator decisions was performed, as the metric is intended as a proxy; we will note this explicitly. revision: yes
-
Referee: [Results] Results / Discussion: the claim that the 200 feeders support generalization of Chronos-2 superiority requires explicit discussion of representativeness and sampling; without it, performance on this specific set does not establish broader applicability to the population of low-voltage feeders.
Authors: We concur that an explicit discussion of dataset representativeness is required. In the revision we will add a paragraph in the Results/Discussion section describing the sampling procedure used to select the 200 feeders, key characteristics (geographic distribution, load types, voltage levels), and the limitations on generalizing Chronos-2 superiority beyond this sample to the full population of low-voltage feeders. revision: yes
Circularity Check
No circularity: purely empirical comparison on held-out data
full rationale
The paper conducts an empirical model comparison of Chronos variants and TabPFN-TS against six baselines on 200 real-world low-voltage feeders, with an ablation and a novel application-oriented metric for peak prediction. No equations, derivations, or self-citations are load-bearing; performance claims rest on external held-out data and baselines rather than any reduction of predictions to fitted inputs or self-defined quantities. The novel metric is introduced without derivation from the paper's own parameters, satisfying the self-contained empirical benchmark criterion.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The 200 low-voltage feeders constitute a representative sample for evaluating general forecasting performance and the new metric.
Reference graph
Works this paper leans on
-
[1]
Abdul Fatir Ansari, Caner Turkmen, Oleksandr Shchur, and Lorenzo Stella. 2024. Fast and accurate zero-shot forecasting with Chronos-Bolt and AutoGluon. AWS Artificial Intelligence. (Dec. 2024). Retrieved Apr. 24, 2026 from https://aws.am azon.com/de/blogs/machine-learning/fast-and-accurate-zero-shot-forecasti ng-with-chronos-bolt-and-autogluon/
2024
-
[2]
Yvenn Amara-Ouali, Bachir Hamrouche, Guillaume Principato, and Yannig Goude. 2025. Quantifying the Uncertainty of Electric Vehicle Charging with Probabilistic Load Forecasting.World Electric Vehicle Journal, 16, 2, (Feb. 2025),
2025
-
[3]
doi:10.3390/wevj16020088
-
[4]
Abdul Fatir Ansari et al. 2025. Chronos-2: From Univariate to Universal Fore- casting. (Oct. 2025). arXiv: 2510.15821[cs]
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[5]
Abdul Fatir Ansari et al. 2024. Chronos: Learning the Language of Time Series. (Nov. 2024). arXiv: 2403.07815[cs]
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[6]
Jason Ansel et al. 2024. PyTorch 2: Faster machine learning through dynamic python bytecode transformation and graph compilation. In29th ACM Inter- national Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2 (ASPLOS ’24). ACM, (Apr. 2024). doi:10.1145/36206 65.3640366
- [7]
-
[8]
Marcel Arpogaus, Marcus Voss, Beate Sick, Mark Nigge-Uricher, and Oliver Dürr
-
[9]
Short-Term Density Forecasting of Low-Voltage Load Using Bernstein- Polynomial Normalizing Flows.IEEE Transactions on Smart Grid, 14, 6, (Nov. 2023), 4902–4911. doi:10.1109/TSG.2023.3254890
- [10]
-
[11]
Muhammad Awais, Muzammal Naseer, Salman Khan, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, and Fahad Shahbaz Khan
-
[12]
IEEE Transactions on Pattern Analysis and Machine Intelligence, 47, 4, (Apr
Foundation Models Defining a New Era in Vision: A Survey and Outlook. IEEE Transactions on Pattern Analysis and Machine Intelligence, 47, 4, (Apr. 2025), 2245–2264. doi:10.1109/TPAMI.2024.3506283
-
[13]
Cristian Bodnar et al. 2025. A foundation model for the Earth system.Nature, 641, 8065, (May 2025), 1180–1187. doi:10.1038/s41586-025-09005-y
-
[14]
On the Opportunities and Risks of Foundation Models
Rishi Bommasani, Drew A Hudson, Ehsan Adeli, Russ Altman, and Simran Arora. On the Opportunities and Risks of Foundation Models. (Aug. 2021). arXiv: 2108.07258[cs]
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[15]
Huseyin K. Cakmak and Veit Hagenmeyer. 2022. Using Open Data for Modeling and Simulation of the All Electrical Society in eASiMOV. In2022 Open Source Modelling and Simulation of Energy Systems (OSMSES). 2022 Open Source Mod- elling and Simulation of Energy Systems (OSMSES). IEEE, Aachen, Germany, (Apr. 2022), 1–6. doi:10.1109/OSMSES54027.2022.9769145
-
[16]
Zhaojing Cao, Can Wan, Zijun Zhang, Furong Li, and Yonghua Song. 2020. Hy- brid Ensemble Deep Learning for Deterministic and Probabilistic Low-Voltage Load Forecasting.IEEE Transactions on Power Systems, 35, 3, (May 2020), 1881–
2020
-
[17]
doi:10.1109/TPWRS.2019.2946701
-
[18]
Ching Chang, Wei-Yao Wang, Wen-Chih Peng, and Tien-Fu Chen. 2024. LLM4TS: Aligning Pre-Trained LLMs as Data-Efficient Time-Series Forecasters.ACM Transactions on Intelligent Systems and Technology, 16, 3, 1–20. arXiv: 2308.08469 [cs]. doi:10.1145/371920710.1145/3719207
work page internal anchor Pith review doi:10.1145/371920710.1145/3719207 2024
-
[19]
Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’16: The 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, San Francisco California USA, (Aug. 2016), 785–794. doi:10.1145/2939672.2939785
- [20]
-
[21]
Abhimanyu Das, Weihao Kong, Rajat Sen, and Yichen Zhou. 2024. A decoder- only foundation model for time-series forecasting. (Apr. 2024). arXiv: 2310.10688 [cs]. 12
work page internal anchor Pith review Pith/arXiv arXiv 2024
- [22]
- [23]
-
[24]
Vijay Ekambaram, Arindam Jati, Pankaj Dayama, Sumanta Mukherjee, Nam H. Nguyen, Wesley M. Gifford, Chandra Reddy, and Jayant Kalagnanam. 2024. Tiny time mixers (ttms): Fast pre-trained models for enhanced zero/few-shot forecasting of multivariate time series. InAdvances in Neural Information Pro- cessing Systems. A. Globerson, L. Mackey, D. Belgrave, A. F...
-
[25]
Vijay Ekambaram, Arindam Jati, Nam Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. 2023. TSMixer: Lightweight MLP-Mixer Model for Multivariate Time Series Forecasting. InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. KDD ’23: The 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. ACM, Long Beach CA...
-
[26]
Epoch AI. 2026. Data on AI models. (Apr. 2026). Retrieved Apr. 24, 2026 from https://epoch.ai/data/ai-models
2026
-
[27]
Anthony Faustine, Nuno Jardim Nunes, and Lucas Pereira. 2025. Efficiency Through Simplicity: MLP-Based Approach for Net-Load Forecasting With Un- certainty Estimates in Low-Voltage Distribution Networks.IEEE Transactions on Power Systems, 40, 1, (Jan. 2025), 46–56. doi:10.1109/TPWRS.2024.3400123
-
[28]
Kun Feng, Shaocheng Lan, Yuchen Fang, Wenchao He, Lintao Ma, Xingyu Lu, and Kan Ren. 2025. Kairos: Towards Adaptive and Generalizable Time Series Foundation Models. (Sept. 2025). arXiv: 2509.25826[cs]
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [29]
- [30]
-
[31]
Ciaran Gilbert, Jethro Browell, and Bruce Stephen. 2023. Probabilistic load fore- casting for the low voltage network: Forecast fusion and daily peaks.Sustainable Energy, Grids and Networks, 34, (June 2023), 100998. doi:10.1016/j.segan.2023.10 0998
-
[32]
J.M. González-Sopeña, V. Pakrashi, and B. Ghosh. 2021. An overview of per- formance evaluation metrics for short-term statistical wind power forecasting. Renewable and Sustainable Energy Reviews, 138, (Mar. 2021), 110515. doi:10.1016 /j.rser.2020.110515
- [33]
- [34]
-
[35]
Stephen Haben, Siddharth Arora, Georgios Giasemidis, Marcus Voss, and Danica Vukadinovic Greetham. 2021. Review of Low Voltage Load Forecasting: Methods, Applications, and Recommendations.Applied Energy, 304, (Dec. 2021), 117798. doi:10.1016/j.apenergy.2021.117798
-
[36]
Stephen Haben, Georgios Giasemidis, Florian Ziel, and Siddharth Arora. 2019. Short term load forecasting and the effect of temperature at the low voltage level.International Journal of Forecasting, 35, 4, (Oct. 2019), 1469–1484. doi:10.1 016/j.ijforecast.2018.10.007
2019
-
[37]
Stephen Haben, Jonathan Ward, Danica Vukadinovic Greetham, Colin Singleton, and Peter Grindrod. 2014. A new error measure for forecasts of household- level, high resolution electrical energy consumption.International Journal of Forecasting, 30, 2, (Apr. 2014), 246–256. doi:10.1016/j.ijforecast.2013.08.002
-
[38]
Hendrik F. Hamann et al. 2024. Foundation models for the electric power grid. Joule, 8, 12, (Dec. 2024), 3245–3258. doi:10.1016/j.joule.2024.11.002
-
[39]
Benedikt Heidrich, Matthias Hertel, Oliver Neumann, Veit Hagenmeyer, and Ralf Mikut. 2024. Using conditional Invertible Neural Networks to perform mid-term peak load forecasting.IET Smart Grid, 7, 4, (Apr. 2024), 460–472. doi:10.1049/stg2.12169
-
[40]
Matthias Hertel, Sebastian Pütz, Jonathan Kolar, Benjamin Schäfer, Ralf Mikut, and Veit Hagenmeyer. 2026. A Benchmark for Electrical Load Forecasting across Grid Levels: Time-Series Transformers outperform established Methods. In 15th DACH+ Conference on Energy Informatics. Linz, Austria, (Sept. 2026). accepted
2026
-
[41]
2013.Elektrische En- ergieversorgung: Erzeugung, Übertragung und Verteilung elektrischer Energie für Studium und Praxis
Klaus Heuck, Klaus-Dieter Dettmann, and Detlef Schulz. 2013.Elektrische En- ergieversorgung: Erzeugung, Übertragung und Verteilung elektrischer Energie für Studium und Praxis. Springer Fachmedien Wiesbaden, Wiesbaden. doi:10.1007 /978-3-8348-2174-4
2013
-
[42]
Noah Hollmann, Samuel Müller, Katharina Eggensperger, and Frank Hutter
-
[43]
TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second. (Sept. 2023). arXiv: 2207.01848[cs]
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[44]
Noah Hollmann, Samuel Müller, Lennart Purucker, Arjun Krishnakumar, Max Körfer, Shi Bin Hoo, Robin Tibor Schirrmeister, and Frank Hutter. 2025. Accurate predictions on small data with a tabular foundation model.Nature, 637, 8045, (Jan. 2025), 319–326. doi:10.1038/s41586-024-08328-6
- [45]
-
[46]
Haowen Hou and F. Richard Yu. 2024. RWKV-TS: Beyond Traditional Recurrent Neural Network for Time Series Tasks. (Jan. 2024). arXiv: 2401.09093[cs]
-
[47]
Ming Jin et al. 2024. Time-LLM: Time Series Forecasting by Reprogramming Large Language Models. (Jan. 2024). arXiv: 2310.01728[cs]
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[48]
Yuxuan Liang, Haomin Wen, Yuqi Nie, Yushan Jiang, Ming Jin, Dongjin Song, Shirui Pan, and Qingsong Wen. 2024. Foundation Models for Time Series Anal- ysis: A Tutorial and Survey. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. KDD ’24: The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. ACM, Barce...
-
[49]
Arik, Nicolas Loeff, and Tomas Pfister
Bryan Lim, Sercan O. Arik, Nicolas Loeff, and Tomas Pfister. 2020. Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting. (Sept. 2020). arXiv: 1912.09363[stat]
- [50]
- [51]
-
[52]
Yong Liu, Guo Qin, Zhiyuan Shi, Zhi Chen, Caiyin Yang, Xiangdong Huang, Jianmin Wang, and Mingsheng Long. 2025. Sundial: A Family of Highly Capable Time Series Foundation Models. (May 2025). arXiv: 2502.00816[cs]
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [53]
-
[54]
Stefan Meisenbacher, Johannes Galenzowski, Kevin Förderer, Wolfgang Suess, Simon Waczowicz, Ralf Mikut, and Veit Hagenmeyer. 2025. Automation Level Taxonomy for Time Series Forecasting Services: Guideline for Real-World Smart Grid Applications. InEnergy Informatics. Vol. 15271. Bo Nørregaard Jørgensen, Zheng Grace Ma, Fransisco Danang Wijaya, Roni Irnawan...
-
[55]
Marcel Meyer, Sascha Kaltenpoth, Kevin Zalipski, Henrik Albers, and Oliver Müller. 2025. TS-Arena Technical Report – A Pre-registered Live Forecasting Platform. (Dec. 2025). arXiv: 2512.20761[cs]
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[56]
Marcel Meyer, David Zapata, Sascha Kaltenpoth, and Oliver Müller. 2025. Bench- marking Time Series Foundation Models for Short-Term Household Electricity Load Forecasting.IEEE Access, 13, 218141–218153. doi:10.1109/ACCESS.2025.36 48056
-
[57]
A. Moreno-Munoz, J. J. G. De La Rosa, R. Posadillo, and V. Pallares. 2008. Short term forecasting of solar radiation. In2008 IEEE International Symposium on Industrial Electronics. 2008 IEEE International Symposium on Industrial Elec- tronics (ISIE 2008). IEEE, Cambridge, UK, (June 2008), 1537–1541. doi:10.1109 /ISIE.2008.4676880
-
[58]
A Time Series is Worth 64 Words: Long-term Forecasting with Transformers
Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. 2023. A Time Series is Worth 64 Words: Long-term Forecasting with Transformers. (Mar. 2023). arXiv: 2211.14730[cs]
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[59]
Zarzalejo, Andreas Kazantzidis, and Stefan Wilbert
Bijan Nouri, Yann Fabel, Niklas Blum, Dominik Schnaus, Luis F. Zarzalejo, Andreas Kazantzidis, and Stefan Wilbert. 2024. Ramp Rate Metric Suitable for Solar Forecasting.Solar RRL, 8, 24, (Dec. 2024), 2400468. doi:10.1002/solr.202400 468
-
[60]
Kashif Rasul et al. 2023. Lag-Llama: Towards Foundation Models for Time Series Forecasting. InR0-FoMo: Workshop on Robustness of Few-shot and Zero-shot Learning in Foundation Models at NeurIPS 2023. NeurIPS 2023
2023
-
[61]
Johannes Schneider, Christian Meske, and Pauline Kuss. 2024. Foundation Models: A New Paradigm for Artificial Intelligence.Business & Information Systems Engineering, 66, 2, (Apr. 2024), 221–231. doi:10.1007/s12599-024-00851- 0
- [62]
-
[63]
Shreyashi Shukla and Tao Hong. 2024. BigDEAL Challenge 2022: Forecasting peak timing of electricity demand.IET Smart Grid, 7, 4, 442–459. doi:10.1049/st g2.12162
work page doi:10.1049/st 2024
- [64]
-
[65]
Manuel Treutlein, Marc Schmidt, Roman Hahn, Matthias Hertel, Benedikt Hei- drich, Ralf Mikut, and Veit Hagenmeyer. 2025. Generating peak-aware pseudo- measurements for low-voltage feeders using metadata of distribution system operators.IET Smart Grid, 8, 1, (Jan. 2025), e12210. doi:10.1049/stg2.12210
- [66]
-
[67]
Çakmak, and Veit Hagenmeyer
Dorina Werling, Benedikt Heidrich, Hüseyin K. Çakmak, and Veit Hagenmeyer
-
[68]
InProceedings of the Thirteenth ACM International Conference on Future Energy Systems
Towards line-restricted dispatchable feeders using probabilistic forecasts for PV-dominated low-voltage distribution grids. InProceedings of the Thirteenth ACM International Conference on Future Energy Systems. E-Energy ’22: The Thirteenth ACM International Conference on Future Energy Systems. ACM, Virtual Event, (June 2022), 395–400. doi:10.1145/3538637.3538868
-
[69]
Robert L. Winkler. 1972. A decision-theoretic approach to interval estimation. Journal of the American Statistical Association, 67, 337, 187–191. doi:10.1080/016 21459.1972.10481224
work page doi:10.1080/016 1972
- [70]
-
[71]
arXiv preprint arXiv:2302.11939 , year=
Tian Zhou, PeiSong Niu, Xue Wang, Liang Sun, and Rong Jin. 2023. One Fits All: Power General Time Series Analysis by Pretrained LM. (Oct. 2023). arXiv: 2302.11939[cs]. A Time Series Foundation Models A.1 Paradigm Change Foundation models are large-scale models trained on broad data frequently using self-supervision [11]. They are often characterized by em...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.