Recognition: unknown
A Physics-Aware Framework for Short-Term GPU Power Forecasting of AI Data Centers
Pith reviewed 2026-05-10 15:22 UTC · model grok-4.3
The pith
A physics-informed DLinear model forecasts AI data center GPU power 5-80 minutes ahead while respecting thermal dynamics and outperforming prior methods.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We derive time-dependent ordinary differential equations from a multi-node lumped thermal resistance-capacitance network based on Newton's law of cooling; these ODEs interlink GPU power consumption, compute and memory utilization, and temperature. We incorporate the resulting physics constraints into a DLinear architecture to obtain PI-DLinear. When trained and evaluated on real AI data center measurements, PI-DLinear yields short-term forecasts (5-80 minutes) whose accuracy exceeds that of tested state-of-the-art models and whose profiles remain physically consistent under power throttling and load transients.
What carries the argument
PI-DLinear, formed by embedding newly derived time-dependent ODEs from a multi-node lumped thermal RC network into the DLinear time-series model so that power, utilization, and temperature predictions must satisfy the thermal dynamics.
If this is right
- Forecasts remain physically consistent during power throttling and load transient events.
- Averaged across look-back and prediction windows, accuracy improves by 0.782%-39.08% in MSE, 0.993%-51.82% in MAE, and 0.370%-22.28% in RMSE relative to tested SOTA models.
- The approach supports reliable short-term power-demand predictions over horizons of 5 to 80 minutes on real AI data center data.
- Better anticipation of power fluctuations helps mitigate risks to electricity-grid stability.
Where Pith is reading between the lines
- The same ODE-based physics injection could be tested inside other linear or attention-based forecasters to see whether accuracy and consistency gains transfer.
- Grid operators could feed these forecasts into real-time balancing algorithms to reduce the impact of AI-facility peaks.
- Applying the thermal RC network to different hardware generations or cooling setups would test whether the derived equations remain valid beyond the original dataset.
- Extending the model to include additional variables such as cooling-fan speed or ambient conditions might capture finer transient behavior.
Load-bearing premise
The multi-node lumped thermal resistance-capacitance network based on Newton's law of cooling correctly captures how GPU power, compute and memory utilization, and temperature evolve together through the derived ODEs.
What would settle it
If measured GPU power, utilization, and temperature traces during load transients show that the model's forecasts violate the temperature-power relationships required by the derived ODEs, the claim that the physics is respected would be falsified.
Figures
read the original abstract
AI data centers experience rapid fluctuations in power demand due to the heterogeneity of computational tasks that they have to support. For example, the power profile of inference and training of large language models (LLMs) is quite distinct and big divergences can result in the instability of the underlying electricity grid. In this paper we propose, to the best of our knowledge, the first physics-informed DLinear time-series model that can accurately forecast power utilization of an AI data center 5-80 minutes (short-term forecasting) into the future. The physics, based on a multi-node lumped thermal resistance-capacitance (RC) network consistent with Newton's law of cooling, is captured using newly derived time-dependent ordinary differential equations (ODE) that separately models and interlinks power consumption with the GPU compute and memory utilization and temperature. The resulting model, that we refer to as PI-DLinear, trained and evaluated on a real AI data center dataset and is not only more accurate than the state-of-the-art (SOTA) models tested, but the forecast profile respects the underlying physics under power throttling and load transient events. Relative to the SOTA transformer-based and non-transformer-based models, improvements in forecasting accuracy (averaged across all look-back and prediction windows) range from 0.782%-39.08% for MSE, 0.993%-51.82% for MAE, and 0.370%-22.28% for RMSE.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to introduce the first physics-informed DLinear (PI-DLinear) model for short-term (5-80 min) forecasting of GPU power utilization in AI data centers. It derives time-dependent ODEs from a multi-node lumped RC thermal network based on Newton's law of cooling to interlink power consumption, GPU compute/memory utilization, and temperature. The model is trained and tested on real AI data center data, showing improved accuracy over SOTA transformer and non-transformer models with average gains of 0.782%-39.08% in MSE, 0.993%-51.82% in MAE, and 0.370%-22.28% in RMSE across look-back and prediction windows, while ensuring forecasts respect physical constraints during power throttling and load transient events.
Significance. If the physics-informed component successfully enforces consistency without sacrificing accuracy, this framework could have significant impact on power management and grid stability for AI data centers handling variable workloads like LLM training and inference. The combination of a simple, stable DLinear backbone with physics constraints via ODEs offers an efficient alternative to complex transformers, and the use of real-world data strengthens the practical relevance. Credit is due for the explicit derivation of the ODEs and the focus on physically plausible outputs.
minor comments (1)
- [Abstract] Grammatical issue in the sentence describing the model: 'The resulting model, that we refer to as PI-DLinear, trained and evaluated on a real AI data center dataset and is not only more accurate...' should be rephrased for clarity, e.g., 'The resulting model, referred to as PI-DLinear, is trained and evaluated on a real AI data center dataset and is not only more accurate...'.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of our work on the PI-DLinear model and for recommending minor revision. The recognition of the novelty in deriving time-dependent ODEs from the lumped RC thermal network, the efficiency of the DLinear backbone, and the practical value of real AI data center data is appreciated. No specific major comments were raised in the report.
Circularity Check
No significant circularity; derivation grounded in independent physics and data-driven training
full rationale
The paper derives time-dependent ODEs from Newton's law of cooling and a standard multi-node lumped RC network, which are external physical principles not defined in terms of the target forecasts. These ODEs interlink power, utilization, and temperature as an approximation whose parameters are fitted or taken from hardware specs. The PI-DLinear model embeds this physics-informed structure into the DLinear backbone and trains it on real data-center traces; reported accuracy gains are measured against external SOTA baselines using MSE/MAE/RMSE on held-out windows. No equation reduces the forecast output to a re-expression of the input data or a self-citation chain. The central claim therefore remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- RC network parameters
axioms (2)
- domain assumption Newton's law of cooling governs heat transfer in the GPU system.
- domain assumption A multi-node lumped-parameter approximation sufficiently captures GPU thermal dynamics for short-term forecasting.
Reference graph
Works this paper leans on
-
[1]
Association for Computing Machinery. ISBN 9798400700323. doi:10.1145/3575813.3595197. URL https://doi.org/10.1145/3575813.3595197. Mark Haranas. Google pours billions into new u.s. data centers: Here’s where,
-
[2]
Accessed: 2025-11-16
URL https://www.crn.com/ news/cloud/2024/google-pours-billions-into-new-u-s-data-centers-here-s-where . Accessed: 2025-11-16. Dan Swinhoe. Oracle’s larry ellison: We’re building out 100 data centers globally,
2024
-
[3]
com/2024/
URLhttps://www.cnbc. com/2024/. Accessed: 2025-11-16. Aurora Energy Research. Data center load growth in pjm,
2024
-
[4]
Accessed: 2025-11-17
URL https://auroraer.com/resources/ aurora-insights/market-reports/data-center-load-growth-in-pjm. Accessed: 2025-11-17. Liuzixuan Lin, Rajini Wijayawardana, Varsha Rao, Hai Nguyen, Emmanuel Wedan GNIBGA, and Andrew A. Chien. Exploding ai power use: an opportunity to rethink grid planning and management. InProceedings of the 15th ACM International Confere...
2025
-
[5]
Association for Computing Machinery. ISBN 9798400704802. doi:10.1145/3632775.3661959. URL https://doi.org/10.1145/3632775.3661959. Schneider Electric and NVIDIA. Ai reference designs to enable adoption: A collaboration between schneider electric and nvidia. White paper, Schneider Electric,
-
[7]
Preprint at https://arxiv.org/abs/2409.11416 (2024)
URLhttps://arxiv.org/abs/2409.11416. Haixu Wu, Tengge Hu, Yong Liu, Hang Zhou, Jianmin Wang, and Mingsheng Long. Timesnet: Temporal 2d-variation modeling for general time series analysis,
-
[8]
URLhttps://arxiv.org/abs/2210.02186. Siddharth Samsi, Matthew L Weiss, David Bestor, Baolin Li, Michael Jones, Albert Reuther, Daniel Edelman, William Arcand, Chansup Byun, John Holodnack, Matthew Hubbell, Jeremy Kepner, Anna Klein, Joseph McDonald, Adam Michaleas, Peter Michaleas, Lauren Milechin, Julia Mullen, Charles Yee, Benjamin Price, Andrew Prout, ...
-
[9]
URLhttps://arxiv.org/abs/2108.02037. Maziar Raissi. Deep hidden physics models: deep learning of nonlinear partial differential equations.J. Mach. Learn. Res., 19(1):932–955, January
-
[10]
doi:https://doi.org/10.1016/j.jcp.2022.111722
ISSN 0021-9991. doi:https://doi.org/10.1016/j.jcp.2022.111722. URL https://www.sciencedirect.com/science/article/pii/S0021999122007859. George Amvrosiadis, Jae W. Park, Gregory R. Ganger, Garth A. Gibson, Ethan Baseman, and Nathan DeBardeleben. On the diversity of cluster workloads and its impact on research results. In2018 USENIX Annual Technical Confere...
-
[11]
Association for Computing Machinery. ISBN 9781450383172. doi:10.1145/3445814.3446760. URL https: //doi.org/10.1145/3445814.3446760. Grant Wilkins, Srinivasan Keshav, and Richard Mortier. Hybrid heterogeneous clusters can lower the energy consump- tion of llm inference workloads. InProceedings of the 15th ACM International Conference on Future and Sustaina...
-
[12]
Association for Computing Machinery. ISBN 9798400704802. doi:10.1145/3632775.3662830. URLhttps://doi.org/10.1145/3632775.3662830. Sheng Wang, Shiping Chen, and Yumei Shi. Utilization-prediction-aware energy optimization approach for heteroge- neous gpu clusters.The Journal of Supercomputing, 80:9554–9578, May
-
[13]
Maurizio Rossi and Davide Brunelli
doi:https://doi.org/10.1007/s11227- 023-05807-x. Maurizio Rossi and Davide Brunelli. Forecasting data centers power consumption with the holt-winters method. In 2015 IEEE Workshop on Environmental, Energy, and Structural Monitoring Systems (EESMS) Proceedings, pages 210–214,
-
[14]
doi:10.1109/EESMS.2015.7175879. 13 A Physics-Aware Framework for Short-Term Power Forecasting of AI Data CentersAPREPRINT David Meisner, Brian T. Gold, and Thomas F. Wenisch. Powernap: eliminating server idle power. InProceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, ASP- LOS XIV , p...
-
[15]
Association for Computing Machinery. ISBN 9781605584065. doi:10.1145/1508244.1508269. URLhttps://doi.org/10.1145/1508244.1508269. Hayk Shoukourian and Dieter Kranzlmüller. Forecasting power-efficiency related key performance indicators for modern data centers using lstms.Future Generation Computer Systems, 112:362–382,
-
[16]
doi:https://doi.org/10.1016/j.future.2020.05.014
ISSN 0167-739X. doi:https://doi.org/10.1016/j.future.2020.05.014. URL https://www.sciencedirect.com/science/article/ pii/S0167739X20303964. Lu Bai, Weixing Ji, Qinyuan Li, Xilai Yao, Wei Xin, and Wanyi Zhu. Dnnabacus: Toward accurate computational cost prediction for deep neural networks,
-
[17]
URLhttps://arxiv.org/abs/2205.12095. Pratyush Patel, Esha Choukse, Chaojie Zhang, Íñigo Goiri, Brijesh Warrier, Nithish Mahalingam, and Ricardo Bianchini. Characterizing power management opportunities for llms in the cloud. InProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volu...
-
[18]
Association for Computing Machinery. ISBN 9798400703867. doi:10.1145/3620666.3651329. URLhttps://doi.org/10.1145/3620666.3651329. Qinghao Hu, Peng Sun, Shengen Yan, Yonggang Wen, and Tianwei Zhang. Characterization and prediction of deep learning workloads in large-scale gpu datacenters. InProceedings of the International Conference for High Performance C...
-
[19]
Association for Computing Machinery. ISBN 9781450384421. doi:10.1145/3458817.3476223. URL https://doi.org/10.1145/ 3458817.3476223. Mariam Mughees, Yuzhuo Li, Yize Chen, and Yunwei Ryan Li. Short-term load forecasting for ai-data center,
-
[20]
Short-term load forecasting for ai-data center,
URLhttps://arxiv.org/abs/2503.07756. Tiechui Yao, Jue Wang, Yangang Wang, Pei Zhang, Haizhou Cao, Xuebin Chi, and Min Shi. Very short-term forecasting of distributed pv power using gstann.CSEE Journal of Power and Energy Systems, 10(4):1491–1501,
-
[21]
Muhammad Aslam, Seung-Jae Lee, Sang-Hee Khang, and Sugwon Hong
doi:10.17775/CSEEJPES.2022.00110. Muhammad Aslam, Seung-Jae Lee, Sang-Hee Khang, and Sugwon Hong. Two-stage attention over lstm with bayesian optimization for day-ahead solar power forecasting.IEEE Access, 9:107387–107398,
-
[22]
Mohammad Safayet Hossain and Hisham Mahmood
doi:10.1109/ACCESS.2021.3100105. Mohammad Safayet Hossain and Hisham Mahmood. Short-term photovoltaic power forecasting using an lstm neural network and synthetic weather forecast.IEEE Access, 8:172524–172533,
-
[23]
Jianjing Li, Chenghui Zhang, and Bo Sun
doi:10.1109/ACCESS.2020.3024901. Jianjing Li, Chenghui Zhang, and Bo Sun. Two-stage hybrid deep learning with strong adaptability for detailed day-ahead photovoltaic power forecasting.IEEE Transactions on Sustainable Energy, 14(1):193–205,
-
[24]
Qingsong Wen, Tian Zhou, Chaoli Zhang, Weiqi Chen, Ziqing Ma, Junchi Yan, and Liang Sun
doi:10.1109/TSTE.2022.3206240. Qingsong Wen, Tian Zhou, Chaoli Zhang, Weiqi Chen, Ziqing Ma, Junchi Yan, and Liang Sun. Transformers in time series: a survey. InProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI ’23,
-
[25]
Transformers in time series: a survey,
ISBN 978-1-956792-03-4. doi:10.24963/ijcai.2023/759. URL https://doi.org/10.24963/ijcai. 2023/759. Shiyang Li, Xiaoyong Jin, Yao Xuan, Xiyou Zhou, Wenhu Chen, Yu-Xiang Wang, and Xifeng Yan.Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. Curran Associates Inc., Red Hook, NY , USA,
-
[26]
URL https: //arxiv.org/abs/2204.13767. 14 A Physics-Aware Framework for Short-Term Power Forecasting of AI Data CentersAPREPRINT Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. InInternational conference on machine learning, pages 27268–27286. PMLR,
-
[27]
ISBN 978-1-57735-880-0. doi:10.1609/aaai.v37i9.26317. URL https://doi.org/10.1609/aaai.v37i9.26317. Ulisses Braga-Neto. Physics-informed machine learning. InFundamentals of Pattern Recognition and Machine Learning, pages 293–324. Springer,
-
[28]
Hai-Peng Deng, Yan-Bo He, Bing-Chuan Wang, and Han-Xiong Li
doi:10.1109/ACCESS.2025.3591040. Hai-Peng Deng, Yan-Bo He, Bing-Chuan Wang, and Han-Xiong Li. Physics-dominated neural network for spa- tiotemporal modeling of battery thermal process.IEEE Transactions on Industrial Informatics, 20(1):452–460,
-
[29]
, Dku(x, t;w);λ −f(x, t) 2 dx
Lu(w) = Z Ω |ˆu(x, t)−u(x, t)|2 dx, Lr(w, λ) = Z Ω F u(x, t;w),x, t, Du(x, t;w), D2u(x, t;w), . . . , Dku(x, t;w);λ −f(x, t) 2 dx. (16) Lu is the mean-squared error (MSEu) that the NN incorporates to forecast the initial and boundary conditions, as well as utilizing training data for calibration represented via xi u, ti u, ui Nu i=1, with ˆudefined as the...
2023
-
[30]
, Dmu(xi r, ti r;w);λ −f(x i r, ti r) 2
Lu(w) .= 1 Nu NuX i=1 ˆu(xi u, ti u)−u(x i u, ti u) , Lr(w) .= 1 Nr NrX i=1 F u(xi r, ti r;w),x i r, ti r, Du(xi r, ti r;w), D2u(xi r, ti r;w), . . . , Dmu(xi r, ti r;w);λ −f(x i r, ti r) 2 . (17) 15 A Physics-Aware Framework for Short-Term Power Forecasting of AI Data CentersAPREPRINT B Methodology B.1 Problem Statement We address the problem of short-te...
-
[31]
8 presents the heatmaps of the trend and seasional components of the proposed PI-DLinear model
D Weight Visualization of PI-Dlinear Fig. 8 presents the heatmaps of the trend and seasional components of the proposed PI-DLinear model. The heatmaps can be interpreted as time-varying linear attribution maps, where for each forecast step t∈ {1, ..., T} (y-axis), PI- DLinear assigns a weight to each look-back/sequence length index k∈ {0, ..., L−1} (x-axi...
2050
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.