pith. machine review for the scientific record. sign in

arxiv: 2605.04074 · v1 · submitted 2026-04-14 · 💻 cs.LG · cs.AI· cs.CE· cs.DC· cs.ET· cs.OS

Recognition: unknown

A Physics-Aware Framework for Short-Term GPU Power Forecasting of AI Data Centers

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:22 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CEcs.DCcs.ETcs.OS
keywords physics-informed forecastingtime-series predictionGPU power forecastingAI data centersthermal RC networkNewton's law of coolingDLinear modelshort-term forecasting
0
0 comments X

The pith

A physics-informed DLinear model forecasts AI data center GPU power 5-80 minutes ahead while respecting thermal dynamics and outperforming prior methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces PI-DLinear, the first DLinear-based time-series model that embeds physics from a multi-node lumped thermal resistance-capacitance network consistent with Newton's law of cooling. It derives time-dependent ODEs that explicitly link GPU power consumption to compute and memory utilization and temperature. On real AI data center traces, the model produces short-term forecasts that are more accurate than transformer and non-transformer baselines while remaining consistent with physical behavior during throttling and load changes. A reader would care because large, rapid swings in data-center power can destabilize the electricity grid, and forecasts that both reduce error and obey known thermal laws offer a practical way to anticipate those swings.

Core claim

We derive time-dependent ordinary differential equations from a multi-node lumped thermal resistance-capacitance network based on Newton's law of cooling; these ODEs interlink GPU power consumption, compute and memory utilization, and temperature. We incorporate the resulting physics constraints into a DLinear architecture to obtain PI-DLinear. When trained and evaluated on real AI data center measurements, PI-DLinear yields short-term forecasts (5-80 minutes) whose accuracy exceeds that of tested state-of-the-art models and whose profiles remain physically consistent under power throttling and load transients.

What carries the argument

PI-DLinear, formed by embedding newly derived time-dependent ODEs from a multi-node lumped thermal RC network into the DLinear time-series model so that power, utilization, and temperature predictions must satisfy the thermal dynamics.

If this is right

  • Forecasts remain physically consistent during power throttling and load transient events.
  • Averaged across look-back and prediction windows, accuracy improves by 0.782%-39.08% in MSE, 0.993%-51.82% in MAE, and 0.370%-22.28% in RMSE relative to tested SOTA models.
  • The approach supports reliable short-term power-demand predictions over horizons of 5 to 80 minutes on real AI data center data.
  • Better anticipation of power fluctuations helps mitigate risks to electricity-grid stability.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same ODE-based physics injection could be tested inside other linear or attention-based forecasters to see whether accuracy and consistency gains transfer.
  • Grid operators could feed these forecasts into real-time balancing algorithms to reduce the impact of AI-facility peaks.
  • Applying the thermal RC network to different hardware generations or cooling setups would test whether the derived equations remain valid beyond the original dataset.
  • Extending the model to include additional variables such as cooling-fan speed or ambient conditions might capture finer transient behavior.

Load-bearing premise

The multi-node lumped thermal resistance-capacitance network based on Newton's law of cooling correctly captures how GPU power, compute and memory utilization, and temperature evolve together through the derived ODEs.

What would settle it

If measured GPU power, utilization, and temperature traces during load transients show that the model's forecasts violate the temperature-power relationships required by the derived ODEs, the claim that the physics is respected would be falsified.

Figures

Figures reproduced from arXiv: 2605.04074 by Ali Ghrayeb, Haitham Abu-Rub, Mohammad AlShaikh Saleh, Sanjay Chawla, Sertac Bayhan.

Figure 1
Figure 1. Figure 1: Comparison of DLinear and PI-DLinear across power throttling, transient recovery, and post-event stability regimes. PI-DLinear consistently achieves lower prediction error, enabling accurate throttling characterization, faster recovery from sudden AI load changes, and stable forecasting behavior. ABSTRACT AI data centers experience rapid fluctuations in power demand due to the heterogeneity of computa￾tion… view at source ↗
Figure 2
Figure 2. Figure 2: PI-DLinear Architecture. The base DLinear model (top) decomposes the input look-back window X ∈ R L×C into seasonal/remainder (Xs) and trend (Xt) components, which are independently projected to the forecast horizon via linear layers Hs and Ht, then summed to produce the full multivariate forecast Yb ∈ R H×C , from which the power channel yb ∈ R H is extracted. Our physics-informed extension introduces thr… view at source ↗
Figure 3
Figure 3. Figure 3: Relationships observed in the MIT Supercloud dataset: power vs. utilization (left), power vs. temperature [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Computational job distribution across the AI workloads present in the MIT Supercloud dataset, namely, vision [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: PI-DLinear forecasting results shown as heatmaps for input sequence length [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Bubble chart showing model efficiency comparison under input-240-predict-80 for the top 4 models. PI [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Equivalent RC thermal circuit for coupled GPU-Memory system. Current sources represent heat input from electrical power dissipation (P split by factor α). Capacitors represent thermal mass (ability to store thermal energy). Resistors represent thermal resistance to heat flow. Applying Kirchhoff’s Current Law at each node yields the governing ODEs. B.5 Derivation of Governing ODEs via Kirchhoff’s Current La… view at source ↗
Figure 8
Figure 8. Figure 8: Learned linear projection weights of PI-DLinear over the look-back window for each forecast step. The [PITH_FULL_IMAGE:figures/full_fig_p020_8.png] view at source ↗
read the original abstract

AI data centers experience rapid fluctuations in power demand due to the heterogeneity of computational tasks that they have to support. For example, the power profile of inference and training of large language models (LLMs) is quite distinct and big divergences can result in the instability of the underlying electricity grid. In this paper we propose, to the best of our knowledge, the first physics-informed DLinear time-series model that can accurately forecast power utilization of an AI data center 5-80 minutes (short-term forecasting) into the future. The physics, based on a multi-node lumped thermal resistance-capacitance (RC) network consistent with Newton's law of cooling, is captured using newly derived time-dependent ordinary differential equations (ODE) that separately models and interlinks power consumption with the GPU compute and memory utilization and temperature. The resulting model, that we refer to as PI-DLinear, trained and evaluated on a real AI data center dataset and is not only more accurate than the state-of-the-art (SOTA) models tested, but the forecast profile respects the underlying physics under power throttling and load transient events. Relative to the SOTA transformer-based and non-transformer-based models, improvements in forecasting accuracy (averaged across all look-back and prediction windows) range from 0.782%-39.08% for MSE, 0.993%-51.82% for MAE, and 0.370%-22.28% for RMSE.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 1 minor

Summary. The paper claims to introduce the first physics-informed DLinear (PI-DLinear) model for short-term (5-80 min) forecasting of GPU power utilization in AI data centers. It derives time-dependent ODEs from a multi-node lumped RC thermal network based on Newton's law of cooling to interlink power consumption, GPU compute/memory utilization, and temperature. The model is trained and tested on real AI data center data, showing improved accuracy over SOTA transformer and non-transformer models with average gains of 0.782%-39.08% in MSE, 0.993%-51.82% in MAE, and 0.370%-22.28% in RMSE across look-back and prediction windows, while ensuring forecasts respect physical constraints during power throttling and load transient events.

Significance. If the physics-informed component successfully enforces consistency without sacrificing accuracy, this framework could have significant impact on power management and grid stability for AI data centers handling variable workloads like LLM training and inference. The combination of a simple, stable DLinear backbone with physics constraints via ODEs offers an efficient alternative to complex transformers, and the use of real-world data strengthens the practical relevance. Credit is due for the explicit derivation of the ODEs and the focus on physically plausible outputs.

minor comments (1)
  1. [Abstract] Grammatical issue in the sentence describing the model: 'The resulting model, that we refer to as PI-DLinear, trained and evaluated on a real AI data center dataset and is not only more accurate...' should be rephrased for clarity, e.g., 'The resulting model, referred to as PI-DLinear, is trained and evaluated on a real AI data center dataset and is not only more accurate...'.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of our work on the PI-DLinear model and for recommending minor revision. The recognition of the novelty in deriving time-dependent ODEs from the lumped RC thermal network, the efficiency of the DLinear backbone, and the practical value of real AI data center data is appreciated. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity; derivation grounded in independent physics and data-driven training

full rationale

The paper derives time-dependent ODEs from Newton's law of cooling and a standard multi-node lumped RC network, which are external physical principles not defined in terms of the target forecasts. These ODEs interlink power, utilization, and temperature as an approximation whose parameters are fitted or taken from hardware specs. The PI-DLinear model embeds this physics-informed structure into the DLinear backbone and trains it on real data-center traces; reported accuracy gains are measured against external SOTA baselines using MSE/MAE/RMSE on held-out windows. No equation reduces the forecast output to a re-expression of the input data or a self-citation chain. The central claim therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on the validity of the derived ODEs from a multi-node lumped RC thermal model. Because only the abstract is available, the exact number and values of any fitted parameters inside the DLinear component or RC network remain unknown.

free parameters (1)
  • RC network parameters
    Thermal resistance and capacitance values in the multi-node model are likely either derived or fitted from data; abstract does not specify.
axioms (2)
  • domain assumption Newton's law of cooling governs heat transfer in the GPU system.
    Basis for constructing the lumped RC network and deriving the time-dependent ODEs.
  • domain assumption A multi-node lumped-parameter approximation sufficiently captures GPU thermal dynamics for short-term forecasting.
    Invoked to interlink power, utilization, and temperature in the ODEs.

pith-pipeline@v0.9.0 · 5591 in / 1542 out tokens · 48727 ms · 2026-05-10T15:22:10.372606+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

30 extracted references · 25 canonical work pages

  1. [1]

    ISBN 9798400700323

    Association for Computing Machinery. ISBN 9798400700323. doi:10.1145/3575813.3595197. URL https://doi.org/10.1145/3575813.3595197. Mark Haranas. Google pours billions into new u.s. data centers: Here’s where,

  2. [2]

    Accessed: 2025-11-16

    URL https://www.crn.com/ news/cloud/2024/google-pours-billions-into-new-u-s-data-centers-here-s-where . Accessed: 2025-11-16. Dan Swinhoe. Oracle’s larry ellison: We’re building out 100 data centers globally,

  3. [3]

    com/2024/

    URLhttps://www.cnbc. com/2024/. Accessed: 2025-11-16. Aurora Energy Research. Data center load growth in pjm,

  4. [4]

    Accessed: 2025-11-17

    URL https://auroraer.com/resources/ aurora-insights/market-reports/data-center-load-growth-in-pjm. Accessed: 2025-11-17. Liuzixuan Lin, Rajini Wijayawardana, Varsha Rao, Hai Nguyen, Emmanuel Wedan GNIBGA, and Andrew A. Chien. Exploding ai power use: an opportunity to rethink grid planning and management. InProceedings of the 15th ACM International Confere...

  5. [5]

    ISBN 9798400704802

    Association for Computing Machinery. ISBN 9798400704802. doi:10.1145/3632775.3661959. URL https://doi.org/10.1145/3632775.3661959. Schneider Electric and NVIDIA. Ai reference designs to enable adoption: A collaboration between schneider electric and nvidia. White paper, Schneider Electric,

  6. [7]

    Preprint at https://arxiv.org/abs/2409.11416 (2024)

    URLhttps://arxiv.org/abs/2409.11416. Haixu Wu, Tengge Hu, Yong Liu, Hang Zhou, Jianmin Wang, and Mingsheng Long. Timesnet: Temporal 2d-variation modeling for general time series analysis,

  7. [8]

    URLhttps://arxiv.org/abs/2210.02186. Siddharth Samsi, Matthew L Weiss, David Bestor, Baolin Li, Michael Jones, Albert Reuther, Daniel Edelman, William Arcand, Chansup Byun, John Holodnack, Matthew Hubbell, Jeremy Kepner, Anna Klein, Joseph McDonald, Adam Michaleas, Peter Michaleas, Lauren Milechin, Julia Mullen, Charles Yee, Benjamin Price, Andrew Prout, ...

  8. [9]

    Maziar Raissi

    URLhttps://arxiv.org/abs/2108.02037. Maziar Raissi. Deep hidden physics models: deep learning of nonlinear partial differential equations.J. Mach. Learn. Res., 19(1):932–955, January

  9. [10]

    doi:https://doi.org/10.1016/j.jcp.2022.111722

    ISSN 0021-9991. doi:https://doi.org/10.1016/j.jcp.2022.111722. URL https://www.sciencedirect.com/science/article/pii/S0021999122007859. George Amvrosiadis, Jae W. Park, Gregory R. Ganger, Garth A. Gibson, Ethan Baseman, and Nathan DeBardeleben. On the diversity of cluster workloads and its impact on research results. In2018 USENIX Annual Technical Confere...

  10. [11]

    ISBN 9781450383172

    Association for Computing Machinery. ISBN 9781450383172. doi:10.1145/3445814.3446760. URL https: //doi.org/10.1145/3445814.3446760. Grant Wilkins, Srinivasan Keshav, and Richard Mortier. Hybrid heterogeneous clusters can lower the energy consump- tion of llm inference workloads. InProceedings of the 15th ACM International Conference on Future and Sustaina...

  11. [12]

    ISBN 9798400704802

    Association for Computing Machinery. ISBN 9798400704802. doi:10.1145/3632775.3662830. URLhttps://doi.org/10.1145/3632775.3662830. Sheng Wang, Shiping Chen, and Yumei Shi. Utilization-prediction-aware energy optimization approach for heteroge- neous gpu clusters.The Journal of Supercomputing, 80:9554–9578, May

  12. [13]

    Maurizio Rossi and Davide Brunelli

    doi:https://doi.org/10.1007/s11227- 023-05807-x. Maurizio Rossi and Davide Brunelli. Forecasting data centers power consumption with the holt-winters method. In 2015 IEEE Workshop on Environmental, Energy, and Structural Monitoring Systems (EESMS) Proceedings, pages 210–214,

  13. [14]

    13 A Physics-Aware Framework for Short-Term Power Forecasting of AI Data CentersAPREPRINT David Meisner, Brian T

    doi:10.1109/EESMS.2015.7175879. 13 A Physics-Aware Framework for Short-Term Power Forecasting of AI Data CentersAPREPRINT David Meisner, Brian T. Gold, and Thomas F. Wenisch. Powernap: eliminating server idle power. InProceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, ASP- LOS XIV , p...

  14. [15]

    ISBN 9781605584065

    Association for Computing Machinery. ISBN 9781605584065. doi:10.1145/1508244.1508269. URLhttps://doi.org/10.1145/1508244.1508269. Hayk Shoukourian and Dieter Kranzlmüller. Forecasting power-efficiency related key performance indicators for modern data centers using lstms.Future Generation Computer Systems, 112:362–382,

  15. [16]

    doi:https://doi.org/10.1016/j.future.2020.05.014

    ISSN 0167-739X. doi:https://doi.org/10.1016/j.future.2020.05.014. URL https://www.sciencedirect.com/science/article/ pii/S0167739X20303964. Lu Bai, Weixing Ji, Qinyuan Li, Xilai Yao, Wei Xin, and Wanyi Zhu. Dnnabacus: Toward accurate computational cost prediction for deep neural networks,

  16. [17]

    Pratyush Patel, Esha Choukse, Chaojie Zhang, Íñigo Goiri, Brijesh Warrier, Nithish Mahalingam, and Ricardo Bianchini

    URLhttps://arxiv.org/abs/2205.12095. Pratyush Patel, Esha Choukse, Chaojie Zhang, Íñigo Goiri, Brijesh Warrier, Nithish Mahalingam, and Ricardo Bianchini. Characterizing power management opportunities for llms in the cloud. InProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volu...

  17. [18]

    ISBN 9798400703867

    Association for Computing Machinery. ISBN 9798400703867. doi:10.1145/3620666.3651329. URLhttps://doi.org/10.1145/3620666.3651329. Qinghao Hu, Peng Sun, Shengen Yan, Yonggang Wen, and Tianwei Zhang. Characterization and prediction of deep learning workloads in large-scale gpu datacenters. InProceedings of the International Conference for High Performance C...

  18. [19]

    ISBN 9781450384421

    Association for Computing Machinery. ISBN 9781450384421. doi:10.1145/3458817.3476223. URL https://doi.org/10.1145/ 3458817.3476223. Mariam Mughees, Yuzhuo Li, Yize Chen, and Yunwei Ryan Li. Short-term load forecasting for ai-data center,

  19. [20]

    Short-term load forecasting for ai-data center,

    URLhttps://arxiv.org/abs/2503.07756. Tiechui Yao, Jue Wang, Yangang Wang, Pei Zhang, Haizhou Cao, Xuebin Chi, and Min Shi. Very short-term forecasting of distributed pv power using gstann.CSEE Journal of Power and Energy Systems, 10(4):1491–1501,

  20. [21]

    Muhammad Aslam, Seung-Jae Lee, Sang-Hee Khang, and Sugwon Hong

    doi:10.17775/CSEEJPES.2022.00110. Muhammad Aslam, Seung-Jae Lee, Sang-Hee Khang, and Sugwon Hong. Two-stage attention over lstm with bayesian optimization for day-ahead solar power forecasting.IEEE Access, 9:107387–107398,

  21. [22]

    Mohammad Safayet Hossain and Hisham Mahmood

    doi:10.1109/ACCESS.2021.3100105. Mohammad Safayet Hossain and Hisham Mahmood. Short-term photovoltaic power forecasting using an lstm neural network and synthetic weather forecast.IEEE Access, 8:172524–172533,

  22. [23]

    Jianjing Li, Chenghui Zhang, and Bo Sun

    doi:10.1109/ACCESS.2020.3024901. Jianjing Li, Chenghui Zhang, and Bo Sun. Two-stage hybrid deep learning with strong adaptability for detailed day-ahead photovoltaic power forecasting.IEEE Transactions on Sustainable Energy, 14(1):193–205,

  23. [24]

    Qingsong Wen, Tian Zhou, Chaoli Zhang, Weiqi Chen, Ziqing Ma, Junchi Yan, and Liang Sun

    doi:10.1109/TSTE.2022.3206240. Qingsong Wen, Tian Zhou, Chaoli Zhang, Weiqi Chen, Ziqing Ma, Junchi Yan, and Liang Sun. Transformers in time series: a survey. InProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI ’23,

  24. [25]

    Transformers in time series: a survey,

    ISBN 978-1-956792-03-4. doi:10.24963/ijcai.2023/759. URL https://doi.org/10.24963/ijcai. 2023/759. Shiyang Li, Xiaoyong Jin, Yao Xuan, Xiyou Zhou, Wenhu Chen, Yu-Xiang Wang, and Xifeng Yan.Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. Curran Associates Inc., Red Hook, NY , USA,

  25. [26]

    14 A Physics-Aware Framework for Short-Term Power Forecasting of AI Data CentersAPREPRINT Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin

    URL https: //arxiv.org/abs/2204.13767. 14 A Physics-Aware Framework for Short-Term Power Forecasting of AI Data CentersAPREPRINT Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. InInternational conference on machine learning, pages 27268–27286. PMLR,

  26. [27]

    Are Transformers Effective for Time Series Forecasting?Proceedings of the AAAI Conference on Artificial Intelligence, 37(9): 11121–11128, 2023

    ISBN 978-1-57735-880-0. doi:10.1609/aaai.v37i9.26317. URL https://doi.org/10.1609/aaai.v37i9.26317. Ulisses Braga-Neto. Physics-informed machine learning. InFundamentals of Pattern Recognition and Machine Learning, pages 293–324. Springer,

  27. [28]

    Hai-Peng Deng, Yan-Bo He, Bing-Chuan Wang, and Han-Xiong Li

    doi:10.1109/ACCESS.2025.3591040. Hai-Peng Deng, Yan-Bo He, Bing-Chuan Wang, and Han-Xiong Li. Physics-dominated neural network for spa- tiotemporal modeling of battery thermal process.IEEE Transactions on Industrial Informatics, 20(1):452–460,

  28. [29]

    , Dku(x, t;w);λ −f(x, t) 2 dx

    Lu(w) = Z Ω |ˆu(x, t)−u(x, t)|2 dx, Lr(w, λ) = Z Ω F u(x, t;w),x, t, Du(x, t;w), D2u(x, t;w), . . . , Dku(x, t;w);λ −f(x, t) 2 dx. (16) Lu is the mean-squared error (MSEu) that the NN incorporates to forecast the initial and boundary conditions, as well as utilizing training data for calibration represented via xi u, ti u, ui Nu i=1, with ˆudefined as the...

  29. [30]

    , Dmu(xi r, ti r;w);λ −f(x i r, ti r) 2

    Lu(w) .= 1 Nu NuX i=1 ˆu(xi u, ti u)−u(x i u, ti u) , Lr(w) .= 1 Nr NrX i=1 F u(xi r, ti r;w),x i r, ti r, Du(xi r, ti r;w), D2u(xi r, ti r;w), . . . , Dmu(xi r, ti r;w);λ −f(x i r, ti r) 2 . (17) 15 A Physics-Aware Framework for Short-Term Power Forecasting of AI Data CentersAPREPRINT B Methodology B.1 Problem Statement We address the problem of short-te...

  30. [31]

    8 presents the heatmaps of the trend and seasional components of the proposed PI-DLinear model

    D Weight Visualization of PI-Dlinear Fig. 8 presents the heatmaps of the trend and seasional components of the proposed PI-DLinear model. The heatmaps can be interpreted as time-varying linear attribution maps, where for each forecast step t∈ {1, ..., T} (y-axis), PI- DLinear assigns a weight to each look-back/sequence length index k∈ {0, ..., L−1} (x-axi...