Recognition: 2 theorem links
· Lean TheoremThermal-GEMs: Generalized Models for Building Thermal Dynamics
Pith reviewed 2026-05-10 19:11 UTC · model grok-4.3
The pith
Multi-source transfer learning models reduce building thermal forecasting errors by up to 63% versus single-source methods.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors demonstrate that multi-source TL models pretrained on multiple source buildings deliver up to 63% lower forecasting errors than single-source TL when applied to real buildings. They also identify a data-volume trade-off: multi-source TL models require thermal data from 16-32 source buildings spanning one year to consistently outperform TSFMs pretrained on diverse time series, when performance is measured by mean absolute error. These outcomes supply guidance for choosing between building-focused pretraining and general foundation models according to the number of available source buildings.
What carries the argument
Four state-of-the-art multi-source transfer learning architectures pretrained exclusively on building thermal time series, evaluated against time series foundation models through ablations on synthetic and real-world datasets.
If this is right
- Multi-source TL models can be deployed directly for accurate real-world building thermal forecasting.
- When data from 16-32 source buildings over one year is available, multi-source TL should be preferred over TSFMs for lower mean absolute error.
- Single-source TL is consistently outperformed, confirming the value of pretraining across multiple buildings.
- Modeling strategy selection for new buildings can be guided by counting how many source buildings' data sets are accessible for pretraining.
Where Pith is reading between the lines
- Organizations could pool thermal data across many buildings to create reusable models that lower the data-collection burden for each new site.
- Hybrid pretraining that mixes building-specific data with broader time series might close remaining performance gaps without needing the full 16-32 building threshold.
- The identified data threshold may indicate scaling behavior that applies to other specialized time-series domains facing competition from general foundation models.
- Future evaluations on larger or more diverse building stocks could refine the exact number of source buildings required.
Load-bearing premise
The four chosen multi-source TL architectures and the particular synthetic and real-world datasets tested are representative enough that the observed error reductions and 16-32 building threshold apply to other buildings and operating conditions.
What would settle it
Applying the same models to a fresh collection of buildings with different construction types or climates and finding error reductions well below 63% or no consistent advantage for multi-source TL until far more than 32 buildings are included would falsify the central claims.
Figures
read the original abstract
Data-driven models for building thermal dynamics are a scalable approach for enabling energy-efficient operation through fault detection & diagnosis or advanced control. To obtain accurate models, measurement data from a target building spanning months to years are required. Transfer Learning (TL) mitigates this challenge by employing pretrained models based on single or multiple source buildings. General multi-source TL models promise to outperform single-source TL, but alternative multi-source modeling architectures remain to be explored, and evaluation on real-world data is missing. Moreover, time series foundation models (TSFM) have emerged as candidates for the best-performing general models. Hence, we conduct a first, comprehensive assessment of general modeling approaches for building thermal dynamics, including multi-source TL and TSFMs. Our assessment includes ablations using four state-of-the-art multi-source TL architectures and evaluations on synthetic as well as real-world data. We demonstrate that multi-source TL models are highly effective in accurately modeling buildings in real-world applications, yielding up to 63% lower forecasting errors compared to single-source TL. Moreover, our results suggest a trade-off between multi-source TL models exclusively pretrained with building data and TSFMs pretrained with a multitude of different time series, revealing that data from 16-32 source buildings must be available over 1 year for pretraining multi-source TL models to consistently outperform TSFMs as evaluated using the mean absolute error. These findings provide practical guidance for selecting modeling strategies based on the number of source buildings available for pretraining multi-source TL models.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a comprehensive empirical assessment of generalized models for building thermal dynamics, comparing multi-source transfer learning (TL) approaches against time series foundation models (TSFMs). It evaluates four state-of-the-art multi-source TL architectures through ablations on synthetic and real-world datasets, claiming up to 63% lower forecasting errors relative to single-source TL and identifying a practical threshold of 16-32 source buildings over 1 year for multi-source TL to consistently outperform TSFMs when measured by mean absolute error.
Significance. If the empirical findings hold under broader conditions, the work provides actionable guidance for practitioners selecting modeling strategies in building energy systems when target data is scarce. A clear strength is the inclusion of ablations across multiple architectures and dual evaluation on synthetic plus real-world data, yielding falsifiable performance numbers from held-out tests rather than self-referential fits. This directly addresses data requirements for accurate thermal dynamics modeling in fault detection and control applications.
major comments (1)
- [§5] §5 (multi-source TL vs. TSFM comparison): The central trade-off claim that data from 16-32 source buildings over 1 year is required for multi-source TL to consistently outperform TSFMs rests on evaluations with four specific architectures and one synthetic generator plus a real-world corpus. The manuscript should add sensitivity analyses to source-building sampling strategies and synthetic data fidelity (e.g., effects of occupancy stochasticity or sensor noise) because shifts in these factors could move the MAE crossover point by more than a factor of two, undermining the reported threshold as a general guideline.
minor comments (3)
- [Abstract] Abstract: The stated quantitative results (63% error reduction, 16-32 building threshold) are given without any reference to model architectures, training details, error bars, or data characteristics, making it difficult for readers to assess support for the claims at first reading.
- [Methods] Methods section: Clarify how the four chosen multi-source TL architectures were selected and whether they exhaustively represent the space of general multi-source models, as the weakest assumption in the evaluation is their representativeness.
- [Results] Results tables/figures: Include statistical significance tests or confidence intervals alongside the MAE values to support statements of 'consistent' outperformance across the 16-32 building range.
Simulated Author's Rebuttal
We thank the referee for their thorough review and valuable suggestions. We have carefully considered the major comment and provide our response below. We believe the manuscript can be strengthened by addressing the points raised.
read point-by-point responses
-
Referee: [§5] §5 (multi-source TL vs. TSFM comparison): The central trade-off claim that data from 16-32 source buildings over 1 year is required for multi-source TL to consistently outperform TSFMs rests on evaluations with four specific architectures and one synthetic generator plus a real-world corpus. The manuscript should add sensitivity analyses to source-building sampling strategies and synthetic data fidelity (e.g., effects of occupancy stochasticity or sensor noise) because shifts in these factors could move the MAE crossover point by more than a factor of two, undermining the reported threshold as a general guideline.
Authors: We appreciate the referee's concern regarding the robustness of our reported threshold. Our study evaluates the performance across four state-of-the-art multi-source TL architectures using both a synthetic data generator and a real-world dataset from multiple buildings. The real-world data naturally incorporates stochastic occupancy patterns and sensor noise, providing a realistic testbed. The synthetic data is used to control variables and isolate effects. We do not present the 16-32 building threshold as an absolute general guideline but as a practical observation from our experiments, as stated in the abstract ('our results suggest'). To address the comment, we will revise the manuscript to include an expanded discussion on the potential sensitivities to sampling strategies and data fidelity, including qualitative analysis of how variations might affect the crossover point. However, conducting exhaustive quantitative sensitivity analyses would require generating new datasets and retraining models, which is computationally intensive and beyond the current scope; we note this as a limitation and direction for future work. This partial revision maintains the integrity of our empirical findings while acknowledging the referee's valid point. revision: partial
Circularity Check
No circularity: purely empirical held-out evaluation on independent data
full rationale
The paper conducts ablations and evaluations of multi-source TL architectures and TSFMs on synthetic and real-world building data, reporting forecasting errors (MAE) on held-out test sets. No derivation, equation, or claim reduces by construction to fitted parameters, self-citations, or renamed inputs; the 16-32 building threshold and 63% error reduction are direct outputs of the described experiments rather than tautological restatements of the training procedure.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We demonstrate that multi-source TL models are highly effective... data from 16-32 source buildings must be available over 1 year for pretraining multi-source TL models to consistently outperform TSFMs as evaluated using the mean absolute error.
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanLogicNat unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
A1: Long Short-Term Memory (LSTM)... A2: Transformer... A3: Mamba... A4: xLSTM...
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, and Masanori Koyama. 2019. Optuna: A next-generation hyperparameter optimization frame- work. InProceedings of the 25th ACM SIGKDD international conference on knowl- edge discovery & data mining. 2623–2631
2019
-
[2]
Abdul Fatir Ansari, Oleksandr Shchur, Jaris Küken, Andreas Auer, Boran Han, Pedro Mercado, Syama Sundar Rangapuram, Huibin Shen, Lorenzo Stella, Xiyuan Zhang, et al. 2025. Chronos-2: From Univariate to Universal Forecasting.arXiv preprint arXiv:2510.15821(2025)
work page internal anchor Pith review arXiv 2025
-
[3]
Abdul Fatir Ansari, Lorenzo Stella, Caner Turkmen, Xiyuan Zhang, Pedro Mercado, Huibin Shen, Oleksandr Shchur, Syama Sundar Rangapuram, Sebas- tian Pineda Arango, Shubham Kapoor, et al. 2024. Chronos: Learning the language of time series.arXiv preprint arXiv:2403.07815(2024)
work page internal anchor Pith review arXiv 2024
-
[4]
Javier Arroyo, Carlo Manna, Fred Spiessens, and Lieve Helsen. 2022. Reinforced model predictive control (RL-MPC) for building energy management.Applied Energy309 (2022), 118346
2022
-
[5]
Yasaman Balali, Adrian Chong, Andrew Busch, and Steven O’Keefe. 2023. Energy modelling and control of building heating and cooling systems with data-driven and hybrid models—A review.Renewable and Sustainable Energy Reviews183 (2023), 113496
2023
-
[6]
Michael Batty. 2018. Digital twins. 817–820 pages
2018
-
[7]
Maximilian Beck, Korbinian Pöppel, Markus Spanring, Andreas Auer, Oleksandra Prudnikova, Michael Kopp, Günter Klambauer, Johannes Brandstetter, and Sepp Hochreiter. 2024. xlstm: Extended long short-term memory.Advances in Neural Information Processing Systems37 (2024), 107547–107603
2024
-
[8]
Gaurav Chaudhary, Hicham Johra, Laurent Georges, and Bjørn Austbø. 2025. Transfer learning in building dynamics prediction.Energy and Buildings330 (2025), 115384
2025
-
[9]
Yujiao Chen, Zheming Tong, Yang Zheng, Holly Samuelson, and Leslie Norford
-
[10]
Transfer learning with deep neural networks for model predictive control of HVAC and natural ventilation in smart buildings.Journal of Cleaner Production 254 (2020), 119866
2020
-
[11]
Zhelun Chen, Zheng O’Neill, Jin Wen, Ojas Pradhan, Tao Yang, Xing Lu, Guanjing Lin, Shohei Miyata, Seungjae Lee, Chou Shen, Roberto Chiosa, Marco Savino Piscitelli, Alfonso Capozzoli, Franz Hengel, Alexander Kührer, Marco Pritoni, Wei Liu, John Clauß, Yimin Chen, and Terry Herr. 2023. A review of data-driven BUILDSYS ’26, June 22–25, 2026, Banff, Canada K...
-
[12]
Wonjun Choi and Sangwon Lee. 2023. Performance evaluation of deep learning architectures for load and temperature forecasting under dataset size constraints and seasonality.Energy and buildings288 (2023), 113027
2023
-
[13]
Ben Cohen, Emaad Khwaja, Youssef Doubli, Salahidine Lemaachi, Chris Lettieri, Charles Masson, Hugo Miccinilli, Elise Ramé, Qiqi Ren, Afshin Rostamizadeh, et al. 2025. This Time is Different: An Observability Perspective on Time Series Foundation Models.arXiv preprint arXiv:2505.14766(2025)
-
[14]
Copernicus Climate Change Service (C3S). 2017. ERA5: Fifth generation of ECMWF atmospheric reanalyses of the global climate. Climate Data Store (CDS). Access via cds.climate.copernicus.eu
2017
-
[15]
Davide Coraci, Silvio Brandi, and Alfonso Capozzoli. 2023. Effective pre-training of a deep reinforcement learning agent by means of long short-term memory models for thermal energy management in buildings.Energy Conversion and Management291 (2023), 117303. doi:10.1016/j.enconman.2023.117303
-
[16]
Abhimanyu Das, Weihao Kong, Rajat Sen, and Yichen Zhou. 2024. A decoder- only foundation model for time-series forecasting. InForty-first International Conference on Machine Learning
2024
-
[17]
Hongwen Dou and Kun Zhang. 2025. Transfer learning for cross-building fore- casting of building energy and indoor air temperature in model predictive control applications.Journal of Building Engineering(2025), 113341
2025
-
[18]
Ján Drgoňa, Javier Arroyo, Iago Cupeiro Figueroa, David Blum, Krzysztof Arendt, Donghun Kim, Enric Perarnau Ollé, Juraj Oravec, Michael Wetter, Draguna L Vrabie, et al. 2020. All you need to know about model predictive control for buildings.Annual reviews in control50 (2020), 190–232
2020
-
[19]
Zhimin Du, Xinqiao Jin, and Yunyu Yang. 2009. Fault diagnosis for temperature, flow rate and pressure sensors in VAV systems using wavelet neural network. Applied Energy86, 9 (2009), 1624–1631. doi:10.1016/j.apenergy.2009.01.015
-
[20]
Vijay Ekambaram, Arindam Jati, Pankaj Dayama, Sumanta Mukherjee, Nam Nguyen, Wesley M Gifford, Chandra Reddy, and Jayant Kalagnanam. 2024. Tiny time mixers (ttms): Fast pre-trained models for enhanced zero/few-shot fore- casting of multivariate time series.Advances in Neural Information Processing Systems37 (2024), 74147–74181
2024
-
[21]
Furkan Elmaz, Reinout Eyckerman, Wim Casteels, Steven Latré, and Peter Hellinckx. 2021. CNN-LSTM architecture for predictive indoor temperature modeling.Building and Environment206 (2021), 108327
2021
-
[22]
Google Research. 2025. TimesFM: Time Series Foundation Model. https://github. com/google-research/timesfm. Accessed: 2025-12-09
2025
-
[23]
Mononito Goswami, Konrad Szafer, Arjun Choudhry, Yifu Cai, Shuo Li, and Artur Dubrawski. 2024. MOMENT: A Family of Open Time-series Foundation Models. InInternational Conference on Machine Learning
2024
-
[24]
Albert Gu and Tri Dao. 2023. Mamba: Linear-time sequence modeling with selective state spaces.arXiv preprint arXiv:2312.00752(2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[25]
Matthias Hertel, Maximilian Beichter, Benedikt Heidrich, Oliver Neumann, Ben- jamin Schäfer, Ralf Mikut, and Veit Hagenmeyer. 2023. Transformer training strategies for forecasting multiple load time series.Energy Informatics6, Suppl 1 (2023), 20
2023
-
[26]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory.Neural computation9, 8 (1997), 1735–1780
1997
-
[27]
Diederik P Kingma. 2014. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980(2014)
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[28]
Thomas Krug, Fabian Raisch, Dominik Aimer, Markus Wirnsberger, Ferdinand Sigg, Felix Koch, Benjamin Schäfer, and Benjamin Tischler. 2025. A Highly Configurable Framework for Large-Scale Thermal Building Data Generation to drive Machine Learning Research.arXiv preprint arXiv:2512.00483(2025)
-
[29]
Thomas Krug, Fabian Raisch, Dominik Aimer, Markus Wirnsberger, Ferdinand Sigg, Benjamin Schäfer, and Benjamin Tischler. 2025. Builda: A thermal building data generation framework for transfer learning. In2025 Annual Modeling and Simulation Conference (ANNSIM). IEEE, 1–13
2025
-
[30]
Doyun Lee, Ryozo Ooka, Yuki Matsuda, Shintaro Ikeda, and Wonjun Choi. 2022. Experimental analysis of artificial intelligence-based model predictive control for thermal energy storage under different cooling load conditions.Sustainable Cities and Society79 (2022), 103700
2022
-
[31]
Han Li, Giuseppe Pinto, Marco Savino Piscitelli, Alfonso Capozzoli, and Tianzhen Hong. 2024. Building thermal dynamics modeling with deep transfer learning using a large residential smart thermostat dataset.Engineering Applications of Artificial Intelligence130 (2024), 107701
2024
- [32]
-
[33]
Bryan Lim, Sercan Ö Arık, Nicolas Loeff, and Tomas Pfister. 2021. Temporal fusion transformers for interpretable multi-horizon time series forecasting.International journal of forecasting37, 4 (2021), 1748–1764
2021
-
[34]
Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101(2017)
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[35]
2022.Ecobee donate your data 1,000 homes in 2017
Na Luo and Tianzhen Hong. 2022.Ecobee donate your data 1,000 homes in 2017. Technical Report. Pacific Northwest National Lab.(PNNL), Richland, WA (United States)
2022
-
[36]
Ozan Baris Mulayim, Pengrui Quan, Liying Han, Xiaomin Ouyang, Dezhi Hong, Mario Bergés, and Mani Srivastava. 2024. Are Time Series Foundation Models Ready to Revolutionize Predictive Building Analytics?. InProceedings of the 11th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation. 169–173
2024
-
[37]
Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam
Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. 2023. A Time Series is Worth 64 Words: Long-term Forecasting with Transformers. In International Conference on Learning Representations
2023
-
[38]
Young-Jin Park, François Germain, Jing Liu, Ye Wang, Toshiaki Koike-Akino, Gor- don Wichern, Navid Azizan, Christopher Laughman, and Ankush Chakrabarty
-
[39]
Probabilistic Forecasting for Building Energy Systems using Time-Series Foundation Models.Energy and Buildings(2025), 116446
2025
-
[40]
Giuseppe Pinto, Riccardo Messina, Han Li, Tianzhen Hong, Marco Savino Piscitelli, and Alfonso Capozzoli. 2022. Sharing is caring: An extensive analy- sis of parameter-based transfer learning for the prediction of building thermal dynamics.Energy and Buildings276 (2022), 112530
2022
-
[41]
Giuseppe Pinto, Zhe Wang, Abhishek Roy, Tianzhen Hong, and Alfonso Capozzoli
-
[42]
Transfer learning for smart buildings: A critical review of algorithms, applications, and future perspectives.Advances in Applied Energy5 (2022), 100084
2022
-
[43]
Martin Pullinger, Jonathan Kilgour, Nigel Goddard, Niklas Berliner, Lynda Webb, Myroslava Dzikovska, Heather Lovell, Janek Mann, Charles Sutton, Janette Webb, et al. 2021. The IDEAL household energy dataset, electricity, gas, contextual sensor data and survey data for 255 UK homes.Scientific Data8, 1 (2021), 146
2021
-
[44]
Fabian Raisch, Thomas Krug, Christoph Goebel, and Benjamin Tischler. 2025. GenTL: A General Transfer Learning Model for Building Thermal Dynamics
2025
-
[45]
Fabian Raisch, Max Langtry, Felix Koch, Ruchi Choudhary, Christoph Goebel, and Benjamin Tischler. 2025. Adapting to Change: A Comparison of Continual and Transfer Learning for Modeling Building Thermal Dynamics under Concept Drifts.Energy and Buildings(2025), 116868
2025
-
[46]
Kashif Rasul, Arjun Ashok, Andrew Robert Williams, Hena Ghonia, Rishika Bhag- watkar, Arian Khorasani, Mohammad Javad Darvishi Bayazi, George Adamopou- los, Roland Riachi, Nadhir Hassen, Marin Biloš, Sahil Garg, Anderson Schneider, Nicolas Chapados, Alexandre Drouin, Valentina Zantedeschi, Yuriy Nevmyvaka, and Irina Rish. 2024. Lag-Llama: Towards Foundati...
-
[47]
Skipper Seabold, Josef Perktold, et al. 2010. Statsmodels: econometric and statis- tical modeling with python.SciPy7, 1 (2010), 92–96
2010
-
[48]
Wei Tian, Yeonsook Heo, Pieter De Wilde, Zhanyong Li, Da Yan, Cheol Soo Park, Xiaohang Feng, and Godfried Augenbroe. 2018. A review of uncertainty analysis in building energy assessment.Renewable and Sustainable Energy Reviews93 (2018)
2018
-
[49]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need.Advances in neural information processing systems30 (2017)
2017
-
[50]
Zihan Wang, Fanheng Kong, Shi Feng, Ming Wang, Xiaocui Yang, Han Zhao, Dal- ing Wang, and Yifei Zhang. 2025. Is mamba effective for time series forecasting? Neurocomputing619 (2025), 129178
2025
-
[51]
Weather Underground. 2023. Daily Weather History for UK. https://www. wunderground.com/history. Accessed: 2025-05-12
2023
- [52]
-
[53]
Haixu Wu, Jiehui Xu, Jianmin Wang, and Mingsheng Long. 2021. Autoformer: De- composition transformers with auto-correlation for long-term series forecasting. Advances in neural information processing systems34 (2021), 22419–22430
2021
-
[54]
Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. 2021. Informer: Beyond efficient transformer for long se- quence time-series forecasting. InProceedings of the AAAI conference on artificial intelligence, Vol. 35. 11106–11115. A Appendix Received 06 February 2026; accepted 04 April 2026 Thermal-GEMs Generalized...
2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.