pith. machine review for the scientific record. sign in

arxiv: 2605.00504 · v1 · submitted 2026-05-01 · 💻 cs.SE

Recognition: unknown

EnCoDe: Energy Estimation of Source Code At Design-Time

Authors on Pith no claims yet

Pith reviewed 2026-05-09 18:54 UTC · model grok-4.3

classification 💻 cs.SE
keywords energy estimationdesign-time analysismachine learningcode blocksstatic featuresPythonenergy hotspotsPowerLens
0
0 comments X

The pith

Machine learning models predict energy consumption of individual code blocks from static features alone, without execution.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces EnCoDe to let developers estimate how much energy small code blocks like loops and conditionals will consume during the design phase. It develops PowerLens to capture reliable sub-millisecond energy readings and extracts blocks from over 18,000 Python programs to reveal links between energy use and features such as structure, complexity, density, and context. Machine learning models trained on these features deliver stable predictions, with regressors reaching an R-squared of 0.75 and classifiers identifying energy hotspots at 80.6 percent accuracy. This shifts energy analysis from coarse runtime profiling to fine-grained, early-stage decisions that can reduce inefficient code before testing begins.

Core claim

Using measurements from PowerLens on code blocks drawn from more than 18,000 Python programs, the authors establish that static code features exhibit both linear and non-linear relationships with energy consumption. Regressors trained on structural, complexity, density, and contextual characteristics achieve an R-squared value of 0.75 for block-level energy estimation, while classifiers reach 80.6 percent accuracy in detecting energy hotspots. The resulting models enable accurate, reproducible predictions at design time without requiring program execution.

What carries the argument

Machine learning regressors and classifiers trained on static structural, complexity, density, and contextual features of code blocks, using PowerLens sub-millisecond energy measurements as ground truth.

If this is right

  • Developers can compare the energy efficiency of alternative constructs such as loops versus conditionals while writing code.
  • High-energy code regions can be localized and refactored early, lowering overall software energy use without runtime tools.
  • A fine-grained dataset of block-level energy data becomes available for further study of code-energy relationships.
  • Design-time energy feedback supports lower operational costs and reduced environmental impact from software.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Embedding the models in integrated development environments could deliver real-time energy estimates as developers type.
  • The approach might extend to other languages if similar static-feature relationships are measured and validated.
  • Combining predictions with automated refactoring tools could generate energy-aware code suggestions during development.
  • Collecting larger or more diverse datasets could test whether additional contextual features improve prediction stability.

Load-bearing premise

The linear and non-linear relationships between static code features and energy consumption found in the studied Python blocks will hold for new, unseen code.

What would settle it

Testing the trained models on code blocks from a fresh set of Python programs and obtaining R-squared values below 0.5 or hotspot classification accuracy below 70 percent against new PowerLens measurements.

Figures

Figures reproduced from arXiv: 2605.00504 by Akhila Matathammal, Karthik Vaidhyanathan, Shailender Goyal.

Figure 1
Figure 1. Figure 1: Quality of RAPL’s Measurements over Execution view at source ↗
Figure 2
Figure 2. Figure 2: Design Time Energy Estimation Methodology view at source ↗
Figure 3
Figure 3. Figure 3: Code Parsing to identify blocks from AST view at source ↗
Figure 4
Figure 4. Figure 4: PowerLens : Sub-Millisecond Energy Measurement Methodology view at source ↗
Figure 2
Figure 2. Figure 2 view at source ↗
Figure 5
Figure 5. Figure 5: Block Level Energy Measurement of the score com view at source ↗
Figure 6
Figure 6. Figure 6: Sum of Blocks measured by Powerlens compared view at source ↗
read the original abstract

Energy efficiency has emerged as a vital attribute of software quality, with significant implications for both environmental sustainability and operational costs. However, existing profiling tools operate only at runtime and coarse granularity, typically capturing energy at the process or method level. Such tools fail to expose how small code blocks, such as functions, loops, and conditionals, contribute to energy consumption, preventing developers from reasoning about and comparing the energy efficiency of programming constructs during design-time. To address this gap, we propose EnCoDe, a methodology for fine-grained, design-time energy estimation, with the following key contributions: (1) PowerLens, a novel measurement methodology that achieves reliable sub-millisecond energy readings for small code blocks; (2) Extensive empirical study on code blocks extracted from over 18,000 Python programs, uncovering linear and non-linear relationships between energy consumption and static code features such as structural, complexity, density, and contextual characteristics, resulting in a first-of-its-kind fine-grained dataset; and (3) Predictive modeling, in which machine learning models are trained on these features to accurately estimate and classify block-level energy consumption at design-time. Our results demonstrate stable, reproducible block-level estimations, with regressors achieving R^2 = 0.75 and classifiers achieving 80.6% accuracy in identifying energy hotspots, enabling developers to localize and address inefficient code regions early in the development process without execution.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript introduces EnCoDe, a design-time methodology for estimating energy consumption of fine-grained source code blocks (e.g., loops, conditionals). It contributes (1) PowerLens, a measurement approach claimed to deliver reliable sub-millisecond energy readings; (2) an empirical study extracting blocks from >18,000 Python programs to identify linear and non-linear relationships with static features (structural, complexity, density, contextual); and (3) ML regressors and classifiers trained on these features, reporting R²=0.75 for energy estimation and 80.6% accuracy for hotspot classification, enabling early localization of inefficient code without execution.

Significance. If the measurement methodology is independently validated and the models are shown to generalize, the work would enable practical design-time energy reasoning at block granularity, a notable gap relative to existing runtime profilers. The scale of the empirical dataset and the reported performance metrics, if robust, would provide a reusable resource for sustainable software engineering and could influence early-stage optimization practices.

major comments (3)
  1. [PowerLens methodology] PowerLens methodology (contribution 1): The claim of 'reliable sub-millisecond energy readings' for blocks <1 ms is presented without any independent validation against a reference instrument, quantification of systematic bias, timing jitter, or hardware-specific artifacts. Since these readings constitute the ground-truth labels for the entire dataset and subsequent ML training, the absence of such validation directly undermines the soundness of the reported R² and accuracy figures.
  2. [Predictive modeling] Predictive modeling (contribution 3): The regressors are reported to achieve R²=0.75 and classifiers 80.6% accuracy, yet no information is given on train/test splits, cross-validation procedure, error bars, or explicit controls for overfitting. Without these details the performance numbers cannot be assessed for robustness or generalization to unseen code.
  3. [Empirical study] Empirical study (contribution 2): The study extracts blocks from 18,000 programs and identifies feature-energy relationships, but provides no description of block extraction criteria, feature selection process, or statistical tests confirming the significance of the linear/non-linear relationships. This leaves the feature set used for ML training without a clear, reproducible foundation.
minor comments (1)
  1. [Abstract] The abstract states 'stable, reproducible block-level estimations' but the manuscript does not indicate whether code, seeds, or the full dataset are made available to support reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed review. The comments highlight important aspects of methodological rigor that we address point by point below. We have revised the manuscript to incorporate the requested details and clarifications.

read point-by-point responses
  1. Referee: [PowerLens methodology] PowerLens methodology (contribution 1): The claim of 'reliable sub-millisecond energy readings' for blocks <1 ms is presented without any independent validation against a reference instrument, quantification of systematic bias, timing jitter, or hardware-specific artifacts. Since these readings constitute the ground-truth labels for the entire dataset and subsequent ML training, the absence of such validation directly undermines the soundness of the reported R² and accuracy figures.

    Authors: We agree that independent validation of PowerLens is necessary to substantiate the reliability claims. The revised manuscript includes a new subsection on PowerLens validation that reports direct comparisons against a calibrated reference power analyzer. This covers systematic bias quantification, timing jitter measurements across repeated runs, and artifact analysis on hardware-specific workloads. These additions provide the required evidence supporting the sub-millisecond ground-truth labels used for the dataset and ML models. revision: yes

  2. Referee: [Predictive modeling] Predictive modeling (contribution 3): The regressors are reported to achieve R²=0.75 and classifiers 80.6% accuracy, yet no information is given on train/test splits, cross-validation procedure, error bars, or explicit controls for overfitting. Without these details the performance numbers cannot be assessed for robustness or generalization to unseen code.

    Authors: We acknowledge that the original description of the modeling pipeline lacked sufficient experimental detail. The revised version adds an explicit experimental setup subsection describing the stratified 80/20 train/test split, 5-fold cross-validation with shuffling, L2 regularization and early stopping as overfitting controls, and performance reported as mean ± standard deviation across folds. These changes enable proper evaluation of robustness and generalization. revision: yes

  3. Referee: [Empirical study] Empirical study (contribution 2): The study extracts blocks from 18,000 programs and identifies feature-energy relationships, but provides no description of block extraction criteria, feature selection process, or statistical tests confirming the significance of the linear/non-linear relationships. This leaves the feature set used for ML training without a clear, reproducible foundation.

    Authors: We recognize the importance of full reproducibility for the empirical study. The revised manuscript expands the methodology to detail the AST-based block extraction criteria (including rules for loops, conditionals, and functions), the initial feature set of 52 static metrics, the selection process using correlation filtering and recursive feature elimination, and statistical validation via Pearson coefficients for linear relationships and distance correlation with permutation tests for non-linear ones, including p-values. This establishes a clear foundation for the feature set and dataset. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical measurement-to-ML pipeline is self-contained

full rationale

The paper's core chain consists of (1) PowerLens sub-millisecond measurements on extracted blocks from 18k+ Python programs to produce labeled data, (2) extraction of static code features (structural, complexity, etc.), (3) empirical observation of linear/non-linear relationships, and (4) training of regressors/classifiers whose reported R^2 = 0.75 and 80.6% accuracy are standard held-out evaluation metrics on that dataset. No equations reduce any prediction to a fitted parameter by construction, no self-citations or uniqueness theorems are invoked as load-bearing premises, and no ansatz or renaming of known results is smuggled in. The derivation is therefore an ordinary empirical ML pipeline whose outputs are not tautological with its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim depends on the assumption that static features suffice to predict energy and that the new measurement tool yields usable ground truth; no explicit free parameters are named in the abstract.

axioms (1)
  • domain assumption Static code features (structural, complexity, density, contextual) capture sufficient information to predict energy consumption of small blocks
    This underpins the predictive modeling step and the claim that design-time estimation is possible.
invented entities (1)
  • PowerLens no independent evidence
    purpose: Achieve reliable sub-millisecond energy readings for small code blocks
    New measurement methodology introduced to enable the fine-grained dataset.

pith-pipeline@v0.9.0 · 5558 in / 1365 out tokens · 38332 ms · 2026-05-09T18:54:01.234990+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

43 extracted references · 28 canonical work pages

  1. [1]

    Aho, Ravi Sethi, and Jeffrey D

    Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. 1986. Compilers: Principles, Techniques, and Tools. InAddison-Wesley series in computer science / World student series edition. https://api.semanticscholar.org/CorpusID:278028060

  2. [2]

    T., Devanbu, P., and Sutton, C

    Miltiadis Allamanis, Earl T. Barr, Premkumar Devanbu, and Charles Sutton. 2018. A Survey of Machine Learning for Big Code and Naturalness.ACM Comput. Surv. 51, 4, Article 81 (July 2018), 37 pages. doi:10.1145/3212695

  3. [3]

    Eman Abdullah AlOmar, Salma Abdullah AlOmar, and Mohamed Wiem Mkaouer

  4. [4]

    In2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering Education and Training (ICSE-SEET)

    On the Use of Static Analysis to Engage Students with Software Quality Improvement: An Experience with PMD. In2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering Education and Training (ICSE-SEET). IEEE/ACM, 179–191. doi:10.1109/ICSE-SEET58685.2023.00023

  5. [5]

    Hamza Mustafa Alvi, Hammad Majeed, Hasan Mujtaba, and Mirza Omer Beg

  6. [6]

    Nicolás, The bar derived category of a curved dg algebra, Journal of Pure and Applied Algebra 212 (2008) 2633–2659

    MLEE: Method Level Energy Estimation — A machine learning approach. Sustainable Computing: Informatics and Systems32 (2021), 100594. doi:10.1016/j. suscom.2021.100594

  7. [7]

    Lotfi Belkhir and Ahmed Elmeligi. 2018. Assessing ICT global emissions footprint: Trends to 2040 & recommendations.Journal of Cleaner Production177 (2018), 448–463. doi:10.1016/j.jclepro.2017.12.239

  8. [8]

    Alexei Botchkarev. 2018. Performance Metrics (Error Measures) in Machine Learning Regression, Forecasting and Prognostics: Properties and Typology. ArXivabs/1809.03006 (2018). https://api.semanticscholar.org/CorpusID:52182534

  9. [9]

    Warrens, and Giuseppe Jurman

    Davide Chicco, Matthijs J. Warrens, and Giuseppe Jurman. 2021. The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation.PeerJ Computer Science7 (2021). https://api.semanticscholar.org/CorpusID:236196832

  10. [10]

    Istvan David. 2025. SusDevOps: Promoting Sustainability to a First Principle in Software Delivery. In2025 IEEE/ACM 47th International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER). 106–110. doi:10.1109/ ICSE-NIER66352.2025.00027

  11. [11]

    Roberto Di Cosmo. 2018. Software Heritage: Why and How We Collect, Preserve and Share All the Software Source Code. In2018 IEEE/ACM 40th International Conference on Software Engineering: Software Engineering in Society (ICSE-SEIS). 2–2

  12. [12]

    Choice modelling in the age of machine learning - Discussion paper , volume =

    Charlotte Freitag, Mike Berners-Lee, Kelly Widdicks, Bran Knowles, Gordon S. Blair, and Adrian Friday. 2021. The real climate and transformative impact of ICT: A critique of estimates, trends, and regulations.Patterns2, 9 (2021), 100340. doi:10.1016/j.patter.2021.100340

  13. [13]

    Marcus Hähnel, Björn Döbel, Marcus Völp, and Hermann Härtig. 2012. Measuring energy consumption for short code paths using RAPL.SIGMETRICS Perform. Eval. Rev.40, 3 (Jan. 2012), 13–17. doi:10.1145/2425248.2425252

  14. [14]

    Halstead

    Maurice H. Halstead. 1977.Elements of Software Science (Operating and program- ming systems series). Elsevier Science Inc., USA

  15. [15]

    Abram Hindle. 2015. Green mining: a methodology of relating software change and configuration to power consumption.Empirical Softw. Engg.20, 2 (April 2015), 374–409. doi:10.1007/s10664-013-9276-6

  16. [16]

    M. G. Kendall. 1938. A New Measure of Rank Correlation.Biometrika30, 1/2 (1938), 81–93. http://www.jstor.org/stable/2332226

  17. [17]

    K. N. Khan, M. Hirki, T. Niemi, J. K. Nurminen, and Z. Ou. 2018. RAPL in Action: Experiences in Using RAPL for Power Measurements.ACM Transactions on Modeling and Performance Evaluation of Computing Systems3, 2 (2018), 1–26. doi:10.1145/3177754

  18. [18]

    Ghani, and Khaironi Yatim Sharif

    Ching Kin Keong, Koh Tieng Wei, Abdul Azim Abd. Ghani, and Khaironi Yatim Sharif. 2015. Toward using software metrics as indicator to measure power consumption of mobile application: A case study. In2015 9th Malaysian Software Engineering Conference (MySEC). 172–177. doi:10.1109/MySEC.2015.7475216

  19. [19]

    Sung Une Lee, Niroshinie Fernando, Kevin Lee, and Jean-Guy Schneider. 2024. A survey of energy concerns for software engineering.Journal of Systems and Software210 (2024), 111944. doi:10.1016/j.jss.2023.111944

  20. [20]

    Valentina Lenarduzzi, Nyyti Saarimäki, and Davide Taibi. 2020. Some SonarQube issues have a significant but small effect on faults and changes. A large-scale empirical study.Journal of Systems and Software170 (2020), 110750. doi:10.1016/ j.jss.2020.110750

  21. [21]

    T.J. McCabe. 1976. A Complexity Measure.IEEE Transactions on Software Engi- neeringSE-2, 4 (1976), 308–320. doi:10.1109/TSE.1976.233837

  22. [22]

    Nikolopoulos, and Bronis R

    Lev Mukhanov, Dimitrios S. Nikolopoulos, and Bronis R. De Supinski. 2015. ALEA: Fine-Grain Energy Profiling with Basic Block Sampling. In2015 International Conference on Parallel Architecture and Compilation (PACT). 87–98. doi:10.1109/ PACT.2015.16

  23. [23]

    Hira Noman, Naeem Ahmed Mahoto, Sania Bhatti, Hamad Ali Abosaq, Mana Saleh Al Reshan, and Asadullah Shaikh. 2022. An Exploratory Study of Software Sustainability at Early Stages of Software Development.Sustainability 14, 14 (2022). doi:10.3390/su14148596

  24. [24]

    Candy Pang, Abram Hindle, Bram Adams, and Ahmed E. Hassan. 2016. What Do Programmers Know about Software Energy Consumption?IEEE Software33, 3 (2016), 83–89. doi:10.1109/MS.2015.83

  25. [25]

    Karl Pearson. 1896. Mathematical Contributions to the Theory of Evolution. III. Regression, Heredity, and Panmixia.Philosophical Transactions of the Royal Society of London Series A187 (Jan. 1896), 253–318. doi:10.1098/rsta.1896.0007

  26. [26]

    Gustavo Pinto and Fernando Castor. 2017. Energy efficiency: A new concern for application software developers.Commun. ACM60 (11 2017), 68–75. doi:10. 1145/3154384

  27. [27]

    Saurabhsingh Rajput, Alexander Brandt, Vadim Elisseev, and Tushar Sharma

  28. [28]

    arXiv:2601.13345 [cs.SE] https://arxiv.org/abs/2601.13345

    FlipFlop: A Static Analysis-based Energy Optimization Framework for GPU Kernels. arXiv:2601.13345 [cs.SE] https://arxiv.org/abs/2601.13345

  29. [29]

    Saurabhsingh Rajput, Tim Widmayer, Ziyuan Shang, Maria Kechagia, Federica Sarro, and Tushar Sharma. 2024. Enhancing Energy-Awareness in Deep Learning through Fine-Grained Energy Measurement.ACM Trans. Softw. Eng. Methodol. 33, 8, Article 211 (Dec. 2024), 34 pages. doi:10.1145/3680470

  30. [30]

    Pooja Rani, Jonas Zellweger, Veronika Kousadianos, Luis Cruz, Timo Kehrer, and Alberto Bacchelli. 2024. Energy Patterns for Web: An Exploratory Study. InProceedings of the 46th International Conference on Software Engineering: Soft- ware Engineering in Society(Lisbon, Portugal)(ICSE-SEIS’24). Association for Computing Machinery, New York, NY, USA, 12–22. ...

  31. [31]

    Motaharul Islam

    Nurzihan Fatema Reya, Abtahi Ahmed, Tashfia Rifa Zaman, and Md. Motaharul Islam. 2023. GreenPy: Evaluating Application-Level Energy Efficiency in Python for Green Computing.Annals of Emerging Technologies in Computing(2023). https://api.semanticscholar.org/CorpusID:259509821

  32. [32]

    Ferreira, and Alexandra Mendes

    Ana Ribeiro, João F. Ferreira, and Alexandra Mendes. 2021. EcoAndroid: An Android Studio Plugin for Developing Energy-Efficient Java Mobile Applications. In2021 IEEE 21st International Conference on Software Quality, Reliability and Security (QRS). 62–69. doi:10.1109/QRS54544.2021.00017

  33. [33]

    Shivadharshan S, Akilesh P, Rajrupa Chattaraj, and Sridhar Chimalakonda. 2024. CPPJoules: An Energy Measurement Tool for C++. arXiv:2412.13555 [cs.SE] https://arxiv.org/abs/2412.13555

  34. [34]

    Marina Sokolova and Guy Lapalme. 2009. A systematic analysis of performance measures for classification tasks.Information Processing & Management45, 4 (2009), 427–437. doi:10.1016/j.ipm.2009.03.002

  35. [35]

    Spearman

    C. Spearman. 1904. The Proof and Measurement of Association Between Two Things.American Journal of Psychology15 (1904), 88–103

  36. [36]

    Weisong Sun, Chunrong Fang, Yun Miao, Yudu You, Mengzhe Yuan, Yuchen Chen, Quanjun Zhang, An Guo, Xiang Chen, Yang Liu, and Zhenyu Chen. 2023. Abstract Syntax Tree for Programming Language Understanding and Representation: How Far Are We? arXiv:2312.00413 [cs.SE] https://arxiv.org/abs/2312.00413

  37. [37]

    John W. Tukey. 1977.Exploratory data analysis. Addison-Wesley Pub. Co., Reading, Mass

  38. [38]

    Roberto Verdecchia, Luis Cruz, June Sallou, Michelle Lin, James Wickenden, and Estelle Hotellier. 2022. Data-Centric Green AI An Exploratory Empirical Study. In2022 International Conference on ICT for Sustainability (ICT4S). IEEE, 35–45. doi:10.1109/ict4s55073.2022.00015

  39. [39]

    Fadi Wedyan, Rachael Morrison, and Osama Sam Abuomar. 2023. Integration and Unit Testing of Software Energy Consumption. In2023 Tenth International Conference on Software Defined Systems (SDS). 60–64. doi:10.1109/SDS59856.2023. 10329262

  40. [40]

    N. Wirth. 1995. A plea for lean software.Computer28, 2 (1995), 64–68. doi:10. 1109/2.348001

  41. [41]

    Ohlsson, Bjrn Regnell, and An- ders Wessln

    Claes Wohlin, Per Runeson, Martin Hst, Magnus C. Ohlsson, Bjrn Regnell, and An- ders Wessln. 2012.Experimentation in Software Engineering. Springer Publishing Company, Incorporated

  42. [42]

    Włodzimierz Wysocki, Ireneusz Miciuła, and Przemysław Plecka. 2025. Methods of Improving Software Energy Efficiency: A Systematic Literature Review and the Current State of Applied Methods in Practice.Electronics14, 7 (2025). doi:10. 3390/electronics14071331

  43. [43]

    Thomas Zaragoza, Adel Noureddine, and Ernesto Exposito. 2025. A systematic mapping study on software-based feedback for energy consumption.Renewable and Sustainable Energy Reviews222 (2025), 115889. doi:10.1016/j.rser.2025.115889