arxiv: 2605.00504 · v1 · submitted 2026-05-01 · 💻 cs.SE

Recognition: unknown

EnCoDe: Energy Estimation of Source Code At Design-Time

Shailender Goyal , Akhila Matathammal , Karthik Vaidhyanathan

Authors on Pith no claims yet

Pith reviewed 2026-05-09 18:54 UTC · model grok-4.3

classification 💻 cs.SE

keywords energy estimationdesign-time analysismachine learningcode blocksstatic featuresPythonenergy hotspotsPowerLens

0 comments

The pith

Machine learning models predict energy consumption of individual code blocks from static features alone, without execution.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces EnCoDe to let developers estimate how much energy small code blocks like loops and conditionals will consume during the design phase. It develops PowerLens to capture reliable sub-millisecond energy readings and extracts blocks from over 18,000 Python programs to reveal links between energy use and features such as structure, complexity, density, and context. Machine learning models trained on these features deliver stable predictions, with regressors reaching an R-squared of 0.75 and classifiers identifying energy hotspots at 80.6 percent accuracy. This shifts energy analysis from coarse runtime profiling to fine-grained, early-stage decisions that can reduce inefficient code before testing begins.

Core claim

Using measurements from PowerLens on code blocks drawn from more than 18,000 Python programs, the authors establish that static code features exhibit both linear and non-linear relationships with energy consumption. Regressors trained on structural, complexity, density, and contextual characteristics achieve an R-squared value of 0.75 for block-level energy estimation, while classifiers reach 80.6 percent accuracy in detecting energy hotspots. The resulting models enable accurate, reproducible predictions at design time without requiring program execution.

What carries the argument

Machine learning regressors and classifiers trained on static structural, complexity, density, and contextual features of code blocks, using PowerLens sub-millisecond energy measurements as ground truth.

If this is right

Developers can compare the energy efficiency of alternative constructs such as loops versus conditionals while writing code.
High-energy code regions can be localized and refactored early, lowering overall software energy use without runtime tools.
A fine-grained dataset of block-level energy data becomes available for further study of code-energy relationships.
Design-time energy feedback supports lower operational costs and reduced environmental impact from software.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Embedding the models in integrated development environments could deliver real-time energy estimates as developers type.
The approach might extend to other languages if similar static-feature relationships are measured and validated.
Combining predictions with automated refactoring tools could generate energy-aware code suggestions during development.
Collecting larger or more diverse datasets could test whether additional contextual features improve prediction stability.

Load-bearing premise

The linear and non-linear relationships between static code features and energy consumption found in the studied Python blocks will hold for new, unseen code.

What would settle it

Testing the trained models on code blocks from a fresh set of Python programs and obtaining R-squared values below 0.5 or hotspot classification accuracy below 70 percent against new PowerLens measurements.

Figures

Figures reproduced from arXiv: 2605.00504 by Akhila Matathammal, Karthik Vaidhyanathan, Shailender Goyal.

**Figure 1.** Figure 1: Quality of RAPL’s Measurements over Execution view at source ↗

**Figure 2.** Figure 2: Design Time Energy Estimation Methodology view at source ↗

**Figure 3.** Figure 3: Code Parsing to identify blocks from AST view at source ↗

**Figure 4.** Figure 4: PowerLens : Sub-Millisecond Energy Measurement Methodology view at source ↗

**Figure 5.** Figure 5: Block Level Energy Measurement of the score com view at source ↗

**Figure 6.** Figure 6: Sum of Blocks measured by Powerlens compared view at source ↗

read the original abstract

Energy efficiency has emerged as a vital attribute of software quality, with significant implications for both environmental sustainability and operational costs. However, existing profiling tools operate only at runtime and coarse granularity, typically capturing energy at the process or method level. Such tools fail to expose how small code blocks, such as functions, loops, and conditionals, contribute to energy consumption, preventing developers from reasoning about and comparing the energy efficiency of programming constructs during design-time. To address this gap, we propose EnCoDe, a methodology for fine-grained, design-time energy estimation, with the following key contributions: (1) PowerLens, a novel measurement methodology that achieves reliable sub-millisecond energy readings for small code blocks; (2) Extensive empirical study on code blocks extracted from over 18,000 Python programs, uncovering linear and non-linear relationships between energy consumption and static code features such as structural, complexity, density, and contextual characteristics, resulting in a first-of-its-kind fine-grained dataset; and (3) Predictive modeling, in which machine learning models are trained on these features to accurately estimate and classify block-level energy consumption at design-time. Our results demonstrate stable, reproducible block-level estimations, with regressors achieving R^2 = 0.75 and classifiers achieving 80.6% accuracy in identifying energy hotspots, enabling developers to localize and address inefficient code regions early in the development process without execution.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

EnCoDe builds a sizable dataset of block-level energy measurements from 18k Python programs and trains ML models on static features to predict at design time, but the sub-millisecond PowerLens readings lack clear independent validation.

read the letter

The paper's main advance is the scale of the empirical work: they pull code blocks from over 18,000 Python programs, measure energy at fine granularity with their PowerLens tool, map those values to static features like structure and complexity, and release what they call a first-of-its-kind dataset. They then train regressors and classifiers that hit R² of 0.75 and 80.6% hotspot accuracy. That combination of dataset size and design-time focus is the part worth noting, since most energy tools stay at runtime and coarser levels.

Referee Report

3 major / 1 minor

Summary. The manuscript introduces EnCoDe, a design-time methodology for estimating energy consumption of fine-grained source code blocks (e.g., loops, conditionals). It contributes (1) PowerLens, a measurement approach claimed to deliver reliable sub-millisecond energy readings; (2) an empirical study extracting blocks from >18,000 Python programs to identify linear and non-linear relationships with static features (structural, complexity, density, contextual); and (3) ML regressors and classifiers trained on these features, reporting R²=0.75 for energy estimation and 80.6% accuracy for hotspot classification, enabling early localization of inefficient code without execution.

Significance. If the measurement methodology is independently validated and the models are shown to generalize, the work would enable practical design-time energy reasoning at block granularity, a notable gap relative to existing runtime profilers. The scale of the empirical dataset and the reported performance metrics, if robust, would provide a reusable resource for sustainable software engineering and could influence early-stage optimization practices.

major comments (3)

[PowerLens methodology] PowerLens methodology (contribution 1): The claim of 'reliable sub-millisecond energy readings' for blocks <1 ms is presented without any independent validation against a reference instrument, quantification of systematic bias, timing jitter, or hardware-specific artifacts. Since these readings constitute the ground-truth labels for the entire dataset and subsequent ML training, the absence of such validation directly undermines the soundness of the reported R² and accuracy figures.
[Predictive modeling] Predictive modeling (contribution 3): The regressors are reported to achieve R²=0.75 and classifiers 80.6% accuracy, yet no information is given on train/test splits, cross-validation procedure, error bars, or explicit controls for overfitting. Without these details the performance numbers cannot be assessed for robustness or generalization to unseen code.
[Empirical study] Empirical study (contribution 2): The study extracts blocks from 18,000 programs and identifies feature-energy relationships, but provides no description of block extraction criteria, feature selection process, or statistical tests confirming the significance of the linear/non-linear relationships. This leaves the feature set used for ML training without a clear, reproducible foundation.

minor comments (1)

[Abstract] The abstract states 'stable, reproducible block-level estimations' but the manuscript does not indicate whether code, seeds, or the full dataset are made available to support reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed review. The comments highlight important aspects of methodological rigor that we address point by point below. We have revised the manuscript to incorporate the requested details and clarifications.

read point-by-point responses

Referee: [PowerLens methodology] PowerLens methodology (contribution 1): The claim of 'reliable sub-millisecond energy readings' for blocks <1 ms is presented without any independent validation against a reference instrument, quantification of systematic bias, timing jitter, or hardware-specific artifacts. Since these readings constitute the ground-truth labels for the entire dataset and subsequent ML training, the absence of such validation directly undermines the soundness of the reported R² and accuracy figures.

Authors: We agree that independent validation of PowerLens is necessary to substantiate the reliability claims. The revised manuscript includes a new subsection on PowerLens validation that reports direct comparisons against a calibrated reference power analyzer. This covers systematic bias quantification, timing jitter measurements across repeated runs, and artifact analysis on hardware-specific workloads. These additions provide the required evidence supporting the sub-millisecond ground-truth labels used for the dataset and ML models. revision: yes
Referee: [Predictive modeling] Predictive modeling (contribution 3): The regressors are reported to achieve R²=0.75 and classifiers 80.6% accuracy, yet no information is given on train/test splits, cross-validation procedure, error bars, or explicit controls for overfitting. Without these details the performance numbers cannot be assessed for robustness or generalization to unseen code.

Authors: We acknowledge that the original description of the modeling pipeline lacked sufficient experimental detail. The revised version adds an explicit experimental setup subsection describing the stratified 80/20 train/test split, 5-fold cross-validation with shuffling, L2 regularization and early stopping as overfitting controls, and performance reported as mean ± standard deviation across folds. These changes enable proper evaluation of robustness and generalization. revision: yes
Referee: [Empirical study] Empirical study (contribution 2): The study extracts blocks from 18,000 programs and identifies feature-energy relationships, but provides no description of block extraction criteria, feature selection process, or statistical tests confirming the significance of the linear/non-linear relationships. This leaves the feature set used for ML training without a clear, reproducible foundation.

Authors: We recognize the importance of full reproducibility for the empirical study. The revised manuscript expands the methodology to detail the AST-based block extraction criteria (including rules for loops, conditionals, and functions), the initial feature set of 52 static metrics, the selection process using correlation filtering and recursive feature elimination, and statistical validation via Pearson coefficients for linear relationships and distance correlation with permutation tests for non-linear ones, including p-values. This establishes a clear foundation for the feature set and dataset. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical measurement-to-ML pipeline is self-contained

full rationale

The paper's core chain consists of (1) PowerLens sub-millisecond measurements on extracted blocks from 18k+ Python programs to produce labeled data, (2) extraction of static code features (structural, complexity, etc.), (3) empirical observation of linear/non-linear relationships, and (4) training of regressors/classifiers whose reported R^2 = 0.75 and 80.6% accuracy are standard held-out evaluation metrics on that dataset. No equations reduce any prediction to a fitted parameter by construction, no self-citations or uniqueness theorems are invoked as load-bearing premises, and no ansatz or renaming of known results is smuggled in. The derivation is therefore an ordinary empirical ML pipeline whose outputs are not tautological with its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim depends on the assumption that static features suffice to predict energy and that the new measurement tool yields usable ground truth; no explicit free parameters are named in the abstract.

axioms (1)

domain assumption Static code features (structural, complexity, density, contextual) capture sufficient information to predict energy consumption of small blocks
This underpins the predictive modeling step and the claim that design-time estimation is possible.

invented entities (1)

PowerLens no independent evidence
purpose: Achieve reliable sub-millisecond energy readings for small code blocks
New measurement methodology introduced to enable the fine-grained dataset.

pith-pipeline@v0.9.0 · 5558 in / 1365 out tokens · 38332 ms · 2026-05-09T18:54:01.234990+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

43 extracted references · 28 canonical work pages

[1]

Aho, Ravi Sethi, and Jeffrey D

Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. 1986. Compilers: Principles, Techniques, and Tools. InAddison-Wesley series in computer science / World student series edition. https://api.semanticscholar.org/CorpusID:278028060

1986
[2]

T., Devanbu, P., and Sutton, C

Miltiadis Allamanis, Earl T. Barr, Premkumar Devanbu, and Charles Sutton. 2018. A Survey of Machine Learning for Big Code and Naturalness.ACM Comput. Surv. 51, 4, Article 81 (July 2018), 37 pages. doi:10.1145/3212695

work page doi:10.1145/3212695 2018
[3]

Eman Abdullah AlOmar, Salma Abdullah AlOmar, and Mohamed Wiem Mkaouer
[4]

In2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering Education and Training (ICSE-SEET)

On the Use of Static Analysis to Engage Students with Software Quality Improvement: An Experience with PMD. In2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering Education and Training (ICSE-SEET). IEEE/ACM, 179–191. doi:10.1109/ICSE-SEET58685.2023.00023

work page doi:10.1109/icse-seet58685.2023.00023 2023
[5]

Hamza Mustafa Alvi, Hammad Majeed, Hasan Mujtaba, and Mirza Omer Beg
[6]

Nicolás, The bar derived category of a curved dg algebra, Journal of Pure and Applied Algebra 212 (2008) 2633–2659

MLEE: Method Level Energy Estimation — A machine learning approach. Sustainable Computing: Informatics and Systems32 (2021), 100594. doi:10.1016/j. suscom.2021.100594

work page doi:10.1016/j 2021
[7]

Lotfi Belkhir and Ahmed Elmeligi. 2018. Assessing ICT global emissions footprint: Trends to 2040 & recommendations.Journal of Cleaner Production177 (2018), 448–463. doi:10.1016/j.jclepro.2017.12.239

work page doi:10.1016/j.jclepro.2017.12.239 2018
[8]

Alexei Botchkarev. 2018. Performance Metrics (Error Measures) in Machine Learning Regression, Forecasting and Prognostics: Properties and Typology. ArXivabs/1809.03006 (2018). https://api.semanticscholar.org/CorpusID:52182534

work page arXiv 2018
[9]

Warrens, and Giuseppe Jurman

Davide Chicco, Matthijs J. Warrens, and Giuseppe Jurman. 2021. The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation.PeerJ Computer Science7 (2021). https://api.semanticscholar.org/CorpusID:236196832

2021
[10]

Istvan David. 2025. SusDevOps: Promoting Sustainability to a First Principle in Software Delivery. In2025 IEEE/ACM 47th International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER). 106–110. doi:10.1109/ ICSE-NIER66352.2025.00027

work page arXiv 2025
[11]

Roberto Di Cosmo. 2018. Software Heritage: Why and How We Collect, Preserve and Share All the Software Source Code. In2018 IEEE/ACM 40th International Conference on Software Engineering: Software Engineering in Society (ICSE-SEIS). 2–2

2018
[12]

Choice modelling in the age of machine learning - Discussion paper , volume =

Charlotte Freitag, Mike Berners-Lee, Kelly Widdicks, Bran Knowles, Gordon S. Blair, and Adrian Friday. 2021. The real climate and transformative impact of ICT: A critique of estimates, trends, and regulations.Patterns2, 9 (2021), 100340. doi:10.1016/j.patter.2021.100340

work page doi:10.1016/j.patter.2021.100340 2021
[13]

Marcus Hähnel, Björn Döbel, Marcus Völp, and Hermann Härtig. 2012. Measuring energy consumption for short code paths using RAPL.SIGMETRICS Perform. Eval. Rev.40, 3 (Jan. 2012), 13–17. doi:10.1145/2425248.2425252

work page doi:10.1145/2425248.2425252 2012
[14]

Halstead

Maurice H. Halstead. 1977.Elements of Software Science (Operating and program- ming systems series). Elsevier Science Inc., USA

1977
[15]

Abram Hindle. 2015. Green mining: a methodology of relating software change and configuration to power consumption.Empirical Softw. Engg.20, 2 (April 2015), 374–409. doi:10.1007/s10664-013-9276-6

work page doi:10.1007/s10664-013-9276-6 2015
[16]

M. G. Kendall. 1938. A New Measure of Rank Correlation.Biometrika30, 1/2 (1938), 81–93. http://www.jstor.org/stable/2332226

work page arXiv 1938
[17]

K. N. Khan, M. Hirki, T. Niemi, J. K. Nurminen, and Z. Ou. 2018. RAPL in Action: Experiences in Using RAPL for Power Measurements.ACM Transactions on Modeling and Performance Evaluation of Computing Systems3, 2 (2018), 1–26. doi:10.1145/3177754

work page doi:10.1145/3177754 2018
[18]

Ghani, and Khaironi Yatim Sharif

Ching Kin Keong, Koh Tieng Wei, Abdul Azim Abd. Ghani, and Khaironi Yatim Sharif. 2015. Toward using software metrics as indicator to measure power consumption of mobile application: A case study. In2015 9th Malaysian Software Engineering Conference (MySEC). 172–177. doi:10.1109/MySEC.2015.7475216

work page doi:10.1109/mysec.2015.7475216 2015
[19]

Sung Une Lee, Niroshinie Fernando, Kevin Lee, and Jean-Guy Schneider. 2024. A survey of energy concerns for software engineering.Journal of Systems and Software210 (2024), 111944. doi:10.1016/j.jss.2023.111944

work page doi:10.1016/j.jss.2023.111944 2024
[20]

Valentina Lenarduzzi, Nyyti Saarimäki, and Davide Taibi. 2020. Some SonarQube issues have a significant but small effect on faults and changes. A large-scale empirical study.Journal of Systems and Software170 (2020), 110750. doi:10.1016/ j.jss.2020.110750

work page arXiv 2020
[21]

T.J. McCabe. 1976. A Complexity Measure.IEEE Transactions on Software Engi- neeringSE-2, 4 (1976), 308–320. doi:10.1109/TSE.1976.233837

work page doi:10.1109/tse.1976.233837 1976
[22]

Nikolopoulos, and Bronis R

Lev Mukhanov, Dimitrios S. Nikolopoulos, and Bronis R. De Supinski. 2015. ALEA: Fine-Grain Energy Profiling with Basic Block Sampling. In2015 International Conference on Parallel Architecture and Compilation (PACT). 87–98. doi:10.1109/ PACT.2015.16

2015
[23]

Hira Noman, Naeem Ahmed Mahoto, Sania Bhatti, Hamad Ali Abosaq, Mana Saleh Al Reshan, and Asadullah Shaikh. 2022. An Exploratory Study of Software Sustainability at Early Stages of Software Development.Sustainability 14, 14 (2022). doi:10.3390/su14148596

work page doi:10.3390/su14148596 2022
[24]

Candy Pang, Abram Hindle, Bram Adams, and Ahmed E. Hassan. 2016. What Do Programmers Know about Software Energy Consumption?IEEE Software33, 3 (2016), 83–89. doi:10.1109/MS.2015.83

work page doi:10.1109/ms.2015.83 2016
[25]

Karl Pearson. 1896. Mathematical Contributions to the Theory of Evolution. III. Regression, Heredity, and Panmixia.Philosophical Transactions of the Royal Society of London Series A187 (Jan. 1896), 253–318. doi:10.1098/rsta.1896.0007

work page doi:10.1098/rsta.1896.0007
[26]

Gustavo Pinto and Fernando Castor. 2017. Energy efficiency: A new concern for application software developers.Commun. ACM60 (11 2017), 68–75. doi:10. 1145/3154384

2017
[27]

Saurabhsingh Rajput, Alexander Brandt, Vadim Elisseev, and Tushar Sharma
[28]

arXiv:2601.13345 [cs.SE] https://arxiv.org/abs/2601.13345

FlipFlop: A Static Analysis-based Energy Optimization Framework for GPU Kernels. arXiv:2601.13345 [cs.SE] https://arxiv.org/abs/2601.13345

work page arXiv
[29]

Saurabhsingh Rajput, Tim Widmayer, Ziyuan Shang, Maria Kechagia, Federica Sarro, and Tushar Sharma. 2024. Enhancing Energy-Awareness in Deep Learning through Fine-Grained Energy Measurement.ACM Trans. Softw. Eng. Methodol. 33, 8, Article 211 (Dec. 2024), 34 pages. doi:10.1145/3680470

work page doi:10.1145/3680470 2024
[30]

Pooja Rani, Jonas Zellweger, Veronika Kousadianos, Luis Cruz, Timo Kehrer, and Alberto Bacchelli. 2024. Energy Patterns for Web: An Exploratory Study. InProceedings of the 46th International Conference on Software Engineering: Soft- ware Engineering in Society(Lisbon, Portugal)(ICSE-SEIS’24). Association for Computing Machinery, New York, NY, USA, 12–22. ...

work page doi:10.1145/3639475.3640110 2024
[31]

Motaharul Islam

Nurzihan Fatema Reya, Abtahi Ahmed, Tashfia Rifa Zaman, and Md. Motaharul Islam. 2023. GreenPy: Evaluating Application-Level Energy Efficiency in Python for Green Computing.Annals of Emerging Technologies in Computing(2023). https://api.semanticscholar.org/CorpusID:259509821

2023
[32]

Ferreira, and Alexandra Mendes

Ana Ribeiro, João F. Ferreira, and Alexandra Mendes. 2021. EcoAndroid: An Android Studio Plugin for Developing Energy-Efficient Java Mobile Applications. In2021 IEEE 21st International Conference on Software Quality, Reliability and Security (QRS). 62–69. doi:10.1109/QRS54544.2021.00017

work page doi:10.1109/qrs54544.2021.00017 2021
[33]

Shivadharshan S, Akilesh P, Rajrupa Chattaraj, and Sridhar Chimalakonda. 2024. CPPJoules: An Energy Measurement Tool for C++. arXiv:2412.13555 [cs.SE] https://arxiv.org/abs/2412.13555

work page arXiv 2024
[34]

Marina Sokolova and Guy Lapalme. 2009. A systematic analysis of performance measures for classification tasks.Information Processing & Management45, 4 (2009), 427–437. doi:10.1016/j.ipm.2009.03.002

work page doi:10.1016/j.ipm.2009.03.002 2009
[35]

Spearman

C. Spearman. 1904. The Proof and Measurement of Association Between Two Things.American Journal of Psychology15 (1904), 88–103

1904
[36]

Weisong Sun, Chunrong Fang, Yun Miao, Yudu You, Mengzhe Yuan, Yuchen Chen, Quanjun Zhang, An Guo, Xiang Chen, Yang Liu, and Zhenyu Chen. 2023. Abstract Syntax Tree for Programming Language Understanding and Representation: How Far Are We? arXiv:2312.00413 [cs.SE] https://arxiv.org/abs/2312.00413

work page arXiv 2023
[37]

John W. Tukey. 1977.Exploratory data analysis. Addison-Wesley Pub. Co., Reading, Mass

1977
[38]

Roberto Verdecchia, Luis Cruz, June Sallou, Michelle Lin, James Wickenden, and Estelle Hotellier. 2022. Data-Centric Green AI An Exploratory Empirical Study. In2022 International Conference on ICT for Sustainability (ICT4S). IEEE, 35–45. doi:10.1109/ict4s55073.2022.00015

work page doi:10.1109/ict4s55073.2022.00015 2022
[39]

Fadi Wedyan, Rachael Morrison, and Osama Sam Abuomar. 2023. Integration and Unit Testing of Software Energy Consumption. In2023 Tenth International Conference on Software Defined Systems (SDS). 60–64. doi:10.1109/SDS59856.2023. 10329262

work page doi:10.1109/sds59856.2023 2023
[40]

N. Wirth. 1995. A plea for lean software.Computer28, 2 (1995), 64–68. doi:10. 1109/2.348001

1995
[41]

Ohlsson, Bjrn Regnell, and An- ders Wessln

Claes Wohlin, Per Runeson, Martin Hst, Magnus C. Ohlsson, Bjrn Regnell, and An- ders Wessln. 2012.Experimentation in Software Engineering. Springer Publishing Company, Incorporated

2012
[42]

Włodzimierz Wysocki, Ireneusz Miciuła, and Przemysław Plecka. 2025. Methods of Improving Software Energy Efficiency: A Systematic Literature Review and the Current State of Applied Methods in Practice.Electronics14, 7 (2025). doi:10. 3390/electronics14071331

2025
[43]

Thomas Zaragoza, Adel Noureddine, and Ernesto Exposito. 2025. A systematic mapping study on software-based feedback for energy consumption.Renewable and Sustainable Energy Reviews222 (2025), 115889. doi:10.1016/j.rser.2025.115889

work page doi:10.1016/j.rser.2025.115889 2025