Evaluation of ML Resource Utilization Requires Model Life Cycle Assessment

Clara Na; Constantine Samaras; Emma Strubell; Jared Fernandez; Yonatan Bisk

arxiv: 2606.07632 · v1 · pith:YME3IG6Xnew · submitted 2026-05-31 · 💻 cs.LG

Evaluation of ML Resource Utilization Requires Model Life Cycle Assessment

Jared Fernandez , Clara Na , Yonatan Bisk , Constantine Samaras , Emma Strubell This is my paper

Pith reviewed 2026-06-28 17:24 UTC · model grok-4.3

classification 💻 cs.LG

keywords life cycle assessmentmachine learning efficiencyenergy consumptionenvironmental impactAI resource utilizationmodel pipeline

0 comments

The pith

Accounting for AI's full environmental costs requires life cycle assessment of the entire model pipeline.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that single training run or inference evaluations are insufficient for assessing AI efficiency due to increasing pipeline complexity. It proposes applying life cycle assessment to capture embodied hardware costs and all operational stages from development to deployment. This matters because accurate resource accounting is needed for researchers, developers, policymakers, and users to understand barriers to scaling AI systems. Without it, the true energy requirements and downstream impacts remain unaccounted for.

Core claim

Proper accounting of the energy requirements and environmental impact of AI systems requires life cycle assessment of the machine learning model development and deployment pipeline to incorporate embodied costs of physical computing hardware and operational costs in training and inference.

What carries the argument

Life cycle assessment frameworks applied to the full AI system pipeline, from hardware production through all stages of model development and deployment.

If this is right

Evaluations will include costs across the entire life cycle rather than isolated components.
Barriers to building systems at scale can be more accurately assessed.
Downstream impacts of AI systems will be better incorporated into efficiency metrics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

New data collection methods may be needed to apply LCA effectively to ML.
This could influence how infrastructure for AI is designed and reported.

Load-bearing premise

Life cycle assessment methods from other domains can be directly applied to ML pipelines without major new methodological development or unavailable data.

What would settle it

Empirical evidence showing that the total resource costs of an AI system are dominated by or accurately represented by a single training run or inference prediction, rendering full pipeline assessment unnecessary.

Figures

Figures reproduced from arXiv: 2606.07632 by Clara Na, Constantine Samaras, Emma Strubell, Jared Fernandez, Yonatan Bisk.

**Figure 1.** Figure 1: LCA enables aggregation across ML model development and deployment life cycles of increasing complexity. The pre- and post-training pipelines of modern LLMs (e.g. OLMo with the Tulu post training recipe Walsh et al. (2025); Lambert et al. (2025)) have significantly more stages than classical train-test settings; and a larger variety of methods for conducting inference (Welleck et al., 2024). total demands … view at source ↗

**Figure 2.** Figure 2: CO2e emissions of OLMo2 7b training and inference (Morrison et al., 2025; Walsh et al., 2025). Increasing inference efficiency via offline batching reduces the unit cost, as does amortization of embodied costs over model use. Decomposition of the resource use across life cycle stages enables identification of the significant issues (i.e. the life cycle stage which maximally contributes to total costs). 3.5… view at source ↗

read the original abstract

Proper accounting of the energy requirements and environmental impact of artificial intelligence (AI) systems is necessary for researchers, developers, policy makers, and users to assess the barriers to building systems at scale. With the growing complexity of pipelines and underlying infrastructure needed to develop and deploy AI systems, previous approaches for evaluating AI efficiency which focus on the costs of a single training run or an individual inference prediction are no longer sufficient. In this position paper, we enunciate the need for applying life cycle assessment to evaluate the costs of the machine learning model development and deployment pipeline to properly account for the required resources and downstream impact. Life cycle assessments enable the incorporation of costs across the full life cycle of an AI system and its underlying infrastructure, from the embodied costs associated with the physical computing hardware through the operational costs in training and inference.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a position paper arguing single-run metrics miss the full costs of ML pipelines and that life cycle assessment should replace them, but it provides no methods or data.

read the letter

Colleague,

The main thing to know is that this paper claims single training-run or inference metrics no longer capture AI resource use because pipelines have grown more complex, and it recommends life cycle assessment as the fix. It is a short position statement with no experiments, measurements, or derivations.

What it does well is identify a genuine gap. Current efficiency numbers often stop at one forward pass or one training job, yet real work includes data pipelines, repeated experiments, model versioning, and the hardware that gets built and discarded. Including embodied costs of servers and chips makes sense on paper, and the authors are right that policy and reporting standards will eventually need something broader.

The soft spots are straightforward. The paper asserts that existing LCA frameworks can be applied directly but gives no outline for ML-specific problems such as allocating shared cloud resources across many users, handling the cost of failed or exploratory runs, or sourcing the required inventory data at the right granularity. It also offers no comparison showing how much the numbers would actually change under LCA versus current practice. Without that, the recommendation stays general.

This piece is aimed at people already working on AI sustainability, corporate reporting, or hardware policy. A practitioner looking for a concrete protocol or dataset will not find one here.

I would send it to peer review. The underlying concern is real and timely, and referees could usefully press on the adaptation questions. It is not a finished method, but it is clear enough to start a discussion.

Referee Report

2 major / 0 minor

Summary. This position paper claims that due to growing complexity of ML pipelines and infrastructure, single training-run or single-inference metrics are no longer sufficient for evaluating AI energy use and environmental impact. It advocates applying life cycle assessment (LCA) frameworks to incorporate embodied hardware costs plus operational costs across the full model development and deployment pipeline.

Significance. If the position holds, it would encourage the community to move beyond narrow efficiency metrics toward holistic sustainability accounting for AI. The paper correctly flags a potential mismatch between current practice and pipeline reality, but supplies no new data, derivations, or empirical comparisons, so its contribution is awareness-raising rather than resolution of the identified gap.

major comments (2)

[Abstract] Abstract: the assertion that single-run metrics are 'no longer sufficient' rests solely on the unquantified claim of 'growing complexity of pipelines' with no supporting evidence, comparisons, or citations, which is load-bearing for the entire argument.
[Abstract] Abstract: the recommendation to apply LCA is made without any outline of ML-specific adaptations (e.g., iterative hyperparameter search, data provenance, model versioning, or shared-infrastructure attribution) or indication that required inventory data exist at the needed granularity; this directly engages the stress-test concern and is load-bearing for the proposed solution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our position paper. We address each major comment below, proposing revisions where the feedback identifies opportunities to strengthen the argument.

read point-by-point responses

Referee: [Abstract] Abstract: the assertion that single-run metrics are 'no longer sufficient' rests solely on the unquantified claim of 'growing complexity of pipelines' with no supporting evidence, comparisons, or citations, which is load-bearing for the entire argument.

Authors: We agree that the abstract would be strengthened by supporting citations or brief evidence for the claim of growing pipeline complexity. In the revised version we will add references to studies documenting increases in ML pipeline scale, iterative development practices, and infrastructure demands. revision: yes
Referee: [Abstract] Abstract: the recommendation to apply LCA is made without any outline of ML-specific adaptations (e.g., iterative hyperparameter search, data provenance, model versioning, or shared-infrastructure attribution) or indication that required inventory data exist at the needed granularity; this directly engages the stress-test concern and is load-bearing for the proposed solution.

Authors: As a position paper our intent is to advocate for the adoption of LCA rather than to deliver a complete implementation guide. We nevertheless accept that a high-level indication of adaptations would improve clarity. We will add a short paragraph outlining ML-specific considerations such as iterative hyperparameter tuning, data provenance tracking, and attribution in shared environments. Regarding inventory data, we will reference existing hardware embodied-cost databases while noting that fine-grained ML operational inventories are still developing. revision: partial

Circularity Check

0 steps flagged

No circularity; position paper contains no derivations or load-bearing self-references

full rationale

This is a position paper advocating broader use of life-cycle assessment for ML without any equations, fitted quantities, or mathematical derivations. The central argument—that single-run metrics are insufficient and LCA frameworks should be applied—rests on conceptual reasoning about pipeline complexity rather than any chain that reduces by construction to self-definition, fitted inputs renamed as predictions, or a self-citation whose validity depends on the present work. No enumerated circularity pattern is present, and the paper is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that LCA is both feasible and necessary for ML; no free parameters, new entities, or ad-hoc axioms are introduced.

axioms (1)

domain assumption Life cycle assessment methods from other domains can be applied to AI systems to capture embodied and operational costs across the full pipeline.
Invoked in the abstract when stating that LCA enables incorporation of costs from hardware through training and inference.

pith-pipeline@v0.9.1-grok · 5670 in / 1150 out tokens · 18760 ms · 2026-06-28T17:24:47.530812+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

208 extracted references · 16 canonical work pages · 1 internal anchor

[1]

Gpt-4 technical report , author =
[2]

Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve , author =
[3]

2010.15581 , archiveprefix =

The De-democratization of AI: Deep Learning and the Compute Divide in Artificial Intelligence Research , author =. 2010.15581 , archiveprefix =

arXiv 2010
[4]

The Theory of Parsing, Translation and Compiling , author =
[5]

Powering Intelligence: Analyzing Artificial Intelligence and Data Center Energy Consumption , author =
[6]

AI and Compute , author =
[7]

Journal of Machine Learning Research , publisher =

A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , author =. Journal of Machine Learning Research , publisher =
[8]

Scalable training of

Andrew, Galen and Gao, Jianfeng , year = 2007, booktitle =. Scalable training of

2007
[9]

Publications Manual , author =
[10]

Qwen technical report , author =
[11]

Amazon EC2 Update-Infl Instances with AWS Inferentia Chips for High Performance Cost-Effective Inferencing , author =
[12]

Compute-efficient deep learning: algorithmic trends and opportunities , author =. J. Mach. Learn. Res. , publisher =
[13]

Courty, V

Benoit Courty and Victor Schmidt and Sasha Luccioni and Goyal-Kamal and MarionCoutarel and Boris Feld and Jérémy Lecourt and LiamConnell and Amine Saboni and Inimaz and supatomic and Mathilde Léval and Luis Blanche and Alexis Cruveiller and ouminasara and Franklin Zhao and Aditya Joshi and Alexis Bogroff and Hugues de Lavoreille and Niko Laskaris and Edoa...

work page doi:10.5281/zenodo.11171501
[14]

Proceedings of the Ninth Workshop on Statistical Machine Translation , publisher =

Findings of the 2014 Workshop on Statistical Machine Translation , author =. Proceedings of the Ninth Workshop on Statistical Machine Translation , publisher =

2014
[15]

Life-cycle assessment of semiconductors , author =
[16]

2005.14165 , archiveprefix =

Language Models are Few-Shot Learners , author =. 2005.14165 , archiveprefix =

Pith/arXiv arXiv 2005
[17]

Journal of the Association for Computing Machinery , volume = 28, number = 1, pages =

Alternation , author =. Journal of the Association for Computing Machinery , volume = 28, number = 1, pages =
[18]

Accelerating large language model decoding with speculative sampling , author =
[19]

Efficient and Economic Large Language Model Inference with Attention Offloading , author =
[20]

Magicdec: Breaking the latency-throughput tradeoff for long context generation with speculative decoding , author =
[21]

The rising costs of training frontier AI models , author =
[22]

Environmental life-cycle assessment , author =
[23]

Life-cycle assessment: principles and practice , author =
[24]

2501.12948 , archiveprefix =

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning , author =. 2501.12948 , archiveprefix =

Pith/arXiv arXiv
[25]

International Conference on Learning Representations , url =

The Efficiency Misnomer , author =. International Conference on Learning Representations , url =
[26]

Proceedings of the 2022 ACM conference on fairness, accountability, and transparency , pages =

Measuring the carbon intensity of ai in cloud instances , author =. Proceedings of the 2022 ACM conference on fairness, accountability, and transparency , pages =

2022
[27]

The llama 3 herd of models , author =
[28]

arXiv preprint arXiv:2505.06727 , year =

Modeling PFAS in Semiconductor Manufacturing to Quantify Trade-offs in Energy Efficiency and Environmental Impact of Computing Systems , author =. arXiv preprint arXiv:2505.06727 , year =

arXiv
[29]

The Twelfth International Conference on Learning Representations , year =

LLMCarbon: Modeling the End-to-End Carbon Footprint of Large Language Models , author=. The Twelfth International Conference on Learning Representations , year =
[30]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages =

The Framework Tax: Disparities Between Inference Efficiency in NLP Research and Deployment , author =. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages =

2023
[31]

Findings of the Association for Computational Linguistics: EMNLP 2024 , pages =

Gradient Localization Improves Lifelong Pretraining of Language Models , author =. Findings of the Association for Computational Linguistics: EMNLP 2024 , pages =

2024
[32]

Transactions on Machine Learning Research , issn=

Efficient Hardware Scaling and Diminishing Returns in Large-Scale Training of Language Models , author=. Transactions on Machine Learning Research , issn=. 2025 , url=

2025
[33]

Energy Considerations of Large Language Model Inference and Efficiency Optimizations

Fernandez, Jared and Na, Clara and Tiwari, Vashisth and Bisk, Yonatan and Luccioni, Sasha and Strubell, Emma. Energy Considerations of Large Language Model Inference and Efficiency Optimizations. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.acl-long.1563

work page doi:10.18653/v1/2025.acl-long.1563 2025
[34]

Evaluating the Environmental Impact of Language Models with Life Cycle Assessment , author =
[35]

Patterns , volume =

The real climate and transformative impact of ICT: A critique of estimates, trends, and regulations , author =. Patterns , volume =. 2021 , publisher =

2021
[36]

Proceedings of the 44th International Conference on Software Engineering , pages =

Green ai: Do deep learning frameworks have different costs? , author =. Proceedings of the 44th International Conference on Software Engineering , pages =
[37]

ACS Sustainable Chemistry & Engineering , volume = 5, number = 7, pages =

Consequential Life Cycle Optimization: General Conceptual Framework and Application to Algal Renewable Diesel Production , author =. ACS Sustainable Chemistry & Engineering , volume = 5, number = 7, pages =. doi:10.1021/acssuschemeng.7b00631 , url =

work page doi:10.1021/acssuschemeng.7b00631
[38]

Alphabet plans massive capex hike, reports cloud revenue growth slowed , author =
[39]

Olmo: Accelerating the science of language models , author =
[40]

Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning , author =
[41]

2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA) , pages =

Chasing carbon: The elusive environmental footprint of computing , author =. 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA) , pages =

2021
[42]

Algorithms on Strings, Trees and Sequences , author =
[43]

The Unpaid Toll: Quantifying the Public Health Impact of AI , author =
[44]

Science , volume = 344, number = 6188, pages =

Emerging approaches, challenges and opportunities in life cycle assessment , author =. Science , volume = 344, number = 6188, pages =. doi:10.1126/science.1248361 , url =. https://www.science.org/doi/pdf/10.1126/science.1248361 , abstract =

work page doi:10.1126/science.1248361
[45]

Proceedings of the 36th International Conference on Neural Information Processing Systems , pages =

Training compute-optimal large language models , author =. Proceedings of the 36th International Conference on Neural Information Processing Systems , pages =
[46]

International Conference on Learning Representations , url =

The Curious Case of Neural Text Degeneration , author =. International Conference on Learning Representations , url =
[47]

International conference on machine learning , pages =

Parameter-efficient transfer learning for NLP , author =. International conference on machine learning , pages =. 2019 , organization =

2019
[48]

Edward J Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen-Zhu and Yuanzhi Li and Shean Wang and Lu Wang and Weizhu Chen , year = 2022, booktitle =. Lo

2022
[49]

Gpipe: Efficient training of giant neural networks using pipeline parallelism , author =
[50]

Findings of the Association for Computational Linguistics: ACL 2024 , publisher =

Prompt-Based Length Controlled Generation with Multiple Control Types , author =. Findings of the Association for Computational Linguistics: ACL 2024 , publisher =. doi:10.18653/v1/2024.findings-acl.63 , url =

work page doi:10.18653/v1/2024.findings-acl.63 2024
[51]

Proceedings of the 44th annual international symposium on computer architecture , pages =

In-datacenter performance analysis of a tensor processing unit , author =. Proceedings of the 44th annual international symposium on computer architecture , pages =
[52]

Nature Climate Change , publisher =

Aligning artificial intelligence with climate change mitigation , author =. Nature Climate Change , publisher =
[53]

Scaling laws for neural language models , author =
[54]

, author =

Our house is on fire: The climate emergency and computing's responsibility. , author =. Communications of the ACM , publisher =
[55]

Efficient Memory Management for Large Language Model Serving with PagedAttention , author =
[56]

T " ulu 3: Pushing frontiers in open language model post-training , author =
[57]

2411.15124 , archiveprefix =

Tulu 3: Pushing Frontiers in Open Language Model Post-Training , author =. 2411.15124 , archiveprefix =

Pith/arXiv arXiv
[58]

2306.16900 , archiveprefix =

Surveying (Dis)Parities and Concerns of Compute Hungry NLP Research , author =. 2306.16900 , archiveprefix =

arXiv
[59]

Forever Chemicals PFAS Global Impact and Activities, Cascading Consequences of Colossal Systems Failure: Long-Term Health Effects, Food-Systems, Eco-Systems , author =
[60]

Aws to offer nvidia’s t4 gpus for ai inferencing , author =
[61]

International Conference on Machine Learning , pages =

Fast inference from transformers via speculative decoding , author =. International Conference on Machine Learning , pages =
[62]

Communications of the ACM , volume=

Making ai less' thirsty' , author=. Communications of the ACM , volume=. 2025 , publisher=

2025
[63]

Proceedings of the VLDB Endowment , volume=

PyTorch distributed: experiences on accelerating data parallel training , author=. Proceedings of the VLDB Endowment , volume=. 2020 , publisher=

2020
[64]

Making AI Less “Thirsty”: Uncovering and Addressing the Secret Water Footprint of AI Models , author =
[65]

ACM SIGENERGY Energy Informatics Review , volume =

Carbon in Motion: Characterizing Open-Sora on the Sustainability of Generative AI for Video Generation , author =. ACM SIGENERGY Energy Informatics Review , volume =. 2024 , publisher =

2024
[66]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages =

Sprout: Green Generative AI with Carbon-Efficient LLM Inference , author =. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages =

2024
[67]

arXiv preprint arXiv:2502.05043 , year =

Ecoserve: Designing carbon-aware ai inference systems , author =. arXiv preprint arXiv:2502.05043 , year =

arXiv
[68]

Optimizing Speculative Decoding for Serving Large Language Models Using Goodput , author =
[69]

The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources , author =
[70]

Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency , pages=

From efficiency gains to rebound effects: The problem of jevons' paradox in AI's polarized environmental debate , author=. Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency , pages=

2025
[71]

Journal of Machine Learning Research , volume = 24, number = 253, pages =

Estimating the carbon footprint of bloom, a 176b parameter language model , author =. Journal of Machine Learning Research , volume = 24, number = 253, pages =
[72]

Nature , publisher =

Light bulbs have energy ratings—so why can’t AI chatbots? , author =. Nature , publisher =
[73]

Proceedings of the 2024 ACM conference on fairness, accountability, and transparency , pages =

Power hungry processing: Watts driving the cost of ai deployment? , author =. Proceedings of the 2024 ACM conference on fairness, accountability, and transparency , pages =

2024
[74]

Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies , pages =

Learning word vectors for sentiment analysis , author =. Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies , pages =
[75]

Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies , publisher =

Learning Word Vectors for Sentiment Analysis , author =. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies , publisher =
[76]

A call for research on storage emissions , author =
[77]

How data centers and the energy sector can sate AI’s hunger for power , author =
[78]

1609.07843 , archiveprefix =

Pointer Sentinel Mixture Models , author =. 1609.07843 , archiveprefix =

Pith/arXiv arXiv
[79]

Proceedings of the conference on fairness, accountability, and transparency , pages =

Model cards for model reporting , author =. Proceedings of the conference on fairness, accountability, and transparency , pages =
[80]

2412.17376 , archiveprefix =

How Green Can AI Be? A Study of Trends in Machine Learning Environmental Impacts , author =. 2412.17376 , archiveprefix =

arXiv

Showing first 80 references.

[1] [1]

Gpt-4 technical report , author =

[2] [2]

Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve , author =

[3] [3]

2010.15581 , archiveprefix =

The De-democratization of AI: Deep Learning and the Compute Divide in Artificial Intelligence Research , author =. 2010.15581 , archiveprefix =

arXiv 2010

[4] [4]

The Theory of Parsing, Translation and Compiling , author =

[5] [5]

Powering Intelligence: Analyzing Artificial Intelligence and Data Center Energy Consumption , author =

[6] [6]

AI and Compute , author =

[7] [7]

Journal of Machine Learning Research , publisher =

A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , author =. Journal of Machine Learning Research , publisher =

[8] [8]

Scalable training of

Andrew, Galen and Gao, Jianfeng , year = 2007, booktitle =. Scalable training of

2007

[9] [9]

Publications Manual , author =

[10] [10]

Qwen technical report , author =

[11] [11]

Amazon EC2 Update-Infl Instances with AWS Inferentia Chips for High Performance Cost-Effective Inferencing , author =

[12] [12]

Compute-efficient deep learning: algorithmic trends and opportunities , author =. J. Mach. Learn. Res. , publisher =

[13] [13]

Courty, V

Benoit Courty and Victor Schmidt and Sasha Luccioni and Goyal-Kamal and MarionCoutarel and Boris Feld and Jérémy Lecourt and LiamConnell and Amine Saboni and Inimaz and supatomic and Mathilde Léval and Luis Blanche and Alexis Cruveiller and ouminasara and Franklin Zhao and Aditya Joshi and Alexis Bogroff and Hugues de Lavoreille and Niko Laskaris and Edoa...

work page doi:10.5281/zenodo.11171501

[14] [14]

Proceedings of the Ninth Workshop on Statistical Machine Translation , publisher =

Findings of the 2014 Workshop on Statistical Machine Translation , author =. Proceedings of the Ninth Workshop on Statistical Machine Translation , publisher =

2014

[15] [15]

Life-cycle assessment of semiconductors , author =

[16] [16]

2005.14165 , archiveprefix =

Language Models are Few-Shot Learners , author =. 2005.14165 , archiveprefix =

Pith/arXiv arXiv 2005

[17] [17]

Journal of the Association for Computing Machinery , volume = 28, number = 1, pages =

Alternation , author =. Journal of the Association for Computing Machinery , volume = 28, number = 1, pages =

[18] [18]

Accelerating large language model decoding with speculative sampling , author =

[19] [19]

Efficient and Economic Large Language Model Inference with Attention Offloading , author =

[20] [20]

Magicdec: Breaking the latency-throughput tradeoff for long context generation with speculative decoding , author =

[21] [21]

The rising costs of training frontier AI models , author =

[22] [22]

Environmental life-cycle assessment , author =

[23] [23]

Life-cycle assessment: principles and practice , author =

[24] [24]

2501.12948 , archiveprefix =

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning , author =. 2501.12948 , archiveprefix =

Pith/arXiv arXiv

[25] [25]

International Conference on Learning Representations , url =

The Efficiency Misnomer , author =. International Conference on Learning Representations , url =

[26] [26]

Proceedings of the 2022 ACM conference on fairness, accountability, and transparency , pages =

Measuring the carbon intensity of ai in cloud instances , author =. Proceedings of the 2022 ACM conference on fairness, accountability, and transparency , pages =

2022

[27] [27]

The llama 3 herd of models , author =

[28] [28]

arXiv preprint arXiv:2505.06727 , year =

Modeling PFAS in Semiconductor Manufacturing to Quantify Trade-offs in Energy Efficiency and Environmental Impact of Computing Systems , author =. arXiv preprint arXiv:2505.06727 , year =

arXiv

[29] [29]

The Twelfth International Conference on Learning Representations , year =

LLMCarbon: Modeling the End-to-End Carbon Footprint of Large Language Models , author=. The Twelfth International Conference on Learning Representations , year =

[30] [30]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages =

The Framework Tax: Disparities Between Inference Efficiency in NLP Research and Deployment , author =. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages =

2023

[31] [31]

Findings of the Association for Computational Linguistics: EMNLP 2024 , pages =

Gradient Localization Improves Lifelong Pretraining of Language Models , author =. Findings of the Association for Computational Linguistics: EMNLP 2024 , pages =

2024

[32] [32]

Transactions on Machine Learning Research , issn=

Efficient Hardware Scaling and Diminishing Returns in Large-Scale Training of Language Models , author=. Transactions on Machine Learning Research , issn=. 2025 , url=

2025

[33] [33]

Energy Considerations of Large Language Model Inference and Efficiency Optimizations

Fernandez, Jared and Na, Clara and Tiwari, Vashisth and Bisk, Yonatan and Luccioni, Sasha and Strubell, Emma. Energy Considerations of Large Language Model Inference and Efficiency Optimizations. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.acl-long.1563

work page doi:10.18653/v1/2025.acl-long.1563 2025

[34] [34]

Evaluating the Environmental Impact of Language Models with Life Cycle Assessment , author =

[35] [35]

Patterns , volume =

The real climate and transformative impact of ICT: A critique of estimates, trends, and regulations , author =. Patterns , volume =. 2021 , publisher =

2021

[36] [36]

Proceedings of the 44th International Conference on Software Engineering , pages =

Green ai: Do deep learning frameworks have different costs? , author =. Proceedings of the 44th International Conference on Software Engineering , pages =

[37] [37]

ACS Sustainable Chemistry & Engineering , volume = 5, number = 7, pages =

Consequential Life Cycle Optimization: General Conceptual Framework and Application to Algal Renewable Diesel Production , author =. ACS Sustainable Chemistry & Engineering , volume = 5, number = 7, pages =. doi:10.1021/acssuschemeng.7b00631 , url =

work page doi:10.1021/acssuschemeng.7b00631

[38] [38]

Alphabet plans massive capex hike, reports cloud revenue growth slowed , author =

[39] [39]

Olmo: Accelerating the science of language models , author =

[40] [40]

Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning , author =

[41] [41]

2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA) , pages =

Chasing carbon: The elusive environmental footprint of computing , author =. 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA) , pages =

2021

[42] [42]

Algorithms on Strings, Trees and Sequences , author =

[43] [43]

The Unpaid Toll: Quantifying the Public Health Impact of AI , author =

[44] [44]

Science , volume = 344, number = 6188, pages =

Emerging approaches, challenges and opportunities in life cycle assessment , author =. Science , volume = 344, number = 6188, pages =. doi:10.1126/science.1248361 , url =. https://www.science.org/doi/pdf/10.1126/science.1248361 , abstract =

work page doi:10.1126/science.1248361

[45] [45]

Proceedings of the 36th International Conference on Neural Information Processing Systems , pages =

Training compute-optimal large language models , author =. Proceedings of the 36th International Conference on Neural Information Processing Systems , pages =

[46] [46]

International Conference on Learning Representations , url =

The Curious Case of Neural Text Degeneration , author =. International Conference on Learning Representations , url =

[47] [47]

International conference on machine learning , pages =

Parameter-efficient transfer learning for NLP , author =. International conference on machine learning , pages =. 2019 , organization =

2019

[48] [48]

Edward J Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen-Zhu and Yuanzhi Li and Shean Wang and Lu Wang and Weizhu Chen , year = 2022, booktitle =. Lo

2022

[49] [49]

Gpipe: Efficient training of giant neural networks using pipeline parallelism , author =

[50] [50]

Findings of the Association for Computational Linguistics: ACL 2024 , publisher =

Prompt-Based Length Controlled Generation with Multiple Control Types , author =. Findings of the Association for Computational Linguistics: ACL 2024 , publisher =. doi:10.18653/v1/2024.findings-acl.63 , url =

work page doi:10.18653/v1/2024.findings-acl.63 2024

[51] [51]

Proceedings of the 44th annual international symposium on computer architecture , pages =

In-datacenter performance analysis of a tensor processing unit , author =. Proceedings of the 44th annual international symposium on computer architecture , pages =

[52] [52]

Nature Climate Change , publisher =

Aligning artificial intelligence with climate change mitigation , author =. Nature Climate Change , publisher =

[53] [53]

Scaling laws for neural language models , author =

[54] [54]

, author =

Our house is on fire: The climate emergency and computing's responsibility. , author =. Communications of the ACM , publisher =

[55] [55]

Efficient Memory Management for Large Language Model Serving with PagedAttention , author =

[56] [56]

T " ulu 3: Pushing frontiers in open language model post-training , author =

[57] [57]

2411.15124 , archiveprefix =

Tulu 3: Pushing Frontiers in Open Language Model Post-Training , author =. 2411.15124 , archiveprefix =

Pith/arXiv arXiv

[58] [58]

2306.16900 , archiveprefix =

Surveying (Dis)Parities and Concerns of Compute Hungry NLP Research , author =. 2306.16900 , archiveprefix =

arXiv

[59] [59]

Forever Chemicals PFAS Global Impact and Activities, Cascading Consequences of Colossal Systems Failure: Long-Term Health Effects, Food-Systems, Eco-Systems , author =

[60] [60]

Aws to offer nvidia’s t4 gpus for ai inferencing , author =

[61] [61]

International Conference on Machine Learning , pages =

Fast inference from transformers via speculative decoding , author =. International Conference on Machine Learning , pages =

[62] [62]

Communications of the ACM , volume=

Making ai less' thirsty' , author=. Communications of the ACM , volume=. 2025 , publisher=

2025

[63] [63]

Proceedings of the VLDB Endowment , volume=

PyTorch distributed: experiences on accelerating data parallel training , author=. Proceedings of the VLDB Endowment , volume=. 2020 , publisher=

2020

[64] [64]

Making AI Less “Thirsty”: Uncovering and Addressing the Secret Water Footprint of AI Models , author =

[65] [65]

ACM SIGENERGY Energy Informatics Review , volume =

Carbon in Motion: Characterizing Open-Sora on the Sustainability of Generative AI for Video Generation , author =. ACM SIGENERGY Energy Informatics Review , volume =. 2024 , publisher =

2024

[66] [66]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages =

Sprout: Green Generative AI with Carbon-Efficient LLM Inference , author =. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages =

2024

[67] [67]

arXiv preprint arXiv:2502.05043 , year =

Ecoserve: Designing carbon-aware ai inference systems , author =. arXiv preprint arXiv:2502.05043 , year =

arXiv

[68] [68]

Optimizing Speculative Decoding for Serving Large Language Models Using Goodput , author =

[69] [69]

The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources , author =

[70] [70]

Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency , pages=

From efficiency gains to rebound effects: The problem of jevons' paradox in AI's polarized environmental debate , author=. Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency , pages=

2025

[71] [71]

Journal of Machine Learning Research , volume = 24, number = 253, pages =

Estimating the carbon footprint of bloom, a 176b parameter language model , author =. Journal of Machine Learning Research , volume = 24, number = 253, pages =

[72] [72]

Nature , publisher =

Light bulbs have energy ratings—so why can’t AI chatbots? , author =. Nature , publisher =

[73] [73]

Proceedings of the 2024 ACM conference on fairness, accountability, and transparency , pages =

Power hungry processing: Watts driving the cost of ai deployment? , author =. Proceedings of the 2024 ACM conference on fairness, accountability, and transparency , pages =

2024

[74] [74]

Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies , pages =

Learning word vectors for sentiment analysis , author =. Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies , pages =

[75] [75]

Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies , publisher =

Learning Word Vectors for Sentiment Analysis , author =. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies , publisher =

[76] [76]

A call for research on storage emissions , author =

[77] [77]

How data centers and the energy sector can sate AI’s hunger for power , author =

[78] [78]

1609.07843 , archiveprefix =

Pointer Sentinel Mixture Models , author =. 1609.07843 , archiveprefix =

Pith/arXiv arXiv

[79] [79]

Proceedings of the conference on fairness, accountability, and transparency , pages =

Model cards for model reporting , author =. Proceedings of the conference on fairness, accountability, and transparency , pages =

[80] [80]

2412.17376 , archiveprefix =

How Green Can AI Be? A Study of Trends in Machine Learning Environmental Impacts , author =. 2412.17376 , archiveprefix =

arXiv