Full development of 7B and 32B Olmo 3 models used 12.3 GWh datacenter energy and emitted 4,251 tCO2eq, with development overheads accounting for 82% of compute and reasoning models costing 17x more to post-train than instruction-tuned ones.
Holistically evaluating the environmental impact of creating language models. arxiv 2025
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 8verdicts
UNVERDICTED 8roles
background 1polarities
support 1representative citing papers
Game-theoretic modeling and difference-in-differences analysis using LLM releases show AI data center demand increases fossil generation, wholesale prices, and outages near data centers unless mitigated by behind-the-meter capacity.
An end-to-end energy measurement framework for LLM distillation pipelines reveals hidden teacher-side costs and yields selection guidelines plus an open-source harness.
Watt Counts supplies over 5,000 energy measurements across 50 LLMs and 10 GPUs and shows that hardware-aware selection can reduce server-scenario energy use by up to 70 percent with little effect on user experience.
The paper proposes a transparent proxy framework for estimating LLM inference and training environmental impacts from natural-language application descriptions.
A review of AI sustainability studies finds inconsistent life cycle definitions and predominant reliance on coarse CO2e proxies, with limited coverage of water, materials, and multi-impact assessments.
Survey of AIED 2025 papers shows widespread LLM use with minimal reporting of computational or environmental costs, paired with a proposed open-source measurement method and formula for frontier models.
A systematic review finds research on the sustainability of LLM-generated code to be limited, fragmented, and without accepted frameworks for measurement or benchmarking.
citing papers explorer
-
The Hidden Cost of Thinking: Energy Use and Environmental Impact of LMs Beyond Pretraining
Full development of 7B and 32B Olmo 3 models used 12.3 GWh datacenter energy and emitted 4,251 tCO2eq, with development overheads accounting for 82% of compute and reasoning models costing 17x more to post-train than instruction-tuned ones.
-
Certificates without Electrons? Theory and Evidence on Impacts from AI-Driven Power Demand
Game-theoretic modeling and difference-in-differences analysis using LLM releases show AI data center demand increases fossil generation, wholesale prices, and outages near data centers unless mitigated by behind-the-meter capacity.
-
Towards Resource-Efficient LLMs: End-to-End Energy Accounting of Distillation Pipelines
An end-to-end energy measurement framework for LLM distillation pipelines reveals hidden teacher-side costs and yields selection guidelines plus an open-source harness.
-
Watt Counts: Energy-Aware Benchmark for Sustainable LLM Inference on Heterogeneous GPU Architectures
Watt Counts supplies over 5,000 energy measurements across 50 LLMs and 10 GPUs and shows that hardware-aware selection can reduce server-scenario energy use by up to 70 percent with little effect on user experience.
-
Transparent Screening for LLM Inference and Training Impacts
The paper proposes a transparent proxy framework for estimating LLM inference and training environmental impacts from natural-language application descriptions.
-
From Cradle to Cloud: A Life Cycle Review of AI's Environmental Footprint
A review of AI sustainability studies finds inconsistent life cycle definitions and predominant reliance on coarse CO2e proxies, with limited coverage of water, materials, and multi-impact assessments.
-
The Environmental Cost of LLMs in AIED: Reporting and Practices
Survey of AIED 2025 papers shows widespread LLM use with minimal reporting of computational or environmental costs, paired with a proposed open-source measurement method and formula for frontier models.
-
Sustainable Code Generation Using Large Language Models: A Systematic Literature Review
A systematic review finds research on the sustainability of LLM-generated code to be limited, fragmented, and without accepted frameworks for measurement or benchmarking.