The Environmental Cost of LLMs in AIED: Reporting and Practices

Aditi Haiman; Andr\'e Helgert; B\"usra Yapici; Daniel Flood; Lachlan McGinness; Luca H\"ackert; Lukas Erle; Sabrina C. Eimler

arxiv: 2606.11215 · v1 · pith:UX425VKBnew · submitted 2026-05-03 · 💻 cs.CY · cs.AI

The Environmental Cost of LLMs in AIED: Reporting and Practices

Sabrina C. Eimler , Lukas Erle , Daniel Flood , Aditi Haiman , Luca H\"ackert , Andr\'e Helgert , Lachlan McGinness , B\"usra Yapici This is my paper

Pith reviewed 2026-07-01 00:50 UTC · model grok-4.3

classification 💻 cs.CY cs.AI

keywords Large Language ModelsAI in EducationEnvironmental ImpactCarbon FootprintReporting PracticesComputational CostMachine Learning

0 comments

The pith

Most AIED projects use LLMs but few report computational resources and almost none discuss environmental impacts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reviews every submission to the AIED 2025 conference proceedings and finds that LLMs appear in most projects. Reporting of the computational resources consumed remains rare, and discussion of environmental impacts as an ethical matter is almost nonexistent. To close the gap, the authors supply open-source software that measures carbon footprints on both local machines and cloud hardware. They also give a simple formula that estimates the computational expense of frontier LLMs even when the exact parameter count is unknown.

Core claim

The central claim is that the absence of standardized measurement and reporting procedures leaves the computational and environmental costs of LLMs hidden in AIED work; supplying accessible software for carbon-footprint tracking and a parameter-light estimation formula will let researchers document these costs routinely and treat them as ethical concerns.

What carries the argument

An open-source measurement method that combines software for local and cloud carbon-footprint tracking with a formula for estimating LLM computational expense without known parameter counts.

If this is right

Researchers gain concrete software that calculates carbon emissions for both local hardware and cloud runs.
A formula allows cost estimates for large models when the number of parameters is not published.
Systematic reporting of these costs becomes feasible for any ML-based AIED system.
Environmental impact can be treated as a standard ethical consideration in future AIED studies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the method spreads, other AI conferences may adopt similar reporting norms.
Repeated use of the tools could identify which common AIED tasks carry the largest hidden environmental loads.
Wider availability of the measurements might encourage creation of smaller, lower-cost models specifically for education settings.

Load-bearing premise

That the main reason for low reporting rates is the lack of easy measurement tools, and that supplying those tools will produce noticeably higher rates of transparent reporting.

What would settle it

A follow-up review of AIED conference papers published after the tools are released that shows no rise in the percentage of papers reporting computational resources or environmental impacts.

Figures

Figures reproduced from arXiv: 2606.11215 by Aditi Haiman, Andr\'e Helgert, B\"usra Yapici, Daniel Flood, Lachlan McGinness, Luca H\"ackert, Lukas Erle, Sabrina C. Eimler.

**Figure 1.** Figure 1: Diagram of decoder-only transformer architecture demonstrating the number of parameters and FLOPs required for the first token generated by a Large Language Model after receiving input context. Grey arrows indicate the matrix dimensions that can be used to find the number of learned parameters within the model. Blue arrows indicate the number of FLOPs required to generate the first token [PITH_FULL_IMAGE:… view at source ↗

read the original abstract

Large Language Model (LLM) usage in recent years has become increasingly widespread in the Artificial Intelligence in Education (AIED) community. While LLMs offer unique avenues for learners and educators, using LLMs comes with computational and environmental costs. These costs are mostly hidden due to a lack of standardised procedures to measure and report these impacts. To address this gap, we first conducted a literature review of all papers published as part of the AIED 2025 conference proceedings, determining if and how computational or environmental costs of LLMs are reported. Most projects use LLMs, but few report computational resources used and almost none discuss environmental impacts of LLMs as an ethical concern. To address this lack of standardised reporting practices, we propose an open-source method for systematically measuring and reporting the computational expense of LLMs and environmental impact of running Machine Learning (ML) AIED systems. We provide software solutions to measure the carbon footprint for both local and cloud based hardware. We also provide an easy-to-use formula to calculate the computational expense of frontier LLMs even when the exact number of parameters is not known. Overall, we hope to motivate colleagues to use our method to strive for more transparent reporting of hidden costs of using LLMs in the AIED community.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The survey of AIED 2025 proceedings is impossible given the June 2024 preprint date, so the quantitative claims about reporting gaps have no empirical support.

read the letter

The core problem is straightforward. The paper states it reviewed all papers from the AIED 2025 conference proceedings to establish that most projects use LLMs but almost none report environmental costs. The arXiv identifier places the preprint in June 2024. Standard conference timing puts AIED 2025 in 2025, so those proceedings did not exist when the review was supposedly done. The counts that drive the motivation for the rest of the work are therefore unsupported.

What the paper actually contributes is a set of practical tools: open-source software to measure carbon footprints for local and cloud hardware, plus a formula for estimating computational cost on frontier models when the exact parameter count is unknown. These pieces address a real operational need in the AIED community and could be used independently of the survey.

The soft spot is that the entire framing depends on the survey results to demonstrate a widespread reporting failure and to justify the call for standardized procedures. Without valid counts, it is unclear how large the gap actually is or whether the absence of standards is the main cause. The paper does not test alternative explanations such as lack of awareness or differing priorities.

This work is aimed at AIED researchers who want concrete ways to track LLM energy use. A reader could extract the software and formula for their own projects, but the paper as written does not supply reliable evidence on current practices. It does not deserve peer review until the survey section is corrected or removed.

Referee Report

1 major / 0 minor

Summary. The manuscript reports a literature review of all papers in the AIED 2025 conference proceedings, finding that most projects use LLMs but few report computational resources and almost none discuss environmental impacts as an ethical concern. It proposes an open-source method with software tools to measure carbon footprint for local and cloud hardware, plus a formula for estimating computational expense of frontier LLMs when the parameter count is unknown.

Significance. If the reporting-gap findings hold, the work could increase awareness of hidden environmental costs in AIED. The open-source software solutions for carbon measurement and the formula for unknown-parameter cases are concrete, adoptable contributions that support reproducible reporting practices.

major comments (1)

[Abstract] Abstract: The claim of having 'conducted a literature review of all papers published as part of the AIED 2025 conference proceedings' is inconsistent with a June 2024 arXiv submission (arXiv:2606.11215). AIED 2025 proceedings would not have existed, so the quantitative statements ('Most projects use LLMs, but few report computational resources used and almost none discuss environmental impacts') lack an empirical basis. This directly undercuts the motivation for the proposed measurement tools and formula.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and for highlighting this important inconsistency. We address the comment directly below.

read point-by-point responses

Referee: [Abstract] Abstract: The claim of having 'conducted a literature review of all papers published as part of the AIED 2025 conference proceedings' is inconsistent with a June 2024 arXiv submission (arXiv:2606.11215). AIED 2025 proceedings would not have existed, so the quantitative statements ('Most projects use LLMs, but few report computational resources used and almost none discuss environmental impacts') lack an empirical basis. This directly undercuts the motivation for the proposed measurement tools and formula.

Authors: We agree that the stated claim is inconsistent with the June 2024 submission date. This is an error in the manuscript: the literature review was performed on the AIED 2024 conference proceedings (available prior to submission), and the year '2025' was used by mistake. We will revise the abstract and all other references throughout the paper to 'AIED 2024'. The quantitative findings on LLM usage and reporting practices are based on the 2024 proceedings and remain unchanged. The motivation for the measurement tools and formula is unaffected, as the identified reporting gap is still present in the corrected data. revision: yes

Circularity Check

0 steps flagged

No circularity; descriptive survey with independent empirical basis

full rationale

The paper conducts a literature review of AIED 2025 proceedings and proposes open-source measurement tools plus a formula for computational expense. No equations, fitted parameters, self-citations, or derivations appear in the provided text. The central claims rest on external counts from the proceedings review rather than reducing to self-referential inputs or ansatzes. This matches the default expectation of a non-circular descriptive work; any concerns about review timing or scope fall under empirical validity, not circularity per the analysis rules.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is a literature survey and methodological proposal with no mathematical derivations, fitted parameters, background axioms, or new postulated entities.

pith-pipeline@v0.9.1-grok · 5785 in / 1165 out tokens · 53114 ms · 2026-07-01T00:50:00.899747+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

24 extracted references · 6 canonical work pages · 3 internal anchors

[1]

In: Advances in Neural Information Processing Systems, vol

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is All you Need. In: Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017)

2017
[2]

In: Advances in Neural Information Processing Systems, vol

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Nee- lakantan, A., Shyam, P., Sastry, G., Askell, A., et al.: Language Models are Few- Shot Learners. In: Advances in Neural Information Processing Systems, vol. 33, pp. 1877–1901. Curran Associates, Inc. (2020)

1901
[3]

Zenodo (2026)

Courty, B., Schmidt, V., et al.: mlco2/codecarbon: v3.2.3 (v3.2.3). Zenodo (2026). https://doi.org/10.5281/zenodo.18731928

work page doi:10.5281/zenodo.18731928 2026
[4]

https://docs.codecarbon.io/

CodeCarbon Contributors: CodeCarbon Documentation. https://docs.codecarbon.io/. Accessed 25 Feb 2026

2026
[5]

In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp

Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q., Salakhutdinov, R.: Transformer- XL: Attentive Language Models beyond a Fixed-Length Context. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2978–2988. (2019)

2019
[6]

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

DeepSeek-AI: DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of- Experts Language Model. arXiv:2405.04434 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[7]

Our World in Data (2026)

Ember: Lifecycle carbon intensity of electricity generation. Our World in Data (2026). Available at: https://ourworldindata.org/energy. Accessed 3 Mar 2026

2026
[8]

In: NeurIPS 2023 Workshop on Tackling Climate Change with Machine Learning (2023)

Faiz, A., Kaneda, S., Wang, R., Osi, R.C., Sharma, P., Chen, F., Jiang, L.: LLM- Carbon: Modeling the End-to-End Carbon Footprint of Large Language Models. In: NeurIPS 2023 Workshop on Tackling Climate Change with Machine Learning (2023)

2023
[9]

Journal of Machine Learning Research, vol

Henderson, P., Hu, J., Romoff, J., Brunskill, E., Jurafsky, D., Pineau, J.: Towards the systematic reporting of the energy and carbon footprints of machine learning. Journal of Machine Learning Research, vol. 21, no. 248, pp. 10039–10081 (2020)

2020
[10]

Holmes, W., Porayska-Pomsta, K., Holstein, K., Sutherland, E., Baker, T., Shum, S. B., ... & Koedinger, K. R. (2022). Ethics of AI in education: Towards a community-wide framework. International Journal of Artificial Intelligence in Edu- cation, 32(3), 504-526

2022
[11]

Scaling Laws for Neural Language Models

Kaplan, J., McCandlish, S., Henighan, T., Brown, T.B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., Amodei, D.: Scaling Laws for Neural Language Models. arXiv preprint arXiv:2001.08361 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2001
[12]

Available at: https://archive.ics.uci.edu

Kelly, M., Longjohn, R., Nottingham, K.: The UCI Machine Learning Repository. Available at: https://archive.ics.uci.edu. Accessed 27 Jan 2026

2026
[13]

In: 2025 IEEE 25th International Symposium on Cluster, Cloud and Internet Computing Work- shops (CCGridW)

Kocher, N., Wassermann, C., Hennig, L., Seng, J., Hoos, H., Kersting, K.: Guide- lines for the Quality Assessment of Energy-Aware NAS Benchmarks. In: 2025 IEEE 25th International Symposium on Cluster, Cloud and Internet Computing Work- shops (CCGridW)

2025
[14]

In: 6th International Con- ference on Learning Representations (ICLR 2018)

Liu, P.J., Saleh, M., Pot, E., Goodrich, B., Sepassi, R., Kaiser, L., Shazeer, N.: Generating Wikipedia by Summarizing Long Sequences. In: 6th International Con- ference on Learning Representations (ICLR 2018). 14 S. Eimler et al

2018
[15]

(2019) Energy usage reports: Environmental awareness as part of algorithmic accountability

Lottick, K., Susai, S., Friedler, S.A., Wilson, J.P. (2019) Energy usage reports: Environmental awareness as part of algorithmic accountability. arXiv preprint arXiv:1911.08354

work page arXiv 2019
[16]

Luccioni, S., Jernite, Y., Strubell, E.: Power Hungry Processing: Watts Driving the Cost of AI Deployment? In: Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’24), pp. 85–99. ACM (2024)

2024
[17]

21, 020155

McGinness, L., Baumgartner, P.: Can Large Language Models Correctly Inter- pret Equations with Errors? Physical Review Physics Education Research, vol. 21, 020155
[18]

Morrison, J., Na, C., Fernandez, J., Dettmers, T., Strubell, E., & Dodge, J. (2025). Holistically evaluating the environmental impact of creating language models. arXiv preprint arXiv:2503.05804

work page arXiv 2025
[19]

OpenAI: GPT-4 Technical Report. Tech. rep., OpenAI (2023). https://api.semanticscholar.org/CorpusID:257532815

2023
[20]

Rismanchian, S., & Doroudi, S. (2025). The evolution of research on AI and educa- tion across four decades: Insights from the AIxEd framework. International Journal of Artificial Intelligence in Education, 35(5), 2797-2820

2025
[21]

Neurocomputing, vol

Su, J., Ahmed, M., Lu, Y., Pan, S., Bo, W., Liu, Y.: RoFormer: Enhanced trans- former with Rotary Position Embedding. Neurocomputing, vol. 568, Article 127063. Elsevier (2024)

2024
[22]

In: Proceedings of the 37th International Conference on Machine Learning (ICML 2020), PMLR 119, pp

Xiong, R., Yang, Y., He, D., Zheng, K., Zheng, S., Xing, C., Zhang, H., Lan, Y., Wang, L., Liu, T.Y.: On Layer Normalization in the Transformer Architecture. In: Proceedings of the 37th International Conference on Machine Learning (ICML 2020), PMLR 119, pp. 10524–10533 (2020)

2020
[23]

Qwen3 Technical Report

Yang, A., Li, A., Yang, B., Zhang, B., Cui, Z., Zhang, Z., Zhou, Z., Qiu, Z.: Qwen3 Technical Report. arXiv preprint arXiv:2505.09388 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[24]

In: Advances in Neural Information Processing Systems, vol

Zhang, B., Sennrich, R.: Root mean square layer normalization. In: Advances in Neural Information Processing Systems, vol. 32, pp. 12360–12371. Curran Asso- ciates, Inc. (2019)

2019

[1] [1]

In: Advances in Neural Information Processing Systems, vol

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is All you Need. In: Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017)

2017

[2] [2]

In: Advances in Neural Information Processing Systems, vol

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Nee- lakantan, A., Shyam, P., Sastry, G., Askell, A., et al.: Language Models are Few- Shot Learners. In: Advances in Neural Information Processing Systems, vol. 33, pp. 1877–1901. Curran Associates, Inc. (2020)

1901

[3] [3]

Zenodo (2026)

Courty, B., Schmidt, V., et al.: mlco2/codecarbon: v3.2.3 (v3.2.3). Zenodo (2026). https://doi.org/10.5281/zenodo.18731928

work page doi:10.5281/zenodo.18731928 2026

[4] [4]

https://docs.codecarbon.io/

CodeCarbon Contributors: CodeCarbon Documentation. https://docs.codecarbon.io/. Accessed 25 Feb 2026

2026

[5] [5]

In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp

Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q., Salakhutdinov, R.: Transformer- XL: Attentive Language Models beyond a Fixed-Length Context. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2978–2988. (2019)

2019

[6] [6]

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

DeepSeek-AI: DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of- Experts Language Model. arXiv:2405.04434 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[7] [7]

Our World in Data (2026)

Ember: Lifecycle carbon intensity of electricity generation. Our World in Data (2026). Available at: https://ourworldindata.org/energy. Accessed 3 Mar 2026

2026

[8] [8]

In: NeurIPS 2023 Workshop on Tackling Climate Change with Machine Learning (2023)

Faiz, A., Kaneda, S., Wang, R., Osi, R.C., Sharma, P., Chen, F., Jiang, L.: LLM- Carbon: Modeling the End-to-End Carbon Footprint of Large Language Models. In: NeurIPS 2023 Workshop on Tackling Climate Change with Machine Learning (2023)

2023

[9] [9]

Journal of Machine Learning Research, vol

Henderson, P., Hu, J., Romoff, J., Brunskill, E., Jurafsky, D., Pineau, J.: Towards the systematic reporting of the energy and carbon footprints of machine learning. Journal of Machine Learning Research, vol. 21, no. 248, pp. 10039–10081 (2020)

2020

[10] [10]

Holmes, W., Porayska-Pomsta, K., Holstein, K., Sutherland, E., Baker, T., Shum, S. B., ... & Koedinger, K. R. (2022). Ethics of AI in education: Towards a community-wide framework. International Journal of Artificial Intelligence in Edu- cation, 32(3), 504-526

2022

[11] [11]

Scaling Laws for Neural Language Models

Kaplan, J., McCandlish, S., Henighan, T., Brown, T.B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., Amodei, D.: Scaling Laws for Neural Language Models. arXiv preprint arXiv:2001.08361 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2001

[12] [12]

Available at: https://archive.ics.uci.edu

Kelly, M., Longjohn, R., Nottingham, K.: The UCI Machine Learning Repository. Available at: https://archive.ics.uci.edu. Accessed 27 Jan 2026

2026

[13] [13]

In: 2025 IEEE 25th International Symposium on Cluster, Cloud and Internet Computing Work- shops (CCGridW)

Kocher, N., Wassermann, C., Hennig, L., Seng, J., Hoos, H., Kersting, K.: Guide- lines for the Quality Assessment of Energy-Aware NAS Benchmarks. In: 2025 IEEE 25th International Symposium on Cluster, Cloud and Internet Computing Work- shops (CCGridW)

2025

[14] [14]

In: 6th International Con- ference on Learning Representations (ICLR 2018)

Liu, P.J., Saleh, M., Pot, E., Goodrich, B., Sepassi, R., Kaiser, L., Shazeer, N.: Generating Wikipedia by Summarizing Long Sequences. In: 6th International Con- ference on Learning Representations (ICLR 2018). 14 S. Eimler et al

2018

[15] [15]

(2019) Energy usage reports: Environmental awareness as part of algorithmic accountability

Lottick, K., Susai, S., Friedler, S.A., Wilson, J.P. (2019) Energy usage reports: Environmental awareness as part of algorithmic accountability. arXiv preprint arXiv:1911.08354

work page arXiv 2019

[16] [16]

Luccioni, S., Jernite, Y., Strubell, E.: Power Hungry Processing: Watts Driving the Cost of AI Deployment? In: Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’24), pp. 85–99. ACM (2024)

2024

[17] [17]

21, 020155

McGinness, L., Baumgartner, P.: Can Large Language Models Correctly Inter- pret Equations with Errors? Physical Review Physics Education Research, vol. 21, 020155

[18] [18]

Morrison, J., Na, C., Fernandez, J., Dettmers, T., Strubell, E., & Dodge, J. (2025). Holistically evaluating the environmental impact of creating language models. arXiv preprint arXiv:2503.05804

work page arXiv 2025

[19] [19]

OpenAI: GPT-4 Technical Report. Tech. rep., OpenAI (2023). https://api.semanticscholar.org/CorpusID:257532815

2023

[20] [20]

Rismanchian, S., & Doroudi, S. (2025). The evolution of research on AI and educa- tion across four decades: Insights from the AIxEd framework. International Journal of Artificial Intelligence in Education, 35(5), 2797-2820

2025

[21] [21]

Neurocomputing, vol

Su, J., Ahmed, M., Lu, Y., Pan, S., Bo, W., Liu, Y.: RoFormer: Enhanced trans- former with Rotary Position Embedding. Neurocomputing, vol. 568, Article 127063. Elsevier (2024)

2024

[22] [22]

In: Proceedings of the 37th International Conference on Machine Learning (ICML 2020), PMLR 119, pp

Xiong, R., Yang, Y., He, D., Zheng, K., Zheng, S., Xing, C., Zhang, H., Lan, Y., Wang, L., Liu, T.Y.: On Layer Normalization in the Transformer Architecture. In: Proceedings of the 37th International Conference on Machine Learning (ICML 2020), PMLR 119, pp. 10524–10533 (2020)

2020

[23] [23]

Qwen3 Technical Report

Yang, A., Li, A., Yang, B., Zhang, B., Cui, Z., Zhang, Z., Zhou, Z., Qiu, Z.: Qwen3 Technical Report. arXiv preprint arXiv:2505.09388 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[24] [24]

In: Advances in Neural Information Processing Systems, vol

Zhang, B., Sennrich, R.: Root mean square layer normalization. In: Advances in Neural Information Processing Systems, vol. 32, pp. 12360–12371. Curran Asso- ciates, Inc. (2019)

2019