pith. sign in

arxiv: 2606.11215 · v1 · pith:UX425VKBnew · submitted 2026-05-03 · 💻 cs.CY · cs.AI

The Environmental Cost of LLMs in AIED: Reporting and Practices

Pith reviewed 2026-07-01 00:50 UTC · model grok-4.3

classification 💻 cs.CY cs.AI
keywords Large Language ModelsAI in EducationEnvironmental ImpactCarbon FootprintReporting PracticesComputational CostMachine Learning
0
0 comments X

The pith

Most AIED projects use LLMs but few report computational resources and almost none discuss environmental impacts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reviews every submission to the AIED 2025 conference proceedings and finds that LLMs appear in most projects. Reporting of the computational resources consumed remains rare, and discussion of environmental impacts as an ethical matter is almost nonexistent. To close the gap, the authors supply open-source software that measures carbon footprints on both local machines and cloud hardware. They also give a simple formula that estimates the computational expense of frontier LLMs even when the exact parameter count is unknown.

Core claim

The central claim is that the absence of standardized measurement and reporting procedures leaves the computational and environmental costs of LLMs hidden in AIED work; supplying accessible software for carbon-footprint tracking and a parameter-light estimation formula will let researchers document these costs routinely and treat them as ethical concerns.

What carries the argument

An open-source measurement method that combines software for local and cloud carbon-footprint tracking with a formula for estimating LLM computational expense without known parameter counts.

If this is right

  • Researchers gain concrete software that calculates carbon emissions for both local hardware and cloud runs.
  • A formula allows cost estimates for large models when the number of parameters is not published.
  • Systematic reporting of these costs becomes feasible for any ML-based AIED system.
  • Environmental impact can be treated as a standard ethical consideration in future AIED studies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the method spreads, other AI conferences may adopt similar reporting norms.
  • Repeated use of the tools could identify which common AIED tasks carry the largest hidden environmental loads.
  • Wider availability of the measurements might encourage creation of smaller, lower-cost models specifically for education settings.

Load-bearing premise

That the main reason for low reporting rates is the lack of easy measurement tools, and that supplying those tools will produce noticeably higher rates of transparent reporting.

What would settle it

A follow-up review of AIED conference papers published after the tools are released that shows no rise in the percentage of papers reporting computational resources or environmental impacts.

Figures

Figures reproduced from arXiv: 2606.11215 by Aditi Haiman, Andr\'e Helgert, B\"usra Yapici, Daniel Flood, Lachlan McGinness, Luca H\"ackert, Lukas Erle, Sabrina C. Eimler.

Figure 1
Figure 1. Figure 1: Diagram of decoder-only transformer architecture demonstrating the number of parameters and FLOPs required for the first token generated by a Large Language Model after receiving input context. Grey arrows indicate the matrix dimensions that can be used to find the number of learned parameters within the model. Blue arrows indicate the number of FLOPs required to generate the first token [PITH_FULL_IMAGE:… view at source ↗
read the original abstract

Large Language Model (LLM) usage in recent years has become increasingly widespread in the Artificial Intelligence in Education (AIED) community. While LLMs offer unique avenues for learners and educators, using LLMs comes with computational and environmental costs. These costs are mostly hidden due to a lack of standardised procedures to measure and report these impacts. To address this gap, we first conducted a literature review of all papers published as part of the AIED 2025 conference proceedings, determining if and how computational or environmental costs of LLMs are reported. Most projects use LLMs, but few report computational resources used and almost none discuss environmental impacts of LLMs as an ethical concern. To address this lack of standardised reporting practices, we propose an open-source method for systematically measuring and reporting the computational expense of LLMs and environmental impact of running Machine Learning (ML) AIED systems. We provide software solutions to measure the carbon footprint for both local and cloud based hardware. We also provide an easy-to-use formula to calculate the computational expense of frontier LLMs even when the exact number of parameters is not known. Overall, we hope to motivate colleagues to use our method to strive for more transparent reporting of hidden costs of using LLMs in the AIED community.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript reports a literature review of all papers in the AIED 2025 conference proceedings, finding that most projects use LLMs but few report computational resources and almost none discuss environmental impacts as an ethical concern. It proposes an open-source method with software tools to measure carbon footprint for local and cloud hardware, plus a formula for estimating computational expense of frontier LLMs when the parameter count is unknown.

Significance. If the reporting-gap findings hold, the work could increase awareness of hidden environmental costs in AIED. The open-source software solutions for carbon measurement and the formula for unknown-parameter cases are concrete, adoptable contributions that support reproducible reporting practices.

major comments (1)
  1. [Abstract] Abstract: The claim of having 'conducted a literature review of all papers published as part of the AIED 2025 conference proceedings' is inconsistent with a June 2024 arXiv submission (arXiv:2606.11215). AIED 2025 proceedings would not have existed, so the quantitative statements ('Most projects use LLMs, but few report computational resources used and almost none discuss environmental impacts') lack an empirical basis. This directly undercuts the motivation for the proposed measurement tools and formula.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and for highlighting this important inconsistency. We address the comment directly below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The claim of having 'conducted a literature review of all papers published as part of the AIED 2025 conference proceedings' is inconsistent with a June 2024 arXiv submission (arXiv:2606.11215). AIED 2025 proceedings would not have existed, so the quantitative statements ('Most projects use LLMs, but few report computational resources used and almost none discuss environmental impacts') lack an empirical basis. This directly undercuts the motivation for the proposed measurement tools and formula.

    Authors: We agree that the stated claim is inconsistent with the June 2024 submission date. This is an error in the manuscript: the literature review was performed on the AIED 2024 conference proceedings (available prior to submission), and the year '2025' was used by mistake. We will revise the abstract and all other references throughout the paper to 'AIED 2024'. The quantitative findings on LLM usage and reporting practices are based on the 2024 proceedings and remain unchanged. The motivation for the measurement tools and formula is unaffected, as the identified reporting gap is still present in the corrected data. revision: yes

Circularity Check

0 steps flagged

No circularity; descriptive survey with independent empirical basis

full rationale

The paper conducts a literature review of AIED 2025 proceedings and proposes open-source measurement tools plus a formula for computational expense. No equations, fitted parameters, self-citations, or derivations appear in the provided text. The central claims rest on external counts from the proceedings review rather than reducing to self-referential inputs or ansatzes. This matches the default expectation of a non-circular descriptive work; any concerns about review timing or scope fall under empirical validity, not circularity per the analysis rules.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is a literature survey and methodological proposal with no mathematical derivations, fitted parameters, background axioms, or new postulated entities.

pith-pipeline@v0.9.1-grok · 5785 in / 1165 out tokens · 53114 ms · 2026-07-01T00:50:00.899747+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

24 extracted references · 6 canonical work pages · 3 internal anchors

  1. [1]

    In: Advances in Neural Information Processing Systems, vol

    Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is All you Need. In: Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017)

  2. [2]

    In: Advances in Neural Information Processing Systems, vol

    Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Nee- lakantan, A., Shyam, P., Sastry, G., Askell, A., et al.: Language Models are Few- Shot Learners. In: Advances in Neural Information Processing Systems, vol. 33, pp. 1877–1901. Curran Associates, Inc. (2020)

  3. [3]

    Zenodo (2026)

    Courty, B., Schmidt, V., et al.: mlco2/codecarbon: v3.2.3 (v3.2.3). Zenodo (2026). https://doi.org/10.5281/zenodo.18731928

  4. [4]

    https://docs.codecarbon.io/

    CodeCarbon Contributors: CodeCarbon Documentation. https://docs.codecarbon.io/. Accessed 25 Feb 2026

  5. [5]

    In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp

    Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q., Salakhutdinov, R.: Transformer- XL: Attentive Language Models beyond a Fixed-Length Context. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2978–2988. (2019)

  6. [6]

    DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

    DeepSeek-AI: DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of- Experts Language Model. arXiv:2405.04434 (2024)

  7. [7]

    Our World in Data (2026)

    Ember: Lifecycle carbon intensity of electricity generation. Our World in Data (2026). Available at: https://ourworldindata.org/energy. Accessed 3 Mar 2026

  8. [8]

    In: NeurIPS 2023 Workshop on Tackling Climate Change with Machine Learning (2023)

    Faiz, A., Kaneda, S., Wang, R., Osi, R.C., Sharma, P., Chen, F., Jiang, L.: LLM- Carbon: Modeling the End-to-End Carbon Footprint of Large Language Models. In: NeurIPS 2023 Workshop on Tackling Climate Change with Machine Learning (2023)

  9. [9]

    Journal of Machine Learning Research, vol

    Henderson, P., Hu, J., Romoff, J., Brunskill, E., Jurafsky, D., Pineau, J.: Towards the systematic reporting of the energy and carbon footprints of machine learning. Journal of Machine Learning Research, vol. 21, no. 248, pp. 10039–10081 (2020)

  10. [10]

    Holmes, W., Porayska-Pomsta, K., Holstein, K., Sutherland, E., Baker, T., Shum, S. B., ... & Koedinger, K. R. (2022). Ethics of AI in education: Towards a community-wide framework. International Journal of Artificial Intelligence in Edu- cation, 32(3), 504-526

  11. [11]

    Scaling Laws for Neural Language Models

    Kaplan, J., McCandlish, S., Henighan, T., Brown, T.B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., Amodei, D.: Scaling Laws for Neural Language Models. arXiv preprint arXiv:2001.08361 (2020)

  12. [12]

    Available at: https://archive.ics.uci.edu

    Kelly, M., Longjohn, R., Nottingham, K.: The UCI Machine Learning Repository. Available at: https://archive.ics.uci.edu. Accessed 27 Jan 2026

  13. [13]

    In: 2025 IEEE 25th International Symposium on Cluster, Cloud and Internet Computing Work- shops (CCGridW)

    Kocher, N., Wassermann, C., Hennig, L., Seng, J., Hoos, H., Kersting, K.: Guide- lines for the Quality Assessment of Energy-Aware NAS Benchmarks. In: 2025 IEEE 25th International Symposium on Cluster, Cloud and Internet Computing Work- shops (CCGridW)

  14. [14]

    In: 6th International Con- ference on Learning Representations (ICLR 2018)

    Liu, P.J., Saleh, M., Pot, E., Goodrich, B., Sepassi, R., Kaiser, L., Shazeer, N.: Generating Wikipedia by Summarizing Long Sequences. In: 6th International Con- ference on Learning Representations (ICLR 2018). 14 S. Eimler et al

  15. [15]

    (2019) Energy usage reports: Environmental awareness as part of algorithmic accountability

    Lottick, K., Susai, S., Friedler, S.A., Wilson, J.P. (2019) Energy usage reports: Environmental awareness as part of algorithmic accountability. arXiv preprint arXiv:1911.08354

  16. [16]

    Luccioni, S., Jernite, Y., Strubell, E.: Power Hungry Processing: Watts Driving the Cost of AI Deployment? In: Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’24), pp. 85–99. ACM (2024)

  17. [17]

    21, 020155

    McGinness, L., Baumgartner, P.: Can Large Language Models Correctly Inter- pret Equations with Errors? Physical Review Physics Education Research, vol. 21, 020155

  18. [18]

    Morrison, J., Na, C., Fernandez, J., Dettmers, T., Strubell, E., & Dodge, J. (2025). Holistically evaluating the environmental impact of creating language models. arXiv preprint arXiv:2503.05804

  19. [19]

    OpenAI: GPT-4 Technical Report. Tech. rep., OpenAI (2023). https://api.semanticscholar.org/CorpusID:257532815

  20. [20]

    Rismanchian, S., & Doroudi, S. (2025). The evolution of research on AI and educa- tion across four decades: Insights from the AIxEd framework. International Journal of Artificial Intelligence in Education, 35(5), 2797-2820

  21. [21]

    Neurocomputing, vol

    Su, J., Ahmed, M., Lu, Y., Pan, S., Bo, W., Liu, Y.: RoFormer: Enhanced trans- former with Rotary Position Embedding. Neurocomputing, vol. 568, Article 127063. Elsevier (2024)

  22. [22]

    In: Proceedings of the 37th International Conference on Machine Learning (ICML 2020), PMLR 119, pp

    Xiong, R., Yang, Y., He, D., Zheng, K., Zheng, S., Xing, C., Zhang, H., Lan, Y., Wang, L., Liu, T.Y.: On Layer Normalization in the Transformer Architecture. In: Proceedings of the 37th International Conference on Machine Learning (ICML 2020), PMLR 119, pp. 10524–10533 (2020)

  23. [23]

    Qwen3 Technical Report

    Yang, A., Li, A., Yang, B., Zhang, B., Cui, Z., Zhang, Z., Zhou, Z., Qiu, Z.: Qwen3 Technical Report. arXiv preprint arXiv:2505.09388 (2025)

  24. [24]

    In: Advances in Neural Information Processing Systems, vol

    Zhang, B., Sennrich, R.: Root mean square layer normalization. In: Advances in Neural Information Processing Systems, vol. 32, pp. 12360–12371. Curran Asso- ciates, Inc. (2019)