The Environmental Cost of LLMs in AIED: Reporting and Practices
Pith reviewed 2026-07-01 00:50 UTC · model grok-4.3
The pith
Most AIED projects use LLMs but few report computational resources and almost none discuss environmental impacts.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that the absence of standardized measurement and reporting procedures leaves the computational and environmental costs of LLMs hidden in AIED work; supplying accessible software for carbon-footprint tracking and a parameter-light estimation formula will let researchers document these costs routinely and treat them as ethical concerns.
What carries the argument
An open-source measurement method that combines software for local and cloud carbon-footprint tracking with a formula for estimating LLM computational expense without known parameter counts.
If this is right
- Researchers gain concrete software that calculates carbon emissions for both local hardware and cloud runs.
- A formula allows cost estimates for large models when the number of parameters is not published.
- Systematic reporting of these costs becomes feasible for any ML-based AIED system.
- Environmental impact can be treated as a standard ethical consideration in future AIED studies.
Where Pith is reading between the lines
- If the method spreads, other AI conferences may adopt similar reporting norms.
- Repeated use of the tools could identify which common AIED tasks carry the largest hidden environmental loads.
- Wider availability of the measurements might encourage creation of smaller, lower-cost models specifically for education settings.
Load-bearing premise
That the main reason for low reporting rates is the lack of easy measurement tools, and that supplying those tools will produce noticeably higher rates of transparent reporting.
What would settle it
A follow-up review of AIED conference papers published after the tools are released that shows no rise in the percentage of papers reporting computational resources or environmental impacts.
Figures
read the original abstract
Large Language Model (LLM) usage in recent years has become increasingly widespread in the Artificial Intelligence in Education (AIED) community. While LLMs offer unique avenues for learners and educators, using LLMs comes with computational and environmental costs. These costs are mostly hidden due to a lack of standardised procedures to measure and report these impacts. To address this gap, we first conducted a literature review of all papers published as part of the AIED 2025 conference proceedings, determining if and how computational or environmental costs of LLMs are reported. Most projects use LLMs, but few report computational resources used and almost none discuss environmental impacts of LLMs as an ethical concern. To address this lack of standardised reporting practices, we propose an open-source method for systematically measuring and reporting the computational expense of LLMs and environmental impact of running Machine Learning (ML) AIED systems. We provide software solutions to measure the carbon footprint for both local and cloud based hardware. We also provide an easy-to-use formula to calculate the computational expense of frontier LLMs even when the exact number of parameters is not known. Overall, we hope to motivate colleagues to use our method to strive for more transparent reporting of hidden costs of using LLMs in the AIED community.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript reports a literature review of all papers in the AIED 2025 conference proceedings, finding that most projects use LLMs but few report computational resources and almost none discuss environmental impacts as an ethical concern. It proposes an open-source method with software tools to measure carbon footprint for local and cloud hardware, plus a formula for estimating computational expense of frontier LLMs when the parameter count is unknown.
Significance. If the reporting-gap findings hold, the work could increase awareness of hidden environmental costs in AIED. The open-source software solutions for carbon measurement and the formula for unknown-parameter cases are concrete, adoptable contributions that support reproducible reporting practices.
major comments (1)
- [Abstract] Abstract: The claim of having 'conducted a literature review of all papers published as part of the AIED 2025 conference proceedings' is inconsistent with a June 2024 arXiv submission (arXiv:2606.11215). AIED 2025 proceedings would not have existed, so the quantitative statements ('Most projects use LLMs, but few report computational resources used and almost none discuss environmental impacts') lack an empirical basis. This directly undercuts the motivation for the proposed measurement tools and formula.
Simulated Author's Rebuttal
We thank the referee for their careful reading and for highlighting this important inconsistency. We address the comment directly below.
read point-by-point responses
-
Referee: [Abstract] Abstract: The claim of having 'conducted a literature review of all papers published as part of the AIED 2025 conference proceedings' is inconsistent with a June 2024 arXiv submission (arXiv:2606.11215). AIED 2025 proceedings would not have existed, so the quantitative statements ('Most projects use LLMs, but few report computational resources used and almost none discuss environmental impacts') lack an empirical basis. This directly undercuts the motivation for the proposed measurement tools and formula.
Authors: We agree that the stated claim is inconsistent with the June 2024 submission date. This is an error in the manuscript: the literature review was performed on the AIED 2024 conference proceedings (available prior to submission), and the year '2025' was used by mistake. We will revise the abstract and all other references throughout the paper to 'AIED 2024'. The quantitative findings on LLM usage and reporting practices are based on the 2024 proceedings and remain unchanged. The motivation for the measurement tools and formula is unaffected, as the identified reporting gap is still present in the corrected data. revision: yes
Circularity Check
No circularity; descriptive survey with independent empirical basis
full rationale
The paper conducts a literature review of AIED 2025 proceedings and proposes open-source measurement tools plus a formula for computational expense. No equations, fitted parameters, self-citations, or derivations appear in the provided text. The central claims rest on external counts from the proceedings review rather than reducing to self-referential inputs or ansatzes. This matches the default expectation of a non-circular descriptive work; any concerns about review timing or scope fall under empirical validity, not circularity per the analysis rules.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
In: Advances in Neural Information Processing Systems, vol
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is All you Need. In: Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017)
2017
-
[2]
In: Advances in Neural Information Processing Systems, vol
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Nee- lakantan, A., Shyam, P., Sastry, G., Askell, A., et al.: Language Models are Few- Shot Learners. In: Advances in Neural Information Processing Systems, vol. 33, pp. 1877–1901. Curran Associates, Inc. (2020)
1901
-
[3]
Courty, B., Schmidt, V., et al.: mlco2/codecarbon: v3.2.3 (v3.2.3). Zenodo (2026). https://doi.org/10.5281/zenodo.18731928
-
[4]
https://docs.codecarbon.io/
CodeCarbon Contributors: CodeCarbon Documentation. https://docs.codecarbon.io/. Accessed 25 Feb 2026
2026
-
[5]
In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp
Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q., Salakhutdinov, R.: Transformer- XL: Attentive Language Models beyond a Fixed-Length Context. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2978–2988. (2019)
2019
-
[6]
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
DeepSeek-AI: DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of- Experts Language Model. arXiv:2405.04434 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[7]
Our World in Data (2026)
Ember: Lifecycle carbon intensity of electricity generation. Our World in Data (2026). Available at: https://ourworldindata.org/energy. Accessed 3 Mar 2026
2026
-
[8]
In: NeurIPS 2023 Workshop on Tackling Climate Change with Machine Learning (2023)
Faiz, A., Kaneda, S., Wang, R., Osi, R.C., Sharma, P., Chen, F., Jiang, L.: LLM- Carbon: Modeling the End-to-End Carbon Footprint of Large Language Models. In: NeurIPS 2023 Workshop on Tackling Climate Change with Machine Learning (2023)
2023
-
[9]
Journal of Machine Learning Research, vol
Henderson, P., Hu, J., Romoff, J., Brunskill, E., Jurafsky, D., Pineau, J.: Towards the systematic reporting of the energy and carbon footprints of machine learning. Journal of Machine Learning Research, vol. 21, no. 248, pp. 10039–10081 (2020)
2020
-
[10]
Holmes, W., Porayska-Pomsta, K., Holstein, K., Sutherland, E., Baker, T., Shum, S. B., ... & Koedinger, K. R. (2022). Ethics of AI in education: Towards a community-wide framework. International Journal of Artificial Intelligence in Edu- cation, 32(3), 504-526
2022
-
[11]
Scaling Laws for Neural Language Models
Kaplan, J., McCandlish, S., Henighan, T., Brown, T.B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., Amodei, D.: Scaling Laws for Neural Language Models. arXiv preprint arXiv:2001.08361 (2020)
work page internal anchor Pith review Pith/arXiv arXiv 2001
-
[12]
Available at: https://archive.ics.uci.edu
Kelly, M., Longjohn, R., Nottingham, K.: The UCI Machine Learning Repository. Available at: https://archive.ics.uci.edu. Accessed 27 Jan 2026
2026
-
[13]
In: 2025 IEEE 25th International Symposium on Cluster, Cloud and Internet Computing Work- shops (CCGridW)
Kocher, N., Wassermann, C., Hennig, L., Seng, J., Hoos, H., Kersting, K.: Guide- lines for the Quality Assessment of Energy-Aware NAS Benchmarks. In: 2025 IEEE 25th International Symposium on Cluster, Cloud and Internet Computing Work- shops (CCGridW)
2025
-
[14]
In: 6th International Con- ference on Learning Representations (ICLR 2018)
Liu, P.J., Saleh, M., Pot, E., Goodrich, B., Sepassi, R., Kaiser, L., Shazeer, N.: Generating Wikipedia by Summarizing Long Sequences. In: 6th International Con- ference on Learning Representations (ICLR 2018). 14 S. Eimler et al
2018
-
[15]
(2019) Energy usage reports: Environmental awareness as part of algorithmic accountability
Lottick, K., Susai, S., Friedler, S.A., Wilson, J.P. (2019) Energy usage reports: Environmental awareness as part of algorithmic accountability. arXiv preprint arXiv:1911.08354
-
[16]
Luccioni, S., Jernite, Y., Strubell, E.: Power Hungry Processing: Watts Driving the Cost of AI Deployment? In: Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’24), pp. 85–99. ACM (2024)
2024
-
[17]
21, 020155
McGinness, L., Baumgartner, P.: Can Large Language Models Correctly Inter- pret Equations with Errors? Physical Review Physics Education Research, vol. 21, 020155
- [18]
-
[19]
OpenAI: GPT-4 Technical Report. Tech. rep., OpenAI (2023). https://api.semanticscholar.org/CorpusID:257532815
2023
-
[20]
Rismanchian, S., & Doroudi, S. (2025). The evolution of research on AI and educa- tion across four decades: Insights from the AIxEd framework. International Journal of Artificial Intelligence in Education, 35(5), 2797-2820
2025
-
[21]
Neurocomputing, vol
Su, J., Ahmed, M., Lu, Y., Pan, S., Bo, W., Liu, Y.: RoFormer: Enhanced trans- former with Rotary Position Embedding. Neurocomputing, vol. 568, Article 127063. Elsevier (2024)
2024
-
[22]
In: Proceedings of the 37th International Conference on Machine Learning (ICML 2020), PMLR 119, pp
Xiong, R., Yang, Y., He, D., Zheng, K., Zheng, S., Xing, C., Zhang, H., Lan, Y., Wang, L., Liu, T.Y.: On Layer Normalization in the Transformer Architecture. In: Proceedings of the 37th International Conference on Machine Learning (ICML 2020), PMLR 119, pp. 10524–10533 (2020)
2020
-
[23]
Yang, A., Li, A., Yang, B., Zhang, B., Cui, Z., Zhang, Z., Zhou, Z., Qiu, Z.: Qwen3 Technical Report. arXiv preprint arXiv:2505.09388 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[24]
In: Advances in Neural Information Processing Systems, vol
Zhang, B., Sennrich, R.: Root mean square layer normalization. In: Advances in Neural Information Processing Systems, vol. 32, pp. 12360–12371. Curran Asso- ciates, Inc. (2019)
2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.