Hugging Carbon: Quantifying the Training Carbon Emissions of AI Models at Scale

Jing Qiu, Jinjin Gu, Junhua Zhao, Ruibo Ming, Xinlei Wang

Authors on Pith no claims yet

Pith reviewed 2026-05-09 13:42 UTC · model grok-4.3

classification 💻 cs.CY

keywords carbonemissionstrainingindustrymodelsopen-sourcesustainabilityaccounting

0 comments

The pith

A FLOPs-based framework with tiered metadata handling estimates that training popular open-source models on Hugging Face has emitted approximately 58,000 metric tons of carbon and introduces the ATCI metric for training efficiency.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors treat the Hugging Face platform as a representative sample of open-source AI models and build a method to estimate total carbon emissions from their training without needing to rerun the models. They count the computational operations (FLOPs) required for each model and convert that into emissions using energy and carbon intensity factors. Because many models lack full training details, they apply a tiered system that uses different levels of available information and backs it with regression analysis to check statistical patterns. The result is an estimate of roughly 58,000 metric tons of carbon for the most popular models and a new metric called AI training carbon intensity that measures emissions per unit of compute. This approach aims to make large-scale carbon accounting feasible for the AI community.

Core claim

Our results show that training the most popular open-source models (with over 5,000 downloads) has resulted in approximately 5.8×10^4 metric tons of carbon emissions.

Load-bearing premise

Given that the Hugging Face (HF) platform well represents the broader open-source community, we treat it as a large-scale, publicly accessible, and audit-ready corpus for carbon accounting.

read the original abstract

The scaling-law era has transformed artificial intelligence from research into a global industry, but its rapid growth raises concerns over energy usage, carbon emissions, and environmental sustainability. Unlike traditional sectors, the AI industry still lacks systematic carbon accounting methods that support large-scale estimates without reproducing the original model. This leaves open questions about how large the problem is today and how large it might be in the near future. Given that the Hugging Face (HF) platform well represents the broader open-source community, we treat it as a large-scale, publicly accessible, and audit-ready corpus for carbon accounting. We propose a FLOPs-based framework to estimate aggregate training emissions of HF open-source models. Considering their uneven disclosure quality, we introduce a tiered approach to handle incomplete metadata, supported by empirical regressions that verify the statistical significance. Compute is also converted to AI training carbon intensity (ATCI, emissions per compute), a metric to assess the sustainability efficiency of model training. Our results show that training the most popular open-source models (with over 5,000 downloads) has resulted in approximately $5.8\times10^4$ metric tons of carbon emissions. This paper provides a scalable framework for emission estimations and a practical methodology to guide future standards and sustainability strategies in the AI industry.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a FLOPs-based way to estimate aggregate carbon from popular Hugging Face models and introduces an ATCI metric, but the 58,000-ton total rests on regressions without out-of-sample checks or propagated uncertainty.

read the letter

The main takeaway is that this work supplies a practical, no-reproduction method for tallying training emissions across open-source models using public Hugging Face data. It handles uneven metadata with a tiered system backed by regressions and defines ATCI as emissions per unit compute to compare efficiency. That combination is new enough to stand out from earlier single-model estimates or high-level industry reports. The approach is straightforward and avoids the need to rerun training jobs, which makes it usable at scale for audit-style accounting. The authors also flag the scaling-law context and the gap in systematic methods, which is fair. The regressions show statistical significance on the available data, and the HF corpus is a reasonable starting point for open-source coverage. Those are the solid parts. The central number of roughly 5.8×10^4 metric tons comes from summing estimates that fill in missing FLOPs and hardware details via those same regressions. No out-of-sample validation or error propagation to the aggregate total is described, so the figure could shift by a large factor if the complete-data models differ from the rest. The claim that HF represents the broader open-source community is stated but not tested against other repositories or private models. Both issues are load-bearing for the headline result. Readers working on AI sustainability metrics or corporate reporting will find the framework useful as a template, even if they treat the specific total as preliminary. Policymakers or benchmark developers could adapt the tiered logic once the validation gaps are closed. The paper deserves a serious referee because the topic is timely, the method is reproducible in principle, and the flaws are fixable with added checks rather than fatal. I would send it for review with requests for sensitivity analysis on the regressions and a clearer comparison to existing carbon accounting tools.

Referee Report

3 major / 2 minor

Summary. The paper proposes a FLOPs-based framework for estimating aggregate training carbon emissions of open-source AI models on the Hugging Face platform. It introduces a tiered approach to handle incomplete metadata via empirical regressions, defines an AI training carbon intensity (ATCI) metric, and reports that training the most popular models (over 5,000 downloads) has produced approximately 5.8×10^4 metric tons of CO2e.

Significance. If the estimates hold after validation, the work supplies a scalable, public-data-driven method for AI carbon accounting that avoids model reproduction. The ATCI metric and tiered handling of disclosure gaps offer practical tools for industry sustainability assessment and future standards. The framing of HF as an audit-ready corpus is a constructive starting point for reproducible large-scale analysis.

major comments (3)

[Tiered approach / methods] The tiered approach section (describing regressions for incomplete FLOPs/hardware metadata): the central 5.8×10^4 tCO2e aggregate rests on these regressions, yet no equations, training data sources, coefficient values, or residual statistics are supplied, preventing assessment of extrapolation error or sensitivity of the final sum.
[Results / aggregate estimate] Results on aggregate emissions: statistical significance is claimed for the regressions, but it is unclear whether this verification uses the same models that contribute to the summed total; without out-of-sample validation or uncertainty propagation to the aggregate, the numerical claim lacks independent support.
[Introduction / discussion] Introduction and discussion of HF representativeness: the assumption that HF models with >5,000 downloads proxy the broader open-source training emissions is load-bearing for the headline figure, but no quantitative comparison to non-HF sources or sensitivity test is provided.

minor comments (2)

[Metrics definition] Clarify the exact definition and units of ATCI (emissions per compute) and how it is computed from the tiered estimates.
[Abstract] The abstract states the final number without referencing the supporting tables or regression diagnostics; add cross-references.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that Hugging Face adequately represents the open-source AI community and that FLOPs can be reliably mapped to emissions even with incomplete metadata via regressions.

axioms (1)

domain assumption Hugging Face platform well represents the broader open-source community
Explicitly stated in the abstract as the justification for using HF as the corpus.

pith-pipeline@v0.9.0 · 5534 in / 1250 out tokens · 83286 ms · 2026-05-09T13:42:23.194793+00:00 · methodology

Hugging Carbon: Quantifying the Training Carbon Emissions of AI Models at Scale

Core claim

Load-bearing premise

discussion (0)