hub Mixed citations

Carbon Emissions and Large Neural Network Training

David Patterson, Joseph Gonzalez, Quoc Le, Chen Liang, Lluis-Miquel Munguia, Daniel Rothchild · 2021 · cs.LG · arXiv 2104.10350

Mixed citation behavior. Most common role is background (69%).

70 Pith papers citing it

Background 69% of classified citations

open full Pith review browse 70 citing papers arXiv PDF

abstract

The computation demand for machine learning (ML) has grown rapidly recently, which comes with a number of costs. Estimating the energy cost helps measure its environmental impact and finding greener strategies, yet it is challenging without detailed information. We calculate the energy use and carbon footprint of several recent large models-T5, Meena, GShard, Switch Transformer, and GPT-3-and refine earlier estimates for the neural architecture search that found Evolved Transformer. We highlight the following opportunities to improve energy efficiency and CO2 equivalent emissions (CO2e): Large but sparsely activated DNNs can consume <1/10th the energy of large, dense DNNs without sacrificing accuracy despite using as many or even more parameters. Geographic location matters for ML workload scheduling since the fraction of carbon-free energy and resulting CO2e vary ~5X-10X, even within the same country and the same organization. We are now optimizing where and when large models are trained. Specific datacenter infrastructure matters, as Cloud datacenters can be ~1.4-2X more energy efficient than typical datacenters, and the ML-oriented accelerators inside them can be ~2-5X more effective than off-the-shelf systems. Remarkably, the choice of DNN, datacenter, and processor can reduce the carbon footprint up to ~100-1000X. These large factors also make retroactive estimates of energy cost difficult. To avoid miscalculations, we believe ML papers requiring large computational resources should make energy consumption and CO2e explicit when practical. We are working to be more transparent about energy use and CO2e in our future research. To help reduce the carbon footprint of ML, we believe energy usage and CO2e should be a key metric in evaluating models, and we are collaborating with MLPerf developers to include energy usage during training and inference in this industry standard benchmark.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 12 method 1

citation-polarity summary

background 9 support 3 use method 1

claims ledger

abstract The computation demand for machine learning (ML) has grown rapidly recently, which comes with a number of costs. Estimating the energy cost helps measure its environmental impact and finding greener strategies, yet it is challenging without detailed information. We calculate the energy use and carbon footprint of several recent large models-T5, Meena, GShard, Switch Transformer, and GPT-3-and refine earlier estimates for the neural architecture search that found Evolved Transformer. We highlight the following opportunities to improve energy efficiency and CO2 equivalent emissions (CO2e): Large

co-cited works

representative citing papers

Energy per Successful Goal: Goal-Level Energy Accounting for Agentic AI Systems

cs.AI · 2026-05-20 · unverdicted · novelty 7.0

Proposes EpG and OOI metrics showing agentic workflows use 4.33x more energy per successful goal than linear baselines due to orchestration structure.

The Economics of AI Inference: Inflation Dynamics, Welfare Costs, and Optimal Monetary Policy under the Inference-Cost Phillips Curve

econ.GN · 2026-05-19 · unverdicted · novelty 7.0

Develops the Inference-Cost Phillips Curve linking AI inference costs to inflation dynamics, derives structural slopes and optimal monetary policy, and reports empirical estimates from US and G7 data that align with theoretical predictions.

Characterizing Learning in Deep Neural Networks using Tractable Algorithmic Complexity Analysis

cs.LG · 2026-05-15 · unverdicted · novelty 7.0

QuBD extends algorithmic complexity estimation to quantized DNN weights, revealing that complexity decreases during learning, increases with overfitting, follows grokking patterns, and correlates with generalization.

An Amortized Efficiency Threshold for Comparing Neural and Heuristic Solvers in Combinatorial Optimization

cs.LG · 2026-05-14 · unverdicted · novelty 7.0 · 2 refs

Introduces the Amortized Efficiency Threshold (AET) to identify the deployment volume at which neural combinatorial optimization solvers achieve lower total energy use than heuristic baselines after accounting for training costs.

DUET: Optimize Token-Budget Allocation for Reinforcement Learning with Verifiable Rewards

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

DUET improves RLVR by allocating tokens across both prompt selection and rollout length, outperforming full-budget baselines even when using only half the tokens.

Token Arena: A Continuous Benchmark Unifying Energy and Cognition in AI Inference

cs.AI · 2026-05-01 · unverdicted · novelty 7.0

TokenArena is a continuous benchmark for AI inference endpoints that measures output speed, time to first token, blended price, effective context, quality, and modeled energy to produce composites of joules per correct answer, dollars per correct answer, and endpoint fidelity.

SAT: Sequential Agent Tuning for Coordinator Free Plug and Play Multi-LLM Training with Monotonic Improvement Guarantees

cs.LG · 2026-04-17 · unverdicted · novelty 7.0

SAT trains multi-LLM teams with sequential block updates to deliver monotonic gains and plug-and-play model swaps that provably improve performance bounds.

Training single-electron and single-photon stochastic physical neural networks

quant-ph · 2026-04-12 · unverdicted · novelty 7.0

Single-electron and single-photon stochastic physical neural networks achieve over 97% MNIST test accuracy when trained with empirical outputs in the backward pass using few trials per layer.

The Phase Is the Gradient: Equilibrium Propagation for Frequency Learning in Kuramoto Networks

cs.LG · 2026-04-11 · unverdicted · novelty 7.0

In Kuramoto networks at equilibrium, weak nudging makes phase displacement the exact gradient of loss w.r.t. natural frequencies, enabling frequency learning that beats weight learning and resolves convergence via spectral initialization.

Hidden State Poisoning Attacks against Mamba-based Language Models

cs.CL · 2026-01-05 · unverdicted · novelty 7.0

Short input phrases can irreversibly overwrite hidden states in Mamba models, impairing information retrieval on a new benchmark while leaving pure Transformer models unaffected.

Stochastic Thermodynamics of Associative Memory

cond-mat.stat-mech · 2026-01-03 · unverdicted · novelty 7.0

DenseAMs show tradeoffs between entropy production, retrieval accuracy, and speed at intermediate loads, with a new failure mode in higher-order networks at finite temperature.

SAM 3: Segment Anything with Concepts

cs.CV · 2025-11-20 · unverdicted · novelty 7.0

SAM 3 introduces promptable concept segmentation that doubles accuracy of prior systems on images and videos while improving standard SAM segmentation performance.

Segment Anything

cs.CV · 2023-04-05 · unverdicted · novelty 7.0

A promptable model trained on 1B masks achieves competitive zero-shot segmentation performance across tasks and is released publicly with its dataset.

Mass-Editing Memory in a Transformer

cs.CL · 2022-10-13 · conditional · novelty 7.0

MEMIT scales direct memory editing in transformers from single facts to thousands of associations by optimizing MLP weight updates.

OPT: Open Pre-trained Transformer Language Models

cs.CL · 2022-05-02 · unverdicted · novelty 7.0

OPT releases open decoder-only transformers up to 175B parameters that match GPT-3 performance at one-seventh the carbon cost, along with code and training logs.

High-Resolution Image Synthesis with Latent Diffusion Models

cs.CV · 2021-12-20 · conditional · novelty 7.0

Latent diffusion models achieve state-of-the-art inpainting and competitive results on unconditional generation, scene synthesis, and super-resolution by performing the diffusion process in the latent space of pretrained autoencoders with cross-attention conditioning, while cutting computational and

Multitask Prompted Training Enables Zero-Shot Task Generalization

cs.LG · 2021-10-15 · conditional · novelty 7.0

Multitask fine-tuning of an encoder-decoder model on prompted datasets produces zero-shot generalization that often beats models up to 16 times larger on standard benchmarks.

Perhaps PTLMs Should Go to School -- A Task to Assess Open Book and Closed Book QA

cs.CL · 2021-10-04 · unverdicted · novelty 7.0

Proposes a textbook-based true/false QA task where PTLMs score ~50% closed-book even after pre-training on the text and ~60% open-book with retrieval.

Inference Cost Attacks for Retrieval-Augmented Large Language Models

cs.CR · 2026-05-31 · unverdicted · novelty 6.0

Poisoning external knowledge bases with LLM-agent-crafted documents can increase RAG inference token consumption by up to 13.12 times at over 90% success rate while preserving answer quality.

General-Purpose Photonic Computing Primitive for Contemporary Artificial Intelligence

physics.optics · 2026-05-21 · unverdicted · novelty 6.0

DUET is a photonic tensor core paradigm that uses structural symmetry in VODICs to support arbitrary signed operands directly, experimentally tested on image classification, segmentation, and Transformer tasks.

Recasting AI Data Centers as Engines for Carbon Removal

math.OC · 2026-05-13 · unverdicted · novelty 6.0

AI data center waste heat upgraded by heat pumps can drive direct air capture to achieve net CO2 removal and offset operational emissions in several US states under current and 2030 scenarios.

Language-Conditioned Visual Grounding with CLIP Multilingual

cs.CL · 2026-05-09 · unverdicted · novelty 6.0

Fixing the visual encoder in multilingual CLIP isolates text-branch deficits as the cause of lower visual grounding performance for low-resource languages, with model scaling widening some gaps but not others.

A Hardware-aware Hopfield Network with a Nonlinear Memristor Array for Robust Associative Memory with Superlinear Capacity

cond-mat.dis-nn · 2026-05-08 · unverdicted · novelty 6.0

A memristor-array Hopfield network uses device nonlinearity to exceed classical memory capacity with K ~ 0.14N experimentally and superlinear K ~ 0.3 N^1.2 in simulations.

UniSD: Towards a Unified Self-Distillation Framework for Large Language Models

cs.CL · 2026-05-07 · unverdicted · novelty 6.0 · 2 refs

UniSD unifies self-distillation components for autoregressive LLMs and its full integrated version improves base models by 5.4 points and baselines by 2.8 points across six benchmarks.

citing papers explorer

Showing 20 of 70 citing papers.

Remember what you did so you know what to do next cs.CL · 2023-10-30 · unverdicted · none · ref 19 · internal anchor
GPT-J with full action history achieves 3.5x improvement over RL in ScienceWorld and matches a two-stage system using 29x larger models.
DINOv2: Learning Robust Visual Features without Supervision cs.CV · 2023-04-14 · unverdicted · none · ref 19 · internal anchor
Pith review generated a malformed one-line summary.
Artificial Adaptive Intelligence: The Missing Stage Between Narrow and General Intelligence cs.AI · 2026-05-16 · unverdicted · none · ref 14 · internal anchor
Proposes Artificial Adaptive Intelligence as the regime between narrow and general AI, defined by elimination of human-specified hyperparameters, and introduces an adaptivity index plus parametric minimality principle grounded in minimum description length.
From Cradle to Cloud: A Life Cycle Review of AI's Environmental Footprint cs.CY · 2026-05-06 · unverdicted · none · ref 75 · internal anchor
A review of AI sustainability studies finds inconsistent life cycle definitions and predominant reliance on coarse CO2e proxies, with limited coverage of water, materials, and multi-impact assessments.
Carbon-Taxed Transformers: A Green Compression Pipeline for Overgrown Language Models cs.SE · 2026-04-28 · unverdicted · none · ref 39 · internal anchor
CTT is a compression pipeline for LLMs that achieves up to 49x memory reduction, 10x faster inference, 81% lower CO2 emissions, and retains 68-98% accuracy on code clone detection, summarization, and generation tasks.
AI-Native Autonomous Infrastructure (ANAI): A Formal Framework for the Next General-Purpose Technology eess.SY · 2026-04-27 · unverdicted · none · ref 29 · internal anchor
Introduces ANAI framework with Autonomy Index (AIx), Infrastructure Coupling Coefficient (ICC), and Technological Transition Potential (TTP) to model AI-driven infrastructural transition via nonlinear coevolution and recursive feedback loops.
minAction.net: Energy-First Neural Architecture Design -- From Biological Principles to Systematic Validation cs.LG · 2026-04-27 · conditional · none · ref 8 · internal anchor
Large-scale experiments show architecture performance depends on task type, not universality, and a single-parameter energy penalty reduces computational energy by ~1000x with negligible accuracy cost.
SymptomWise: A Deterministic Reasoning Layer for Reliable and Efficient AI Systems cs.AI · 2026-04-07 · unverdicted · none · ref 9 · internal anchor
SymptomWise uses expert knowledge and deterministic rules for diagnosis after LLM-based symptom extraction, achieving 88% top-5 accuracy on 42 challenging pediatric neurology cases.
Quantifying the Climate Risk of Generative AI: Region-Aware Carbon Accounting with G-TRACE and the AI Sustainability Pyramid cs.CY · 2025-11-06 · unverdicted · none · ref 28 · 2 links · internal anchor
G-TRACE provides region-aware estimates of GenAI carbon emissions including 4309 MWh and 2068 tCO2 for a 2024-2025 image generation trend, paired with a seven-level AI Sustainability Pyramid for policy guidance.
Less LLM, More Documents: Searching for Improved RAG cs.IR · 2025-10-03 · unverdicted · none · ref 22 · internal anchor
Corpus scaling in RAG frequently matches the accuracy gains from larger LLMs on open-domain QA tasks, with mid-sized models benefiting most due to better passage coverage.
Training speedups via batching for geometric learning: an analysis of static and dynamic algorithms cs.LG · 2025-02-02 · unverdicted · none · ref 28 · internal anchor
Experiments on QM9 and AFLOW datasets show that static and dynamic batching for GNNs can yield up to 2.7x training speedups depending on data, model, batch size, hardware, and training steps, with occasional differences in learning metrics.
CompPow: A Case for Component-level GPU Power Management cs.AR · 2026-05-21 · unverdicted · none · ref 25 · internal anchor
CompPow makes the case that component-aware power management inside GPUs can yield 10% higher energy efficiency and 5% better performance for ML workloads.
Transformer Scalability Crisis: The First Comprehensive Empirical Analysis of Performance Walls in Modern Language Models cs.LG · 2026-05-14 · unverdicted · none · ref 58 · internal anchor
Empirical tests on 118 transformers show success falling from 88.1% at 512 tokens to 0% at 2048 tokens, with compressed models achieving 649.2 tokens/sec/M parameters versus 12.5 for large generative ones.
Analytic Framework for Estimating Memory Cost cs.ET · 2026-05-03 · unverdicted · none · ref 1 · internal anchor
An analytic framework is introduced to estimate memory-related energy costs of AI models and quantify their ecological footprint.
Unbox Responsible GeoAI: Navigating Climate Extreme and Disaster Mapping cs.CY · 2026-05-01 · unverdicted · none · ref 11 · internal anchor
Responsible GeoAI for disaster mapping requires governance across data, applications, and society rather than algorithm improvements alone.
AI Infrastructure Sovereignty cs.NI · 2026-02-11 · unverdicted · none · ref 1 · internal anchor
AI sovereignty requires coordinated design of data centers, optical networks, and real-time control systems to operate within energy availability and sustainability constraints.
A Comprehensive Survey on Semantic Communication in Non-Terrestrial Networks: Architectures, Methodologies, and Challenges cs.IT · 2026-05-28 · unverdicted · none · ref 211 · internal anchor
A literature survey that pairs NTN limitations with semantic communication properties, organizes work by platform and methodology, and lists open problems for integrated SAGIN systems.
LLMs in Qualitative Research: Opportunities, Limitations, and Practical Considerations cs.HC · 2026-05-15 · unverdicted · none · ref 4 · internal anchor
The paper outlines opportunities, limitations, and practical parameters for integrating LLMs into qualitative research while aligning with epistemological commitments like reflexivity and interpretive judgment.
Integrated photonic computing: towards high-dimensional information processing physics.optics · 2026-05-14 · unverdicted · none · ref 24 · internal anchor
A review of integrated photonic computing that organizes low- to high-dimensional architectures and argues that exploiting light's full dimensionality offers a path to scalable, energy-efficient information processing.
Will LLMs Scaling Hit the Wall? Breaking Barriers via Distributed Resources on Massive Edge Devices cs.DC · 2025-03-11 · unverdicted · none · ref 7 · internal anchor
Position paper claiming that distributed training across massive edge devices can overcome data depletion and centralized compute monopolies in LLM scaling.

Carbon Emissions and Large Neural Network Training

hub tools

citation-role summary

citation-polarity summary

claims ledger

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer