Releases DTE-aerial-train (385K patches) and DTE-aerial-bench (25 global orthoimages) as the first harmonized multi-resolution datasets for joint tree cover and mortality segmentation across biomes.
hub Canonical reference
Quantifying the Carbon Emissions of Machine Learning
Canonical reference. 71% of citing Pith papers cite this work as background.
abstract
From an environmental standpoint, there are a few crucial aspects of training a neural network that have a major impact on the quantity of carbon that it emits. These factors include: the location of the server used for training and the energy grid that it uses, the length of the training procedure, and even the make and model of hardware on which the training takes place. In order to approximate these emissions, we present our Machine Learning Emissions Calculator, a tool for our community to better understand the environmental impact of training ML models. We accompany this tool with an explanation of the factors cited above, as well as concrete actions that individual practitioners and organizations can take to mitigate their carbon emissions.
hub tools
citation-role summary
citation-polarity summary
roles
background 7representative citing papers
Introduces the Amortized Efficiency Threshold (AET) to identify the deployment volume at which neural combinatorial optimization solvers achieve lower total energy use than heuristic baselines after accounting for training costs.
Nearly every arXiv submission leaks hidden sensitive information through its source files, existing cleaners fail, and ALC-NG provides a more reliable fix.
SAM 3 introduces promptable concept segmentation that doubles accuracy of prior systems on images and videos while improving standard SAM segmentation performance.
A promptable model trained on 1B masks achieves competitive zero-shot segmentation performance across tasks and is released publicly with its dataset.
OPT releases open decoder-only transformers up to 175B parameters that match GPT-3 performance at one-seventh the carbon cost, along with code and training logs.
Multitask fine-tuning of an encoder-decoder model on prompted datasets produces zero-shot generalization that often beats models up to 16 times larger on standard benchmarks.
Nf-PEAK is a containerized method that attributes energy to Nextflow tasks with 6.6% MAPE in isolated runs and 10.9% under co-located load, outperforming Kepler on nf-core workflows.
EnergyLens predicts multi-GPU LLM inference energy consumption with 9-13% MAPE and identifies configurations with up to 52x energy efficiency differences.
PEML co-optimizes continuous prompts and low-rank adaptations to deliver up to 6.67% average accuracy gains over existing multi-task PEFT methods on GLUE, SuperGLUE, and other benchmarks.
Inter-laboratory measurement variance dominates the generalization gap in PROTAC activity prediction, capping LOTO AUROC near 0.67 across models and architectures.
UniSD unifies self-distillation components for autoregressive LLMs and its full integrated version improves base models by 5.4 points and baselines by 2.8 points across six benchmarks.
GPU data centers participating in power grid frequency regulation via the EcoCenter framework can achieve exogenous carbon savings exceeding their operational emissions.
PlantTraitNet is an uncertainty-aware multimodal deep learning framework that infers four plant traits from citizen science images and produces global trait maps that outperform prior products when validated against independent survey data.
A single attacker can use strategic upvoting and downvoting on language model outputs to inject facts, security flaws, or fake news that persist in the model for all users after preference tuning.
SAM 2 delivers more accurate video segmentation with 3x fewer user interactions and 6x faster image segmentation than the original SAM by training a streaming-memory transformer on the largest video segmentation dataset collected to date.
StarCoder2-15B matches or beats CodeLlama-34B on code tasks despite being smaller, and StarCoder2-3B outperforms prior 15B models, with open weights and exact training data identifiers released.
DeepSpeed-Ulysses keeps communication volume constant for sequence-parallel attention when sequence length and device count scale together, delivering 2.5x faster training on 4x longer sequences than prior SOTA.
ART automatically generates multi-step reasoning programs with tool integration for LLMs, yielding substantial gains over few-shot and auto-CoT prompting on BigBench and MMLU while matching hand-crafted CoT on most tasks.
BLOOM is a 176B-parameter open-access multilingual language model trained on the ROOTS corpus that achieves competitive performance on benchmarks, with improved results after multitask prompted finetuning.
GPT-NeoX-20B is a publicly released 20B parameter autoregressive language model trained on the Pile that shows strong gains in five-shot reasoning over similarly sized prior models.
MIRAI is a unified index that combines five responsibility dimensions into one score for tabular models, demonstrating that predictive performance does not ensure high overall integrity.
LLM inference should be reframed and evaluated as energy-to-token production with a Token Production Function that accounts for power, cooling, and efficiency ceilings.
A two-step agentic system for extracting insights from VSM simulations achieves up to 86% accuracy with top LLMs by using progressive data discovery and slim context.
citing papers explorer
-
Multitask Prompted Training Enables Zero-Shot Task Generalization
Multitask fine-tuning of an encoder-decoder model on prompted datasets produces zero-shot generalization that often beats models up to 16 times larger on standard benchmarks.
-
Nf-PEAK: Process-Based Energy Attribution for Nextflow Workflows on Kubernetes Clusters
Nf-PEAK is a containerized method that attributes energy to Nextflow tasks with 6.6% MAPE in isolated runs and 10.9% under co-located load, outperforming Kepler on nf-core workflows.
-
SAM 2: Segment Anything in Images and Videos
SAM 2 delivers more accurate video segmentation with 3x fewer user interactions and 6x faster image segmentation than the original SAM by training a streaming-memory transformer on the largest video segmentation dataset collected to date.
-
Position: Graph Condensation Needs a Reset -- Move Beyond Full-dataset Training and Model-Dependence
The paper claims current graph condensation approaches are flawed due to full-dataset training requirements, high overhead, poor generalization, and misleading evaluation metrics, calling for a reset toward lightweight and architecture-agnostic methods.