Recognition: unknown
Carbon-Taxed Transformers: A Green Compression Pipeline for Overgrown Language Models
Pith reviewed 2026-05-07 15:48 UTC · model grok-4.3
The pith
Ordering compression techniques by a carbon-tax principle yields up to 49x memory reduction in language models for software engineering tasks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CTT operationalizes a computational carbon tax to order compression techniques, producing up to 49x memory reduction, 8-10x faster inference on clone detection, 3x on summarization, 4-7x on generation, up to 81% lower CO2 emissions, and accuracy retention of around 98% on clone detection, 89% on summarization, and up to 91% textual metrics with 68% pass@1 on generation.
What carries the argument
The carbon-tax ordering principle in the CTT pipeline, which systematically sequences compression steps to penalize architectural inefficiencies and reward efficient ones across model types.
If this is right
- Large language models become practical to deploy on modest hardware for clone detection, summarization, and generation in software engineering.
- The carbon emissions associated with running these models drop substantially, supporting more sustainable AI use in the field.
- The same pipeline ordering applies to encoder-only, encoder-decoder, and decoder-only architectures without architecture-specific redesign.
- Accuracy remains high enough on standard benchmarks to support real-world SE applications after compression.
Where Pith is reading between the lines
- The carbon-tax ordering might extend to non-SE language model tasks if the penalty metric is recalibrated for different objectives.
- Testing the pipeline on models larger than those evaluated here could reveal whether the gains scale or plateau.
- Economic metaphors like taxation could guide other efficiency optimizations in AI by making tradeoffs explicit and quantifiable.
Load-bearing premise
The assumption that ordering compression techniques according to a computational carbon-tax principle will reliably produce multiplicative efficiency gains without unacceptable accuracy loss across the tested architectures and SE tasks.
What would settle it
An experiment that applies the same compression components in random or alternative orderings and measures whether the efficiency-accuracy tradeoffs match or exceed those of the carbon-tax ordering on the same tasks and models.
read the original abstract
The accelerating adoption of Large Language Models (LLMs) in software engineering (SE) has brought with it a silent crisis: unsustainable computational cost. While these models demonstrate remarkable capabilities in different SE tasks, they are unmanageably large, slow to deploy, memory-intensive, and carbon-heavy. This reality threatens not only the scalability and accessibility of AI-powered SE, but also its long-term environmental sustainability. The research challenge is clear: we must go beyond accuracy and address efficiency and environmental cost as first-class design constraints. To meet this challenge, we introduce Carbon-Taxed Transformers (CTT), a systematic multi-architectural compression principled pipeline ordering inspired by economic carbon taxation principles. Drawing from the economic concept of carbon pricing, CTT operationalizes a computational carbon tax that penalizes architectural inefficiencies and rewards deployment-ready compression. We evaluate CTT across three core SE tasks: code clone detection, code summarization, and code generation, with models spanning encoder-only, encoder-decoder, and decoder-only architecture. Our results show that CTT delivers on inference: (1) up to 49x memory reduction, (2) time reduction up to 8-10x for clone detection, up to 3x for summarization, and 4-7x for generation, (3) up to 81% reduction in CO2 emissions and (4) CTT retains around 98% accuracy on clone detection, around 89% on summarization, and up to 91% (textual metrics) and 68% (pass@1) for generation. Two ablation studies show that pipeline ordering and individual component contributions are both essential, providing empirical justification for CTT's design and effectiveness. This work establishes a viable path toward responsible AI in SE through aggressive yet performance-preserving compression.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Carbon-Taxed Transformers (CTT), a systematic compression pipeline for LLMs in software engineering that orders techniques according to a computational carbon tax principle inspired by economic carbon pricing. It evaluates the pipeline on code clone detection, code summarization, and code generation tasks using encoder-only, encoder-decoder, and decoder-only models. Reported outcomes include up to 49x memory reduction, inference speedups of 8-10x (clone detection), 3x (summarization), and 4-7x (generation), up to 81% CO2 reduction, and accuracy retention of ~98% (clone detection), ~89% (summarization), and up to 91% textual / 68% pass@1 (generation). Two ablation studies are cited to establish that both the ordering and the individual components are essential.
Significance. If the empirical results prove robust, the work would offer a practical, environmentally-aware approach to LLM compression tailored to SE tasks, with the multi-architecture and multi-task evaluation providing a useful breadth. The framing of compression as a 'carbon-taxed' pipeline is a novel heuristic that could stimulate further research on sustainability-driven design choices. Strengths include the focus on inference metrics and the attempt to justify the pipeline via ablations. However, the absence of a formal definition for the carbon tax, detailed baselines, and statistical validation limits the immediate significance and generalizability of the claimed multiplicative gains.
major comments (2)
- [Section 3 and Section 5] Section 3 (CTT Pipeline) and Section 5 (Ablation Studies): The manuscript provides no formula, metric, or operational definition for the 'computational carbon tax' used to order compression techniques. This is load-bearing for the central claim, because the paper attributes the reported multiplicative efficiency gains (49x memory, 3-10x time, 81% CO2) specifically to this principled ordering, yet the ablations supply no quantitative comparison against alternative sequences or random orderings of the same components.
- [Section 4] Section 4 (Results): The accuracy and efficiency claims (e.g., 98% clone detection accuracy, 68% pass@1 for generation) are stated without error bars, confidence intervals, statistical tests, or per-model baseline tables comparing CTT against uncompressed models and against the same techniques applied in non-carbon-tax orderings. This directly affects verification of whether the carbon-tax ordering is necessary for acceptable accuracy retention across architectures.
minor comments (2)
- [Abstract and Section 4] The abstract and results sections use approximate phrasing ('around 98%', 'up to 49x') without accompanying tables that list exact values, standard deviations, or hardware/measurement details for CO2 and time metrics.
- [Section 3] No description is given of the specific compression techniques included in the pipeline or how their individual carbon costs were estimated prior to ordering.
Simulated Author's Rebuttal
We thank the referee for their thoughtful and constructive comments. We address each major comment point by point below and outline the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: [Section 3 and Section 5] Section 3 (CTT Pipeline) and Section 5 (Ablation Studies): The manuscript provides no formula, metric, or operational definition for the 'computational carbon tax' used to order compression techniques. This is load-bearing for the central claim, because the paper attributes the reported multiplicative efficiency gains (49x memory, 3-10x time, 81% CO2) specifically to this principled ordering, yet the ablations supply no quantitative comparison against alternative sequences or random orderings of the same components.
Authors: We agree that a formal operational definition is needed to make the central claim fully verifiable. Section 3 currently describes the carbon tax as a heuristic inspired by economic carbon pricing that prioritizes techniques with lower computational and environmental cost, but it lacks an explicit formula. In the revised manuscript we will add a precise metric: a carbon-tax score defined as a normalized weighted sum of memory footprint, inference latency, and estimated CO2 emissions for each technique, with techniques ordered by ascending score. For the ablation studies, the existing experiments already compare the full ordered pipeline against versions that omit ordering or individual components; however, we acknowledge the value of additional controls. We will include new results comparing the carbon-tax ordering against random permutations of the same techniques and against alternative heuristics (e.g., ordering by model size alone or by latency alone) to quantify the benefit of the proposed ordering. revision: yes
-
Referee: [Section 4] Section 4 (Results): The accuracy and efficiency claims (e.g., 98% clone detection accuracy, 68% pass@1 for generation) are stated without error bars, confidence intervals, statistical tests, or per-model baseline tables comparing CTT against uncompressed models and against the same techniques applied in non-carbon-tax orderings. This directly affects verification of whether the carbon-tax ordering is necessary for acceptable accuracy retention across architectures.
Authors: We accept this criticism on the presentation of empirical results. The current version reports point estimates without measures of variability or formal statistical comparisons. In the revision we will add standard deviations across repeated runs, 95% confidence intervals, and appropriate statistical tests (paired t-tests or Wilcoxon signed-rank tests) for all accuracy and efficiency metrics. We will also expand the baseline tables to show, for each model and task, the uncompressed baseline, the CTT pipeline, and the same compression techniques applied in non-carbon-tax orderings. These additions will allow direct assessment of whether the carbon-tax ordering is required to retain acceptable accuracy while delivering the reported efficiency gains. revision: yes
Circularity Check
No significant circularity; empirical pipeline without derivations or self-referential reductions
full rationale
The paper presents CTT as an empirical multi-stage compression pipeline for LLMs on SE tasks, with results from experiments and two ablation studies. No equations, derivations, or formal definitions appear in the abstract or described structure. The carbon-tax ordering is described as an inspirational principle operationalized into a pipeline, but without any quoted formula, fitted parameter, or self-citation chain that reduces a claimed result to its own inputs by construction. Ablations are invoked to support ordering and components, yet the absence of mathematical steps means no load-bearing claim reduces to a tautology or renamed fit. This is a standard empirical evaluation whose central claims rest on measured metrics rather than internal theoretical circularity.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption A principled ordering of standard compression techniques can produce multiplicative gains in memory, speed, and emissions while preserving task performance.
invented entities (1)
-
Computational carbon tax
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Wasi Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang. 2021. Unified Pre-training for Program Understanding and Generation. InProceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2655–2668. doi:10.18653/v1/2021.naacl-main.211 Proc. ACM Softw. Eng., Vol. 3,...
-
[2]
Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, et al. 2021. Program Synthesis with Large Language Models.arXiv preprint arXiv:2108.07732 (2021)
work page internal anchor Pith review arXiv 2021
-
[3]
Shamil Ayupov and Nadezhda Chirkova. 2022. Parameter-Efficient Finetuning of Transformers for Source Code. Proceedings of the Second Workshop on Efficient Natural Language and Speech Processing. https://neurips2022-enlsp. github.io/papers/paper_24.pdf
2022
-
[4]
Elad Ben Zaken, Yoav Goldberg, and Shauli Ravfogel. 2022. BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models. InProceedings of the 60th Annual Meeting of the Association for Compu- tational Linguistics. Association for Computational Linguistics, Dublin, Ireland, 1–9. doi:10.18653/v1/2022.acl-short.1
-
[5]
Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, and Song Han. 2020. Once-for-All: Train One Network and Specialize it for Efficient Deployment. InProceedings of the Eighth International Conference on Learning Representations. https://openreview.net/forum?id=HylxE1HKwS
2020
-
[6]
, Xie X: A survey on evaluation of large language models
Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Linyi Yang, Kaijie Zhu, Hao Chen, Xiaoyuan Yi, Cunxiang Wang, Yidong Wang, Wei Ye, Yue Zhang, Yi Chang, Philip S. Yu, Qiang Yang, and Xing Xie. 2024. A Survey on Evaluation of Large Language Models.ACM transactions on intelligent systems and technology15, 3, 1–45. doi:10.1145/3641289
-
[7]
Daoyuan Chen, Yaliang Li, Minghui Qiu, Zhen Wang, Bofang Li, Bolin Ding, Hongbo Deng, Jun Huang, Wei Lin, and Jingren Zhou. 2021. AdaBERT: task-adaptive BERT compression with differentiable neural architecture search. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence. Article 341, 2463-2469 pages
2021
-
[8]
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde De Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al . 2021. Evaluating large language models trained on code.arXiv preprint arXiv:2107.03374
work page internal anchor Pith review arXiv 2021
-
[9]
de Araújo and JPW and MinervaBooks , title =
Benoit Courty, Victor Schmidt, Sasha Luccioni, Goyal-Kamal, MarionCoutarel, Boris Feld, et al . 2024.mlco2/codecarbon: v2.4.1. doi:10.5281/zenodo.11171501
-
[10]
Tim Dettmers, Mike Lewis, Younes Belkada, and Luke Zettlemoyer. 2022. LLM.int8(): 8-bit matrix multiplication for transformers at scale. InProceedings of the 36th International Conference on Neural Information Processing Systems. New Orleans, LA, USA, Article 2198, 30318-30332 pages
2022
-
[11]
Thomas Elsken, Jan Hendrik Metzen, and Frank Hutter. 2019. Neural architecture search: A survey.Journal of Machine Learning Research20, 55, 1–21
2019
-
[12]
Sol Farahmand. 2025. Working with LLMS: Using Lora vs quantization vs both. https://medium.com/@sol. farahmand1986/working-with-llms-using-lora-vs-quantization-vs-both-8b20c7db427d
2025
-
[13]
Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Findings of the Association for Computational Linguistics: Empirical Methods in Natural Language Processing. 1536–1547. doi:10.18653/v1/2020.findi...
-
[14]
Jonathan Frankle and Michael Carbin. 2019. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks. InInternational Conference on Learning Representations. https://openreview.net/forum?id=rJl-b3RcF7
2019
-
[15]
Elias Frantar, Saleh Ashkboos, Torsten Hoefler, and Dan Alistarh. 2022. Gptq: Accurate post-training quantization for generative pre-trained transformers.arXiv preprint arXiv:2210.17323(2022)
work page internal anchor Pith review arXiv 2022
-
[16]
Mitchell Gordon, Kevin Duh, and Nicholas Andrews. 2020. Compressing bert: Studying the effects of weight pruning on transfer learning. InProceedings of the 5th Workshop on Representation Learning for NLP. 143–155
2020
-
[17]
Daya Guo, Shuai Lu, Nan Duan, Yanlin Wang, Ming Zhou, and Jian Yin. 2022. UniXcoder: Unified Cross-Modal Pre-training for Code Representation. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Dublin, Ireland, 7212–7225. doi:10.18653/v1/2022.acl-long.499
-
[18]
Xinyi Hou, Yanjie Zhao, Yue Liu, Zhou Yang, Kailong Wang, Li Li, Xiapu Luo, David Lo, John Grundy, and Haoyu Wang. 2024. Large language models for software engineering: A systematic literature review.ACM Transactions on Software Engineering and Methodology33, 8, 1–79
2024
-
[19]
Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. 2019. Parameter-Efficient Transfer Learning for NLP. InProceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 97). 2790–2799. https://proceedings.mlr.pre...
2019
-
[20]
Edward J Hu, yelong shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen
-
[21]
https://openreview.net/forum?id=nZeVKeeFYf9
LoRA: Low-Rank Adaptation of Large Language Models.Proceedings of the Tenth International Conference on Learning Representations(2022). https://openreview.net/forum?id=nZeVKeeFYf9
2022
-
[22]
Binyuan Hui, Jian Yang, Zeyu Cui, Jiaxi Yang, Dayiheng Liu, Lei Zhang, Tianyu Liu, Jiajun Zhang, Bowen Yu, Kai Dang, et al. 2024. Qwen2. 5-Coder Technical Report.arXiv preprint arXiv:2409.12186
work page internal anchor Pith review arXiv 2024
-
[23]
Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis Allamanis, and Marc Brockschmidt. 2019. Codesearchnet challenge: Evaluating the state of semantic code search.arXiv preprint arXiv:1909.09436. Proc. ACM Softw. Eng., Vol. 3, No. FSE, Article FSE047. Publication date: July 2026. Preprint FSE047:22 Ajmain Inqiad Alam, Palash Roy, Chanchal K. Roy, Banani R...
work page internal anchor Pith review arXiv 2019
-
[24]
Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. 2018. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2704–2713
2018
-
[25]
Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, and Qun Liu. 2020. TinyBERT: Distilling BERT for Natural Language Understanding. InFindings of the Association for Computational Linguistics: Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 4163–4174. doi:10.18653/ v1/2020.finding...
2020
-
[26]
Sehoon Kim, Sheng Shen, David Thorsley, Amir Gholami, Woosuk Kwon, Joseph Hassoun, and Kurt Keutzer. 2022. Learned token pruning for transformers. InProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 784–794
2022
- [27]
-
[28]
Raymond Li, Loubna Ben allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia LI, Jenny Chim, Qian Liu, Evgenii Zheltonozhskii, Terry Yue Zhuo, Thomas Wang, Olivier Dehaene, Joel Lamy-Poirier, Joao Monteiro, Nicolas Gontier, Ming-Ho Yee, Logesh Kumar Umapathi, Jian Zhu, Ben Lipkin, Muhtasham Oblokulov, Zhi...
2023
-
[29]
Xiang Lisa Li and Percy Liang. 2021. Prefix-Tuning: Optimizing Continuous Prompts for Generation. InProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, Online, 4582–4597. doi:10.18653/v1/2021.acl- long.353
-
[30]
Yujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, Rémi Leblond, Tom Eccles, James Keeling, Felix Gimeno, Agustin Dal Lago, et al . 2022. Competition-level code generation with alphacode.Science378, 6624, 1092–1097
2022
-
[31]
Ji Lin, Wei-Ming Chen, Yujun Lin, Chuang Gan, Song Han, et al . 2020. Mcunet: Tiny deep learning on iot devices. Advances in neural information processing systems33 (2020), 11711–11722
2020
-
[32]
Ji Lin, Jiaming Tang, Haotian Tang, Shang Yang, Wei-Ming Chen, Wei-Chen Wang, Guangxuan Xiao, Xingyu Dang, Chuang Gan, and Song Han. 2024. Awq: Activation-aware weight quantization for on-device llm compression and acceleration.Proceedings of machine learning and systems6, 87–100
2024
-
[33]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach.arXiv preprint arXiv:1907.11692 (2019)
work page internal anchor Pith review arXiv 2019
-
[34]
Sasha Luccioni, Yacine Jernite, and Emma Strubell. 2024. Power Hungry Processing: Watts Driving the Cost of AI Deployment?. InProceedings of the ACM Conference on Fairness, Accountability, and Transparency. Association for Computing Machinery, Rio de Janeiro, Brazil, 85–99. doi:10.1145/3630106.3658542
-
[35]
Rabeeh Karimi Mahabadi, James Henderson, and Sebastian Ruder. 2021. COMPACTER: efficient low-rank hypercomplex adapter layers. InProceedings of the 35th International Conference on Neural Information Processing Systems. Article 79, 1022–1035 pages
2021
-
[36]
Paul Michel, Omer Levy, and Graham Neubig. 2019. Are Sixteen Heads Really Better than One?. InAdvances in Neural Information Processing Systems, Vol. 32. 14037–14047. https://proceedings.neurips.cc/paper_files/paper/2019/file/ 2c601ad9d2ff9bc8b282670cdd54f69f-Paper.pdf
2019
-
[37]
Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, and Caiming Xiong. 2023. CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. InThe Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=iaYcJKpY2B_
2023
-
[38]
David Patterson, Joseph Gonzalez, Urs Hölzle, Quoc Le, Chen Liang, Lluis-Miquel Munguia, Daniel Rothchild, David R So, Maud Texier, and JeffDean. 2022. The carbon footprint of machine learning training will plateau, then shrink. Computer55, 7, 18–28
2022
-
[39]
David Patterson, Joseph Gonzalez, Quoc Le, Chen Liang, Lluis-Miquel Munguia, Daniel Rothchild, David So, Maud Texier, and JeffDean. 2021. Carbon emissions and large neural network training.arXiv preprint arXiv:2104.10350
work page internal anchor Pith review arXiv 2021
-
[40]
Sida Peng, Eirini Kalliamvakou, Peter Cihon, and Mert Demirer. 2023. The impact of ai on developer productivity: Evidence from github copilot.arXiv preprint arXiv:2302.06590. Proc. ACM Softw. Eng., Vol. 3, No. FSE, Article FSE047. Publication date: July 2026. Preprint Carbon-Taxed Transformers: A Green Compression Pipeline for Overgrown Language Models FSE047:23
work page internal anchor Pith review arXiv 2023
-
[41]
Salesforce Research. 2023. Discussion on CodeT5+ model execution error. https://huggingface.co/Salesforce/codet5p- 2b/discussions/9. Accessed: 2025-09-09
2023
-
[42]
Salesforce Research. 2023. GitHub Issue: AssertionError when running CodeT5+ models. https://github.com/salesforce/ CodeT5/issues/192. Accessed: 2025-09-09
2023
-
[43]
2011.Wilcoxon-Signed-Rank Test
Denise Rey and Markus Neuhäuser. 2011.Wilcoxon-Signed-Rank Test. Springer Berlin Heidelberg, Berlin, Heidelberg, 1658–1659. doi:10.1007/978-3-642-04898-2_616
-
[44]
Mootez Saad, José Antonio Hernández López, Boqi Chen, Dániel Varró, and Tushar Sharma. 2025. An Adaptive Language-Agnostic Pruning Method for Greener Language Models for Code.Proceedings ACM Software Engineering2, Foundations of Software Engineering, Article FSE054, 1183-1204 pages. doi:10.1145/3715773
-
[45]
Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter.arXiv preprint arXiv:1910.01108
work page internal anchor Pith review arXiv 2019
-
[46]
Victor Sanh, Thomas Wolf, and Alexander M. Rush. 2020. Movement pruning: adaptive sparsity byfine-tuning. In Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver, BC, Canada, Article 1711, 20378-20389 pages
2020
-
[47]
please” and “thank you
Toria Sheffield. 2025. OpenAI CEO claims saying “please” and “thank you” to CHATGPT costs “tens of millions of dollars” - here’s why.People.com(Apr 2025). https://people.com/open-ai-ceo-claims-saying-please-thank-you-to- chatgpt-costs-millions-of-dollars-in-electricity-bills-11721523
2025
-
[48]
Sheng Shen, Zhen Dong, Jiayu Ye, Linjian Ma, Zhewei Yao, Amir Gholami, Michael W Mahoney, and Kurt Keutzer
-
[49]
InProceedings of the AAAI Conference on Artificial Intelligence, Vol
Q-bert: Hessian based ultra low precision quantization of bert. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 8815–8821
-
[50]
Fu, Zhiqiang Xie, Beidi Chen, Clark W
Ying Sheng, Lianmin Zheng, Binhang Yuan, Zhuohan Li, Max Ryabinin, Daniel Y. Fu, Zhiqiang Xie, Beidi Chen, Clark W. Barrett, Joseph Gonzalez, Percy Liang, Christopher Ré, Ion Stoica, and Ce Zhang. 2023. High-throughput Generative Inference of Large Language Models with a Single GPU. InInternational Conference on Machine Learning
2023
-
[51]
Jieke Shi, Zhou Yang, Hong Jin Kang, Bowen Xu, Junda He, and David Lo. 2024. Greening large language models of code. InProceedings of the 46th international conference on software engineering: software engineering in society. 142–153
2024
-
[52]
Jieke Shi, Zhou Yang, Bowen Xu, Hong Jin Kang, and David Lo. 2022. Compressing pre-trained models of code into 3 mb. InProceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. 1–12
2022
-
[53]
David So, Wojciech Mańke, Hanxiao Liu, Zihang Dai, Noam Shazeer, and Quoc V Le. 2021. Searching for efficient transformers for language modeling.Advances in neural information processing systems34, 6010–6022
2021
-
[54]
Emma Strubell, Ananya Ganesh, and Andrew McCallum. 2020. Energy and policy considerations for modern deep learning research. InProceedings of the AAAI conference on artificial intelligence, Vol. 34. 13693–13696
2020
-
[55]
Student. 1908. The probable error of a mean.Biometrika(1908), 1–25
1908
-
[56]
Chia-Yi Su and Collin McMillan. 2024. Distilled GPT for source code summarization.Automated Software Engineering 31, 1, 22. doi:10.1007/s10515-024-00421-4
-
[57]
Jeffrey Svajlenko and Chanchal K Roy. 2021. Bigclonebench. InCode Clone Analysis: Research, Tools, and Practices. Springer, 93–105
2021
-
[58]
Mingxing Tan and Quoc Le. 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 97). 6105–6114. https://proceedings.mlr.press/v97/tan19a.html
2019
- [59]
-
[60]
Elena Voita, David Talbot, Fedor Moiseev, Rico Sennrich, and Ivan Titov. 2019. Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned. InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy, 5797–5808. doi:10.18653/v1/P19-1580
-
[61]
Junjie Wang, Yuchao Huang, Chunyang Chen, Zhe Liu, Song Wang, and Qing Wang. 2024. Software testing with large language models: Survey, landscape, and vision.IEEE Transactions on Software Engineering50, 4, 911–936
2024
-
[62]
Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, Nan Yang, and Ming Zhou. 2020. Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers.Advances in neural information processing systems33, 5776–5788
2020
-
[63]
Yue Wang, Hung Le, Akhilesh Gotmare, Nghi Bui, Junnan Li, and Steven Hoi. 2023. CodeT5+: Open Code Large Language Models for Code Understanding and Generation. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1069–1088. doi:10.18653/v1/2023.emnlp-main.68
-
[64]
Yue Wang, Weishi Wang, Shafiq Joty, and Steven C.H. Hoi. 2021. CodeT5: Identifier-aware Unified Pre-trained Encoder- Decoder Models for Code Understanding and Generation. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 8696–8708. doi:10.18653/v1/2021.emnlp- Proc. ACM Softw. ...
-
[65]
Xiaokai Wei, Sujan Kumar Gonugondla, Shiqi Wang, Wasi Ahmad, Baishakhi Ray, Haifeng Qian, Xiaopeng Li, Varun Kumar, Zijian Wang, Yuchen Tian, et al . 2023. Towards greener yet powerful code generation via quantization: An empirical study. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Softw...
2023
-
[66]
Min-Hsien Weng, Shaoqun Wu, and Mark Dyer. 2022. Identification and visualization of key topics in scientific publications with transformer-based language models and document clustering methods.Applied Sciences12, 21, 11220. doi:10.3390/app122111220
-
[67]
Frank F Xu, Uri Alon, Graham Neubig, and Vincent Josua Hellendoorn. 2022. A systematic evaluation of large language models of code. InProceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming. 1–10
2022
-
[68]
Jin Xu, Xu Tan, Renqian Luo, Kaitao Song, Jian Li, Tao Qin, and Tie-Yan Liu. 2021. NAS-BERT: Task-agnostic and adaptive-size BERT compression with neural architecture search. InProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 1933–1943
2021
-
[69]
An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, et al . 2024. Qwen2 Technical Report.arXiv preprint arXiv:2407.10671
work page internal anchor Pith review arXiv 2024
-
[70]
Yutao Yang, Jie Zhou, Xuanwen Ding, Tianyu Huai, Shunyu Liu, Qin Chen, Yuan Xie, and Liang He. 2025. Recent advances of foundation language models-based continual learning: A survey.Comput. Surveys57, 5, 1–38
2025
-
[71]
Ofir Zafrir, Guy Boudoukh, Peter Izsak, and Moshe Wasserblat. 2019. Q8BERT: Quantized 8Bit BERT. InFifth Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS Edition (EMC2-NIPS). IEEE Computer Society, Los Alamitos, CA, USA, 36–39. doi:10.1109/EMC2-NIPS53020.2019.00016
-
[72]
Bojian Zheng, Ziheng Jiang, Cody Hao Yu, Haichen Shen, Joshua Fromm, Yizhi Liu, Yida Wang, Luis Ceze, Tianqi Chen, and Gennady Pekhimenko. 2022. DietCode: Automatic optimization for dynamic tensor programs.Proceedings of Machine Learning and Systems4, 848–863
2022
-
[73]
Shuchang Zhou, Zekun Ni, Xinyu Zhou, He Wen, Yuxin Wu, and Yuheng Zou. 2016. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients.ArXivabs/1606.06160. https://api.semanticscholar.org/ CorpusID:14395129 Proc. ACM Softw. Eng., Vol. 3, No. FSE, Article FSE047. Publication date: July 2026
work page Pith review arXiv 2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.