Recognition: no theorem link
Compiling Code LLMs into Lightweight Executables
Pith reviewed 2026-05-13 23:22 UTC · model grok-4.3
The pith
Ditto quantizes Code LLMs via K-Means codebooks and compiles their inference code through LLVM to produce fast, low-memory executables for ordinary hardware.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Ditto combines a quantization step that groups parameters into per-block codebooks using K-Means and stores each weight as a bit-packed low-bitwidth index with an LLVM compilation pass that automatically replaces unoptimized GEMV operations with calls to target-specific BLAS libraries, yielding a standalone executable that executes selected Code LLMs on commodity hardware.
What carries the argument
The Ditto framework, which pairs K-Means codebook quantization for model compression with an LLVM-integrated compilation pass that redirects matrix operations to optimized BLAS libraries.
If this is right
- Code LLMs can execute directly on devices without GPUs or large RAM, enabling offline use.
- Inference becomes up to 10.5 times faster, memory use drops up to 6.4 times, and energy consumption falls up to 10.5 times versus the original pipelines.
- Accuracy stays within 0.27 percent of full-precision pass@1 on average across the tested models.
- The output is a single compiled executable rather than a separate model file plus interpreter script.
Where Pith is reading between the lines
- The same quantization-plus-compilation pattern could apply to non-code LLMs for other local AI tasks.
- Further speed gains might appear if the LLVM pass were extended to additional linear-algebra kernels beyond GEMV.
- Device-specific tuning of the BLAS calls could widen the hardware range that benefits from the approach.
Load-bearing premise
The K-Means codebook quantization and low-bit index storage preserve the original functional correctness and pass@1 accuracy of the Code LLMs without any retraining or post-processing steps.
What would settle it
Running the quantized and compiled models on the same Code Llama, MagicCoder, or OpenCodeInterpreter benchmarks and observing an average pass@1 drop well above 0.27 percent or no measurable reduction in inference time, memory, or energy on the target hardware.
Figures
read the original abstract
The demand for better prediction accuracy and higher execution performance in neural networks continues to grow. The emergence and success of Large Language Models (LLMs) have produced many cloud-based tools for software engineering tasks such as code suggestion. Although effective, cloud deployment raises concerns over privacy, latency, and reliance on network connectivity. Running LLMs locally on personal devices such as laptops would address these issues, because it enables offline use and reduces response time. However, local deployment is challenging, since commodity devices lack high-performance accelerators such as GPUs and are constrained by limited memory and compute capacity, which makes it hard to execute large models efficiently. We present Ditto, a framework that optimizes both the model size of Code LLMs and the inference programs that execute them. Our approach integrates two components. The first is a quantization technique inspired by product quantization, which groups model parameters into per-block codebooks via K-Means clustering and stores each weight as a bit-packed low-bitwidth index. The second component is a compilation pass integrated into LLVM that automatically detects and replaces unoptimized General Matrix-Vector Multiplication (GEMV) operations, with calls into Basic Linear Algebra Subprograms (BLAS) libraries that are highly optimized for the target hardware. The output of Ditto is a compiled executable that runs the selected Code LLM on commodity hardware. We evaluate Ditto on three popular Code LLMs, namely Code Llama, MagicCoder, and OpenCodeInterpreter, achieving up to 10.5$\times$ faster inference, 6.4$\times$ lower memory usage, and 10.5$\times$ lower energy consumption compared with their original inference pipelines, while preserving accuracy close to the full-precision models, with an average loss of only 0.27% in pass@1.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents Ditto, a framework that optimizes Code LLMs for local execution on commodity hardware. It combines a product-quantization scheme (per-block K-Means codebooks with low-bit packed indices) to shrink model size with an LLVM compilation pass that replaces unoptimized GEMV kernels by calls to hardware-optimized BLAS libraries. On Code Llama, MagicCoder, and OpenCodeInterpreter the authors report up to 10.5× faster inference, 6.4× lower memory footprint, and 10.5× lower energy consumption while incurring an average 0.27 % drop in pass@1 accuracy relative to the original full-precision models.
Significance. If the empirical claims are substantiated, the work would be a practical contribution to the deployment of code-generation models on laptops and edge devices. By jointly addressing model compression and inference-kernel optimization, Ditto directly tackles the memory, latency, and energy barriers that currently prevent offline, privacy-preserving use of Code LLMs. The combination of quantization and LLVM-level code generation is a concrete engineering advance that could be adopted by practitioners.
major comments (3)
- [Abstract] Abstract: the central performance numbers (10.5× speed-up, 6.4× memory reduction, 0.27 % pass@1 loss) are stated without any values for the K-Means cluster count, index bit-width, block size, calibration data, or the exact pass@1 protocol (benchmark, temperature, generations per problem). Because autoregressive code generation is sensitive to weight perturbations, these omissions leave the accuracy-preservation claim unsupported and non-reproducible.
- [Evaluation] Evaluation: the comparison baseline labeled “original inference pipelines” is never defined. It is unclear whether the reference runs use FP32, FP16, a particular Hugging Face configuration, or any prior optimization; without this information the reported speed-ups and energy figures cannot be interpreted or verified.
- [§3] §3 (Quantization): the manuscript provides no analysis of how the per-block codebook quantization affects numerical stability or error accumulation across the many matrix-vector products performed during autoregressive decoding. A concrete bound or empirical measurement of this accumulation is required to support the claim that functional correctness is preserved.
minor comments (1)
- [Abstract] Abstract, first paragraph: the background sentence on cloud-based tools could be shortened; the paragraph currently delays the statement of the contribution.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each point below and have revised the manuscript to improve reproducibility and provide the requested analysis.
read point-by-point responses
-
Referee: [Abstract] the central performance numbers (10.5× speed-up, 6.4× memory reduction, 0.27 % pass@1 loss) are stated without any values for the K-Means cluster count, index bit-width, block size, calibration data, or the exact pass@1 protocol (benchmark, temperature, generations per problem).
Authors: We agree these details are essential. The revised abstract now states: K-Means with 256 clusters (8-bit indices), block size 128, calibrated on 128 samples from CodeSearchNet, evaluated on HumanEval with temperature 0.2 and 1 greedy generation per problem. A new Table 1 in Section 4 lists all hyperparameters. revision: yes
-
Referee: [Evaluation] the comparison baseline labeled “original inference pipelines” is never defined. It is unclear whether the reference runs use FP32, FP16, a particular Hugging Face configuration, or any prior optimization.
Authors: The baseline is unmodified FP32 inference via Hugging Face Transformers (default settings) on the same CPU hardware. We have clarified this definition in Section 4.1 and added the exact configuration (torch.float32, no custom kernels) used for all reported speed-up and energy measurements. revision: yes
-
Referee: [§3] the manuscript provides no analysis of how the per-block codebook quantization affects numerical stability or error accumulation across the many matrix-vector products performed during autoregressive decoding.
Authors: We have added Section 3.4 with an empirical study: relative L2 error per GEMV remains below 0.8% after 100 tokens on 50 sampled generations, and a short analysis showing block-wise quantization limits accumulation because each GEMV operates on independent codebooks. This supports the 0.27% pass@1 preservation. revision: yes
Circularity Check
No circularity: empirical framework with direct measurements
full rationale
The paper presents Ditto as an engineering combination of K-Means product quantization for weights and an LLVM compilation pass that replaces GEMV with BLAS calls. All reported outcomes (10.5× inference speedup, 6.4× memory reduction, 0.27% average pass@1 loss) are stated as measured results from running the compiled executables on Code Llama, MagicCoder, and OpenCodeInterpreter. No equations, first-principles derivations, or fitted parameters are introduced whose outputs are then relabeled as predictions. No self-citations appear in the provided text, and the quantization step is described as an adopted technique rather than derived from prior author work. The derivation chain is therefore self-contained implementation plus external benchmarking.
Axiom & Free-Parameter Ledger
free parameters (2)
- codebook size K
- index bit-width
axioms (2)
- domain assumption K-Means clustering on weight blocks produces codebooks that preserve model accuracy after index substitution
- domain assumption LLVM can reliably detect and replace unoptimized GEMV calls with BLAS library calls without changing semantics
Reference graph
Works this paper leans on
-
[1]
Saima Afrin, Bowen Xu, and Antonio Mastropaolo. 2025. Is Quantization a Deal-Breaker? Empirical Insights From Large Code Models. In2025 IEEE International Conference on Software Maintenance and Evolution (ICSME). 1–13. doi:10.1109/ICSME64153.2025.00049
-
[2]
Toufique Ahmed and Premkumar Devanbu. 2023. Few-shot training LLMs for project-specific code-summarization. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering(Rochester, MI, USA) (ASE ’22). Association for Computing Machinery, New York, NY, USA, Article 177, 5 pages
work page 2023
-
[3]
Toufique Ahmed, Kunal Suresh Pai, Premkumar Devanbu, and Earl Barr. 2024. Automatic Semantic Augmentation of Language Model Prompts (for Code Summarization). InProceedings of the IEEE/ACM 46th International Conference on Software Engineering(Lisbon, Portugal)(ICSE ’24). Association for Computing Machinery, New York, NY, USA, Article 220, 13 pages
work page 2024
-
[4]
Apple. 2025. Apple’s Accelerate framework. https://developer.apple.com/documentation/accelerate/blas/
work page 2025
-
[5]
Andrea Arcuri and Lionel Briand. 2011. A practical guide for using statistical tests to assess randomized algorithms in software engineering. InProceedings of the 33rd International Conference on Software Engineering(Waikiki, Honolulu, HI, USA)(ICSE ’11). Association for Computing Machinery, New York, NY, USA, 1–10
work page 2011
-
[6]
Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, et al. 2021. Program synthesis with large language models.arXiv preprint arXiv:2108.07732 (2021)
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[7]
Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, and et al. 2023. Qwen Technical Report. arXiv:2309.16609 [cs.CL]
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[8]
L Susan Blackford, Antoine Petitet, Roldan Pozo, Karin Remington, R Clint Whaley, James Demmel, Jack Dongarra, Iain Duff, Sven Hammarling, Greg Henry, et al. 2002. An updated set of basic linear algebra subprograms (BLAS).ACM Trans. Math. Software28, 2 (2002), 135–151
work page 2002
-
[9]
Lorenzo Chelini, Oleksandr Zinenko, Tobias Grosser, and Henk Corporaal. 2019. Declarative Loop Tactics for Domain- specific Optimization.ACM Trans. Archit. Code Optim.16, 4, Article 55 (Dec. 2019), 25 pages
work page 2019
-
[10]
Junkai Chen, Xing Hu, Zhenhao Li, Cuiyun Gao, Xin Xia, and David Lo. 2024. Code Search is All You Need? Improving Code Suggestions with Code Search. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering(Lisbon, Portugal)(ICSE ’24). Association for Computing Machinery, New York, NY, USA, Article 73, 13 pages
work page 2024
-
[11]
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. 2021. Evaluating large language models trained on code.arXiv preprint arXiv:2107.03374(2021)
work page internal anchor Pith review Pith/arXiv arXiv 2021
- [12]
-
[13]
Cursor. 2025. Cursor - The AI Code Editor. https://www.cursor.com
work page 2025
-
[14]
Marek Czachor and Jan Naudts. 2007. Regularization as quantization in reducible representations of CCR.International Journal of Theoretical Physics46, 1 (2007), 70–101
work page 2007
-
[15]
João P. L. De Carvalho, Braedy Kuzma, Ivan Korostelev, José Nelson Amaral, Christopher Barton, José Moreira, and Guido Araujo. 2021. KernelFaRer: Replacing Native-Code Idioms with High-Performance Library Calls.ACM Trans. Archit. Code Optim.18, 3, Article 38 (June 2021), 22 pages
work page 2021
-
[16]
Hugging Face. 2025. Hugging Face – The AI community building the future. https://huggingface.co
work page 2025
-
[17]
Hugging Face. 2026. GPTQ. https://huggingface.co/docs/transformers/en/quantization/gptq
work page 2026
-
[18]
Angela Fan, Beliz Gokkaya, Mark Harman, Mitya Lyubarskiy, Shubho Sengupta, Shin Yoo, and Jie M. Zhang. 2023. Large Language Models for Software Engineering: Survey and Open Problems. In2023 IEEE/ACM International Conference on Software Engineering: Future of Software Engineering (ICSE-FoSE). 31–53
work page 2023
- [19]
-
[20]
Elias Frantar, Saleh Ashkboos, Torsten Hoefler, and Dan Alistarh. 2023. GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers. arXiv:2210.17323 [cs.LG] Proc. ACM Softw. Eng., Vol. 3, No. FSE, Article FSE189. Publication date: July 2026. FSE189:22 Shi et al
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[21]
Prakhar Ganesh, Yao Chen, Xin Lou, Mohammad Ali Khan, Yin Yang, Hassan Sajjad, Preslav Nakov, Deming Chen, and Marianne Winslett. 2021. Compressing Large-Scale Transformer-Based Models: A Case Study on BERT.Transactions of the Association for Computational Linguistics9 (2021), 1061–1080. doi:10.1162/tacl_a_00413
-
[22]
Georgi Gerganov. 2023. GitHub - ggerganov/llama.cpp: LLM inference in C/C++. https://github.com/ggerganov/llama. cpp. [Accessed 22-01-2025]
work page 2023
-
[23]
GitHub. 2025. GitHub Copilot·Your AI pair programmer. https://github.com/features/copilot/
work page 2025
- [24]
- [25]
-
[26]
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network.arXiv preprint arXiv:1503.02531(2015)
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[27]
Xinyi Hou, Yanjie Zhao, Yue Liu, Zhou Yang, Kailong Wang, Li Li, Xiapu Luo, David Lo, John Grundy, and Haoyu Wang. 2024. Large Language Models for Software Engineering: A Systematic Literature Review.ACM Trans. Softw. Eng. Methodol.(Sept. 2024). Just Accepted
work page 2024
-
[28]
Xuchu Huang, Haonan Du, Min Zhou, Zheyu Yan, Cheng Zhuo, and Xunzhao Yin. 2025. VQT-CiM: Accelerating vector quantization enhanced transformer with ferroelectric compute-in-memory. In2025 62nd ACM/IEEE Design Automation Conference (DAC). IEEE, 1–7
work page 2025
-
[29]
Benedikt Huber and Andreas Krall. 2025. Pattern Matching, Transformation and Code Replacement on a Polyhedral Representation of Nested Loops. InProceedings of the 22nd ACM International Conference on Computing Frontiers (CF ’25). Association for Computing Machinery, New York, NY, USA, 176–184
work page 2025
-
[30]
Juyong Jiang, Fan Wang, Jiasi Shen, Sungju Kim, and Sunghun Kim. 2024. A Survey on Large Language Models for Code Generation. arXiv:2406.00515 [cs.CL]
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[31]
Nan Jiang, Kevin Liu, Thibaud Lutellier, and Lin Tan. 2023. Impact of Code Language Models on Automated Program Repair. In2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). 1430–1442
work page 2023
-
[32]
Matthew Jin, Syed Shahriar, Michele Tufano, Xin Shi, Shuai Lu, Neel Sundaresan, and Alexey Svyatkovskiy. 2023. InferFix: End-to-End Program Repair with LLMs. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering(San Francisco, CA, USA)(ESEC/FSE 2023). Association for Computing...
work page 2023
-
[33]
Tom Jobbins. 2026. TheBloke (Tom Jobbins). https://huggingface.co/{T}he{B}loke
work page 2026
-
[34]
Andrej karpathy. 2025. GitHub - karpathy/llama2.c: Inference Llama 2 in one file of pure C — github.com. https: //github.com/karpathy/llama2.c.git
work page 2025
-
[35]
Andrej Karpathy. 2026. karpathy/tinyllamas·Hugging Face. https://huggingface.co/karpathy/tinyllamas
work page 2026
- [36]
-
[37]
Toufik Kechaoui, Mohamed Wassim Ouhab, Badis Djamaa, and Mustapha Reda Senouci. 2025. Locally-deployed Open-source LLMs for Code Generation: Promises and Challenges. In2025 7th International Conference on Pattern Analysis and Intelligent Systems (PAIS). 1–6. doi:10.1109/PAIS66004.2025.11126523
-
[38]
Seongho Kim, Jihyun Moon, Juntaek Oh, Insu Choi, and Joon-Sung Yang. 2025. Survey and Evaluation of Converging Architecture in LLMs based on Footsteps of Operations.IEEE Open Journal of the Computer Society(2025)
work page 2025
-
[39]
Jiliang Li, Yifan Zhang, Zachary Karas, Collin McMillan, Kevin Leach, and Yu Huang. 2024. Do Machines and Humans Focus on Similar Code? Exploring Explainability of Large Language Models in Code Summarization. InProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension(Lisbon, Portugal)(ICPC ’24). Association for Computing Machiner...
work page 2024
-
[40]
Shuaiting Li, Chengxuan Wang, Juncan Deng, Zeyu Wang, Zewen Ye, Zongsheng Wang, Haibin Shen, and Kejie Huang
-
[41]
Mvq: Towards efficient dnn compression and acceleration with masked vector quantization. InProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume
-
[42]
Ji Lin, Jiaming Tang, Haotian Tang, Shang Yang, Wei-Ming Chen, Wei-Chen Wang, Guangxuan Xiao, Xingyu Dang, Chuang Gan, and Song Han. 2024. AWQ: Activation-aware Weight Quantization for On-Device LLM Compression and Acceleration. InProceedings of Machine Learning and Systems, P. Gibbons, G. Pekhimenko, and C. De Sa (Eds.), Vol. 6. 87–100
work page 2024
-
[43]
Jiawei Liu, Chunqiu Steven Xia, Yuyao Wang, and Lingming Zhang. 2023. Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation. InThirty-seventh Conference on Neural Information Processing Systems
work page 2023
-
[44]
Zihan Liu, Xinhao Luo, Junxian Guo, Wentao Ni, Yangjie Zhou, Yue Guan, Cong Guo, Weihao Cui, Yu Feng, Minyi Guo, Yuhao Zhu, Minjia Zhang, Chen Jin, and Jingwen Leng. 2025. VQ-LLM: High-performance Code Generation for Proc. ACM Softw. Eng., Vol. 3, No. FSE, Article FSE189. Publication date: July 2026. Compiling Code LLMs into Lightweight Executables FSE189...
-
[45]
S. Lloyd. 2006. Least squares quantization in PCM.IEEE Trans. Inf. Theor.28, 2 (Sept. 2006), 129–137. doi:10.1109/TIT. 1982.1056489
work page doi:10.1109/tit 2006
-
[46]
David Lo. 2023. Trustworthy and Synergistic Artificial Intelligence for Software Engineering: Vision and Roadmaps. In 2023 IEEE/ACM International Conference on Software Engineering: Future of Software Engineering (ICSE-FoSE). 69–85
work page 2023
-
[47]
My productivity is boosted, but
Yunbo Lyu, Zhou Yang, Jieke Shi, Jianming Chang, Yue Liu, and David Lo. 2025. "My productivity is boosted, but ... " Demystifying Users’ Perception on AI Coding Assistants. InProceedings of the 40th IEEE/ACM International Conference on Automated Software Engineering (ASE ’25). Association for Computing Machinery, New York, NY, USA
work page 2025
-
[48]
Henry B Mann and Donald R Whitney. 1947. On a test of whether one of two random variables is stochastically larger than the other.The annals of mathematical statistics(1947), 50–60
work page 1947
-
[49]
Yusuke Matsui, Yusuke Uchida, Hervé Jégou, and Shin’ichi Satoh. 2018. A survey of product quantization.ITE Transactions on Media Technology and Applications6, 1 (2018), 2–10
work page 2018
-
[50]
Bonan Min, Hayley Ross, Elior Sulem, Amir Pouran Ben Veyseh, Thien Huu Nguyen, Oscar Sainz, Eneko Agirre, Ilana Heintz, and Dan Roth. 2023. Recent Advances in Natural Language Processing via Large Pre-trained Language Models: A Survey.ACM Comput. Surv.56, 2, Article 30 (Sept. 2023), 40 pages
work page 2023
-
[51]
Zhiwen Mo, Lei Wang, Jianyu Wei, Zhichen Zeng, Shijie Cao, Lingxiao Ma, Naifeng Jing, Ting Cao, Jilong Xue, Fan Yang, et al. 2025. LUT Tensor Core: A Software-Hardware Co-Design for LUT-Based Low-Bit LLM Inference. In Proceedings of the 52nd Annual International Symposium on Computer Architecture. 514–528
work page 2025
-
[52]
Gunho Park, Baeseong Park, Minsub Kim, Sungjae Lee, Jeonghoon Kim, Beomseok Kwon, Se Jung Kwon, Byeongwook Kim, Youngjoo Lee, and Dongsoo Lee. 2024. LUT-GEMM: QUANTIZED MATRIX MULTIPLICATION BASED ON LUTS FOR EFFICIENT INFERENCE IN LARGE-SCALE GENERATIVE LANGUAGE MODELS. In2024 International Conference on Learning Representations, ICLR 2024. International...
work page 2024
-
[53]
Moumita Das Purba, Arpita Ghosh, Benjamin J. Radford, and Bill Chu. 2023. Software Vulnerability Detection using Large Language Models. In2023 IEEE 34th International Symposium on Software Reliability Engineering Workshops (ISSREW). 112–119
work page 2023
-
[54]
Qwen. 2026. Qwen/CodeQwen1.5-7B. https://huggingface.co/{Q}wen/{C}ode{Q}wen1.5-7{B}
work page 2026
-
[55]
Shanto Rahman, Abdelrahman Baz, Sasa Misailovic, and August Shi. 2024. Quantizing Large-Language Models for Predicting Flaky Tests . In2024 IEEE Conference on Software Testing, Verification and Validation (ICST). IEEE Computer Society, Los Alamitos, CA, USA, 93–104
work page 2024
-
[56]
Babak Rokh, Ali Azarpeyvand, and Alireza Khanteymoori. 2023. A comprehensive survey on model quantization for deep neural networks in image classification.ACM Transactions on Intelligent Systems and Technology14, 6 (2023), 1–50
work page 2023
-
[57]
Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Romain Sauvestre, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nico...
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[58]
Agnia Sergeyuk, Yaroslav Golubev, Timofey Bryksin, and Iftekhar Ahmed. 2025. Using AI-based coding assistants in practice: State of affairs, perceptions, and ways forward.Information and Software Technology178 (2025), 107610
work page 2025
-
[59]
Jieke Shi, Zhou Yang, Hong Jin Kang, Bowen Xu, Junda He, and David Lo. 2024. Greening Large Language Models of Code. InProceedings of the 46th International Conference on Software Engineering: Software Engineering in Society (Lisbon, Portugal)(ICSE-SEIS’24). Association for Computing Machinery, New York, NY, USA, 142–153
work page 2024
-
[60]
Jieke Shi, Zhou Yang, and David Lo. 2025. Efficient and Green Large Language Models for Software Engineering: Literature Review, Vision, and the Road Ahead.ACM Trans. Softw. Eng. Methodol.34, 5, Article 137 (May 2025), 22 pages
work page 2025
-
[61]
Jieke Shi, Zhou Yang, Bowen Xu, Hong Jin Kang, and David Lo. 2023. Compressing Pre-trained Models of Code into 3 MB. InProceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering(Rochester, MI, USA)(ASE ’22). Association for Computing Machinery, New York, NY, USA, Article 24, 12 pages
work page 2023
-
[62]
Chia-Yi Su and Collin McMillan. 2024. Distilled GPT for source code summarization.Automated Software Engineering 31, 1 (2024), 22
work page 2024
-
[63]
Zhensu Sun, Xiaoning Du, Zhou Yang, Li Li, and David Lo. 2024. AI Coders Are among Us: Rethinking Programming Language Grammar towards Efficient Code Generation. InProceedings of the 33rd ACM SIGSOFT International Sympo- sium on Software Testing and Analysis(Vienna, Austria)(ISSTA 2024). Association for Computing Machinery, New York, NY, USA, 1124–1136
work page 2024
-
[64]
Chandra Thapa, Seung Ick Jang, Muhammad Ejaz Ahmed, Seyit Camtepe, Josef Pieprzyk, and Surya Nepal. 2022. Transformer-Based Language Models for Software Vulnerability Detection. InProceedings of the 38th Annual Computer Proc. ACM Softw. Eng., Vol. 3, No. FSE, Article FSE189. Publication date: July 2026. FSE189:24 Shi et al. Security Applications Conferenc...
work page 2022
-
[65]
Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, and et al. 2023. Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv:2307.09288 [cs.CL]
work page internal anchor Pith review Pith/arXiv arXiv 2023
- [66]
-
[67]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. InAdvances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc
work page 2017
-
[68]
Junjie Wang, Yuchao Huang, Chunyang Chen, Zhe Liu, Song Wang, and Qing Wang. 2024. Software Testing With Large Language Models: Survey, Landscape, and Vision.IEEE Transactions on Software Engineering50, 4 (2024), 911–936
work page 2024
-
[69]
Jianyu Wei, Shijie Cao, Ting Cao, Lingxiao Ma, Lei Wang, Yanyong Zhang, and Mao Yang. 2025. T-mac: Cpu renaissance via table lookup for low-bit llm deployment on edge. InProceedings of the Twentieth European Conference on Computer Systems. 278–292
work page 2025
-
[70]
Xiaokai Wei, Sujan Kumar Gonugondla, Shiqi Wang, Wasi Ahmad, Baishakhi Ray, Haifeng Qian, Xiaopeng Li, Varun Kumar, Zijian Wang, Yuchen Tian, Qing Sun, Ben Athiwaratkun, Mingyue Shang, Murali Krishna Ramanathan, Parminder Bhatia, and Bing Xiang. 2023. Towards Greener Yet Powerful Code Generation via Quantization: An Empirical Study. InProceedings of the 3...
work page 2023
-
[71]
Yuxiang Wei, Zhe Wang, Jiawei Liu, Yifeng Ding, and Lingming Zhang. 2024. Magicoder: empowering code generation with OSS-INSTRUCT. InProceedings of the 41st International Conference on Machine Learning. 52632–52657
work page 2024
-
[72]
Chunqiu Steven Xia, Yuxiang Wei, and Lingming Zhang. 2023. Automated Program Repair in the Era of Large Pre- trained Language Models. In2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). 1482–1494
work page 2023
- [73]
-
[74]
Xu, Uri Alon, Graham Neubig, and Vincent Josua Hellendoorn
Frank F. Xu, Uri Alon, Graham Neubig, and Vincent Josua Hellendoorn. 2022. A systematic evaluation of large language models of code. InProceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming(San Diego, CA, USA)(MAPS 2022). Association for Computing Machinery, New York, NY, USA, 1–10
work page 2022
-
[75]
Chao Zeng, Songwei Liu, Yusheng Xie, Hong Liu, Xiaojian Wang, Miao Wei, Shu Yang, Fangmin Chen, and Xing Mei
-
[76]
InProceedings of the AAAI Conference on Artificial Intelligence, Vol
Abq-llm: Arbitrary-bit quantized inference acceleration for large language models. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 22299–22307
-
[77]
Quanjun Zhang, Chunrong Fang, Yang Xie, Yaxin Zhang, Yun Yang, Weisong Sun, Shengcheng Yu, and Zhenyu Chen
-
[78]
A Survey on Large Language Models for Software Engineering. arXiv:2312.15223 [cs.SE]
-
[79]
Zhaowei Zhang, Hongyu Zhang, Beijun Shen, and Xiaodong Gu. 2022. Diet Code is Healthy: Simplifying Programs for Pre-Trained Models of Code. InProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering(Singapore, Singapore)(ESEC/FSE 2022). Association for Computing Machinery, New York...
work page 2022
-
[80]
Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, and Yupeng Hou et al. 2024. A Survey of Large Language Models. arXiv:2303.18223 [cs.CL]
work page internal anchor Pith review Pith/arXiv arXiv 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.