Recognition: unknown
An Empirical Study of Speculative Decoding on Software Engineering Tasks
Pith reviewed 2026-05-07 13:32 UTC · model grok-4.3
The pith
Speculative decoding accelerates inference for software engineering tasks with larger gains on smaller models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Our empirical results indicate that SD demonstrates clear potential for accelerating inference, particularly for smaller models that achieve higher speedups than those of their larger counterparts. We find that the effectiveness of SD methods varies across different task scenarios. Model-based approaches are well-suited for code generation, whereas model-free methods are better adapted to repository-level repair and editing scenarios. Furthermore, we observe that the repetitiveness of SE tasks improves the performance of model-free methods. In contrast to natural language tasks, the higher predictability of SE tasks allows for more aggressive hyperparameters.
What carries the argument
Speculative decoding, a technique where a smaller draft model proposes multiple tokens for parallel verification by the target large language model.
If this is right
- Smaller models obtain higher speedups from speculative decoding than larger models.
- Model-based speculative decoding is effective for function-level code generation tasks.
- Model-free speculative decoding is more suitable for repository-level repair and editing tasks.
- The repetitiveness of software engineering tasks boosts the performance of model-free methods.
- Software engineering tasks permit more aggressive hyperparameter settings than natural language tasks due to higher predictability.
Where Pith is reading between the lines
- If adopted, this would mean faster response times in AI coding assistants for developers.
- Hybrid approaches combining model-based and model-free methods could optimize performance across mixed task types.
- These guidelines may apply to other structured prediction tasks beyond software engineering.
- Testing on additional models and larger codebases would help confirm the generalizability of the task preferences.
Load-bearing premise
The selected tasks, models, and evaluation metrics are representative of real-world software engineering workflows and the observed differences will generalize.
What would settle it
Demonstrating that on a new set of repository-level tasks a larger model achieves higher speedup with model-based methods than model-free methods, or that speedups do not increase for smaller models, would falsify the central findings.
Figures
read the original abstract
Large Language Models (LLMs) have become widely used for Software Engineering (SE) tasks, spanning from function-level code generation to complex repository-level workflows. However, the high latency of autoregressive inference remains a significant bottleneck, hindering their deployment in interactive environments. While Speculative Decoding (SD) offers a promising technique for lossless acceleration, prior research on long-context repository-level tasks and complex agentic interactions remains limited. To bridge this gap, we present the first systematic empirical study to evaluate the effectiveness of SD in SE tasks. We systematically benchmark a comprehensive spectrum of strategies, encompassing both model-based and model-free methods, across representative generation, editing, and repair scenarios. Our empirical results indicate that SD demonstrates clear potential for accelerating inference, particularly for smaller models that achieve higher speedups than those of their larger counterparts. We find that the effectiveness of SD methods varies across different task scenarios. Model-based approaches are well-suited for code generation, whereas model-free methods are better adapted to repository-level repair and editing scenarios. Furthermore, we observe that the repetitiveness of SE tasks improves the performance of model-free methods. In contrast to natural language tasks, the higher predictability of SE tasks allows for more aggressive hyperparameters. Our findings are summarized as guidelines to help increase inference efficiency for SE scenarios.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents the first systematic empirical study of speculative decoding (SD) for accelerating LLM inference on software engineering tasks. It benchmarks model-based and model-free SD strategies across code generation, editing, and repository-level repair scenarios using multiple models, reporting that SD yields speedups (higher for smaller models), that model-based methods suit generation while model-free suit repair/editing, that SE repetitiveness and predictability enable more aggressive settings than in NL tasks, and that these observations yield practical guidelines for SE inference efficiency.
Significance. If the empirical patterns hold, the work is significant for filling a gap in SD research by focusing on long-context, agentic SE workflows where latency is a deployment barrier. The multi-strategy, multi-task benchmark and distillation into guidelines provide concrete, actionable value for practitioners using LLMs in interactive SE tools, while highlighting how domain-specific properties (repetitiveness, predictability) interact with acceleration techniques.
major comments (2)
- Experimental results section: the reported speedups and task-specific differences lack accompanying statistical tests, variance measures, or error analysis, which is load-bearing for the central claim that 'effectiveness of SD methods varies across different task scenarios' and that patterns are 'consistent'.
- Task and model selection (methodology section): the assumption that the chosen tasks, models, and metrics are representative of real-world SE workflows is not supported by ablation studies or discussion of selection biases, undermining the generalizability of the guidelines to other codebases and models.
minor comments (3)
- The abstract and conclusion refer to 'guidelines' but these should be explicitly enumerated in a dedicated table or subsection for easy reference by readers.
- Figures showing speedups would be clearer if they included non-SD baseline latencies and confidence intervals alongside the reported values.
- Ensure all model sizes, exact hyperparameter settings for 'aggressive' configurations, and dataset statistics are tabulated in the experimental setup for reproducibility.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback and recommendation of minor revision. We have addressed the concerns regarding statistical rigor and generalizability by incorporating additional analyses and discussions in the revised manuscript.
read point-by-point responses
-
Referee: Experimental results section: the reported speedups and task-specific differences lack accompanying statistical tests, variance measures, or error analysis, which is load-bearing for the central claim that 'effectiveness of SD methods varies across different task scenarios' and that patterns are 'consistent'.
Authors: We concur that the inclusion of statistical tests and variance measures would bolster the reliability of our findings. In the revised version, we now report speedups with standard deviations computed over multiple runs and have included results from statistical significance tests (using paired t-tests) to validate the task-specific differences. Furthermore, we have added an error analysis to examine cases of inconsistency, thereby supporting the central claims more robustly. revision: yes
-
Referee: Task and model selection (methodology section): the assumption that the chosen tasks, models, and metrics are representative of real-world SE workflows is not supported by ablation studies or discussion of selection biases, undermining the generalizability of the guidelines to other codebases and models.
Authors: We recognize the importance of justifying our selections for broader applicability. Although we did not perform dedicated ablation studies on the choice of tasks and models in the original manuscript, we have now augmented the methodology section with explicit criteria for selection, drawing from widely-used SE benchmarks. A dedicated subsection on threats to validity and limitations has been introduced to discuss potential biases and the scope of generalizability. This addresses the concern without requiring extensive new experiments. revision: partial
Circularity Check
No significant circularity: direct empirical measurements
full rationale
The paper is a standard multi-model, multi-task empirical benchmark study reporting observed speedups, acceptance rates, and task-specific differences for speculative decoding on SE workloads. No equations, fitted parameters, or derived predictions are present; all central claims are direct observational results from experiments. No self-citation chains support load-bearing premises, and no quantities are defined in terms of themselves or renamed as novel predictions. The work is self-contained against external benchmarks because performance metrics are measured on public models and tasks without reduction to author-specific constructs.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Aider AI. 2024. polyglot-benchmark: Coding problems used in aider’s polyglot benchmark. https://github.com/Aider- AI/polyglot-benchmark
2024
- [2]
-
[3]
Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, and Charles Sutton. 2021. Program Synthesis with Large Language Models. arXiv:2108.07732 [cs.PL] https://arxiv.org/abs/2108.07732
work page internal anchor Pith review arXiv 2021
-
[4]
Tianle Cai, Yuhong Li, Zhengyang Geng, Hongwu Peng, Jason D Lee, Deming Chen, and Tri Dao. [n. d.]. Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads. InForty-first International Conference on Machine Learning
-
[5]
Federico Cassano, Luisa Li, Akul Sethi, Noah Shinn, Abby Brennan-Jones, Jacob Ginesin, Edward Berman, George Chakhnashvili, Anton Lozhkov, Carolyn Jane Anderson, et al . 2023. Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions. InFirst Conference on Language Modeling
2023
-
[6]
Charlie Chen, Sebastian Borgeaud, Geoffrey Irving, Jean-Baptiste Lespiau, Laurent Sifre, and John Jumper. 2023. Accelerating large language model decoding with speculative sampling.arXiv preprint arXiv:2302.01318(2023)
work page internal anchor Pith review arXiv 2023
-
[7]
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian...
work page internal anchor Pith review arXiv 2021
-
[8]
Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, and John Schulman. 2021. Training Verifiers to Solve Math Word Problems. arXiv:2110.14168 [cs.LG] https://arxiv.org/abs/2110.14168
work page internal anchor Pith review arXiv 2021
-
[9]
DeepSeek-AI, Qihao Zhu, Daya Guo, Zhihong Shao, Dejian Yang, Peiyi Wang, Runxin Xu, Y. Wu, Yukun Li, Huazuo Gao, Shirong Ma, Wangding Zeng, Xiao Bi, Zihui Gu, Hanwei Xu, Damai Dai, Kai Dong, Liyue Zhang, Yishi Piao, Zhibin Gou, Zhenda Xie, Zhewen Hao, Bingxuan Wang, Junxiao Song, Deli Chen, Xin Xie, Kang Guan, Yuxiang You, Aixin Liu, Qiushi Du, Wenjun Gao...
-
[10]
Xiang Deng, Jeff Da, Edwin Pan, Yannis Yiming He, Charles Ide, Kanak Garg, Niklas Lauffer, Andrew Park, Nitin Pasari, Chetan Rane, et al. 2025. SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?arXiv preprint arXiv:2509.16941(2025)
work page internal anchor Pith review arXiv 2025
-
[11]
Tim Dettmers, Mike Lewis, Younes Belkada, and Luke Zettlemoyer. 2022. Gpt3. int8 (): 8-bit matrix multiplication for transformers at scale.Advances in neural information processing systems35 (2022), 30318–30332
2022
-
[12]
Ning Ding, Yulin Chen, Bokai Xu, Yujia Qin, Shengding Hu, Zhiyuan Liu, Maosong Sun, and Bowen Zhou. 2023. Enhancing chat language models by scaling high-quality instructional conversations. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 3029–3051
2023
-
[13]
Yangruibo Ding, Zijian Wang, Wasi Ahmad, Hantian Ding, Ming Tan, Nihal Jain, Murali Krishna Ramanathan, Ramesh Nallapati, Parminder Bhatia, Dan Roth, et al. 2023. Crosscodeeval: A diverse and multilingual benchmark for cross-file code completion.Advances in Neural Information Processing Systems36 (2023), 46701–46723
2023
-
[14]
Mostafa Elhoushi, Akshat Shrivastava, Diana Liskovich, Basil Hosmer, Bram Wasti, Liangzhen Lai, Anas Mahmoud, Bilge Acun, Saurabh Agarwal, Ahmed Roman, et al. 2024. Layerskip: Enabling early exit inference and self-speculative decoding. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 12622–12642
2024
-
[15]
Elias Frantar and Dan Alistarh. 2023. Sparsegpt: Massive language models can be accurately pruned in one-shot. In International conference on machine learning. PMLR, 10323–10337. , Vol. 1, No. 1, Article . Publication date: May 2026. 20 Yijia Li, Junkai Chen, Xing Hu, and Xin Xia
2023
-
[16]
Elias Frantar, Saleh Ashkboos, Torsten Hoefler, and Dan Alistarh. [n. d.]. OPTQ: Accurate Quantization for Generative Pre-trained Transformers. InThe Eleventh International Conference on Learning Representations
-
[17]
Yichao Fu, Peter Bailis, Ion Stoica, and Hao Zhang. [n. d.]. Break the Sequential Dependency of LLM Inference Using Lookahead Decoding. InForty-first International Conference on Machine Learning
-
[18]
Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al . 2024. The llama 3 herd of models.arXiv preprint arXiv:2407.21783(2024)
work page internal anchor Pith review arXiv 2024
-
[19]
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, et al. 2025. DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning.Nature645, 8081 (2025), 633–638
2025
-
[20]
Zhenyu He, Zexuan Zhong, Tianle Cai, Jason D Lee, and Di He. 2024. REST: Retrieval-Based Speculative Decoding. In2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2024. Association for Computational Linguistics (ACL), 1582–1595
2024
-
[21]
Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. 2015. Teaching machines to read and comprehend.Advances in neural information processing systems28 (2015)
2015
-
[22]
Abram Hindle, Earl T Barr, Mark Gabel, Zhendong Su, and Premkumar Devanbu. 2016. On the naturalness of software. Commun. ACM59, 5 (2016), 122–131
2016
-
[23]
Xinyi Hou, Yanjie Zhao, Yue Liu, Zhou Yang, Kailong Wang, Li Li, Xiapu Luo, David Lo, John Grundy, and Haoyu Wang. 2024. Large language models for software engineering: A systematic literature review.ACM Transactions on Software Engineering and Methodology33, 8 (2024), 1–79
2024
-
[24]
Xing Hu, Feifei Niu, Junkai Chen, Xin Zhou, Junwei Zhang, Junda He, Xin Xia, and David Lo. 2025. Assessing and advancing benchmarks for evaluating large language models in software engineering tasks.ACM Transactions on Software Engineering and Methodology(2025)
2025
-
[25]
Hugging Face. 2024. Text Generation Inference. https://github.com/huggingface/text-generation-inference. https: //github.com/huggingface/text-generation-inference Production-ready inference server supporting Speculative De- coding
2024
-
[26]
Aaron Jaech, Adam Kalai, Adam Lerer, Adam Richardson, Ahmed El-Kishky, Aiden Low, Alec Helyar, Aleksander Madry, Alex Beutel, Alex Carney, et al. 2024. Openai o1 system card.arXiv preprint arXiv:2412.16720(2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[27]
Naman Jain, King Han, Alex Gu, Wen-Ding Li, Fanjia Yan, Tianjun Zhang, Sida Wang, Armando Solar-Lezama, Koushik Sen, and Ion Stoica. [n. d.]. LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code. InThe Thirteenth International Conference on Learning Representations
-
[28]
Carlos E Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik R Narasimhan. [n. d.]. SWE-bench: Can Language Models Resolve Real-world Github Issues?. InThe Twelfth International Conference on Learning Representations
-
[29]
René Just, Darioush Jalali, and Michael D Ernst. 2014. Defects4J: A database of existing faults to enable controlled testing studies for Java programs. InProceedings of the 2014 international symposium on software testing and analysis. 437–440
2014
-
[30]
Scaling Laws for Neural Language Models
Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. 2020. Scaling Laws for Neural Language Models. arXiv:2001.08361 [cs.LG] https://arxiv.org/abs/2001.08361
work page internal anchor Pith review arXiv 2020
- [31]
-
[32]
Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E Gonzalez, Hao Zhang, and Ion Stoica. 2023. Efficient Memory Management for Large Language Model Serving with PagedAttention. CoRR(2023)
2023
-
[33]
LeetCode Inc. 2026. LeetCode. https://leetcode.com. Accessed: 2026-01-27
2026
-
[34]
Yaniv Leviathan, Matan Kalman, and Yossi Matias. 2023. Fast inference from transformers via speculative decoding. In International Conference on Machine Learning. PMLR, 19274–19286
2023
-
[35]
Rongao Li, Jie Fu, Bo-Wen Zhang, Tao Huang, Zhihong Sun, Chen Lyu, Guang Liu, Zhi Jin, and Ge Li. 2023. TACO: Topics in Algorithmic COde generation dataset.CoRR(2023)
2023
-
[36]
Shenggui Li, Yikai Zhu, Chao Wang, Fan Yin, Shuai Shi, Yubo Wang, Yi Zhang, Yingyi Huang, Haoshuai Zheng, and Yineng Zhang. 2025. SpecForge: Train speculative decoding models effortlessly. https://github.com/sgl-project/ specforge
2025
-
[37]
Yuhui Li, Fangyun Wei, Chao Zhang, and Hongyang Zhang. [n. d.]. EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty. InForty-first International Conference on Machine Learning. , Vol. 1, No. 1, Article . Publication date: May 2026. An Empirical Study of Speculative Decoding on Software Engineering Tasks 21
2026
- [38]
- [39]
-
[40]
Ji Lin, Jiaming Tang, Haotian Tang, Shang Yang, Wei-Ming Chen, Wei-Chen Wang, Guangxuan Xiao, Xingyu Dang, Chuang Gan, and Song Han. 2024. Awq: Activation-aware weight quantization for on-device llm compression and acceleration.Proceedings of machine learning and systems6 (2024), 87–100
2024
-
[41]
Tianyang Liu, Canwen Xu, and Julian McAuley. [n. d.]. RepoBench: Benchmarking Repository-Level Code Auto- Completion Systems. InThe Twelfth International Conference on Learning Representations
-
[42]
Anton Lozhkov, Raymond Li, Loubna Ben Allal, Federico Cassano, Joel Lamy-Poirier, Nouamane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, Yuxiang Wei, Tianyang Liu, Max Tian, Denis Kocetkov, Arthur Zucker, Younes Belkada, Zijian Wang, Qian Liu, Dmitry Abulkhanov, Indraneil Paul, Zhuang Li, Wen-Ding Li, Megan Risdal, Jia Li, Jian Zhu, Terry Yue Zhuo, Evgenii Z...
work page internal anchor Pith review arXiv 2024
-
[43]
Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Zeyu Wang, Zhengxin Zhang, Rae Ying Yee Wong, Alan Zhu, Lijie Yang, Xiaoxiang Shi, et al . 2024. Specinfer: Accelerating large language model serving with tree-based speculative inference and verification. InProceedings of the 29th ACM International Conference on Architectural Support for Programmi...
2024
-
[44]
Niklas Muennighoff, Qian Liu, Armel Zebaze, Qinkai Zheng, Binyuan Hui, Terry Yue Zhuo, Swayam Singh, Xiangru Tang, Leandro Von Werra, and Shayne Longpre. [n. d.]. Octopack: Instruction tuning code large language models. In NeurIPS 2023 workshop on instruction tuning and instruction following
2023
-
[45]
NVIDIA. 2024. TensorRT-LLM: A TensorRT Toolbox for Large Language Model Inference. https://github.com/NVIDIA/ TensorRT-LLM. https://github.com/NVIDIA/TensorRT-LLM High-performance inference library
2024
-
[46]
Gabriele Oliaro, Zhihao Jia, Daniel F Campos, and Aurick Qiao. [n. d.]. Suffixdecoding: Extreme speculative decoding for emerging ai applications. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems
-
[47]
Jiayi Pan, Xingyao Wang, Graham Neubig, Navdeep Jaitly, Heng Ji, Alane Suhr, and Yizhe Zhang. [n. d.]. Training Software Engineering Agents and Verifiers with SWE-Gym. InForty-second International Conference on Machine Learning
-
[48]
Li Zhang Peiding Wang, Fang Liu, Yinghao Zhu, Wang Xu, Lin Shi, Xiaoli Lian, Minxiao Li, Bo Shen, and An Fu. [n. d.]. EFFICIENTEDIT: Accelerating Code Editing via Edit-Oriented Speculative Decoding. ([n. d.])
-
[49]
Andrea Santilli, Silvio Severino, Emilian Postolache, Valentino Maiorca, Michele Mancusi, Riccardo Marin, and Emanuele Rodola. 2023. Accelerating Transformer Inference for Translation via Parallel Decoding. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Lingu...
-
[50]
Apoorv Saxena. 2023. Prompt Lookup Decoding. https://github.com/apoorvumang/prompt-lookup-decoding/
2023
-
[51]
Hashimoto
Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li, Carlos Guestrin, Percy Liang, and Tatsunori B. Hashimoto. 2023. Stanford Alpaca: An Instruction-following LLaMA model. https://github.com/tatsu-lab/stanford_ alpaca
2023
-
[52]
Qwen Team. 2025. Qwen3 Technical Report. arXiv:2505.09388 [cs.CL] https://arxiv.org/abs/2505.09388
work page internal anchor Pith review arXiv 2025
-
[53]
vLLM Team. 2025. vLLM v0.12.0 Release Notes. https://github.com/vllm-project/vllm/releases/tag/v0.12.0. Accessed: 2026-01-27
2025
-
[54]
Xingyao Wang, Boxuan Li, Yufan Song, Frank F Xu, Xiangru Tang, Mingchen Zhuge, Jiayi Pan, Yueqi Song, Bowen Li, Jaskirat Singh, et al. [n. d.]. OpenHands: An Open Platform for AI Software Developers as Generalist Agents. InThe Thirteenth International Conference on Learning Representations
- [55]
-
[56]
Heming Xia, Zhe Yang, Qingxiu Dong, Peiyi Wang, Yongqi Li, Tao Ge, Tianyu Liu, Wenjie Li, and Zhifang Sui. 2024. Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding. InACL (Findings)
2024
-
[57]
John Yang, Carlos E Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press. 2024. Swe-agent: Agent-computer interfaces enable automated software engineering.Advances in Neural Information , Vol. 1, No. 1, Article . Publication date: May 2026. 22 Yijia Li, Junkai Chen, Xing Hu, and Xin Xia Processing Systems37 (2024), 50528–50652
2024
-
[58]
Penghui Yang, Cunxiao Du, Fengzhuo Zhang, Haonan Wang, Tianyu Pang, Chao Du, and Bo An. 2025. LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification. InES-FoMo III: 3rd Workshop on Efficient Systems for Foundation Models
2025
-
[59]
Lefan Zhang, Xiaodan Wang, Yanhua Huang, and Ruiwen Xu. [n. d.]. Learning Harmonized Representations for Speculative Sampling. InThe Thirteenth International Conference on Learning Representations
-
[60]
Qianhui Zhao, Li Zhang, Fang Liu, Xiaoli Lian, Qiaoyuanhe Meng, Ziqian Jiao, Zetong Zhou, Jia Li, and Lin Shi. [n. d.]. FASTCODER: Accelerating Repository-level Code Generation via Efficient Retrieval and Verification. ([n. d.])
-
[61]
Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric Xing, et al. 2023. Judging llm-as-a-judge with mt-bench and chatbot arena.Advances in neural information processing systems36 (2023), 46595–46623. , Vol. 1, No. 1, Article . Publication date: May 2026
2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.