EAGLE resolves feature-level uncertainty in speculative sampling via one-step token advancement, delivering 2.7x-3.5x speedup on LLaMA2-Chat 70B and doubled throughput across multiple model families and tasks.
Advances in Neural Information Processing Systems, 31
3 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
MetaSD integrates multiple heterogeneous drafters into speculative decoding, dynamically selecting them via alignment feedback modeled as a multi-armed bandit to consistently outperform single-drafter baselines.
The paper surveys techniques to speed up and reduce the resource needs of LLM inference, organized by data-level, model-level, and system-level changes, with comparative experiments on representative methods.
citing papers explorer
-
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
EAGLE resolves feature-level uncertainty in speculative sampling via one-step token advancement, delivering 2.7x-3.5x speedup on LLaMA2-Chat 70B and doubled throughput across multiple model families and tasks.
-
Multi-Drafter Speculative Decoding with Alignment Feedback
MetaSD integrates multiple heterogeneous drafters into speculative decoding, dynamically selecting them via alignment feedback modeled as a multi-armed bandit to consistently outperform single-drafter baselines.
-
A Survey on Efficient Inference for Large Language Models
The paper surveys techniques to speed up and reduce the resource needs of LLM inference, organized by data-level, model-level, and system-level changes, with comparative experiments on representative methods.