Speculative sampling accelerates LLM decoding 2-2.5x by letting a draft model propose short sequences that the target model scores in parallel, then applies modified rejection sampling to keep the exact target distribution.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2representative citing papers
Presents TextEconomizer, a transformer-based encoder-decoder for lossy text compression claiming 5.39x ratio, near-perfect semantic quality via standard metrics, and 153x fewer parameters than comparables.
citing papers explorer
-
Accelerating Large Language Model Decoding with Speculative Sampling
Speculative sampling accelerates LLM decoding 2-2.5x by letting a draft model propose short sequences that the target model scores in parallel, then applies modified rejection sampling to keep the exact target distribution.
-
TextEconomizer: Enhancing Lossy Text Compression with Denoising Transformers and Entropy Coding
Presents TextEconomizer, a transformer-based encoder-decoder for lossy text compression claiming 5.39x ratio, near-perfect semantic quality via standard metrics, and 153x fewer parameters than comparables.