Hydra: Sequentially-dependent draft heads for medusa decoding

Zachary Ankner, Rishab Parthasarathy, Aniruddha Nrusimha, Christopher Rinard, Jonathan Ragan-Kelley, William Brandon · 2024 · arXiv 2402.05109

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

read on arXiv browse 11 citing papers

citation-role summary

background 3 baseline 1

citation-polarity summary

background 3 baseline 1

representative citing papers

Mistletoe: Stealthy Acceleration-Collapse Attacks on Speculative Decoding

cs.CL · 2026-05-13 · unverdicted · novelty 8.0 · 2 refs

Mistletoe introduces a stealthy attack on speculative decoding that collapses acceleration by reducing average accepted length while preserving output semantics.

SlimSpec: Low-Rank Draft LM-Head for Accelerated Speculative Decoding

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

SlimSpec replaces the standard LM-head in draft models with a low-rank version to deliver 4-5x faster speculative decoding while preserving full vocabulary and competitive acceptance rates.

BubbleSpec: Turning Long-Tail Bubbles into Speculative Rollout Drafts for Synchronous Reinforcement Learning

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

BubbleSpec exploits long-tail bubbles in synchronous RL by using faster ranks' idle time to pre-generate rollout drafts for speculative decoding, reducing steps by 50% and raising throughput up to 1.8x while preserving exact synchrony.

SpecBlock: Block-Iterative Speculative Decoding with Dynamic Tree Drafting

cs.CL · 2026-05-08 · unverdicted · novelty 7.0 · 2 refs

SpecBlock achieves 8-13% higher mean speedup than EAGLE-3 at 44-52% drafting cost via block-iterative drafting with hidden-state inheritance, dynamic rank-head branching, valid-prefix masking, and optional cost-aware bandit adaptation.

An Empirical Study of Speculative Decoding on Software Engineering Tasks

cs.SE · 2026-04-29 · unverdicted · novelty 7.0

Speculative decoding accelerates LLM inference on SE tasks without accuracy loss, with model-based methods suiting code generation and model-free methods suiting repository-level repair and editing.

DREAM-S: Speculative Decoding with Searchable Drafting and Target-Aware Refinement for Multimodal Generation

cs.LG · 2026-05-30 · unverdicted · novelty 6.0

DREAM-S combines neural architecture search, target-aware supernet training, and attention-entropy-guided distillation to accelerate speculative decoding in VLMs, reporting up to 3.85x speedup over standard methods.

Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding

cs.CL · 2026-05-28 · unverdicted · novelty 6.0

Domino decouples causal dependency modeling from autoregressive draft execution via a parallel backbone plus lightweight causal head and a base-anchored training curriculum, reporting up to 5.49x speedup.

FlexDraft: Flexible Speculative Decoding via Attention Tuning and Bonus-Guided Calibration

cs.CL · 2026-05-19 · unverdicted · novelty 6.0

FlexDraft is a lossless speculative decoding framework that adapts to batch sizes via attention tuning on final layers, MLP-based bonus calibration, and dynamic parallel/sequential decoding.

Performance-Driven Policy Optimization for Speculative Decoding with Adaptive Windowing

cs.CL · 2026-05-14 · unverdicted · novelty 6.0

PPOW uses window-level RL with cost-aware speedup and proximity rewards plus adaptive divergence-aware windowing to reach 6.29-6.52 acceptance lengths and 3.39-4.36x speedups in speculative decoding.

PARD-2: Target-Aligned Parallel Draft Model for Dual-Mode Speculative Decoding

cs.CL · 2026-05-09 · unverdicted · novelty 6.0

PARD-2 uses Confidence-Adaptive Token optimization to align draft model training with acceptance length in speculative decoding, enabling dual-mode operation and up to 6.94x lossless speedup on Llama3.1-8B.

D-PACE: Dynamic Position-Aware Cross-Entropy for Parallel Speculative Drafting

cs.LG · 2026-05-12 · unverdicted · novelty 5.0

D-PACE derives per-position weights from a surrogate of expected accepted draft length to shift training focus toward currently limiting positions, yielding measured gains in wall-clock speedup and emitted length across benchmarks.

citing papers explorer

Showing 11 of 11 citing papers.

Mistletoe: Stealthy Acceleration-Collapse Attacks on Speculative Decoding cs.CL · 2026-05-13 · unverdicted · none · ref 1 · 2 links
Mistletoe introduces a stealthy attack on speculative decoding that collapses acceleration by reducing average accepted length while preserving output semantics.
SlimSpec: Low-Rank Draft LM-Head for Accelerated Speculative Decoding cs.LG · 2026-05-11 · unverdicted · none · ref 6
SlimSpec replaces the standard LM-head in draft models with a low-rank version to deliver 4-5x faster speculative decoding while preserving full vocabulary and competitive acceptance rates.
BubbleSpec: Turning Long-Tail Bubbles into Speculative Rollout Drafts for Synchronous Reinforcement Learning cs.LG · 2026-05-09 · unverdicted · none · ref 2
BubbleSpec exploits long-tail bubbles in synchronous RL by using faster ranks' idle time to pre-generate rollout drafts for speculative decoding, reducing steps by 50% and raising throughput up to 1.8x while preserving exact synchrony.
SpecBlock: Block-Iterative Speculative Decoding with Dynamic Tree Drafting cs.CL · 2026-05-08 · unverdicted · none · ref 21 · 2 links
SpecBlock achieves 8-13% higher mean speedup than EAGLE-3 at 44-52% drafting cost via block-iterative drafting with hidden-state inheritance, dynamic rank-head branching, valid-prefix masking, and optional cost-aware bandit adaptation.
An Empirical Study of Speculative Decoding on Software Engineering Tasks cs.SE · 2026-04-29 · unverdicted · none · ref 2
Speculative decoding accelerates LLM inference on SE tasks without accuracy loss, with model-based methods suiting code generation and model-free methods suiting repository-level repair and editing.
DREAM-S: Speculative Decoding with Searchable Drafting and Target-Aware Refinement for Multimodal Generation cs.LG · 2026-05-30 · unverdicted · none · ref 110
DREAM-S combines neural architecture search, target-aware supernet training, and attention-entropy-guided distillation to accelerate speculative decoding in VLMs, reporting up to 3.85x speedup over standard methods.
Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding cs.CL · 2026-05-28 · unverdicted · none · ref 2
Domino decouples causal dependency modeling from autoregressive draft execution via a parallel backbone plus lightweight causal head and a base-anchored training curriculum, reporting up to 5.49x speedup.
FlexDraft: Flexible Speculative Decoding via Attention Tuning and Bonus-Guided Calibration cs.CL · 2026-05-19 · unverdicted · none · ref 2
FlexDraft is a lossless speculative decoding framework that adapts to batch sizes via attention tuning on final layers, MLP-based bonus calibration, and dynamic parallel/sequential decoding.
Performance-Driven Policy Optimization for Speculative Decoding with Adaptive Windowing cs.CL · 2026-05-14 · unverdicted · none · ref 4
PPOW uses window-level RL with cost-aware speedup and proximity rewards plus adaptive divergence-aware windowing to reach 6.29-6.52 acceptance lengths and 3.39-4.36x speedups in speculative decoding.
PARD-2: Target-Aligned Parallel Draft Model for Dual-Mode Speculative Decoding cs.CL · 2026-05-09 · unverdicted · none · ref 2
PARD-2 uses Confidence-Adaptive Token optimization to align draft model training with acceptance length in speculative decoding, enabling dual-mode operation and up to 6.94x lossless speedup on Llama3.1-8B.
D-PACE: Dynamic Position-Aware Cross-Entropy for Parallel Speculative Drafting cs.LG · 2026-05-12 · unverdicted · none · ref 29
D-PACE derives per-position weights from a surrogate of expected accepted draft length to shift training focus toward currently limiting positions, yielding measured gains in wall-clock speedup and emitted length across benchmarks.

Hydra: Sequentially-dependent draft heads for medusa decoding

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer