pith. sign in

Pard: Accelerating llm inference with low-cost parallel draft model adaptation

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

citation-role summary

background 2

citation-polarity summary

years

2026 6

roles

background 2

polarities

background 2

clear filters

representative citing papers

SpecBlock: Block-Iterative Speculative Decoding with Dynamic Tree Drafting

cs.CL · 2026-05-08 · unverdicted · novelty 7.0 · 2 refs

SpecBlock achieves 8-13% higher mean speedup than EAGLE-3 at 44-52% drafting cost via block-iterative drafting with hidden-state inheritance, dynamic rank-head branching, valid-prefix masking, and optional cost-aware bandit adaptation.

An Interpretable Latency Model for Speculative Decoding in LLM Serving

cs.LG · 2026-05-14 · unverdicted · novelty 6.0

The paper presents an interpretable latency model for speculative decoding that infers effective batch size via Little's Law and decomposes demand to predict and explain performance across serving loads, validated on vLLM measurements.

Test-Time Speculation

cs.CL · 2026-05-10 · unverdicted · novelty 6.0 · 2 refs

TTS adapts speculator models online via target model verifications to improve acceptance lengths by up to 72% over prior methods, with gains increasing for longer generations.

citing papers explorer

Showing 2 of 2 citing papers after filters.