Windserve: Eﬀicient phase- disaggregated llm serving with stream-based dynamic scheduling

Jingqi Feng, Yukai Huang, Rui Zhang, Sicheng Liang, Ming Yan, Jie Wu · 2025 · arXiv 5053.373099

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Enhancing Instruction Prefetching via Cache and TLB Management

cs.AR · 2026-05-12 · unverdicted · novelty 7.0

IP-CaT jointly optimizes TLB and cache management for L1I prefetching via a translation prefetch buffer and trimodal replacement policy, yielding 8.7% geomean speedup over EPI across 105 server workloads.

SPECTRE: Hybrid Ordinary-Parallel Speculative Serving for Resource-Efficient LLM Inference

cs.DC · 2026-05-04 · unverdicted · novelty 6.0 · 2 refs

SPECTRE achieves up to 2.28x speedup for large-model LLM serving by running speculative draft generation and target verification in parallel using idle tail-model services.

citing papers explorer

Showing 2 of 2 citing papers.

Enhancing Instruction Prefetching via Cache and TLB Management cs.AR · 2026-05-12 · unverdicted · none · ref 23
IP-CaT jointly optimizes TLB and cache management for L1I prefetching via a translation prefetch buffer and trimodal replacement policy, yielding 8.7% geomean speedup over EPI across 105 server workloads.
SPECTRE: Hybrid Ordinary-Parallel Speculative Serving for Resource-Efficient LLM Inference cs.DC · 2026-05-04 · unverdicted · none · ref 36 · 2 links
SPECTRE achieves up to 2.28x speedup for large-model LLM serving by running speculative draft generation and target verification in parallel using idle tail-model services.

Windserve: Eﬀicient phase- disaggregated llm serving with stream-based dynamic scheduling

fields

years

verdicts

representative citing papers

citing papers explorer