Nexus: Proactive Intra-GPU Disaggregation of Prefill and Decode in LLM Serving

Xiaoxiang Shi, Colin Cai, Junjia Du, Zhihao Jia · 2025 · arXiv 2507.06608

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Towards Load-Aware Prefill Deflection for Disaggregated LLM Serving

cs.DC · 2026-07-02 · unverdicted · novelty 6.0

A load-aware prefill deflection scheduler for disaggregated LLM serving reduces P95 TTFT by up to 81% by interleaving chunked prefill on decode nodes and eliminating KV-cache transfers.

FlexNPU: Transparent NPU Virtualization for Dynamic LLM Prefill-Decode Co-location

cs.DC · 2026-06-03 · unverdicted · novelty 5.0

FlexNPU is a transparent virtualization system for Ascend NPUs that supports dynamic prefill-decode co-location in LLM serving and reports throughput gains plus large TTFT reductions versus static baselines.

citing papers explorer

Showing 2 of 2 citing papers.

Towards Load-Aware Prefill Deflection for Disaggregated LLM Serving cs.DC · 2026-07-02 · unverdicted · none · ref 20
A load-aware prefill deflection scheduler for disaggregated LLM serving reduces P95 TTFT by up to 81% by interleaving chunked prefill on decode nodes and eliminating KV-cache transfers.
FlexNPU: Transparent NPU Virtualization for Dynamic LLM Prefill-Decode Co-location cs.DC · 2026-06-03 · unverdicted · none · ref 12
FlexNPU is a transparent virtualization system for Ascend NPUs that supports dynamic prefill-decode co-location in LLM serving and reports throughput gains plus large TTFT reductions versus static baselines.

Nexus: Proactive Intra-GPU Disaggregation of Prefill and Decode in LLM Serving

fields

years

verdicts

representative citing papers

citing papers explorer