pith. sign in

← back to paper

Review history

arxiv: 2606.26666 · 2 revisions

PersistentKV: Page-Aware Decode Scheduling for Long-Context LLM Serving on Commodity GPUs

  1. 2026-07-02 UNVERDICTED LOW v0.9.1-grok novelty 6.0
    36431 ms 5864 in 1366 out 2026-07-02T21:02:06.460527+00:00
  2. 2026-06-26 UNVERDICTED LOW v0.9.1-grok novelty 5.0
    37495 ms 5847 in 1334 out 2026-06-26T05:20:50.579646+00:00