CompressKV: Seman- tic retrieval heads know what tokens are not important before generation.arXiv preprint arXiv:2508.02401, 2025

Xiaolin Lin, Jingcun Wang, Olga Kondrateva, Yiyu Shi, Bing Li, Grace Li Zhang · 2025 · arXiv 2508.02401

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

RedKnot: Efficient Long-Context LLM Serving with Head-Aware KV Reuse and SegPagedAttention

cs.AI · 2026-06-04 · unverdicted · novelty 6.0

RedKnot decomposes the KV cache by attention heads to enable position-independent reuse, prefix compression, hot/cold separation, and distributed placement for long-context LLM serving without model changes.

citing papers explorer

Showing 1 of 1 citing paper after filters.

RedKnot: Efficient Long-Context LLM Serving with Head-Aware KV Reuse and SegPagedAttention cs.AI · 2026-06-04 · unverdicted · none · ref 36
RedKnot decomposes the KV cache by attention heads to enable position-independent reuse, prefix compression, hot/cold separation, and distributed placement for long-context LLM serving without model changes.

CompressKV: Seman- tic retrieval heads know what tokens are not important before generation.arXiv preprint arXiv:2508.02401, 2025

fields

years

verdicts

representative citing papers

citing papers explorer