arXiv preprint arXiv:2512.03324 , year=

Cache what lasts: Token retention for memory-bounded kv cache in llms , author= · 2025 · arXiv 2512.03324

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

IntentKV: Cross-Turn Intent-Aware KV Cache Pruning for Agent Inference

cs.LG · 2026-06-06 · unverdicted · novelty 6.0

IntentKV prunes KV cache using cross-turn intent memory and attention scoring, achieving up to 77.8% reduction in worst-case peak tokens and 92.6% in KV reads at 8k budget with negligible accuracy drop on Qwen models.

SPHERICAL KV: Angle-Domain Attention and Rate-Distortion Retention for Efficient Long-Context Inference

cs.LG · 2026-05-13 · unverdicted · novelty 6.0

Spherical KV combines angle-domain attention using spherical key codes with rate-distortion retention to cut KV cache residency and HBM traffic while keeping a paged, fusion-friendly decode path.

Make Each Token Count: Towards Improving Long-Context Performance with KV Cache Eviction

cs.LG · 2026-05-10 · unverdicted · novelty 6.0

A unified learnable KV eviction policy with cross-layer calibration reduces memory and matches or exceeds full-cache performance on long-context tasks by retaining useful tokens and limiting attention dilution.

Protection Is (Nearly) All You Need: Structural Protection Dominates Scoring in Globally Capped KV Eviction

cs.LG · 2026-05-18 · unverdicted · novelty 4.0

Structural protection of boundary tokens in globally capped KV cache eviction recovers 69-90% of full-cache quality at 13% retention and dominates differences among scoring policies.

citing papers explorer

Showing 4 of 4 citing papers.

IntentKV: Cross-Turn Intent-Aware KV Cache Pruning for Agent Inference cs.LG · 2026-06-06 · unverdicted · none · ref 4
IntentKV prunes KV cache using cross-turn intent memory and attention scoring, achieving up to 77.8% reduction in worst-case peak tokens and 92.6% in KV reads at 8k budget with negligible accuracy drop on Qwen models.
SPHERICAL KV: Angle-Domain Attention and Rate-Distortion Retention for Efficient Long-Context Inference cs.LG · 2026-05-13 · unverdicted · none · ref 2
Spherical KV combines angle-domain attention using spherical key codes with rate-distortion retention to cut KV cache residency and HBM traffic while keeping a paged, fusion-friendly decode path.
Make Each Token Count: Towards Improving Long-Context Performance with KV Cache Eviction cs.LG · 2026-05-10 · unverdicted · none · ref 2
A unified learnable KV eviction policy with cross-layer calibration reduces memory and matches or exceeds full-cache performance on long-context tasks by retaining useful tokens and limiting attention dilution.
Protection Is (Nearly) All You Need: Structural Protection Dominates Scoring in Globally Capped KV Eviction cs.LG · 2026-05-18 · unverdicted · none · ref 9
Structural protection of boundary tokens in globally capped KV cache eviction recovers 69-90% of full-cache quality at 13% retention and dominates differences among scoring policies.

arXiv preprint arXiv:2512.03324 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer