CAS trains composable per-document KV cache cartridges via dynamic distractor mixing and a rotating budget manager, scaling to million-token collections with 10-31 point gains over monolithic cartridges and matching RAG at 3-4x lower token cost.
arXiv preprint arXiv:2507.08143 , year=
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
VaSE improves KV cache eviction accuracy for reasoning models by over 4% versus prior eviction methods at 4x compression through value-magnitude protection and stochastic diversity.
Document LoRA acts as decoding-time parametric memory that recovers 13-21 ROUGE-L points under heavy KV cache compression in QA, performing best when the base model encodes the document and the adapter is used only at generation with QA supervision.
citing papers explorer
-
Cartridges at Scale: Training Modular KV Caches over Large Document Collections
CAS trains composable per-document KV cache cartridges via dynamic distractor mixing and a rotating budget manager, scaling to million-token collections with 10-31 point gains over monolithic cartridges and matching RAG at 3-4x lower token cost.
-
Rethinking LoRA Memory Through the Lens of KV Cache Compression
Document LoRA acts as decoding-time parametric memory that recovers 13-21 ROUGE-L points under heavy KV cache compression in QA, performing best when the base model encodes the document and the adapter is used only at generation with QA supervision.