arXiv preprint arXiv:2507.08143 , year=

Compactor: Calibrated Query-Agnostic KV Cache Compression with Approximate Leverage Scores , author= · 2025 · arXiv 2507.08143

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

representative citing papers

Cartridges at Scale: Training Modular KV Caches over Large Document Collections

cs.CL · 2026-06-03 · unverdicted · novelty 6.0

CAS trains composable per-document KV cache cartridges via dynamic distractor mixing and a rotating budget manager, scaling to million-token collections with 10-31 point gains over monolithic cartridges and matching RAG at 3-4x lower token cost.

Value-Aware Stochastic KV Cache Eviction for Reasoning Models

cs.LG · 2026-06-02 · unverdicted · novelty 6.0

VaSE improves KV cache eviction accuracy for reasoning models by over 4% versus prior eviction methods at 4x compression through value-magnitude protection and stochastic diversity.

Rethinking LoRA Memory Through the Lens of KV Cache Compression

cs.CL · 2026-06-04 · unverdicted · novelty 5.0

Document LoRA acts as decoding-time parametric memory that recovers 13-21 ROUGE-L points under heavy KV cache compression in QA, performing best when the base model encodes the document and the adapter is used only at generation with QA supervision.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Cartridges at Scale: Training Modular KV Caches over Large Document Collections cs.CL · 2026-06-03 · unverdicted · none · ref 2
CAS trains composable per-document KV cache cartridges via dynamic distractor mixing and a rotating budget manager, scaling to million-token collections with 10-31 point gains over monolithic cartridges and matching RAG at 3-4x lower token cost.
Rethinking LoRA Memory Through the Lens of KV Cache Compression cs.CL · 2026-06-04 · unverdicted · none · ref 17
Document LoRA acts as decoding-time parametric memory that recovers 13-21 ROUGE-L points under heavy KV cache compression in QA, performing best when the base model encodes the document and the adapter is used only at generation with QA supervision.

arXiv preprint arXiv:2507.08143 , year=

fields

years

verdicts

representative citing papers

citing papers explorer