TGV-KV uses text-vision budgeting, weighted ranking, and prioritised retention to evict KV cache in VLMs while retaining 99.2% accuracy at 5% budget on VizWiz-VQA.
Judge q: Trainable queries for optimized information retention in kv cache eviction.arXiv preprint arXiv:2509.10798
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
EchoKV compresses LLM KV caches by reconstructing missing components from partial data via inter- and intra-layer attention similarities, outperforming prior methods on LongBench and RULER while supporting on-demand full-cache inference.
Meta-Soft dynamically synthesizes targeted soft tokens from a learnable meta-library using Gumbel-Softmax and applies attention-flow integration to compress KV cache while attempting to preserve evicted context information.
citing papers explorer
-
TGV-KV: Text-Grounded KV Eviction for Vision-Language Models
TGV-KV uses text-vision budgeting, weighted ranking, and prioritised retention to evict KV cache in VLMs while retaining 99.2% accuracy at 5% budget on VizWiz-VQA.
-
EchoKV: Efficient KV Cache Compression via Similarity-Based Reconstruction
EchoKV compresses LLM KV caches by reconstructing missing components from partial data via inter- and intra-layer attention similarities, outperforming prior methods on LongBench and RULER while supporting on-demand full-cache inference.
-
Meta-Soft: Leveraging Composable Meta-Tokens for Context-Preserving KV Cache Compression
Meta-Soft dynamically synthesizes targeted soft tokens from a learnable meta-library using Gumbel-Softmax and applies attention-flow integration to compress KV cache while attempting to preserve evicted context information.