Judge q: Trainable queries for optimized information retention in kv cache eviction.arXiv preprint arXiv:2509.10798

Judge Q: Trainable Queries for Optimized Information Retention in KV Cache Eviction , author= · arXiv 2509.10798

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

TGV-KV: Text-Grounded KV Eviction for Vision-Language Models

cs.CV · 2026-06-02 · unverdicted · novelty 6.0

TGV-KV uses text-vision budgeting, weighted ranking, and prioritised retention to evict KV cache in VLMs while retaining 99.2% accuracy at 5% budget on VizWiz-VQA.

EchoKV: Efficient KV Cache Compression via Similarity-Based Reconstruction

cs.CL · 2026-03-24 · unverdicted · novelty 6.0

EchoKV compresses LLM KV caches by reconstructing missing components from partial data via inter- and intra-layer attention similarities, outperforming prior methods on LongBench and RULER while supporting on-demand full-cache inference.

Meta-Soft: Leveraging Composable Meta-Tokens for Context-Preserving KV Cache Compression

cs.AI · 2026-05-21 · unverdicted · novelty 4.0 · 2 refs

Meta-Soft dynamically synthesizes targeted soft tokens from a learnable meta-library using Gumbel-Softmax and applies attention-flow integration to compress KV cache while attempting to preserve evicted context information.

citing papers explorer

Showing 3 of 3 citing papers.

TGV-KV: Text-Grounded KV Eviction for Vision-Language Models cs.CV · 2026-06-02 · unverdicted · none · ref 13
TGV-KV uses text-vision budgeting, weighted ranking, and prioritised retention to evict KV cache in VLMs while retaining 99.2% accuracy at 5% budget on VizWiz-VQA.
EchoKV: Efficient KV Cache Compression via Similarity-Based Reconstruction cs.CL · 2026-03-24 · unverdicted · none · ref 13
EchoKV compresses LLM KV caches by reconstructing missing components from partial data via inter- and intra-layer attention similarities, outperforming prior methods on LongBench and RULER while supporting on-demand full-cache inference.
Meta-Soft: Leveraging Composable Meta-Tokens for Context-Preserving KV Cache Compression cs.AI · 2026-05-21 · unverdicted · none · ref 18 · 2 links
Meta-Soft dynamically synthesizes targeted soft tokens from a learnable meta-library using Gumbel-Softmax and applies attention-flow integration to compress KV cache while attempting to preserve evicted context information.

Judge q: Trainable queries for optimized information retention in kv cache eviction.arXiv preprint arXiv:2509.10798

fields

years

verdicts

representative citing papers

citing papers explorer