IndexMem proposes a learned KV importance predictor paired with a latent memory module to enable bounded KV cache size for long-context inference, reporting gains on RULER, Needle-in-a-Haystack, and LongBench across multiple LLMs.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Meta-Soft dynamically synthesizes targeted soft tokens from a learnable meta-library using Gumbel-Softmax and applies attention-flow integration to compress KV cache while attempting to preserve evicted context information.
citing papers explorer
-
Meta-Soft: Leveraging Composable Meta-Tokens for Context-Preserving KV Cache Compression
Meta-Soft dynamically synthesizes targeted soft tokens from a learnable meta-library using Gumbel-Softmax and applies attention-flow integration to compress KV cache while attempting to preserve evicted context information.