RetentiveKV uses entropy to drive state-space model transitions that retain and reactivate low-attention visual tokens in a continuous memory instead of pruning them, delivering 5x KV cache compression and 1.5x faster decoding.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it