pith. sign in

hub

Keep the cost down: A review on methods to optimize llm’s kv-cache consumption

14 Pith papers cite this work. Polarity classification is still indexing.

14 Pith papers citing it

hub tools

citation-role summary

background 3 method 1

citation-polarity summary

clear filters

representative citing papers

Rethinking KV Cache Eviction via a Unified Information-Theoretic Objective

cs.LG · 2026-04-28 · unverdicted · novelty 7.0

KV cache eviction is unified under an information capacity maximization principle derived from a linear-Gaussian attention surrogate, with CapKV proposed as a leverage-score based implementation that outperforms prior heuristics in experiments.

FlowNar: Scalable Streaming Narration for Long-Form Videos

cs.CV · 2026-05-30 · unverdicted · novelty 6.0

FlowNar achieves bounded memory and 3x higher throughput for streaming narration on Ego4D, EgoExo4D, and EpicKitchens100 by combining dynamic historical context removal with a Cross Linear Attentive Memory module.

OjaKV: Context-Aware Online Low-Rank KV Cache Compression

cs.CL · 2025-09-25 · unverdicted · novelty 6.0

OjaKV introduces hybrid full-rank storage for key tokens combined with online low-rank KV cache compression via Oja's algorithm to support memory-efficient long-context LLM inference.

citing papers explorer

Showing 1 of 1 citing paper after filters.