FastKV decouples prefill context reduction via Token-Selective Propagation from independent KV cache selection, delivering up to 1.82x prefill and 2.87x decoding speedups while matching decoding-only accuracy.
GemFilter: Discovering Gems in Early Layers for Accelerated Long-Context LLMs
4 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 4representative citing papers
StructKV compresses LLM KV caches by tracking global in-degree centrality across network depth and dynamically selecting compression layers to preserve long-range dependencies better than local pruning methods.
K-VEC is a coverage-aware KV-cache eviction strategy using cross-head and cross-layer modules that improves performance by up to 10.35 points over prior methods on LongBench subsets at fixed memory budget.
A pre-execution size filter cuts repository tokens by 80-89% at sub-millisecond cost and raises file-level accuracy from 25% to 72% in a small CodeLlama evaluation.
citing papers explorer
-
FastKV: Decoupling of Context Reduction and KV Cache Compression for Prefill-Decoding Acceleration
FastKV decouples prefill context reduction via Token-Selective Propagation from independent KV cache selection, delivering up to 1.82x prefill and 2.87x decoding speedups while matching decoding-only accuracy.
-
StructKV: Preserving the Structural Skeleton for Scalable Long-Context Inference
StructKV compresses LLM KV caches by tracking global in-degree centrality across network depth and dynamically selecting compression layers to preserve long-range dependencies better than local pruning methods.
-
Coverage-Driven KV Cache Eviction for Efficient and Improved Inference of LLM
K-VEC is a coverage-aware KV-cache eviction strategy using cross-head and cross-layer modules that improves performance by up to 10.35 points over prior methods on LongBench subsets at fixed memory budget.
-
Correctness-Aware Repository Filtering Under Maximum Effective Context Window Constraints
A pre-execution size filter cuts repository tokens by 80-89% at sub-millisecond cost and raises file-level accuracy from 25% to 72% in a small CodeLlama evaluation.