KV offloading degrades accuracy on context-intensive tasks due to low-rank key projections and unreliable landmarks; a simpler alternative improves results across models and benchmarks.
Kvsharer: Efficient inference via layer-wise dissimilar kv cache sharing
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 3roles
background 1polarities
background 1representative citing papers
ScaleSearch optimizes block floating point scales via fine-grained search to cut quantization error by 27% for NVFP4, improving PTQ by up to 15 points on MATH500 for Qwen3-8B and attention PPL by 0.77 on Llama 3.1 70B.
citing papers explorer
-
KV Cache Offloading for Context-Intensive Tasks
KV offloading degrades accuracy on context-intensive tasks due to low-rank key projections and unreliable landmarks; a simpler alternative improves results across models and benchmarks.
-
Search Your Block Floating Point Scales!
ScaleSearch optimizes block floating point scales via fine-grained search to cut quantization error by 27% for NVFP4, improving PTQ by up to 15 points on MATH500 for Qwen3-8B and attention PPL by 0.77 on Llama 3.1 70B.
- GHOST: Geometry-Hierarchical Online Streaming Token Eviction for Efficient 3D Reconstruction