GAPE augments RoPE with query- and key-dependent gates to stabilize attention and improve long-context performance in language models.
Exploring the limits of transfer learning with a unified text-to-text trans- former
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
other 1
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2representative citing papers
LaProx reformulates KV cache eviction as an output-aware matrix approximation, enabling a unified global token selection strategy that preserves LLM performance at 5% cache size across long-context benchmarks.
citing papers explorer
-
Remember to Forget: Gated Adaptive Positional Encoding
GAPE augments RoPE with query- and key-dependent gates to stabilize attention and improve long-context performance in language models.
-
Reformulating KV Cache Eviction Problem for Long-Context LLM Inference
LaProx reformulates KV cache eviction as an output-aware matrix approximation, enabling a unified global token selection strategy that preserves LLM performance at 5% cache size across long-context benchmarks.