KVM is a novel block-recurrent compressed memory for attention that unifies expandable transformer context with linear RNN efficiency, enabling competitive long-context performance with released code and models.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2representative citing papers
A training-inference consistent segmented execution framework for long-context LLMs matches full-context performance with substantially lower peak memory at very long lengths.
citing papers explorer
-
Key-Value Means: Transformers with Expandable Block-Recurrent Compressed Memory
KVM is a novel block-recurrent compressed memory for attention that unifies expandable transformer context with linear RNN efficiency, enabling competitive long-context performance with released code and models.
-
Training-Inference Consistent Segmented Execution for Long-Context LLMs
A training-inference consistent segmented execution framework for long-context LLMs matches full-context performance with substantially lower peak memory at very long lengths.