Transformers are limited to a linearly growing number of accessible output sequences with prompt length, with exponential decay in accessible proportion beyond a critical point, even under unbounded context.
Title resolution pending
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4representative citing papers
A training-inference consistent segmented execution framework for long-context LLMs matches full-context performance with substantially lower peak memory at very long lengths.
KVM is a new block-recurrent compressed KV attention that turns transformers into O(N) chunked RNNs or growable sublinear-memory models while remaining implementable with standard operations.
Absorber LLM introduces causal synchronization to absorb context into parameters for memory-efficient long-context LLM inference while preserving causal effects.
citing papers explorer
-
How Many Different Outputs Can a Transformer Generate?
Transformers are limited to a linearly growing number of accessible output sequences with prompt length, with exponential decay in accessible proportion beyond a critical point, even under unbounded context.
-
Training-Inference Consistent Segmented Execution for Long-Context LLMs
A training-inference consistent segmented execution framework for long-context LLMs matches full-context performance with substantially lower peak memory at very long lengths.
-
Key-Value Means: Transformers with Expandable Block-Recurrent Compressed Memory
KVM is a new block-recurrent compressed KV attention that turns transformers into O(N) chunked RNNs or growable sublinear-memory models while remaining implementable with standard operations.
-
Absorber LLM: Harnessing Causal Synchronization for Test-Time Training
Absorber LLM introduces causal synchronization to absorb context into parameters for memory-efficient long-context LLM inference while preserving causal effects.