CART is a recurrent transformer with shared core, frozen prelude KV tensors, and LTI stability gate that fails to beat dense baselines at parameter parity across tested widths.
Hierarchical vs. Flat Iteration in Shared-Weight Transformers
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
We present an empirical study of whether hierarchically structured, shared-weight recurrence can match the representational quality of independent-layer stacking in a Transformer-based language model. HRM-LM replaces L independent Transformer layers with a two-speed recurrent pair: a Fast module operating at every step for local refinement, and a Slow module operating every T steps for global compression. This recurrent hierarchy is unrolled for M = N x T steps with shared parameters. The central and most robust finding, supported by a parameter-matched Universal Transformer ablation (UniTF, 1.2B) across five independent runs, is a sharp empirical gap between the two approaches.
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
CART: Context-Anchored Recurrent Transformer -- A Parameter-Efficient Architecture with Learned Stability
CART is a recurrent transformer with shared core, frozen prelude KV tensors, and LTI stability gate that fails to beat dense baselines at parameter parity across tested widths.