pith. machine review for the scientific record. sign in

arxiv: 2602.21204 · v4 · submitted 2026-02-24 · 💻 cs.LG · cs.AI· cs.CV

Recognition: unknown

Test-Time Training with KV Binding Is Secretly Linear Attention

Authors on Pith no claims yet
classification 💻 cs.LG cs.AIcs.CV
keywords attentionlinearformtest-timebindinglearnedmultipletraining
0
0 comments X
read the original abstract

Test-time training (TTT) with KV binding as sequence modeling layer is commonly interpreted as a form of online meta-learning that memorizes a key-value mapping at test time. However, our analysis reveals multiple phenomena that contradict this memorization-based interpretation. Motivated by these findings, we revisit the formulation of TTT and show that a broad class of TTT architectures can be expressed as a form of learned linear attention operator. Beyond explaining previously puzzling model behaviors, this perspective yields multiple practical benefits: it enables principled architectural simplifications, admits fully parallel formulations that preserve performance while improving efficiency, and provides a systematic reduction of diverse TTT variants to a standard linear attention form. Overall, our results reframe TTT not as test-time memorization, but as learned linear attention with enhanced representational capacity. Project page: https://research.nvidia.com/labs/sil/projects/tttla/.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Training LLM Agents for Spontaneous, Reward-Free Self-Evolution via World Knowledge Exploration

    cs.AI 2026-04 unverdicted novelty 6.0

    LLM agents trained with a task-success reward on self-generated knowledge can spontaneously explore and adapt to new environments without any rewards or instructions at inference, yielding 20% gains on web tasks and a...

  2. Fast Spatial Memory with Elastic Test-Time Training

    cs.CV 2026-04 unverdicted novelty 6.0

    Elastic Test-Time Training stabilizes test-time updates via an elastic prior and moving-average anchor, enabling Fast Spatial Memory for scalable long-sequence 4D reconstruction with reduced memory use and fewer shortcuts.