Absorber LLM: Harnessing Causal Synchronization for Test-Time Training

· 2026 · cs.LG · arXiv 2604.20915

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Transformers suffer from a high computational cost that grows with sequence length for self-attention, making inference in long streams prohibited by memory consumption. Constant-memory alternatives such as RNNs and SSMs compress history into states with fixed size and thus lose long-tail dependencies, while methods that memorize contexts into parameters, such as Test-Time Training (TTT), are prone to overfitting token-level projection and fail to preserve the causal effect of context in pretrained LLMs. We propose Absorber LLM, which formulates long-context retention as a self-supervised causal synchronization: after absorbing historical contexts into parameters, a contextless model should match the original model with full context on future generations. We optimize this objective by synchronizing internal behaviors of the updated model with the original one, ensuring context absorption and generalization. Experiments on long-context and streaming benchmarks show that Absorber LLM reduces inference memory and improves accuracy over prior parameter-as-memory baselines.

representative citing papers

Beyond Perplexity: A Behavioral Evaluation Framework for Deployment-Memory Claims in LLM Test-Time Training

cs.CL · 2026-07-01 · unverdicted · novelty 7.0

Proposes a claim-calibrated evidence ladder and evaluation protocol with explicit-memory baselines to assess whether TTT produces deployment-usable behavioral memory rather than just proxy metric gains.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Beyond Perplexity: A Behavioral Evaluation Framework for Deployment-Memory Claims in LLM Test-Time Training cs.CL · 2026-07-01 · unverdicted · none · ref 37 · internal anchor
Proposes a claim-calibrated evidence ladder and evaluation protocol with explicit-memory baselines to assess whether TTT produces deployment-usable behavioral memory rather than just proxy metric gains.

Absorber LLM: Harnessing Causal Synchronization for Test-Time Training

fields

years

verdicts

representative citing papers

citing papers explorer