arXiv preprint arXiv:2501.12352 , year=

Ke Alexander Wang, Jiaxin Shi, Emily B · 2025 · arXiv 2501.12352

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

representative citing papers

Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences

cs.LG · 2026-04-22 · unverdicted · novelty 7.0

Preconditioned delta-rule models with a diagonal curvature approximation improve upon standard DeltaNet, GDN, and KDA by better approximating the test-time regression objective.

OSDN: Improving Delta Rule with Provable Online Preconditioning in Linear Attention

cs.LG · 2026-05-13 · unverdicted · novelty 6.0

OSDN adds online diagonal preconditioning to the Delta Rule, preserving chunkwise parallelism while proving super-geometric convergence and delivering 32-39% recall gains at 340M-1.3B scales.

Scal3R: Scalable Test-Time Training for Large-Scale 3D Reconstruction

cs.CV · 2026-04-09 · unverdicted · novelty 6.0

Scal3R achieves better accuracy and consistency in large-scale 3D scene reconstruction by maintaining a compressed global context through test-time adaptation of lightweight neural networks on long video sequences.

Fast Spatial Memory with Elastic Test-Time Training

cs.CV · 2026-04-08 · unverdicted · novelty 6.0

Elastic Test-Time Training stabilizes test-time updates via an elastic prior and moving-average anchor, enabling Fast Spatial Memory for scalable long-sequence 4D reconstruction with reduced memory use and fewer shortcuts.

In-Place Test-Time Training

cs.LG · 2026-04-07 · conditional · novelty 6.0

In-Place TTT adapts LLM MLP projection matrices at test time with a next-token-aligned objective and chunk-wise updates, enabling better long-context performance as a drop-in enhancement.

Kaczmarz Linear Attention

cs.LG · 2026-05-09 · unverdicted · novelty 5.0

Kaczmarz Linear Attention replaces the empirical coefficient in Gated DeltaNet with a key-norm-normalized step size derived from the online regression objective, yielding lower perplexity and better needle-in-haystack performance.

MDN: Parallelizing Stepwise Momentum for Delta Linear Attention

cs.LG · 2026-05-07 · unverdicted · novelty 5.0

MDN parallelizes stepwise momentum for delta linear attention using geometric reordering and dynamical systems analysis, yielding performance gains over Mamba2 and GDN on 400M and 1.3B models.

citing papers explorer

Showing 7 of 7 citing papers.

Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences cs.LG · 2026-04-22 · unverdicted · none · ref 56
Preconditioned delta-rule models with a diagonal curvature approximation improve upon standard DeltaNet, GDN, and KDA by better approximating the test-time regression objective.
OSDN: Improving Delta Rule with Provable Online Preconditioning in Linear Attention cs.LG · 2026-05-13 · unverdicted · none · ref 61
OSDN adds online diagonal preconditioning to the Delta Rule, preserving chunkwise parallelism while proving super-geometric convergence and delivering 32-39% recall gains at 340M-1.3B scales.
Scal3R: Scalable Test-Time Training for Large-Scale 3D Reconstruction cs.CV · 2026-04-09 · unverdicted · none · ref 79
Scal3R achieves better accuracy and consistency in large-scale 3D scene reconstruction by maintaining a compressed global context through test-time adaptation of lightweight neural networks on long video sequences.
Fast Spatial Memory with Elastic Test-Time Training cs.CV · 2026-04-08 · unverdicted · none · ref 51
Elastic Test-Time Training stabilizes test-time updates via an elastic prior and moving-average anchor, enabling Fast Spatial Memory for scalable long-sequence 4D reconstruction with reduced memory use and fewer shortcuts.
In-Place Test-Time Training cs.LG · 2026-04-07 · conditional · none · ref 55
In-Place TTT adapts LLM MLP projection matrices at test time with a next-token-aligned objective and chunk-wise updates, enabling better long-context performance as a drop-in enhancement.
Kaczmarz Linear Attention cs.LG · 2026-05-09 · unverdicted · none · ref 44
Kaczmarz Linear Attention replaces the empirical coefficient in Gated DeltaNet with a key-norm-normalized step size derived from the online regression objective, yielding lower perplexity and better needle-in-haystack performance.
MDN: Parallelizing Stepwise Momentum for Delta Linear Attention cs.LG · 2026-05-07 · unverdicted · none · ref 84
MDN parallelizes stepwise momentum for delta linear attention using geometric reordering and dynamical systems analysis, yielding performance gains over Mamba2 and GDN on 400M and 1.3B models.

arXiv preprint arXiv:2501.12352 , year=

fields

years

verdicts

representative citing papers

citing papers explorer