pith. machine review for the scientific record. sign in

Log-linear attention

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

fields

cs.LG 3 cs.CL 1

years

2026 3 2025 1

verdicts

UNVERDICTED 4

representative citing papers

Kaczmarz Linear Attention

cs.LG · 2026-05-09 · unverdicted · novelty 5.0

Kaczmarz Linear Attention replaces the empirical coefficient in Gated DeltaNet with a key-norm-normalized step size derived from the online regression objective, yielding lower perplexity and better needle-in-haystack performance.

citing papers explorer

Showing 4 of 4 citing papers.

  • Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences cs.LG · 2026-04-22 · unverdicted · none · ref 18

    Preconditioned delta-rule models with a diagonal curvature approximation improve upon standard DeltaNet, GDN, and KDA by better approximating the test-time regression objective.

  • Kimi Linear: An Expressive, Efficient Attention Architecture cs.CL · 2025-10-30 · unverdicted · none · ref 35

    Kimi Linear hybridizes linear attention with a new KDA module to beat full attention on tasks while slashing KV cache by 75% and speeding decoding up to 6x.

  • Kaczmarz Linear Attention cs.LG · 2026-05-09 · unverdicted · none · ref 14

    Kaczmarz Linear Attention replaces the empirical coefficient in Gated DeltaNet with a key-norm-normalized step size derived from the online regression objective, yielding lower perplexity and better needle-in-haystack performance.

  • MDN: Parallelizing Stepwise Momentum for Delta Linear Attention cs.LG · 2026-05-07 · unverdicted · none · ref 90

    MDN parallelizes stepwise momentum for delta linear attention using geometric reordering and dynamical systems analysis, yielding performance gains over Mamba2 and GDN on 400M and 1.3B models.