21 Proof.(1) Row-sum control.Since ˆS=S+ ∆and ˆD= diag( ˆS1), we rewrite E:= ˆD−D= diag ( ( ˆS−S)1 ) = diag(∆1)

If∥E∥2≤smin/2, then∥ˆD−1∥2≤2/smin

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

RACE Attention: A Strictly Linear-Time Attention Layer for Training on Outrageously Large Contexts

cs.LG · 2025-10-05 · unverdicted · novelty 7.0

RACE Attention is a strictly linear-time attention mechanism that approximates softmax attention outputs using Gaussian projections and soft LSH to enable training on contexts up to 12 million tokens.

citing papers explorer

Showing 1 of 1 citing paper.

RACE Attention: A Strictly Linear-Time Attention Layer for Training on Outrageously Large Contexts cs.LG · 2025-10-05 · unverdicted · none · ref 42
RACE Attention is a strictly linear-time attention mechanism that approximates softmax attention outputs using Gaussian projections and soft LSH to enable training on contexts up to 12 million tokens.

21 Proof.(1) Row-sum control.Since ˆS=S+ ∆and ˆD= diag( ˆS1), we rewrite E:= ˆD−D= diag ( ( ˆS−S)1 ) = diag(∆1)

fields

years

verdicts

representative citing papers

citing papers explorer