DeepCrossAttention: Supercharging transformer residual connections.arXiv preprint arXiv:2502.06785

Lucas Heddes et al · arXiv 2502.06785

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Gradient Boosting within a Single Attention Layer

cs.LG · 2026-04-03 · conditional · novelty 7.0

Gradient-boosted attention applies a corrective second attention pass within a single layer, mapping to Friedman's gradient boosting and improving perplexity by 5.6-6.0% on WikiText-103 and OpenWebText subsets over standard attention.

Hyperloop Transformers

cs.LG · 2026-04-23

citing papers explorer

Showing 2 of 2 citing papers after filters.

Gradient Boosting within a Single Attention Layer cs.LG · 2026-04-03 · conditional · none · ref 2
Gradient-boosted attention applies a corrective second attention pass within a single layer, mapping to Friedman's gradient boosting and improving perplexity by 5.6-6.0% on WikiText-103 and OpenWebText subsets over standard attention.
Hyperloop Transformers cs.LG · 2026-04-23 · unreviewed · ref 10

DeepCrossAttention: Supercharging transformer residual connections.arXiv preprint arXiv:2502.06785

fields

years

verdicts

representative citing papers

citing papers explorer