pith. sign in

DeepCrossAttention: Supercharging transformer residual connections.arXiv preprint arXiv:2502.06785

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

fields

cs.LG 2

years

2026 2

clear filters

representative citing papers

Gradient Boosting within a Single Attention Layer

cs.LG · 2026-04-03 · conditional · novelty 7.0

Gradient-boosted attention applies a corrective second attention pass within a single layer, mapping to Friedman's gradient boosting and improving perplexity by 5.6-6.0% on WikiText-103 and OpenWebText subsets over standard attention.

citing papers explorer

Showing 2 of 2 citing papers after filters.

  • Gradient Boosting within a Single Attention Layer cs.LG · 2026-04-03 · conditional · none · ref 2

    Gradient-boosted attention applies a corrective second attention pass within a single layer, mapping to Friedman's gradient boosting and improving perplexity by 5.6-6.0% on WikiText-103 and OpenWebText subsets over standard attention.

  • Hyperloop Transformers cs.LG · 2026-04-23 · unreviewed · ref 10