org/CorpusID:260350893

URL https://api · 1912 · arXiv 1912.10077

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

Gating Enables Curvature: A Geometric Expressivity Gap in Attention

cs.LG · 2026-04-16 · unverdicted · novelty 8.0

Gated attention enables non-flat and positively curved geometries in the Fisher-Rao manifold of representations that ungated attention cannot achieve.

How Does Attention Help? Insights from Random Matrices on Signal Recovery from Sequence Models

stat.ML · 2026-05-07 · conditional · novelty 7.0

Attention pooling produces a free-multiplicative-convolution bulk spectrum and two phase transitions for signal recovery; optimal weights are the top eigenvector of the positional correlation matrix R.

Continuous transformations of probability measures and their transport representations

math.FA · 2026-04-17 · unverdicted · novelty 7.0

Lipschitz continuous transformations F of probability measures w.r.t. Wasserstein distance admit continuous transport maps f(·,μ) such that F(μ) = f(·,μ)_# μ.

Progressive Approximation in Deep Residual Networks: Theory and Validation

cs.LG · 2026-04-27 · unverdicted · novelty 5.0

Residual networks admit progressive approximation trajectories with monotonically decreasing error, enabling useful predictions from any depth after a single training run via the LPA principle.

citing papers explorer

Showing 4 of 4 citing papers.

Gating Enables Curvature: A Geometric Expressivity Gap in Attention cs.LG · 2026-04-16 · unverdicted · none · ref 19
Gated attention enables non-flat and positively curved geometries in the Fisher-Rao manifold of representations that ungated attention cannot achieve.
How Does Attention Help? Insights from Random Matrices on Signal Recovery from Sequence Models stat.ML · 2026-05-07 · conditional · none · ref 11
Attention pooling produces a free-multiplicative-convolution bulk spectrum and two phase transitions for signal recovery; optimal weights are the top eigenvector of the positional correlation matrix R.
Continuous transformations of probability measures and their transport representations math.FA · 2026-04-17 · unverdicted · none · ref 27
Lipschitz continuous transformations F of probability measures w.r.t. Wasserstein distance admit continuous transport maps f(·,μ) such that F(μ) = f(·,μ)_# μ.
Progressive Approximation in Deep Residual Networks: Theory and Validation cs.LG · 2026-04-27 · unverdicted · none · ref 8
Residual networks admit progressive approximation trajectories with monotonically decreasing error, enabling useful predictions from any depth after a single training run via the LPA principle.

org/CorpusID:260350893

fields

years

verdicts

representative citing papers

citing papers explorer