Sparse attention arises from compact kernel regression, with Epanechnikov and similar kernels mapping to normalized ReLU, sparsemax, and alpha-entmax attention.
Ryumei Nakada and Masaaki Imaizumi
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 7roles
method 1polarities
use method 1representative citing papers
Transformers perform kernel-based prediction for Hölder regression on manifolds and achieve intrinsic-dimension-dependent minimax rates with sufficient training tasks.
A direct plug-in kernel estimator for Schrödinger bridge time-series drifts achieves uniform non-asymptotic bounds, pointwise CLT under undersmoothing, and minimax-rate optimal adaptive selection.
VeloTree infers differentiation trees from RNA velocity fields by defining cell dissimilarity as the squared varifold distance between integral curves of the velocity field.
Random slicing for subsampling combined with Nadaraya-Watson smoothing enables faster and improved persistence-based topological optimization of point clouds in 2D and 3D.
A Bayesian optimal experimental design framework with Gaussian approximation of expected information gain and surrogate Fisher information enables optimized uniaxial tests that significantly improve identifiability of history-dependent constitutive parameters over random designs.
Gradient-boosted models with SHAP analysis find word familiarity as the dominant predictor of English vocabulary difficulty across Spanish, German, and Chinese L1 learners, with orthographic transfer adding value only for the first two groups.
citing papers explorer
No citing papers match the current filters.