Standard softmax-attention transformers can approximate the Gaussian kernel ridge regression predictor by implementing preconditioned Richardson iteration during their forward pass.
Understanding in-context learning on structured manifolds: Bridging attention to kernel methods
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Transformers can be built to act as nonlinear featurizers via attention, supporting in-context regression with proven generalization bounds on synthetic tasks.
citing papers explorer
-
Transformers Can Implement Preconditioned Richardson Iteration for In-Context Gaussian Kernel Regression
Standard softmax-attention transformers can approximate the Gaussian kernel ridge regression predictor by implementing preconditioned Richardson iteration during their forward pass.
-
Understanding In-Context Learning for Nonlinear Regression with Transformers: Attention as Featurizer
Transformers can be built to act as nonlinear featurizers via attention, supporting in-context regression with proven generalization bounds on synthetic tasks.