ε-coresets for attention exist of size O(√d e^{ρ+o(ρ)}/ε) for unit-norm keys/values and queries of norm ≤ρ, nearly matching the Ω(√d e^ρ/ε) lower bound.
Random features for large-scale kernel machines
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
MEGPODE decomposes subject-specific ODE vector fields into population and individual Gaussian process priors and uses Kalman smoothing with virtual collocation to enable efficient Bayesian mixed-effects inference for heterogeneous dynamical systems.
citing papers explorer
-
Nearly Optimal Attention Coresets
ε-coresets for attention exist of size O(√d e^{ρ+o(ρ)}/ε) for unit-norm keys/values and queries of norm ≤ρ, nearly matching the Ω(√d e^ρ/ε) lower bound.
-
Bayesian Nonparametric Mixed-Effect ODEs with Gaussian Processes
MEGPODE decomposes subject-specific ODE vector fields into population and individual Gaussian process priors and uses Kalman smoothing with virtual collocation to enable efficient Bayesian mixed-effects inference for heterogeneous dynamical systems.