arxiv: 1811.04504 · v2 · pith:Q3J6KQVJnew · submitted 2018-11-11 · 💻 cs.LG · cs.AI· stat.ML

SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient

Aaron Mishkin , Frederik Kunstner , Didrik Nielsen , Mark Schmidt , Mohammad Emtiyaz Khan This is my paper

classification 💻 cs.LG cs.AIstat.ML

keywords methodsgradientslanguncertaintyapproximationcovariancedeepdiagonal

0 comments p. Extension

Add this Pith Number to your LaTeX paper

\usepackage{pith}
\pithnumber{Q3J6KQVJ}

Prints a linked pith:Q3J6KQVJ badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

read the original abstract

Uncertainty estimation in large deep-learning models is a computationally challenging task, where it is difficult to form even a Gaussian approximation to the posterior distribution. In such situations, existing methods usually resort to a diagonal approximation of the covariance matrix despite, the fact that these matrices are known to result in poor uncertainty estimates. To address this issue, we propose a new stochastic, low-rank, approximate natural-gradient (SLANG) method for variational inference in large, deep models. Our method estimates a "diagonal plus low-rank" structure based solely on back-propagated gradients of the network log-likelihood. This requires strictly less gradient computations than methods that compute the gradient of the whole variational objective. Empirical evaluations on standard benchmarks confirm that SLANG enables faster and more accurate estimation of uncertainty than mean-field methods, and performs comparably to state-of-the-art methods.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Robust Filter Attention: Self-Attention as Precision-Weighted State Estimation
cs.LG 2025-09 unverdicted novelty 7.0

Robust Filter Attention models self-attention as consistency-based state estimation under a linear SDE for token trajectories, matching standard attention complexity while showing lower perplexity and better zero-shot...
RT-Transformer: The Transformer Block as a Spherical State Estimator
cs.LG 2026-05 unverdicted novelty 6.0

Transformer components arise as the natural solution to precision-weighted directional state estimation on the hypersphere.