Spectral clipping of leading singular values in gradient matrices stabilizes SGD for non-convex problems with heavy-tailed noise and achieves the optimal convergence rate O(K^{(2-2α)/(3α-2)}).
The Singular Value Decomposition, Applications and Beyond
3 Pith papers cite this work. Polarity classification is still indexing.
abstract
The singular value decomposition (SVD) is not only a classical theory in matrix computation and analysis, but also is a powerful tool in machine learning and modern data analysis. In this tutorial we first study the basic notion of SVD and then show the central role of SVD in matrices. Using majorization theory, we consider variational principles of singular values and eigenvalues. Built on SVD and a theory of symmetric gauge functions, we discuss unitarily invariant norms, which are then used to formulate general results for matrix low rank approximation. We study the subdifferentials of unitarily invariant norms. These results would be potentially useful in many machine learning problems such as matrix completion and matrix data classification. Finally, we discuss matrix low rank approximation and its recent developments such as randomized SVD, approximate matrix multiplication, CUR decomposition, and Nystrom approximation. Randomized algorithms are important approaches to large scale SVD as well as fast matrix computations.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
Eliot is a query-time clustering and temporal visualization system for arXiv literature, evaluated via offline metrics on eight domains and a user survey showing 85% meaningful cluster labels.
On-policy distillation gains efficiency from early foresight in module allocation and update directions, which the proposed EffOPD method exploits for 3x faster training with comparable performance.
citing papers explorer
-
Gradient Clipping Beyond Vector Norms: A Spectral Approach for Matrix-Valued Parameters
Spectral clipping of leading singular values in gradient matrices stabilizes SGD for non-convex problems with heavy-tailed noise and achieves the optimal convergence rate O(K^{(2-2α)/(3α-2)}).
-
Eliot: Interactively $\underline{E}$xploring Fast-Changing Scientific $\underline{Li}$terature Trends with $\underline{O}$nline Da$\underline{t}$a and Learning
Eliot is a query-time clustering and temporal visualization system for arXiv literature, evaluated via offline metrics on eight domains and a user survey showing 85% meaningful cluster labels.
-
Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation
On-policy distillation gains efficiency from early foresight in module allocation and update directions, which the proposed EffOPD method exploits for 3x faster training with comparable performance.