pith. machine review for the scientific record. sign in

arxiv: 1405.3080 · v1 · submitted 2014-05-13 · 📊 stat.ML · cs.LG· math.OC

Recognition: unknown

Accelerating Minibatch Stochastic Gradient Descent using Stratified Sampling

Authors on Pith no claims yet
classification 📊 stat.ML cs.LGmath.OC
keywords minibatchsamplingstratifiedclustersdescentgradientmethodoften
0
0 comments X
read the original abstract

Stochastic Gradient Descent (SGD) is a popular optimization method which has been applied to many important machine learning tasks such as Support Vector Machines and Deep Neural Networks. In order to parallelize SGD, minibatch training is often employed. The standard approach is to uniformly sample a minibatch at each step, which often leads to high variance. In this paper we propose a stratified sampling strategy, which divides the whole dataset into clusters with low within-cluster variance; we then take examples from these clusters using a stratified sampling technique. It is shown that the convergence rate can be significantly improved by the algorithm. Encouraging experimental results confirm the effectiveness of the proposed method.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Convergence of Riemannian Stochastic Gradient Descents: Varying Batch Sizes And Nonstandard Batch Forming

    math.OC 2026-04 unverdicted novelty 6.0

    Convergence theorems are established for Riemannian SGD with iteration-varying probability spaces, applying to varying batch sizes and unbiased batch forming schemes.

  2. Variance Matters: Improving Domain Adaptation via Stratified Sampling

    cs.LG 2025-12 unverdicted novelty 6.0

    VaRDASS improves unsupervised domain adaptation by using stratified sampling to reduce variance in discrepancy estimation for measures like correlation alignment and MMD, with derived error bounds, an optimality proof...