An overview of gradient descent optimization algorithms

Sebastian Ruder

Authors on Pith no claims yet

classification 💻 cs.LG

keywords algorithmsdescentgradientoptimizationdifferentoverviewadditionalaims

read the original abstract

Gradient descent optimization algorithms, while increasingly popular, are often used as black-box optimizers, as practical explanations of their strengths and weaknesses are hard to come by. This article aims to provide the reader with intuitions with regard to the behaviour of different algorithms that will allow her to put them to use. In the course of this overview, we look at different variants of gradient descent, summarize challenges, introduce the most common optimization algorithms, review architectures in a parallel and distributed setting, and investigate additional strategies for optimizing gradient descent.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 15 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Coherent-State Propagation: A Computational Framework for Simulating Bosonic Quantum Systems
quant-ph 2026-04 unverdicted novelty 8.0

Coherent-state propagation enables quasi-polynomial classical simulation of bosonic circuits with logarithmically many Kerr gates at exponentially small trace-distance error, with polynomial runtime in the weak-nonlin...
Revisiting Shadow Detection from a Vision-Language Perspective
cs.CV 2026-05 unverdicted novelty 7.0

SVL uses language embeddings aligned with global image representations via shadow ratio regression and global-to-local coupling to improve shadow detection robustness in ambiguous cases.
Locally Near Optimal Piecewise Linear Regression in High Dimensions via Difference of Max-Affine Functions
stat.ML 2026-05 unverdicted novelty 7.0

ABGD parametrizes piecewise linear functions as difference of max-affine functions and converges linearly to an epsilon-accurate solution with O(d max(sigma/epsilon,1)^2) samples under sub-Gaussian noise, which is min...
Efficient classical training of model-free quantum photonic reservoir
quant-ph 2026-04 unverdicted novelty 7.0

Classical light training of photonic quantum reservoirs enables accurate model-free estimation of single-qubit observables and two-qubit entanglement witnesses on unseen quantum states.
Large Spikes in Stochastic Gradient Descent: A Large-Deviations View
cs.LG 2026-03 unverdicted novelty 7.0

Large loss spikes in SGD are polynomially likely and serve as the dominant mechanism for escaping sharp minima toward flatter solutions in the NTK regime.
UniISP: A Unified ISP Framework for Both Human and Machine Vision
cs.CV 2026-05 unverdicted novelty 6.0

UniISP unifies ISP processing with a Hybrid Attention Module and Feature Adapter to produce images that are both visually pleasing for humans and informative for computer vision models.
Hyperspectral Anomaly Detection Using Einstein Fuzzy Computing and Quantum Neural Network
eess.IV 2026-05 unverdicted novelty 6.0

HyFuHAD fuses classical Einstein fuzzy detection from multiple membership functions with quantum fuzzy detection to achieve claimed state-of-the-art performance in unsupervised hyperspectral anomaly detection.
BOOOM: Loss-Function-Agnostic Black-Box Optimization over Orthonormal Manifolds for Machine Learning and Statistical Inference
math.OC 2026-04 unverdicted novelty 6.0

BOOOM parametrizes Stiefel manifold optimization into Euclidean angle space using global Givens rotations and solves it with recursive modified pattern search for loss-agnostic black-box problems.
Efficient optimisation of multi-parameter quantum control protocols for strongly-coupled systems
quant-ph 2026-04 unverdicted novelty 6.0

Gradient-based optimization of SUPER and FTPE pulse protocols via auto-differentiation and uniTEMPO yields higher preparation fidelities than resonant pi-pulses or standard two-photon excitation, with the advantage in...
Co-evolving Agent Architectures and Interpretable Reasoning for Automated Optimization
cs.AI 2026-04 unverdicted novelty 6.0

EvoOR-Agent co-evolves agent architectures as AOE-style networks with graph-mediated recombination and knowledge-base-assisted mutation to outperform fixed LLM pipelines on OR benchmarks.
Material-Agnostic Zero-Shot Thermal Inference for Metal Additive Manufacturing via a Parametric PINN Framework
cs.LG 2026-04 unverdicted novelty 6.0

A decoupled parametric PINN with conditional modulation and Rosenthal-derived output scaling achieves zero-shot thermal inference across arbitrary metal alloys in laser powder bed fusion.
Getting large-scale quantum neural networks ready for quantum hardware
quant-ph 2026-04 unverdicted novelty 5.0

Physics-informed quantum neural networks trained on noisy measurements can construct nontrivial decision boundaries to classify quantum states via order parameters and are suited for NISQ hardware due to links with Ma...
Micro-DualNet: Dual-Path Spatio-Temporal Network for Micro-Action Recognition
cs.CV 2026-04 unverdicted novelty 5.0

Micro-DualNet employs dual ST and TS pathways with entity-level adaptive routing and Mutual Action Consistency loss to achieve competitive results on MA-52 and state-of-the-art on iMiGUE for micro-action recognition.
Navigating LLM Valley: From AdamW to Memory-Efficient and Matrix-Based Optimizers
cs.LG 2026-05 unverdicted novelty 3.0

This survey organizes LLM optimizer literature into categories and argues the field is shifting toward rigorous, multi-factor comparisons of convergence, memory, stability, and complexity.
Split and Aggregation Learning for Foundation Models Over Mobile Embodied AI Network (MEAN): A Comprehensive Survey
cs.IT 2026-05 unverdicted novelty 3.0

The paper surveys split and aggregation learning for foundation models in 6G networks to improve efficiency, resource use, and data privacy in distributed AI.