mega hub Mixed citations

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, Jimmy Ba · 2014 · cs.LG · arXiv 1412.6980

Mixed citation behavior. Most common role is method (50%).

2073 Pith papers citing it

Method 50% of classified citations

open full Pith review browse 2073 citing papers more from Diederik P. Kingma arXiv PDF

abstract

We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

method 117 background 97 other 9 baseline 8 dataset 2

citation-polarity summary

use method 117 background 86 unclear 20 baseline 8 use dataset 2

claims ledger

abstract We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little

authors

Diederik P. Kingma Jimmy Ba

mega hub controls

export citing contexts JSON export graph JSON export full bundle JSON open full Pith review annotated reader queued

Recognition alignment

counterfactual ablation

If this work disappeared, these are the nearest dependency candidates in Pith, weighted toward method, dataset, baseline, and extension contexts where available. This is a structural signal, not a retraction verdict.

co-cited works

representative citing papers

GAIA: Geometry-Adaptive Operator Learning for Forward and Inverse Problems

cs.LG · 2026-07-01 · conditional · novelty 8.0

GAIA introduces a geometry-adaptive integral autoencoder that unifies forward, boundary-value, and inverse PDE operator learning on arbitrary domains via geometry tokens and cross-attention.

ShardNet: Training Neural Controllers with Hard, Non-Convex Constraints

eess.SY · 2026-06-29 · unverdicted · novelty 8.0

ShardNet enforces non-convex polyhedral safety constraints in neural controllers by construction via a differentiable projection layer, achieving 100% verified safety and over 3x larger safe sets than prior methods on double integrator benchmarks.

Adam Converges in Nonsmooth Nonconvex Optimization

math.OC · 2026-06-21 · unverdicted · novelty 8.0

The paper establishes the first finite-time convergence rate of 1/T^{2/13} for classical Adam (with bias correction, no extra steps) in nonsmooth nonconvex optimization under heavy-tailed noise with β1=β2.

Efficient AI-Inspired Reduction of Feynman Integrals via Tube Seeding

hep-ph · 2026-06-09 · unverdicted · novelty 8.0

Machine learning discovers a tube-seeding strategy for IBP reduction of Feynman integrals that scales linearly with numerator power, demonstrated on rank-20 2-loop 5-point integrals.

Test-time Adversarial Takeover: A Real-time Hijacking Interface against Robotic Diffusion Policies

cs.RO · 2026-06-09 · unverdicted · novelty 8.0

TAKO demonstrates real-time adversarial takeover of robotic diffusion policies via reusable universal patches on visual inputs, achieving 100% success in steering attacker-chosen trajectories across multiple tasks, encoders, and diffusion methods.

Adaptive directional gradients for parameterised quantum circuits

quant-ph · 2026-06-08 · unverdicted · novelty 8.0

Forward gradient framework for PQCs unifies SPSA and parameter-shift as limits, introduces QUIVER adaptive optimizer with closed-form measurement allocation, and demonstrates efficient training of 60-qubit circuits on ECG5000 and MNIST.

A multimodal dataset of photoplethysmography and continuous behavioral responses to ASMR and nature videos

cs.LG · 2026-05-30 · unverdicted · novelty 8.0

Introduces REST-ASMR multimodal dataset of PPG, stimuli, and continuous annotations for ASMR research, validated with 97% responder rate, significant agreement, PPG deceleration, and BiLSTM achieving 75.51% frame-level accuracy under strict subject-video independent 4-fold CV.

Neutron Star Equation of State via Physics Informed Neural Network

astro-ph.HE · 2026-05-29 · unverdicted · novelty 8.0

PINNs are used to non-parametrically infer the neutron star EOS from NICER and pulsar data, producing M_max = 2.06 M_sun, R_1.4 = 12.85 km, and a reproducible speed-of-sound softening at 2-4 rho_0 consistent with quark-hadron crossover.

Not All Inputs Are Valid: Towards Open-Set Video Moment Retrieval Using Language

cs.CV · 2026-05-28 · unverdicted · novelty 8.0

OpenVMR uses normalizing flow to detect out-of-distribution queries and performs moment retrieval only on in-distribution queries.

Canonical Regularisation of Wide Feature-Learning Neural Networks

stat.ML · 2026-05-18 · unverdicted · novelty 8.0

Derives geodesic ridge regularization and Riemannian Gibbs Process prior for feature-learning wide neural networks, generalizing kernel-regime results via function-space axiomatization.

ENSEMBITS: an alphabet of protein conformational ensembles

cs.LG · 2026-05-13 · unverdicted · novelty 8.0 · 2 refs

Ensembits is the first tokenizer of protein conformational ensembles that outperforms static tokenizers on RMSF prediction and matches them on function and mutation tasks while using less pretraining data.

Spherical Boltzmann machines: a solvable theory of learning and generation in energy-based models

cs.LG · 2026-05-09 · unverdicted · novelty 8.0

In the high-dimensional limit the spherical Boltzmann machine admits exact equations for training dynamics, Bayesian evidence, and cascades of phase transitions tied to mode alignment with data, which connect to generative phenomena including double descent and out-of-equilibrium biases.

Convergent Stochastic Training of Attention and Understanding LoRA

cs.LG · 2026-05-08 · unverdicted · novelty 8.0

Attention and LoRA regression losses induce Poincaré inequalities under mild regularization, so SGD-mimicking SDEs converge to minimizers with no assumptions on data or model size.

SLayerGen: a Crystal Generative Model for all Space and Layer Groups

cond-mat.mtrl-sci · 2026-05-07 · unverdicted · novelty 8.0

SLayerGen generates crystals invariant to any space or layer group via autoregressive lattice and Wyckoff sampling plus equivariant diffusion, achieving gains over bulk models on diperiodic materials after correcting a prior loss inconsistency for hexagonal groups.

3DSS: 3D Surface Splatting for Inverse Rendering

cs.GR · 2026-05-07 · unverdicted · novelty 8.0 · 3 refs

3DSS is the first differentiable surface splatting renderer that recovers shape, spatially-varying BRDF materials, and HDR illumination from multi-view images via a coverage-based compositing model derived from reconstruction kernels.

A Parameter-Free First-Order Algorithm for Non-Convex Optimization with $\tilde{\mkern1mu O}(\epsilon^{-5/3})$ Global Rate

math.OC · 2026-05-04 · conditional · novelty 8.0

PF-AGD is the first parameter-free deterministic accelerated first-order method with Õ(ε^{-5/3} log(1/ε)) complexity for smooth non-convex optimization.

STARE: Step-wise Temporal Alignment and Red-teaming Engine for Multi-modal Toxicity Attack

cs.CR · 2026-05-01 · unverdicted · novelty 8.0

STARE uses step-wise RL to attack multimodal models, achieving 68% higher attack success rate while revealing that adversarial optimization concentrates conceptual toxicity early and detail toxicity late in the generation trajectory.

Qvine: Vine Structured Quantum Circuits for Loading High Dimensional Distributions

quant-ph · 2026-04-29 · unverdicted · novelty 8.0

Qvine uses vine copula-inspired quantum circuit structures to achieve linear or quadratic depth scaling for loading high-dimensional distributions with high approximation quality.

Neural Spectral Bias and Conformal Correlators I: Introduction and Applications

hep-th · 2026-04-20 · unverdicted · novelty 8.0

Neural networks optimized solely on crossing symmetry reconstruct CFT correlators from minimal input data to few-percent accuracy across generalized free fields, minimal models, Ising, N=4 SYM, and AdS diagrams.

MMGait: Towards Multi-Modal Gait Recognition

cs.CV · 2026-04-17 · conditional · novelty 8.0

MMGait provides a new multi-sensor gait dataset and OmniGait baseline to support single-modal, cross-modal, and unified multi-modal person identification from walking patterns.

Proton Structure from Neural Simulation-Based Inference at the LHC

hep-ph · 2026-04-14 · unverdicted · novelty 8.0

Neural simulation-based inference on unbinned top-quark pair data at 13 TeV yields improved gluon PDF precision over traditional binned analyses while incorporating experimental and theoretical uncertainties.

Adam-HNAG: A Convergent Reformulation of Adam with Accelerated Rate

math.OC · 2026-04-09 · unverdicted · novelty 8.0

Adam-HNAG is a splitting-based reformulation of Adam that yields the first convergence proof for Adam-type methods, including accelerated rates, in convex smooth optimization.

CMCC-ReID: Cross-Modality Clothing-Change Person Re-Identification

cs.CV · 2026-04-03 · unverdicted · novelty 8.0

The paper introduces the CMCC-ReID task, constructs the SYSU-CMCC benchmark dataset, and proposes the PIA network with disentangling and prototype modules that outperforms prior methods on combined modality and clothing variations.

Traces of Helium Detected in Type Ic Supernova 2014L

astro-ph.HE · 2026-03-31 · accept · novelty 8.0

Quantitative Bayesian inference using a deep-learning emulator detects 0.018-0.020 M_sun of helium in the Type Ic supernova 2014L.

citing papers explorer

Showing 44 of 44 citing papers after filters.

Adam Converges in Nonsmooth Nonconvex Optimization math.OC · 2026-06-21 · unverdicted · none · ref 10 · internal anchor
The paper establishes the first finite-time convergence rate of 1/T^{2/13} for classical Adam (with bias correction, no extra steps) in nonsmooth nonconvex optimization under heavy-tailed noise with β1=β2.
A Parameter-Free First-Order Algorithm for Non-Convex Optimization with $\tilde{\mkern1mu O}(\epsilon^{-5/3})$ Global Rate math.OC · 2026-05-04 · conditional · none · ref 23 · internal anchor
PF-AGD is the first parameter-free deterministic accelerated first-order method with Õ(ε^{-5/3} log(1/ε)) complexity for smooth non-convex optimization.
Adam-HNAG: A Convergent Reformulation of Adam with Accelerated Rate math.OC · 2026-04-09 · unverdicted · none · ref 1 · internal anchor
Adam-HNAG is a splitting-based reformulation of Adam that yields the first convergence proof for Adam-type methods, including accelerated rates, in convex smooth optimization.
Symbolic Discovery of Iterative Algorithms: A Continuous Latent Space Bayesian Optimization Framework math.OC · 2026-07-02 · unverdicted · none · ref 13 · internal anchor
A VAE-plus-Bayesian-optimization framework discovers new symbolic iterative optimization algorithms without assuming update function forms and faster than prior mathematical programming methods.
Accelerating SAV-based optimization via randomized low-rank Hessian approximation math.OC · 2026-06-09 · unverdicted · none · ref 17 · internal anchor
N-RSAV accelerates RSAV optimization via randomized Nyström low-rank Hessian approximations with eigenvalue truncation, adaptive reuse, and convergence guarantees under the PL condition.
Learning Approximate Solutions to Multiparametric Generalized Nash Equilibrium Problems math.OC · 2026-05-27 · unverdicted · none · ref 24 · internal anchor
A learning approach trains neural networks to approximate solutions of multiparametric GNEPs using NI gap loss with value surrogates, achieving large speedups and providing new existence conditions for continuous selections.
Can Adaptive Gradient Methods Converge under Heavy-Tailed Noise? A Case Study of AdaGrad math.OC · 2026-05-18 · unverdicted · none · ref 13 · 2 links · internal anchor
AdaGrad achieves the first provable convergence rate under heavy-tailed noise (4/3 < p ≤ 2) in non-convex settings without knowing p, plus an algorithm-dependent lower bound and an improved rate for AdaGrad-Norm under a mild extra assumption.
Stochastic Non-Smooth Convex Optimization with Unbounded Gradients math.OC · 2026-05-15 · unverdicted · none · ref 1 · internal anchor
Clipped AdamW with exponentially weighted accumulation achieves superior global convergence rates for convex stochastic generalized Lipschitz optimization compared to SGD and AdaGrad.
Avoiding Bias in Clipped SGD for Overparameterized Models under Generalized Smoothness math.OC · 2026-05-14 · unverdicted · none · ref 29 · internal anchor
Clipped and normalized SGD converge without bias in overparameterized interpolating models under (L0,L1)-smoothness, with improved rates and extensions to heavy-tailed noise and weaker smoothness.
Convergence of difference inclusions via a diameter criterion math.OC · 2026-05-14 · unverdicted · none · ref 289 · internal anchor
A diameter criterion tied to a potential function certifies convergence of difference inclusions, enabling discrete proofs for first-order optimization methods with diminishing steps.
Stochastic global optimization of continuous functions via random walks on Grassmannians math.OC · 2026-05-13 · unverdicted · none · ref 12 · internal anchor
A stochastic global optimizer samples random k-dimensional subspaces, solves the restricted problem on each, and moves to the improved point, with rate controlled by a gap parameter on the distribution of restricted minima.
Newton methods beyond Hessian Lipschitz continuity: A nonlinear preconditioning approach math.OC · 2026-05-12 · unverdicted · none · ref 19 · internal anchor
Nonlinear preconditioning extends Newton methods to objectives lacking Hessian Lipschitz continuity by analyzing a transformed mapping under a relaxed smoothness condition, with superlinear convergence and O(ε^{-3/2}) iteration complexity.
Implicit Neural Optimal Transport via Fixed-Point Optimization math.OC · 2026-05-11 · unverdicted · none · ref 171 · internal anchor
A single-network implicit neural optimal transport method that solves the c-transform via proximal fixed-point iteration for stable, non-adversarial training.
Mamba Sequence Modeling meets Model Predictive Control math.OC · 2026-04-15 · unverdicted · none · ref 24 · internal anchor
Mamba-MPC stabilizes and tracks references on SISO and MIMO systems in simulation and hardware while outperforming LSTM-MPC with faster computation.
Control Forward-Backward Consistency: Quantifying the Accuracy of Koopman Control Family Models math.OC · 2026-03-29 · unverdicted · none · ref 26 · internal anchor
The relative root-mean-square error of finite-dimensional Koopman Control Family predictors is strictly upper-bounded by the square root of the largest eigenvalue of the newly defined control forward-backward consistency matrix.
Global Stability and Step Size Robustness of RMSProp math.OC · 2026-03-16 · unverdicted · none · ref 4 · internal anchor
An input-to-state Lyapunov function is introduced to prove global asymptotic stability of RMSProp for constant step sizes and robustness to arbitrary bounded time-varying step size rules.
Decentralized Nonconvex Optimization under Heavy-Tailed Noise: Normalization and Optimal Convergence math.OC · 2025-05-06 · conditional · none · ref 40 · internal anchor
GT-NSGDm achieves the optimal non-asymptotic convergence rate O(1/T^{(p-1)/(3p-2)}) for decentralized nonconvex stochastic optimization under zero-mean heavy-tailed noise with p-th moment.
A Fully Data-Driven Value Iteration for Stochastic LQR: Convergence, Robustness and Stability math.OC · 2025-05-05 · unverdicted · none · ref 36 · internal anchor
Establishes convergence and stability of fully data-driven value iteration for stochastic LQR with unknown dynamics and introduces a robust ADP algorithm requiring no initial admissible policy.
Constrained Variable Projection for Structured Problems math.OC · 2026-06-22 · unverdicted · none · ref 11 · internal anchor
Extends variable projection to constrained separable nonlinear least-squares via bilevel collapse, yielding exact reduced gradients and a convergent conditional-gradient algorithm.
bAdag: an adaptive block coordinate gradient method for smooth nonconvex functions math.OC · 2026-06-10 · unverdicted · none · ref 49 · internal anchor
Introduces bAdag, an AdaGrad-based block coordinate gradient method with ergodic sublinear convergence proofs for smooth nonconvex objectives under block Lipschitz gradient assumptions, covering cyclic, uniform random, and Gauss-Southwell selection plus box constraints.
A stochastic gradient algorithm for non-separable optimization with convergence guarantee math.OC · 2026-06-09 · unverdicted · none · ref 4 · internal anchor
Presents a stochastic gradient algorithm for non-separable optimization with local convergence guarantees under smoothness assumptions.
In-Expectation Convergence of Stochastic Gradient Methods under Heavy-Tailed Noise math.OC · 2026-05-30 · unverdicted · none · ref 21 · internal anchor
New in-expectation convergence guarantees for SMD, ASMD (convex) and SGD, SGDM (nonconvex) under heavy-tailed noise without bounded-domain restrictions or algorithmic modifications.
Wall-Clock Complexity for Zeroth-Order Optimization with Tunable Oracle Fidelity math.OC · 2026-05-29 · unverdicted · none · ref 18 · internal anchor
Develops wall-clock complexity analysis for zeroth-order optimization with tunable oracle fidelity, deriving optimal fidelity schedules and showing accelerated schemes can be inferior in total time.
Global Convergence and Error Propagation in Neural Gradient Flows: A Riemannian Optimization Framework math.OC · 2026-05-26 · unverdicted · none · ref 8 · internal anchor
Establishes Riemannian gradient flow equivalence for neural MMS steps, linear convergence under convexity conditions, and O(δ) tracking bounds for inexact iterates.
A Differentiable Interior-Point Method in Single Precision math.OC · 2026-05-18 · conditional · none · ref 62 · internal anchor
An alternative complementarity formulation for primal-dual interior-point methods keeps linear systems spectrally bounded near the solution, enabling stable single-precision solves and differentiation for bilevel and end-to-end learning.
Low-Order Explicit Hessian Imitation Method for Large-Scale Supervised Machine Learning math.OC · 2026-05-07 · unverdicted · none · ref 6 · internal anchor
New optimizer uses auxiliary loss to imitate low-order Hessian information, replacing gradient squares in Adam-like training with convergence guarantee and some experimental gains.
IRON: Implicit Resolvent Optimization under Noise math.OC · 2026-05-06 · unverdicted · none · ref 10 · internal anchor
Fully implicit resolvent discretization of noisy accelerated gradient dynamics produces a Lyapunov mean-square recursion whose contraction factor improves and stationary error scales as O(1/α), vanishing for large α under accurate inner solves.
A Line-search-free Method for Adaptive Decentralized Optimization math.OC · 2026-05-01 · unverdicted · none · ref 19 · internal anchor
New adaptive decentralized algorithms select stepsizes from local curvature estimates derived from a Lyapunov function, delivering sublinear convergence for convex problems and linear rates for strongly convex ones.
Adaptive Regularization within Trust Region Methods for Stochastic Nonconvex Optimization math.OC · 2026-04-16 · unverdicted · none · ref 27 · internal anchor
Reg-ASTRO achieves almost sure Õ(ε^{-1.5}) iteration complexity for stochastic nonconvex problems with mean-zero subexponential noise by coupling adaptive sampling with an adaptively regularized local model.
Parametric Nonconvex Optimization via Convex Surrogates math.OC · 2026-04-07 · unverdicted · none · ref 31 · internal anchor
A surrogate for parametric nonconvex optimization is constructed as the minimum of convex-monotonic function compositions and solved via parallel convex optimization, with a proof-of-concept on path tracking.
Accelerating Full-Scale Nonlinear Model Predictive Control via Surrogate Dynamics Optimization math.OC · 2026-04-07 · unverdicted · none · ref 23 · internal anchor
SDO uses an ML surrogate to solve a lightweight auxiliary problem that provides warm starts for full-scale NMPC, yielding faster convergence and two orders of magnitude less training data than behavior cloning in a 24-hour pressurized water reactor load-following case.
MPC and System Identification with Differentiable Physics: Fluid System and Particle Beam Control math.OC · 2026-04-06 · unverdicted · none · ref 7 · internal anchor
A framework for simultaneous model predictive control and online parameter estimation is introduced by treating differentiable physics simulators as computational objects for gradient-based joint optimization.
Optimal Projection-Free Adaptive SGD for Matrix Optimization math.OC · 2026-04-02 · unverdicted · none · ref 3 · internal anchor
Proving stability of Leon's preconditioner enables the first tuning-free Nesterov-accelerated projection-free adaptive SGD variant with improved non-smooth non-convex rates.
INTHOP: A Second-Order Globally Convergent Method for Nonconvex Optimization math.OC · 2025-10-25 · unverdicted · none · ref 34 · internal anchor
INTHOP is a second-order method that bounds the difference between an approximate positive definite Hessian and the exact one within an interval, reuses the approximation when iterates stay inside it, and proves global convergence while showing fewer evaluations than steepest descent or quasi-Newton
Rennala MVR: Improved Time Complexity for Parallel Stochastic Optimization via Momentum-Based Variance Reduction math.OC · 2026-05-09 · unverdicted · none · ref 223 · internal anchor
Rennala MVR improves time complexity over Rennala SGD for smooth nonconvex stochastic optimization in heterogeneous parallel systems under a mean-squared smoothness assumption.
Importance Sampling in Expensive Finite-Sum Optimization via Contextual Bandit Methods math.OC · 2026-04-22 · unverdicted · none · ref 18 · 2 links · internal anchor
The paper frames subset selection in SAM optimization as a contextual bandit problem and applies the Exp4 algorithm to generate sampling distributions, with preliminary synthetic numerical results.
Momentum Stability and Adaptive Control in Stochastic Reconfiguration math.OC · 2026-04-20 · unverdicted · none · ref 16 · internal anchor
Convergence holds for momentum μ less than 1 in SPRING under mild assumptions, but μ=1 risks divergence; PRIME-SR adapts momentum via spectral dimension and subspace overlap to match tuned performance with better robustness.
Stochastic versus Deterministic in Stochastic Gradient Descent math.OC · 2025-09-03 · unverdicted · none · ref 12 · internal anchor
Treating stochastic and deterministic gradients separately in mini-batch SGD yields faster convergence and smaller error radius than uniform treatment, with further gains under strong convexity.
An Efficient Stochastic Subgradient Method for the Global Placement Problem in Very Large-Scale Integration Circuits math.OC · 2024-12-29 · unverdicted · none · ref 25 · internal anchor
A ReLU-penalty formulation for VLSI global placement is solved via stochastic subgradient descent, with the first claimed convergence proof for ReLU-type nonsmooth nonconvex problems.
Quasi-Quadratic Gradient: A New Direction for Accelerating the BFGS Method in Quasi-Newton Optimization math.OC · 2026-04-27 · unverdicted · none · ref 5 · internal anchor
The Quasi-Quadratic Gradient is proposed as a new search direction that multiplies the BFGS inverse-Hessian approximation by the gradient to accelerate convergence over standard BFGS.
Stochastic Optimization and Data Science math.OC · 2026-05-16 · unverdicted · none · ref 26 · internal anchor
The paper motivates stochastic optimization problems from statistical perspectives and describes offline and online approaches to solve expectation minimization problems.
Introduction to stochastic gradient methods math.OC · 2026-06-02 · unverdicted · none · ref 25 · internal anchor
Lecture notes on convergence theory for deterministic gradient descent and stochastic gradient methods under standard assumptions.
Communication-Efficient Decentralized Stochastic Minimax Optimization math.OC · 2025-07-29 · unreviewed · ref 70 · internal anchor
Multi-Iteration Stochastic Optimizers math.OC · 2020-11-03 · unreviewed · ref 32 · internal anchor