GAIA introduces a geometry-adaptive integral autoencoder that unifies forward, boundary-value, and inverse PDE operator learning on arbitrary domains via geometry tokens and cross-attention.
mega hub Mixed citations
Adam: A Method for Stochastic Optimization
Mixed citation behavior. Most common role is method (50%).
abstract
We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.
hub tools
citation-role summary
citation-polarity summary
claims ledger
- abstract We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little
authors
mega hub controls
Recognition alignment
counterfactual ablation
co-cited works
representative citing papers
ShardNet enforces non-convex polyhedral safety constraints in neural controllers by construction via a differentiable projection layer, achieving 100% verified safety and over 3x larger safe sets than prior methods on double integrator benchmarks.
The paper establishes the first finite-time convergence rate of 1/T^{2/13} for classical Adam (with bias correction, no extra steps) in nonsmooth nonconvex optimization under heavy-tailed noise with β1=β2.
Machine learning discovers a tube-seeding strategy for IBP reduction of Feynman integrals that scales linearly with numerator power, demonstrated on rank-20 2-loop 5-point integrals.
TAKO demonstrates real-time adversarial takeover of robotic diffusion policies via reusable universal patches on visual inputs, achieving 100% success in steering attacker-chosen trajectories across multiple tasks, encoders, and diffusion methods.
Forward gradient framework for PQCs unifies SPSA and parameter-shift as limits, introduces QUIVER adaptive optimizer with closed-form measurement allocation, and demonstrates efficient training of 60-qubit circuits on ECG5000 and MNIST.
Introduces REST-ASMR multimodal dataset of PPG, stimuli, and continuous annotations for ASMR research, validated with 97% responder rate, significant agreement, PPG deceleration, and BiLSTM achieving 75.51% frame-level accuracy under strict subject-video independent 4-fold CV.
PINNs are used to non-parametrically infer the neutron star EOS from NICER and pulsar data, producing M_max = 2.06 M_sun, R_1.4 = 12.85 km, and a reproducible speed-of-sound softening at 2-4 rho_0 consistent with quark-hadron crossover.
OpenVMR uses normalizing flow to detect out-of-distribution queries and performs moment retrieval only on in-distribution queries.
Derives geodesic ridge regularization and Riemannian Gibbs Process prior for feature-learning wide neural networks, generalizing kernel-regime results via function-space axiomatization.
Ensembits is the first tokenizer of protein conformational ensembles that outperforms static tokenizers on RMSF prediction and matches them on function and mutation tasks while using less pretraining data.
In the high-dimensional limit the spherical Boltzmann machine admits exact equations for training dynamics, Bayesian evidence, and cascades of phase transitions tied to mode alignment with data, which connect to generative phenomena including double descent and out-of-equilibrium biases.
Attention and LoRA regression losses induce Poincaré inequalities under mild regularization, so SGD-mimicking SDEs converge to minimizers with no assumptions on data or model size.
SLayerGen generates crystals invariant to any space or layer group via autoregressive lattice and Wyckoff sampling plus equivariant diffusion, achieving gains over bulk models on diperiodic materials after correcting a prior loss inconsistency for hexagonal groups.
3DSS is the first differentiable surface splatting renderer that recovers shape, spatially-varying BRDF materials, and HDR illumination from multi-view images via a coverage-based compositing model derived from reconstruction kernels.
PF-AGD is the first parameter-free deterministic accelerated first-order method with Õ(ε^{-5/3} log(1/ε)) complexity for smooth non-convex optimization.
STARE uses step-wise RL to attack multimodal models, achieving 68% higher attack success rate while revealing that adversarial optimization concentrates conceptual toxicity early and detail toxicity late in the generation trajectory.
Qvine uses vine copula-inspired quantum circuit structures to achieve linear or quadratic depth scaling for loading high-dimensional distributions with high approximation quality.
Neural networks optimized solely on crossing symmetry reconstruct CFT correlators from minimal input data to few-percent accuracy across generalized free fields, minimal models, Ising, N=4 SYM, and AdS diagrams.
MMGait provides a new multi-sensor gait dataset and OmniGait baseline to support single-modal, cross-modal, and unified multi-modal person identification from walking patterns.
Neural simulation-based inference on unbinned top-quark pair data at 13 TeV yields improved gluon PDF precision over traditional binned analyses while incorporating experimental and theoretical uncertainties.
Adam-HNAG is a splitting-based reformulation of Adam that yields the first convergence proof for Adam-type methods, including accelerated rates, in convex smooth optimization.
The paper introduces the CMCC-ReID task, constructs the SYSU-CMCC benchmark dataset, and proposes the PIA network with disentangling and prototype modules that outperforms prior methods on combined modality and clothing variations.
Quantitative Bayesian inference using a deep-learning emulator detects 0.018-0.020 M_sun of helium in the Type Ic supernova 2014L.
citing papers explorer
-
Adam Converges in Nonsmooth Nonconvex Optimization
The paper establishes the first finite-time convergence rate of 1/T^{2/13} for classical Adam (with bias correction, no extra steps) in nonsmooth nonconvex optimization under heavy-tailed noise with β1=β2.
-
A Parameter-Free First-Order Algorithm for Non-Convex Optimization with $\tilde{\mkern1mu O}(\epsilon^{-5/3})$ Global Rate
PF-AGD is the first parameter-free deterministic accelerated first-order method with Õ(ε^{-5/3} log(1/ε)) complexity for smooth non-convex optimization.
-
Adam-HNAG: A Convergent Reformulation of Adam with Accelerated Rate
Adam-HNAG is a splitting-based reformulation of Adam that yields the first convergence proof for Adam-type methods, including accelerated rates, in convex smooth optimization.
-
Symbolic Discovery of Iterative Algorithms: A Continuous Latent Space Bayesian Optimization Framework
A VAE-plus-Bayesian-optimization framework discovers new symbolic iterative optimization algorithms without assuming update function forms and faster than prior mathematical programming methods.
-
Accelerating SAV-based optimization via randomized low-rank Hessian approximation
N-RSAV accelerates RSAV optimization via randomized Nyström low-rank Hessian approximations with eigenvalue truncation, adaptive reuse, and convergence guarantees under the PL condition.
-
Learning Approximate Solutions to Multiparametric Generalized Nash Equilibrium Problems
A learning approach trains neural networks to approximate solutions of multiparametric GNEPs using NI gap loss with value surrogates, achieving large speedups and providing new existence conditions for continuous selections.
-
Can Adaptive Gradient Methods Converge under Heavy-Tailed Noise? A Case Study of AdaGrad
AdaGrad achieves the first provable convergence rate under heavy-tailed noise (4/3 < p ≤ 2) in non-convex settings without knowing p, plus an algorithm-dependent lower bound and an improved rate for AdaGrad-Norm under a mild extra assumption.
-
Stochastic Non-Smooth Convex Optimization with Unbounded Gradients
Clipped AdamW with exponentially weighted accumulation achieves superior global convergence rates for convex stochastic generalized Lipschitz optimization compared to SGD and AdaGrad.
-
Avoiding Bias in Clipped SGD for Overparameterized Models under Generalized Smoothness
Clipped and normalized SGD converge without bias in overparameterized interpolating models under (L0,L1)-smoothness, with improved rates and extensions to heavy-tailed noise and weaker smoothness.
-
Convergence of difference inclusions via a diameter criterion
A diameter criterion tied to a potential function certifies convergence of difference inclusions, enabling discrete proofs for first-order optimization methods with diminishing steps.
-
Stochastic global optimization of continuous functions via random walks on Grassmannians
A stochastic global optimizer samples random k-dimensional subspaces, solves the restricted problem on each, and moves to the improved point, with rate controlled by a gap parameter on the distribution of restricted minima.
-
Newton methods beyond Hessian Lipschitz continuity: A nonlinear preconditioning approach
Nonlinear preconditioning extends Newton methods to objectives lacking Hessian Lipschitz continuity by analyzing a transformed mapping under a relaxed smoothness condition, with superlinear convergence and O(ε^{-3/2}) iteration complexity.
-
Implicit Neural Optimal Transport via Fixed-Point Optimization
A single-network implicit neural optimal transport method that solves the c-transform via proximal fixed-point iteration for stable, non-adversarial training.
-
Mamba Sequence Modeling meets Model Predictive Control
Mamba-MPC stabilizes and tracks references on SISO and MIMO systems in simulation and hardware while outperforming LSTM-MPC with faster computation.
-
Control Forward-Backward Consistency: Quantifying the Accuracy of Koopman Control Family Models
The relative root-mean-square error of finite-dimensional Koopman Control Family predictors is strictly upper-bounded by the square root of the largest eigenvalue of the newly defined control forward-backward consistency matrix.
-
Global Stability and Step Size Robustness of RMSProp
An input-to-state Lyapunov function is introduced to prove global asymptotic stability of RMSProp for constant step sizes and robustness to arbitrary bounded time-varying step size rules.
-
Decentralized Nonconvex Optimization under Heavy-Tailed Noise: Normalization and Optimal Convergence
GT-NSGDm achieves the optimal non-asymptotic convergence rate O(1/T^{(p-1)/(3p-2)}) for decentralized nonconvex stochastic optimization under zero-mean heavy-tailed noise with p-th moment.
-
A Fully Data-Driven Value Iteration for Stochastic LQR: Convergence, Robustness and Stability
Establishes convergence and stability of fully data-driven value iteration for stochastic LQR with unknown dynamics and introduces a robust ADP algorithm requiring no initial admissible policy.
-
Constrained Variable Projection for Structured Problems
Extends variable projection to constrained separable nonlinear least-squares via bilevel collapse, yielding exact reduced gradients and a convergent conditional-gradient algorithm.
-
bAdag: an adaptive block coordinate gradient method for smooth nonconvex functions
Introduces bAdag, an AdaGrad-based block coordinate gradient method with ergodic sublinear convergence proofs for smooth nonconvex objectives under block Lipschitz gradient assumptions, covering cyclic, uniform random, and Gauss-Southwell selection plus box constraints.
-
A stochastic gradient algorithm for non-separable optimization with convergence guarantee
Presents a stochastic gradient algorithm for non-separable optimization with local convergence guarantees under smoothness assumptions.
-
In-Expectation Convergence of Stochastic Gradient Methods under Heavy-Tailed Noise
New in-expectation convergence guarantees for SMD, ASMD (convex) and SGD, SGDM (nonconvex) under heavy-tailed noise without bounded-domain restrictions or algorithmic modifications.
-
Wall-Clock Complexity for Zeroth-Order Optimization with Tunable Oracle Fidelity
Develops wall-clock complexity analysis for zeroth-order optimization with tunable oracle fidelity, deriving optimal fidelity schedules and showing accelerated schemes can be inferior in total time.
-
Global Convergence and Error Propagation in Neural Gradient Flows: A Riemannian Optimization Framework
Establishes Riemannian gradient flow equivalence for neural MMS steps, linear convergence under convexity conditions, and O(δ) tracking bounds for inexact iterates.
-
A Differentiable Interior-Point Method in Single Precision
An alternative complementarity formulation for primal-dual interior-point methods keeps linear systems spectrally bounded near the solution, enabling stable single-precision solves and differentiation for bilevel and end-to-end learning.
-
Low-Order Explicit Hessian Imitation Method for Large-Scale Supervised Machine Learning
New optimizer uses auxiliary loss to imitate low-order Hessian information, replacing gradient squares in Adam-like training with convergence guarantee and some experimental gains.
-
IRON: Implicit Resolvent Optimization under Noise
Fully implicit resolvent discretization of noisy accelerated gradient dynamics produces a Lyapunov mean-square recursion whose contraction factor improves and stationary error scales as O(1/α), vanishing for large α under accurate inner solves.
-
A Line-search-free Method for Adaptive Decentralized Optimization
New adaptive decentralized algorithms select stepsizes from local curvature estimates derived from a Lyapunov function, delivering sublinear convergence for convex problems and linear rates for strongly convex ones.
-
Adaptive Regularization within Trust Region Methods for Stochastic Nonconvex Optimization
Reg-ASTRO achieves almost sure Õ(ε^{-1.5}) iteration complexity for stochastic nonconvex problems with mean-zero subexponential noise by coupling adaptive sampling with an adaptively regularized local model.
-
Parametric Nonconvex Optimization via Convex Surrogates
A surrogate for parametric nonconvex optimization is constructed as the minimum of convex-monotonic function compositions and solved via parallel convex optimization, with a proof-of-concept on path tracking.
-
Accelerating Full-Scale Nonlinear Model Predictive Control via Surrogate Dynamics Optimization
SDO uses an ML surrogate to solve a lightweight auxiliary problem that provides warm starts for full-scale NMPC, yielding faster convergence and two orders of magnitude less training data than behavior cloning in a 24-hour pressurized water reactor load-following case.
-
MPC and System Identification with Differentiable Physics: Fluid System and Particle Beam Control
A framework for simultaneous model predictive control and online parameter estimation is introduced by treating differentiable physics simulators as computational objects for gradient-based joint optimization.
-
Optimal Projection-Free Adaptive SGD for Matrix Optimization
Proving stability of Leon's preconditioner enables the first tuning-free Nesterov-accelerated projection-free adaptive SGD variant with improved non-smooth non-convex rates.
-
INTHOP: A Second-Order Globally Convergent Method for Nonconvex Optimization
INTHOP is a second-order method that bounds the difference between an approximate positive definite Hessian and the exact one within an interval, reuses the approximation when iterates stay inside it, and proves global convergence while showing fewer evaluations than steepest descent or quasi-Newton
-
Rennala MVR: Improved Time Complexity for Parallel Stochastic Optimization via Momentum-Based Variance Reduction
Rennala MVR improves time complexity over Rennala SGD for smooth nonconvex stochastic optimization in heterogeneous parallel systems under a mean-squared smoothness assumption.
-
Importance Sampling in Expensive Finite-Sum Optimization via Contextual Bandit Methods
The paper frames subset selection in SAM optimization as a contextual bandit problem and applies the Exp4 algorithm to generate sampling distributions, with preliminary synthetic numerical results.
-
Momentum Stability and Adaptive Control in Stochastic Reconfiguration
Convergence holds for momentum μ less than 1 in SPRING under mild assumptions, but μ=1 risks divergence; PRIME-SR adapts momentum via spectral dimension and subspace overlap to match tuned performance with better robustness.
-
Stochastic versus Deterministic in Stochastic Gradient Descent
Treating stochastic and deterministic gradients separately in mini-batch SGD yields faster convergence and smaller error radius than uniform treatment, with further gains under strong convexity.
-
An Efficient Stochastic Subgradient Method for the Global Placement Problem in Very Large-Scale Integration Circuits
A ReLU-penalty formulation for VLSI global placement is solved via stochastic subgradient descent, with the first claimed convergence proof for ReLU-type nonsmooth nonconvex problems.
-
Quasi-Quadratic Gradient: A New Direction for Accelerating the BFGS Method in Quasi-Newton Optimization
The Quasi-Quadratic Gradient is proposed as a new search direction that multiplies the BFGS inverse-Hessian approximation by the gradient to accelerate convergence over standard BFGS.
-
Stochastic Optimization and Data Science
The paper motivates stochastic optimization problems from statistical perspectives and describes offline and online approaches to solve expectation minimization problems.
-
Introduction to stochastic gradient methods
Lecture notes on convergence theory for deterministic gradient descent and stochastic gradient methods under standard assumptions.
- Communication-Efficient Decentralized Stochastic Minimax Optimization
- Multi-Iteration Stochastic Optimizers