super hub Mixed citations

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, Jimmy Ba · 2014 · cs.LG · arXiv 1412.6980

Mixed citation behavior. Most common role is background (57%).

697 Pith papers citing it

Background 57% of classified citations

open full Pith review browse 697 citing papers more from Diederik P. Kingma arXiv PDF

abstract

We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 5 method 2

citation-polarity summary

background 4 use method 2 unclear 1

claims ledger

abstract We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little

authors

Diederik P. Kingma Jimmy Ba

co-cited works

representative citing papers

ENSEMBITS: an alphabet of protein conformational ensembles

cs.LG · 2026-05-13 · unverdicted · novelty 8.0

Ensembits creates a discrete vocabulary for protein conformational ensembles that outperforms static tokenizers on dynamics prediction tasks and enables ensemble token prediction from single structures via distillation.

REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations

cs.CL · 2026-05-12 · unverdicted · novelty 8.0

REALISTA optimizes continuous combinations of valid editing directions in latent space to produce realistic adversarial prompts that elicit hallucinations more effectively than prior methods, including on large reasoning models.

Online Learning-to-Defer with Varying Experts

stat.ML · 2026-05-12 · unverdicted · novelty 8.0

Presents the first online learning-to-defer algorithm with regret bounds O((n + n_e) T^{2/3}) generally and O((n + n_e) sqrt(T)) under low noise for multiclass classification with varying experts.

Spherical Boltzmann machines: a solvable theory of learning and generation in energy-based models

cs.LG · 2026-05-09 · unverdicted · novelty 8.0

In the high-dimensional limit the spherical Boltzmann machine admits exact equations for training dynamics, Bayesian evidence, and cascades of phase transitions tied to mode alignment with data, which connect to generative phenomena including double descent and out-of-equilibrium biases.

Convergent Stochastic Training of Attention and Understanding LoRA

cs.LG · 2026-05-08 · unverdicted · novelty 8.0

Attention and LoRA regression losses induce Poincaré inequalities under mild regularization, so SGD-mimicking SDEs converge to minimizers with no assumptions on data or model size.

SLayerGen: a Crystal Generative Model for all Space and Layer Groups

cond-mat.mtrl-sci · 2026-05-07 · unverdicted · novelty 8.0

SLayerGen generates crystals invariant to any space or layer group via autoregressive lattice and Wyckoff sampling plus equivariant diffusion, achieving gains over bulk models on diperiodic materials after correcting a prior loss inconsistency for hexagonal groups.

3DSS: 3D Surface Splatting for Inverse Rendering

cs.GR · 2026-05-07 · unverdicted · novelty 8.0 · 3 refs

3DSS is the first differentiable surface splatting renderer that recovers shape, spatially-varying BRDF materials, and HDR illumination from multi-view images via a coverage-based compositing model derived from reconstruction kernels.

Random test functions, $H^{-1}$ norm equivalence, and stochastic variational physics-informed neural networks

math.NA · 2026-05-05 · unverdicted · novelty 8.0

H^{-1} norm equivalence to expected squared evaluations on domain-dependent random test functions enables SV-PINNs that recover accurate solutions to challenging second-order elliptic PDEs faster than standard PINNs.

A Parameter-Free First-Order Algorithm for Non-Convex Optimization with $\tilde{\mkern1mu O}(\epsilon^{-5/3})$ Global Rate

math.OC · 2026-05-04 · conditional · novelty 8.0

PF-AGD is the first parameter-free deterministic accelerated first-order method with Õ(ε^{-5/3} log(1/ε)) complexity for smooth non-convex optimization.

Characterizing the Expressivity of Local Attention in Transformers

cs.CL · 2026-05-01 · unverdicted · novelty 8.0

Local attention strictly enlarges the class of regular languages recognizable by fixed-precision transformers by adding a second past operator in linear temporal logic, with global and local attention being expressively complementary.

STARE: Step-wise Temporal Alignment and Red-teaming Engine for Multi-modal Toxicity Attack

cs.CR · 2026-05-01 · unverdicted · novelty 8.0

STARE uses step-wise RL to attack multimodal models, achieving 68% higher attack success rate while revealing that adversarial optimization concentrates conceptual toxicity early and detail toxicity late in the generation trajectory.

Qvine: Vine Structured Quantum Circuits for Loading High Dimensional Distributions

quant-ph · 2026-04-29 · unverdicted · novelty 8.0

Qvine uses vine copula-inspired quantum circuit structures to achieve linear or quadratic depth scaling for loading high-dimensional distributions with high approximation quality.

Neural Spectral Bias and Conformal Correlators I: Introduction and Applications

hep-th · 2026-04-20 · unverdicted · novelty 8.0

Neural networks optimized solely on crossing symmetry reconstruct CFT correlators from minimal input data to few-percent accuracy across generalized free fields, minimal models, Ising, N=4 SYM, and AdS diagrams.

MMGait: Towards Multi-Modal Gait Recognition

cs.CV · 2026-04-17 · conditional · novelty 8.0

MMGait provides a new multi-sensor gait dataset and OmniGait baseline to support single-modal, cross-modal, and unified multi-modal person identification from walking patterns.

Proton Structure from Neural Simulation-Based Inference at the LHC

hep-ph · 2026-04-14 · unverdicted · novelty 8.0

Neural simulation-based inference on unbinned top-quark pair data at 13 TeV yields improved gluon PDF precision over traditional binned analyses while incorporating experimental and theoretical uncertainties.

Adam-HNAG: A Convergent Reformulation of Adam with Accelerated Rate

math.OC · 2026-04-09 · unverdicted · novelty 8.0

Adam-HNAG is a splitting-based reformulation of Adam that yields the first convergence proof for Adam-type methods, including accelerated rates, in convex smooth optimization.

CMCC-ReID: Cross-Modality Clothing-Change Person Re-Identification

cs.CV · 2026-04-03 · unverdicted · novelty 8.0

The paper introduces the CMCC-ReID task, constructs the SYSU-CMCC benchmark dataset, and proposes the PIA network with disentangling and prototype modules that outperforms prior methods on combined modality and clothing variations.

Traces of Helium Detected in Type Ic Supernova 2014L

astro-ph.HE · 2026-03-31 · accept · novelty 8.0

Quantitative Bayesian inference using a deep-learning emulator detects 0.018-0.020 M_sun of helium in the Type Ic supernova 2014L.

LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning

cs.AI · 2023-06-05 · conditional · novelty 8.0

LIBERO is a new benchmark for lifelong robot learning that evaluates transfer of declarative, procedural, and mixed knowledge across 130 manipulation tasks with provided demonstration data.

Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

cs.LG · 2022-09-07 · unverdicted · novelty 8.0

Rectified flow learns straight-path neural ODEs for distribution transport, yielding efficient generative models and domain transfers that work well even with a single simulation step.

Offline Reinforcement Learning with Implicit Q-Learning

cs.LG · 2021-10-12 · unverdicted · novelty 8.0

IQL achieves policy improvement in offline RL by implicitly estimating optimal action values through state-conditional upper expectiles of value functions, without querying Q-functions on out-of-distribution actions.

Passage Re-ranking with BERT

cs.IR · 2019-01-13 · unverdicted · novelty 8.0

Fine-tuning BERT for query-passage relevance classification achieves state-of-the-art results on TREC-CAR and MS MARCO, with a 27% relative gain in MRR@10 over prior methods.

Density estimation using Real NVP

cs.LG · 2016-05-27 · accept · novelty 8.0

Real NVP uses affine coupling layers to create invertible transformations that support exact density estimation, sampling, and latent inference without approximations.

Adaptive Computation Time for Recurrent Neural Networks

cs.NE · 2016-03-29 · accept · novelty 8.0

ACT lets RNNs dynamically adapt computation depth per input via a differentiable halting unit, yielding large gains on synthetic tasks and structural insights on language data.

citing papers explorer

Showing 15 of 15 citing papers after filters.

Online Learning-to-Defer with Varying Experts stat.ML · 2026-05-12 · unverdicted · none · ref 24 · internal anchor
Presents the first online learning-to-defer algorithm with regret bounds O((n + n_e) T^{2/3}) generally and O((n + n_e) sqrt(T)) under low noise for multiclass classification with varying experts.
On Hallucinations in Inverse Problems: Fundamental Limits and Provable Assessment Methods stat.ML · 2026-05-13 · unverdicted · none · ref 35 · internal anchor
Hallucinations in inverse problem reconstructions are fundamental to ill-posedness, with necessary and sufficient conditions plus computable bounds depending only on the forward model.
Optimality of Sub-network Laplace Approximations: New Results and Methods stat.ML · 2026-05-09 · conditional · none · ref 3 · internal anchor
Sub-network Laplace approximations always underestimate full-model predictive variance, and two new gradient-based and greedy selection rules provide theoretically grounded improvements.
The Interplay of Data Structure and Imbalance in the Learning Dynamics of Diffusion Models stat.ML · 2026-05-07 · unverdicted · none · ref 28 · internal anchor
Higher-variance classes are learned first in diffusion models; strong class imbalance reverses the order and imposes distinct delayed learning times on minority classes.
Tuning Derivatives for Causal Fairness in Machine Learning stat.ML · 2026-05-07 · unverdicted · none · ref 10 · internal anchor
New framework formalizes causal fairness for continuous protected attributes via path-specific derivatives and introduces a tuning algorithm for fair predictors.
Graph Attention Networks stat.ML · 2017-10-30 · accept · none · ref 10 · internal anchor
Graph Attention Networks compute learnable attention coefficients over node neighborhoods to produce weighted feature aggregations, achieving state-of-the-art results on citation networks and inductive protein-protein interaction graphs.
Keeping Score: Efficiency Improvements in Neural Likelihood Surrogate Training via Score-Augmented Loss Functions stat.ML · 2026-05-12 · unverdicted · none · ref 10 · internal anchor
Score-augmented loss functions for neural likelihood surrogates in SBI deliver downstream inference performance equivalent to 10x more training data at under 1.1x training time cost on network and spatial process models.
Spatial Adapter: Structured Spatial Decomposition and Closed-Form Covariance for Frozen Predictors stat.ML · 2026-05-12 · unverdicted · none · ref 35 · internal anchor
The Spatial Adapter equips frozen predictors with a spatially regularized orthonormal basis for residuals and derives a closed-form low-rank-plus-noise covariance for spatial prediction and kriging.
Amortized Variational Inference for Joint Posterior and Predictive Distributions in Bayesian Uncertainty Quantification stat.ML · 2026-05-05 · unverdicted · none · ref 23 · internal anchor
An amortized variational framework jointly targets the posterior and posterior-predictive distributions via a KL upper bound and moment regularization, yielding more accurate predictions at lower online cost than two-stage variational inference.
Laplace Approximation for Bayesian Tensor Network Kernel Machines stat.ML · 2026-04-29 · unverdicted · none · ref 2 · internal anchor
LA-TNKM applies a linearized Laplace approximation to tensor network kernel machines for Bayesian inference, matching or exceeding Gaussian processes and Bayesian neural networks on UCI regression tasks.
FedSPDnet: Geometry-Aware Federated Deep Learning with SPDnet stat.ML · 2026-04-24 · unverdicted · none · ref 18 · internal anchor
FedSPDnet uses manifold projections and retractions to average Stiefel-constrained parameters in federated SPDnet, outperforming standard federated EEGnet on EEG motor imagery benchmarks in F1 score and robustness.
Ensemble-Based Dirichlet Modeling for Predictive Uncertainty and Selective Classification stat.ML · 2026-04-07 · unverdicted · none · ref 67 · internal anchor
Ensemble-based method of moments on softmax outputs produces stable Dirichlet predictive distributions that improve uncertainty-guided tasks like selective classification over evidential deep learning.
Consistency Regularised Gradient Flows for Inverse Problems stat.ML · 2026-05-08 · unverdicted · none · ref 205 · internal anchor
A consistency-regularized Euclidean-Wasserstein-2 gradient flow performs joint posterior sampling and prompt optimization in latent space for efficient low-NFE inverse problem solving with diffusion models.
Missingness-aware Data Imputation via AI-powered Bayesian Generative Modeling stat.ML · 2026-05-03 · unverdicted · none · ref 10 · internal anchor
MissBGM jointly models data generation and missingness in a Bayesian neural generative framework to produce consistent imputations with principled posterior uncertainty.
Probabilistic Graphical Model using Graph Neural Networks for Bayesian Inversion of Discrete Structural Component States stat.ML · 2026-04-26 · unverdicted · none · ref 53 · internal anchor
A probabilistic graphical model framework with graph neural network inference computes Bayesian posteriors for discrete structural states, claimed to match traditional Bayesian results while scaling to high-dimensional problems via topology-informed learning and scale-adaptive training.

Adam: A Method for Stochastic Optimization

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer